Insights from the deep learning industry.
When doing machine learning in production, the choice of the model is just one of the many important criteria. Equally important are the definition of the problem, gathering high-quality data and the architecture of the machine learning pipeline.
Selko.io builds solutions for multi-disciplinary project teams working in large companies. These teams work according to project documents that usually have several hundreds of pages. Finding the relevant sections for each team member is a real burden in the project-based working environment.
This article is the story of us at Selko.io, productionizing our machine learning workflows. We'll describe Selko's route from starting the company to developing our first ML models. We'll also walk through how we built a fully working machine learning solution combining our UI, backend, and orchestration layer for machine learning tasks. And of course, how we went from a homegrown ML orchestration platform to Valohai. To give you some context, let's first dive into the history of the company.
One of the key challenges for a Data Science team is the search for an accurately labelled dataset for solving the given problem. While it is easy to build a basic model that is reasonably accurate for a demo to the business, going beyond it towards a production worthy solution needs gold standard ground truth data.
Apache Airflow is a popular platform to create, schedule and monitor workflows in Python. It has more than 15k stars on Github and it’s used by data engineers at companies like Twitter, Airbnb and Spotify.
Introduction After looking at a lot of Java/JVM based NLP libraries listed on Awesome AI/ML/DL I decided to pick the Apache OpenNLP library. One of the reasons comes from the fact another developer (who had a look at it previously) recommended it. Besides, it’s an Apache project, they have been great supporters of F/OSS Java projects for the last two decades or so (see Wikipedia). It also goes without saying that Apache OpenNLP is backed by the Apache 2.0 license.
ML Is Unlike Industrialization, Electricity and IT Only the companies that invest into machine learning today will exist 10 years from now. The ones that look to the sidelines will be eaten by their competition.
What is continuous integration? Continuous Integration (CI) in software development is the process of testing that a change in one place doesn’t break something else. Continuous Delivery (CD), on the other hand, is an extension to CI where every change in the code is also deployed. Both are and have been core parts in the advancements of Extreme Programming, i.e. rapid small-batch development. This, on its hand, has been the main contributor to advancements in rapid software development.
Introduction We are all aware of Machine Learning tools and cloud services that work via the browser and give us an interface we can use to perform our day-to-day data analysis, model training, and evaluation, and other tasks to various degrees of efficiencies.