Valohai blog

Insights from the deep learning industry.

What did I Learn about CI/CD for Machine Learning

Most software development teams have adopted continuous integration and delivery (CI/CD) to iterate faster. However, a machine learning model depends not only on the code but also the data and hyperparameters. Releasing a new machine learning model in production is more complex than traditional software development.

  • 13 min read
  • Jun 10, 2020 4:09:27 PM

Bayesian Hyperparameter Optimization with Valohai

Grid search and random search are the most well-known in hyperparameter tuning. They are also both first-class citizens inside the Valohai platform. You define your search space, hit go, and Valohai will start all your machines. It does a search over the designated area of parameters you’ve defined. It is all automatic and doesn’t make you launch or shut down machines by hand. Also, you don't accidentally leave machines running costing you money. But we’ve been missing one central way for hyperparameter tuning, Bayesian optimization. Not anymore!

Classifying 4M Reddit posts in 4k subreddits: an end-to-end machine learning pipeline

Finding the right subreddit to submit your post can be tricky, especially for people new to Reddit. There are thousands of active subreddits with overlapping content. If it is no easy task for a human, I didn’t expect it to be easier for a machine. Currently, redditors can ask for suitable subreddits in a special subreddit: r/findareddit.

  • 16 min read
  • Apr 1, 2020 2:46:02 PM

Machine Learning and Remote Work

A lot of companies and teams are going fully remote for the first time due to the Coronavirus. We at Valohai are big believers in remote work. Having practiced with a distributed team for a good 4 years we would like to share some of our thoughts on remote work in Machine Learning. A lot of major pain points we have seen revolve around tooling.

Using DVC to version control your ML experiment data

In this blog post we will explore how you can use DVC for your data version control and how you can automate your data version control with and without DVC inside the Valohai platform. DVC ( is an open source command-line tool for version controlling your binary data in the same way as you version control code in Git. You hook it up to your data store (e.g. AWS S3 or Azure Blob Storage) and after that use it in the same way as you use Git for pulling and pushing files.

Machine Learning in the cloud vs on-premises

The cloud is just somebody else’s computer It’s a running joke among developers that the cloud is just a word for somebody else’s computer. But the fact remains, that by leveraging the cloud you can reap benefits that you couldn’t achieve with your on-premises server farm.

Three ways to categorize machine learning platforms

Machine learning (ML) platforms take many forms and usually solve only one or a few parts of the ML problem space. So how do you make sense of the different platforms that all call themselves ML platforms?

Production Machine Learning Pipeline for Text Classification with fastText

When doing machine learning in production, the choice of the model is just one of the many important criteria. Equally important are the definition of the problem, gathering high-quality data and the architecture of the machine learning pipeline.

  • 12 min read
  • Jan 30, 2020 12:34:50 PM

Identify relevant text from complex documents builds solutions for multi-disciplinary project teams working in large companies. These teams work according to project documents that usually have several hundreds of pages. Finding the relevant sections for each team member is a real burden in the project-based working environment.

    Related Posts