When meeting with teams that are working with machine learning today, there is one point above everything else that I try to teach. It is the importance of storing and versioning of machine learning experiments and especially how many things there actually are that need to be stored.
You know what really grinds my gears? When I have a deep learning model that I want to train and I have to SSH into my AWS instance, install all the drivers and libraries, run my code and then forget to shut down my machine! Once, I ended up forgetting one up over the weekend that cost my employer over $10 000!!!
Recreating experiments inside Valohai could be a whole lot easier and we’ve heard your cries!
With the latest release, live today, whenever you copy an old experiment and want to re-run it Valohai now copies the tags and title over from the previous experiment. Tags are also now automatically propagated down to individual executions when you create a task with several ones e.g. during a hyperparameter sweep.
We also improved the creation of new experiments. When creating a new execution you now have a dropdown for selecting your Docker image. We pre-fill the filterable box with our list of recommended Docker images but you can naturally point to any custom made Docker image as well. And naturally, if you have defined a default one in your valohai.yaml file that will be the default one selected from the get-go.
We have also fixed several bugs and made a handful of smaller fixes in the UI and the API that you can read more about in the patch notes.
All of us have seen those fear mongering headlines about how artificial intelligence is going to steal our jobs and how we should be very careful with biased AI algorithms. Bias means that the algorithm favors certain groups of people or otherwise guides decisions towards an unfair outcome. Bias can mean giving a raise only to white male employees, increasing criminal risk factors of certain ethnic groups and filling your news feed only with topics and point of views that you are currently consuming – instead of giving a broad, balanced view of the world and educating you.
Valohai and Microsoft cross lightsabers in the battle for artificial intelligence, through Microsoft’s global ScaleUp Program.
Just lately we’ve been playing around with IBM PowerAI in order to ensure our customers can leverage it in large-scale on-premise training. PowerAI in itself is IBM’s solution for deep learning consisting of software and hardware to help you quickly train deep learning models. Today we’re happy to announce that Valohai fully supports PowerAI and our customers can start using it!
If developers used to be the rock stars of the dotcom era, Data Scientists are quickly overtaking them as the new Whitesnake cover bands of the 2020s. Although both might be sporting the same hobo beards, Data Scientists are getting their work done with just sticks and stones as their tools while us Software Engineers have every tool in the universe.
Developing a machine learning model for a new project starts with certain common groundwork and exploration, to understand your data and figure out the approaches to try. A popular choice for this groundwork is Jupyter, an environment where you write Python code interactively. In Jupyter notebook's cells you can evaluate and revise and it is an attractive, visual choice (and many times the right choice) – for this step of data science work. Since Jupyter kernels, the processes backing a notebook’s execution, retain their internal state while the code is being edited and revised, they’re a highly interactive, fast-feedback environment.
Reproducibility and replicability are cornerstones of the scientific method. Every so often there’s a sensationalized news article about a new scientific study with astounding results (for instance, we’re looking forward to seeing what’s hot at ICML 2018 – we’re attending, come say hi!) – and it’s not uncommon in these cases that there’s no way for other fellow scientists to verify these results by themselves, be it due to missing or proprietary data, or faulty methodologies. This, naturally, casts shade over the entire study in question.