This is the first part of a tutorial series about reinforcement learning. We will start with some theory and then move on to more practical things in the next part. During this series, you will not only learn how to train your model, but also what is the best workflow for training it in the cloud with full version control using the Valohai deep learning management platform.
This tutorial will demonstrate how to take a single cell in a local Jupyter Notebook and run it in the cloud, using the Valohai platform and its command-line client (CLI).
Valohai now supports random search for hyperparameter optimization (which we call the Tasks feature), which has been proven in the aptly named paper Random search for hyper-parameter optimization to be an efficient way to find “neighborhoods” of likely-to-be-optimal hyperparameter values, which can then be iterated further to find the really good values.
Since the rise of the deep learning revolution, springboarded by the Krizhevsky et al. 2012 ImageNet victory, people have thought that data, processing power and data scientists were the three key ingredients to building AI solutions. The companies with the largest datasets, the most GPUs to train neural networks on, and the smartest data scientists were going to dominate forever.
Watch a recording of the webinar on version control in machine learning that was held on 22th of November 2018. During the webinar we discussed about the topics below and answered multiple questions addressed by the attendees.
PocketFlow is an open-source framework from Tencent to automatically compress and optimize deep learning models. Especially edge devices such as mobile phones or IoT devices can be very limited on computing resources so sacrificing a bit of model performance for a much smaller memory footprint and lower computational requirements is a smart tradeoff.
Microsoft's Cognitive Toolkit or CNTK is an open source framework for building Deep Learning models. This relatively new framework has been gaining traction so we decided to make sure Valohai supports it well. One of the benefits over competing frameworks has been CNTK’s ground up support for multi-node, multi-GPU training, something that for instance TensorFlow has been struggling to tackle well. If you are doing work on really large datasets, you should maybe give it a try.
Synthetic data is artificially created information rather than recorded from real-world events. A simple example would be generating a user profile for John Doe rather than using an actual user profile. This way you can theoretically generate vast amounts of training data for deep learning models and with infinite possibilities.
You might have heard that every individual subject to automated decision making by machine learning models has a right to an explanation of the result. I bet you feel drops of sweat forming on your forehead when you receive an inquiry from a manager saying that he needs details about how a certain decision was made. If thinking about this scenario gives you chills, you are in the right place. Read further and learn how to tackle the transparency issue.
When meeting with teams that are working with machine learning today, there is one point above everything else that I try to teach. It is the importance of storing and versioning of machine learning experiments and especially how many things there actually are that need to be stored.