My name is Ruksi, I'm a machine learning engineer at Valohai.
We at Valohai are building machine learning platform-as-a-service. Underneath this mouthful of a buzzword we are actually trying to solve a real world problem I've seen being tackled over and over in dozens of big and small organizations applying or researching machine learning.
The chain of events that leads to the problem we are solving usually starts with one data scientist trying out machine learning to solve an unsolvable or suboptimally solved problem. Experiments are being run locally on a laptop and experiment details are stored in an Excel document.
Then suddenly the data scientist starts to get results; say, a model that works better than anyone imagined. It's hurried to production or used behind the scenes for the organization's own data analytics. More people are assigned to the machine learning team. Someone converts the Excel document to a Google Document for sharing results. Code is (hopefully!) uploaded to version control to reduce duplication. There will be meetings, a lot of meetings, to share knowledge.
After a while the original model becomes more complex; a need for multiple preceding steps related to training and even a few post-processing steps arises, more models are created and maybe a second machine learning team is established. Then the teams hit the wall of laptop computation performance. Collaborating with devops and sysops team, the data scientists get a few local nodes or a cloud-based cluster of worker nodes. They will get a pipeline of sorts running in a month, maybe two months, maybe six, but the pipeline will never have all the requested features and work resources end up being split between actual machine learning research and developing the infrastructure and other mundane devops/sysops tasks.
And this is how most companies utilizing machine learning in the world have their own pipelines and platforms for machine learning. Currently there simply isn't a service that provides a tool-agnostic and provider-independent machine learning platform to handle all the extra work related to machine learning.
This is where our Valohai platform comes in.
Our core focus areas are:
- Empowering Professionals: Let data scientists focus on what they’re good at; we’ll handle the rest, e.g. infrastructure, version control, deployments, record keeping, chaining multiple executions and sharing results and code. We are not simplifying or dumbing down machine learning with drag-and-drop interfaces; we are taking care of everything else.
- Reducing Manual Labor: Experiment details, results, notes and comments are accessible in real-time through a web browser and API by you and project collaborators. Rerunning experiments and or automating hyperparameter optimizations can be triggered by providing only a few configuration values.
- Keeping It Cost-Efficient: Get started today with preconfigured pay-what-you-use cloud providers and effortlessly add cost-efficient local nodes to your organization account when you want to.
Here are extracted feature categories from our product roadmap; these are the feature sets that we are currently working on:
- Each experiment, their parameters and their input files are recorded in such detail that it can be reproduced by anybody that has access to the system.
- All executed code is version controlled and can be shared between data scientists working within the same organization.
- All input files are stored and downloadable through the web interface and API but we also support fetching inputs from external sources such as AWS S3.
- Data scientists don't have to worry about the actual networking or hardware infrastructure.
- We maintain multiple automatically scaled clusters of worker nodes backed by AWS or GCP.
- We provide an installable agent that enables your custom hardware computational nodes to be linked to your account.
Shared Record Keeping
- Everything that an experiment produces is recorded, accessible and searchable through the web browser; logs, errors, metadata such as loss per step, and naturally output files such as model weights.
- No need for manual record keeping and syncing of results.
- Allows automatic sharing of results and notes between data scientists inside the same team or organization.
- Data scientists have full control of the execution of experiments; when, where and how to start, stop or retry an experiment.
- Experiment metadata and outputs are visible in real-time, visualized and comparable between other executions to support exploratory research.
- Optimizing parameters to achieve minimum or maximum output metadata values can be automated with various hyperparameter optimization algorithms.
- Runtime environments utilize GPU-enabled Docker containers so running any Linux environment, programming language or machine learning library is possible; even clustered approaches.
- Instead of repeatedly downloading and installing your experiment dependencies before each experiment, we provide a Docker image building pipeline so the setup happens only once.
- We have an execution pipelining system if your tools require a more complex set of interdependent or chronological steps for e.g. cleaning data, feature extraction, labeling, training, or generating filler mock data and validation.
We've started onboarding subscribers to our private beta to try out the platform.
If you want to have a chance to give the platform a try, apply to our closed beta through valohai.com