Valohai blog

Insights from the deep learning industry.

All Posts

PocketFlow with Valohai

PocketFlow is an open-source framework from Tencent to automatically compress and optimize deep learning models. Especially edge devices such as mobile phones or IoT devices can be very limited on computing resources so sacrificing a bit of model performance for a much smaller memory footprint and lower computational requirements is a smart tradeoff.

PocketFlow offers a toolkit to improve or retain inference efficiency in a compressed model with little or no performance degradation. With the desired compression and/or acceleration ratios it will automatically choose proper hyperparameters to generate a highly efficient compressed model for deployment.

In this article we show an example of a ResNet-20 model pre-trained with the CIFAR-10 dataset and compress it with PocketFlow on the Valohai platform.

Getting Started

Start by signing up as Valohai user if you haven’t already. After signing in, create a new project and link it with example repository using the settings/repository tab.


All the settings and parameters are set to very light defaults for quick test run, so one can safely test the execution as is.



Valohai downloads the necessary input data for you based on the configuration file. In this case it is the pre-trained model and the dataset it was trained with.

For this example we are using ResNet-20 model and CIFAR-10 dataset.

Evaluating the Uncompressed Model

Before using PocketFlow to compress a model, we need to measure the model’s current performance so we can compare it to the compressed performance afterwards.

Our baseline uncompressed accuracy comes out as approximately 92%.


PocketFlow comes with three categories of compression algorithms.

Channel pruning is an algorithm that trims entire channels of a convolutional neural network (CNN), based on their impact for the performance of the model.

Weight sparsification is the same thing conducted at a lower level. Instead of pruning entire channels, it trims out individual weights.

Weight quantization aims to figure out which high accuracy (for example 32-bit) weights can be replaced with lower accuracy (for example 8-bit) counterparts, without sacrificing model performance.

In this example we will execute a specific channel pruning algorithm called discrimination-aware channel pruning (Zhuang et al., 2018).

More details about the algorithm:


For our execution, we will use these parameters:

batch_size is how many images to run through per batch.

dcp_nb_iters_block and dcp_nb_iters_layer tell the algorithm how many iterations to use per block and layer when figuring out which channels to prune.

dcp_prune_ratio is how much to compress. In this case it is 0.33, which means compressed model will be ⅓ the size of the original.

np_epochs_rat is the ratio of how much we re-train the model after the compression. Higher value simply trains longer.

samples is how many images we use for final evaluation.

Evaluating Compressed Model


After the compression, the model still needs to be retrained and finally we can measure the actual compressed performance.

From the graph above, we see that compressing the model down to ⅓ of its original size had only a small effect on it’s performance.

Final accuracy was 91.3%, which is only a 0.7% drop from the original.


Juha Kiili
Juha Kiili
Senior Software Developer with gaming industry background shape-shifted into full-stack ninja. I have the biggest monitor.

Related Posts

TensorBoard + Valohai Tutorial

One of the core design paradigms of Valohai is technology agnosticism. Building on top of the file system and in our case Docker means that we support running very different kinds of applications, scripts, languages and frameworks on top of Valohai. This means most systems are Valohai-ready because of these common abstractions. The same is true for TensorBoard as well.

Machine Learning at NVIDIA GTC 2019

Last week we had the pleasure of joining our partner SwiftStack at our joint booth at the NVIDIA GTC 2019 conference in San Jose. GTC touts itself as the premier AI conference and it sure was.

Build vs. Buy – A Scalable Machine Learning Infrastructure

In this blog post we’ll look at which parts a machine learning platform consists of and compare building your own infrastructure from scratch to buying a ready-made service that does everything for you.