JeVois tutorial: Recognize digit with tiny-dnn

This post is going to show you how to build a JeVois (1.3) Module that recognizes digits with tiny-dnn (JeVois-patched version).

In this tutorial, you will learn:

how to create a new C++ module for JeVois from scratch
how to include external libraries into your project
how to write components
basic usage of tiny-dnn

0> Theory

A “module” in JeVois is like a “program” in common projects. The core of a module is the process function:

virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override

, which takes the input frame (to read) and an empty output frame (to write) as inputs.

The JeVois will choose certain module to load when boosted based on the configuration file. On your PC, if you have installed the JeVois, you should find a file named /jevois/config/videomappings.cfg, where you can modify the mappings. To launch certain mapping, you can look up the mapping index by:

Launch JeVois by jevois-daemon
Type listmappings and enter
Remember the ID you want to launch
Exit JeVois-daemon
Execute jevois-daemon --videomapping=xx in the command line

I usually comment every modules except the one I want to debug to make this process easier. Another thing to explain is that the “USB mode/width/height/fps” in the mapping items simply means the output mode/width/height/fps. This is because the JeVois uses the USB to output the result.

So what we are going to do is

Write and compile a module
Write and compile components
Compile and install it to PC and add the mapping to the configuration file to simulate
Cross-compile and burn it on JeVois

1> Setup files

Create an empty module:

> jevois-create-module Tutorial Mnist

Compile it and setup the mapping:

> cd mnist
> ./rebuild-host.sh
> sudo vim /jevois/config/videomappings.cfg # if you are not familiar with vim, try sudo gedit /jevois/config/videomappings.cfg

Usually, I just comment anything during development. Add a new line to the end:

YUYV 560 240 30.0 YUYV 320 240 30.0 Tutorial Mnist

This means our expected output is 560 in width, 240 in height, with 30 fps in YUYV mode, and our input is 320 in width, 240 in height, with 30 fps in YUYV mode. The last two is vender’s name and module’s name. Our output consists two parts: one is the original camera input and the next one is the binarized image.

Now save the file and you can launch the basic module by running

> jevois-daemon

You should see a basic module just displaying the original image.

We now can include the tiny-dnn into our file. Before that you have to compile JeVoisBase first because we are going to directly copy the patched version of tiny-dnn.

> mkdir Contrib
> cp -r /path/to/jevoisbase/Contrib/tiny-dnn/ Contrib/tiny-dnn

Add

include_directories(Contrib)
include_directories(Contrib/tiny-dnn)

to the CMakeLists.txt (before the line jevois_project_finalize()) so that it can be included during compiling.
Next, you can include the tiny-dnn in the network by adding

#include "tiny-dnn/tiny_dnn/tiny_dnn.h"

to src/Modules/Mnist. Your can to rebuild the program again since the CMakeList.txt is changed. You might get a lot of unused warnings for compiling tiny-dnn and to avoid that you can add

set(CMAKE_CXX_FLAGS  "${CMAKE_CXX_FLAGS} -Wno-unused")

to the CMakeList.txt to suppress these warnings.

Basically, the generated project contains only one module. But you can include multiple modules into the same project, like jevoisbase does. By doing that, we can reuse the components in multiple modules. But let’s implement only one module first. To add components, we have to create some folders and files:

> mkdir -p src/Components/MNISTnn
> touch src/Components/MNISTnn/MNISTnn.C
> mkdir -p include/jevoisnn/Components/MNISTnn
> touch  include/jevoisnn/Components/MNISTnn/MNISTnn.H
> mkdir -p src/Components/NN
> touch src/Components/NN/NN.C
> mkdir -p include/jevoisnn/Components/NN
> touch  include/jevoisnn/Components/NN/NN.C

Here we defined two components NN and MNISTnn. NN handles APIs provided by tiny-dnn and exposes simple APIs. MNISTnn inherits from NN and specifies the network structure, the pretraining procedure, preprocessing procedure and so on. You can easily extend NN to build other networks.

All components will be compiled into a single library file, which will be linked with modules’ object file to generate the final binary files. We have to change these lines in the CMakeList.txt:

# change line `project(mnist)` to
project(jevoisnn) 
# add this line
jevois_setup_library(src/Components jevoisnn 1.0) 
# change line `jevois_setup_modules(src/Modules "")` to
jevois_setup_modules(src/Modules jevoisnn)
# change line `target_link_libraries( Mnist ...` to
target_link_libraries(jevoisnn ${JEVOIS_OPENCV_LIBS} opencv_imgproc opencv_core) 
# add this line
include_directories(include)

Now try to compile again, you might see that all components are compiled and merged into a single library file named libjevoisnn.so.1.0

Scanning dependencies of target modinfo_Mnist
Scanning dependencies of target jevoisnn
[ 16%] Built target modinfo_Mnist
[ 50%] Building CXX object CMakeFiles/jevoisnn.dir/src/Components/MNISTnn/MNISTnn.C.o
[ 50%] Building CXX object CMakeFiles/jevoisnn.dir/src/Components/NN/NN.C.o
Scanning dependencies of target Mnist
[ 66%] Building CXX object CMakeFiles/Mnist.dir/src/Modules/Mnist/Mnist.C.o
[ 83%] Linking CXX shared library libjevoisnn.so
[ 83%] Built target jevoisnn
[100%] Linking CXX shared library Mnist.so
[100%] Built target Mnist
[ 16%] Built target modinfo_Mnist
[ 50%] Built target Mnist
[100%] Built target jevoisnn
Install the project...
-- Install configuration: ""
-- Up-to-date: /jevois/modules/Tutorial/Mnist
-- Up-to-date: /jevois/modules/Tutorial/Mnist/screenshot1.png
-- Up-to-date: /jevois/modules/Tutorial/Mnist/postinstall
-- Up-to-date: /jevois/modules/Tutorial/Mnist/icon.png
-- Installing: /jevois/modules/Tutorial/Mnist/Mnist.so
-- Installing: /usr/lib/libjevoisnn.so.1.0
-- Installing: /usr/lib/libjevoisnn.so

2> Double Displays

The reason why we set the output to 560*240 rather than 320*240 (i.e. the same as the input) is that we want to show the original image (320*240) as well as the binarized digit (240*240 resized from 32*32). Like this:

However, if you run jevois-daemon, the default program will fail:

It says:

ERR Log::warnAndIgnoreException: Caught std::exception [FTL RawImage::require: Incorrect format for RawImage output: want 320x240 YUYV but image is 560x240 YUYV]

Because in the src/Modules/Mnist/Mnist.C line 77, we assert that the output has the same shape with the input:

77       outimg.require("output", inimg.width, inimg.height, inimg.fmt);

Now lets rewrite the process function to enable a double displays mode.

First let’s add the timer to calculate FPS.

      static jevois::Timer timer("processing", 60, LOG_DEBUG); 
      timer.start();
      
      //...


      std::string const & fpscpu = timer.stop();
      jevois::rawimage::writeText(outimg, fpscpu, 3, outimg.height - 13, jevois::yuyv::White);

And keep the synchornizing commands at the ends:

      inframe.done();
      outframe.send();

In the middle part, first we take the input and the output frame and assert the final shape and format:

      jevois::RawImage const inimg = inframe.get(true);
      jevois::RawImage outimg = outframe.get();
      double w = inimg.width, h = inimg.height;

      inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
      outimg.require("output", w+h, h, inimg.fmt);

We then can process the original image and generated an input for the network. In MNIST, we should have a binarized image, which only contains 0 and 1.

      cv::Mat grayimg, digit, bindigit, bigdigit;
      // YUYV to gray
      grayimg = jevois::rawimage::convertToCvGray(inimg);
      // resize to 32*32 
      cv::resize(grayimg, digit, cv::Size(32, 32));
      // threshold the image to generated a binarized image
      threshold(digit, bindigit, 80, 255, cv::THRESH_BINARY_INV);
      // resize the binarized image to h*h for display 
      cv::resize(bindigit, bigdigit, cv::Size(h, h), cv::INTER_NEAREST);

Finally, display it on the screen. memcpy is not recommended because it is too primitive and hard to use. Here we use jevois::rawimage::paste instand.

      jevois::rawimage::paste(inimg, outimg, 0, 0);
      jevois::rawimage::pasteGreyToYUYV(bigdigit, outimg, w, 0);

Try to compile it again and run jevois-daemon, you should see:

3> Component NN (NN.C and NN.H)

We will implement a shortened version of jevoisbase/components/ObjectRecognition. This module is an abstract class for other networks. Most of the credit should be given to jevoisbase (License is omitted). The code is very self-explanatory.

// File NN.H

#pragma once

#include <jevois/Component/Component.H>
#include <stdarg.h> // needed by tiny_dnn

// Defines used to optimize tiny-dnn:
#define CNN_USE_TBB
#undef CNN_USE_DOUBLE

#include <tiny-dnn/tiny_dnn/config.h> // for float_t, etc. this does not include much code
#include <tiny-dnn/tiny_dnn/util/aligned_allocator.h> // for aligned_allocator
#include <opencv2/core/core.hpp>

namespace tiny_dnn { template <typename NetType> class network; }

// Abstract base class for Neural network 
template <typename NetType>
class NNBase : public jevois::Component
{
  public:
    // Type used by tiny-dnn for the results:
    typedef std::vector<tiny_dnn::float_t, tiny_dnn::aligned_allocator<tiny_dnn::float_t, 64> > vec_t;

    // Constructor
    NNBase(std::string const & instance);
    virtual ~NNBase();

    virtual void define() = 0;
    virtual void pretrain(std::string const & path) = 0;
    virtual vec_t predict(cv::Mat const & img, bool normalize = true);

  protected:
    virtual void postInit() override; // a function that will be called after constructed
    tiny_dnn::network<NetType> * net; 
};

// File NN.C

#include <jevoisnn/Components/NN/NN.H>
#include <jevois/Debug/Log.H>
#include <fstream>

#include "tiny-dnn/tiny_dnn/tiny_dnn.h"

template <typename NetType>
NNBase<NetType>::NNBase(std::string const & instance) :
    jevois::Component(instance), net(new tiny_dnn::network<NetType>())
{ }

template <typename NetType>
NNBase<NetType>::~NNBase()
{ delete net; }

template <typename NetType>
void NNBase<NetType>::postInit()
{
  // Load from file, if available, otherwise trigger training:
  std::string const wpath = absolutePath("tiny-dnn/" + instanceName() + "/weights.tnn");

  try
  {
    net->load(wpath);
    LINFO("Loaded pre-trained weights from " << wpath);
  }
  catch (...)
  {
    LINFO("Could not load pre-trained weights from " << wpath);
    this->define();
    this->pretrain(absolutePath("tiny-dnn/" + instanceName()));

    LINFO("Saving trained weights to /tmp/weights.tnn");
    net->save("/tmp/weights.tnn"); // works without sudo 
    LINFO("Weights saved. Network ready to work.");
  }
}


template <typename NetType>
typename NNBase<NetType>::vec_t // type of vec_t
  NNBase<NetType>::predict(cv::Mat const & img, bool normalize)
{
  auto inshape = (*net)[0]->in_shape()[0];

  if (img.cols != int(inshape.width_) ||
      img.rows != int(inshape.height_) ||
      img.channels() != int(inshape.depth_)) LFATAL("Incorrect input image size or format");
  
  // Convert input image to vec_t with values in [-1..1]:
  size_t const sz = inshape.size();
  tiny_dnn::vec_t data(sz);
  unsigned char const * in = img.data; tiny_dnn::float_t * out = &data[0];
  for (size_t i = 0; i < sz; ++i) *out++ = (*in++) * (2.0F / 255.0F) - 1.0F;

  // Normalized score:
  if (normalize)
  {
    auto scores = net->predict(data);

    // Normalize activation values between 0...100:
    tiny_dnn::layer * lastlayer = (*net)[net->depth() - 1];
    std::pair<tiny_dnn::float_t, tiny_dnn::float_t> outrange = lastlayer->out_value_range();
    tiny_dnn::float_t const mi = outrange.first;
    tiny_dnn::float_t const ma = outrange.second;

    for (tiny_dnn::float_t & s : scores) s = tiny_dnn::float_t(100) * (s - mi) / (ma - mi);
    return scores;
  }
  else
    return net->predict(data);
}

template class NNBase<tiny_dnn::sequential>;

4> Component MNISTnn (MNISTnn.H and MNISTnn.C)

Still, most of the credit goes to jevoisbase.

#pragma once

#include <jevoisnn/Components/NN/NN.H>
#include "tiny-dnn/tiny_dnn/nodes.h"

class MNISTnn : public NNBase<tiny_dnn::sequential>
{
  public:
    //! Constructor, loads the given CNN, its sizes must match our (fixed) internal network structure
    /*! All network data is assumed to be in the module's path plus "tiny-dnn/<instance>". In there, we will look for
        weights.tnn, and if not found, we will train the network using data in that path and then save weights.tnn. */
    MNISTnn(std::string const & instance);
    virtual ~MNISTnn();

    virtual void define() override;
    virtual void pretrain(std::string const & path) override;
};

// MNISTnn.C

#include <jevoisnn/Components/MNISTnn/MNISTnn.H>
#include <jevois/Debug/Log.H>
#include "tiny-dnn/tiny_dnn/tiny_dnn.h"

MNISTnn::MNISTnn(std::string const & instance) :
    NNBase<tiny_dnn::sequential>(instance) { }

MNISTnn::~MNISTnn() { }

void MNISTnn::define()
{
  // LeNet for MNIST handwritten digit recognition: 32x32 in, 10 classes out:
#define O true
#define X false
  static bool const tbl[] = {
    O, X, X, X, O, O, O, X, X, O, O, O, O, X, O, O,
    O, O, X, X, X, O, O, O, X, X, O, O, O, O, X, O,
    O, O, O, X, X, X, O, O, O, X, X, O, X, O, O, O,
    X, O, O, O, X, X, O, O, O, O, X, X, O, X, O, O,
    X, X, O, O, O, X, X, O, O, O, O, X, O, O, X, O,
    X, X, X, O, O, O, X, X, O, O, O, O, X, O, O, O
  };
#undef O
#undef X
  // by default will use backend_t::tiny_dnn unless you compiled
  // with -DUSE_AVX=ON and your device supports AVX intrinsics
  tiny_dnn::core::backend_t backend_type = tiny_dnn::core::default_engine();
    
  // Construct network:
  (*net) << tiny_dnn::convolutional_layer<tiny_dnn::activation::tan_h>
    (32, 32, 5, 1, 6, tiny_dnn::padding::valid, true, 1, 1, backend_type) // C1, 1@32x32-in, 6@28x28-out
    
         << tiny_dnn::average_pooling_layer<tiny_dnn::activation::tan_h>
    (28, 28, 6, 2) // S2, 6@28x28-in, 6@14x14-out
    
         << tiny_dnn::convolutional_layer<tiny_dnn::activation::tan_h>
    (14, 14, 5, 6, 16, tiny_dnn::core::connection_table(tbl, 6, 16),
     tiny_dnn::padding::valid, true, 1, 1, backend_type ) // C3, 6@14x14-in, 16@10x10-in

         << tiny_dnn::average_pooling_layer<tiny_dnn::activation::tan_h>
    (10, 10, 16, 2) // S4, 16@10x10-in, 16@5x5-out
    
         << tiny_dnn::convolutional_layer<tiny_dnn::activation::tan_h>
    (5, 5, 5, 16, 120, tiny_dnn::padding::valid, true, 1, 1, backend_type) // C5, 16@5x5-in, 120@1x1-out

         << tiny_dnn::fully_connected_layer<tiny_dnn::activation::tan_h>
    (120, 10, true, backend_type); // F6, 120-in, 10-out
}

void MNISTnn::pretrain(std::string const & path)
{
  LINFO("Load training data from directory " << path);

  // Load MNIST dataset:
  std::vector<tiny_dnn::label_t> train_labels, test_labels;
  std::vector<tiny_dnn::vec_t> train_images, test_images;
  LINFO("Load training labels...");
  tiny_dnn::parse_mnist_labels(std::string(path) + "/train-labels.idx1-ubyte", &train_labels);
  LINFO("Load training images...");
  tiny_dnn::parse_mnist_images(std::string(path) + "/train-images.idx3-ubyte", &train_images, -1.0, 1.0, 2, 2);
  LINFO("Load test labels...");
  tiny_dnn::parse_mnist_labels(std::string(path) + "/t10k-labels.idx1-ubyte", &test_labels);
  LINFO("Load test images...");
  tiny_dnn::parse_mnist_images(std::string(path) + "/t10k-images.idx3-ubyte", &test_images, -1.0, 1.0, 2, 2);
  
  LINFO("Start training...");
  int minibatch_size = 10;
  int num_epochs = 30;
  tiny_dnn::timer t;
  
  // Create callbacks:
  auto on_enumerate_epoch = [&](){
    LINFO(t.elapsed() << "s elapsed.");
    tiny_dnn::result res = net->test(test_images, test_labels);
    LINFO(res.num_success << "/" << res.num_total << " success/total validation score so far");
    t.restart();
  };

  auto on_enumerate_minibatch = [&](){ };

  // Training:
  tiny_dnn::adagrad optimizer;
  optimizer.alpha *= static_cast<tiny_dnn::float_t>(std::sqrt(minibatch_size));

  net->train<tiny_dnn::mse>(optimizer, train_images, train_labels, minibatch_size, num_epochs,
                            on_enumerate_minibatch, on_enumerate_epoch);

  LINFO("Training complete");

  // Test and show results:
  net->test(test_images, test_labels).print_detail(std::cout);
}

Notice that the training data can be found in Contrib/tiny-dnn/data/

5> Pretrain MNISTnn

First, we should define a smart pointer we use to refer the network in the definition of Class Mnist

  protected:
    std::shared_ptr<ObjectRecognitionMNIST> mnistdnn;

and override the constructor:

//  using jevois::Module::Module;
    Mnist(std::string const & instance): jevois::Module(instance)
    { 
      mnistdnn = addSubComponent<MNISTnn>("mnistdnn");
    }

Don’t forget to include the header

#include <jevoisnn/Components/MNISTnn/MNISTnn.H>

Rebuild and run jevois-daemon, you can see an error:

We can copy the data by

> sudo mkdir -p /jevois/modules/Tutorial/Mnist/tiny-dnn/mnistnn
> sudo cp -p Contrib/tiny-dnn/data/* /jevois/modules/Tutorial/Mnist/tiny-dnn/mnistnn

I reduced the number of epochs (num_epochs in MNISTnn.C) to 3 for demo. ~~Notice if the program wants to store weights.tnn, you have to run jevois-daemon with administrative privilege.~~ I modify the code so that the weights.tnn is saved to /tmp/weights.tnn. After pretraining we should manually copy it to the corresponding path to avoid duplicate pretraining.

Now try to boost that again you can see

ERR VideoMapping::videoMappingsFromStream: No default video mapping provided, using first one with UVC output
INF Engine::Engine: Loaded 1 vision processing modes.
INF Engine::onParamChange: Using [stdio] hardware (4-pin connector) serial port
INF Engine::onParamChange: No USB serial port used
INF Engine::postInit: Initalizing Python...
INF Engine::postInit: Starting camera device /dev/video0
INF Camera::Camera: [10] V4L2 camera /dev/video0 card WebCam SC-10HDD12636N bus usb-0000:00:1a.0-1.4
INF Engine::postInit: Using display for video output
INF Engine::setFormatInternal: OUT: YUYV 560x240 @ 30fps CAM: YUYV 320x240 @ 30fps MOD: Tutorial:Mnist
INF Camera::setFormat: Camera set video format to 320x240 YUYV
INF Engine::setFormatInternal: Instantiating dynamic loader for /jevois/modules/Tutorial/Mnist/Mnist.so
INF NN::postInit: Could not load pre-trained weights from /jevois/modules/Tutorial/Mnist/tiny-dnn/mnistnn/weights.tnn
INF MNISTnn::pretrain: Load training data from directory /jevois/modules/Tutorial/Mnist/tiny-dnn/mnistnn
INF MNISTnn::pretrain: Load training labels...
INF MNISTnn::pretrain: Load training images...
INF MNISTnn::pretrain: Load test labels...
INF MNISTnn::pretrain: Load test images...
INF MNISTnn::pretrain: Start training...
INF MNISTnn::operator(): 141.69s elapsed.
INF MNISTnn::operator(): 9644/10000 success/total validation score so far
INF MNISTnn::operator(): 150.455s elapsed.
INF MNISTnn::operator(): 9728/10000 success/total validation score so far
INF MNISTnn::pretrain: Training complete
accuracy:97.28% (9728/10000)
    *     0     1     2     3     4     5     6     7     8     9 
    0   969     0     4     0     1     2     6     2     6     4 
    1     0  1127     3     0     0     0     2     6     1     7 
    2     0     2  1007     2     1     0     1    15     0     0 
    3     0     3     1   985     0     7     1     1     3     6 
    4     2     0     5     0   948     2     2     2     5    11 
    5     2     1     0     8     0   869     3     0     8     1 
    6     4     1     1     0     7     6   942     0     2     2 
    7     1     0     6     7     0     0     0   970     3     5 
    8     1     1     5     5     2     2     1     1   941     3 
    9     1     0     0     3    23     4     0    31     5   970 
INF NN::postInit: Saving trained weights to /tmp/weights.tnn
INF NN::postInit: Weights saved. Network ready to work.
INF Engine::setFormatInternal: Module [Mnist] loaded, initialized, and ready.
INF Camera::streamOn: 27 buffers of 153600 bytes allocated
INF READY JEVOIS 1.3.2
Terminated

It might take quite a long time to pretrain. For me, it takes 150+ seconds per epoch. After that you can find a weight.tnn under /tmp/. Execute

> sudo cp /tmp/weights.tnn /jevois/modules/Tutorial/Mnist/tiny-dnn/mnistnn/

to copy the weight so that we don’t have to pretrain again. Try to boost jevois-daemon again, you will see a line

INF NN::postInit: Loaded pre-trained weights from /jevois/modules/Tutorial/Mnist/tiny-dnn/mnistnn/weights.tnn

indicating that the weight is found and loaded.

5> Recognize the Digit

After all the preparation, we not can implement the prediction part in the process function

      // prediction 
      auto res = mnistdnn->predict(bindigit, true);

That’s it. That’s all of it. Quite simple, isn’t it? Now we want to output the top-3 prediction:

      // sort & print top-3
      std::vector<std::pair<double, int> > scores;
      for (int i = 0; i < 10; i++) scores.emplace_back(res[i], i);
      std::sort(scores.begin(), scores.end(), std::greater<std::pair<double, int>>());

      std::ostringstream display;
      for (int i = 0; i < 3; ++i)
        display << "[" << scores[i].second << "]: " 
                << scores[i].first << " | ";
      jevois::rawimage::writeText(outimg, display.str(), 3, outimg.height - 26, jevois::yuyv::White);
      LINFO(display.str()); // in case you dont see it on the screen

Compile and run again, you can see the result:

, which is extremely frustrating. :P

There are many reasons that will cause your system work miserably: i) your handwriting style is different from the pretraining data, ii) the preprocessing procedure is wrong, iii) there are too many noisy in the back ground, iv) the number of training epochs is insufficient, and so on. It is very important to remember that achieving high accuracy in pretraining is still far away from applying it in real scenarios.

Actually, if you train the network long enough (>5 epochs) and the image is clean, your program should be able to give the correct label of the digit out of the top-3 prediction.

Possible problems

Problems when making modinfo files.
Sometimes rebuilding with `make` might fail.

Where to proceed from here:

Give a more fancy result visualization.
Build a detect-recognition system.
The argument the data by adding noise.
Use Spatial Transformer Network to align the image.
Add USE_NNPACK feature. (Done. No significant improvement on my laptop).
Rewrite the threshold as a parameter.
Develop a better threshold method to binarize the image.
Build a larger network for more complex tasks (e.g. facial recognition, object recognition)