Vanilla Neural Networks

Ah finally, digital brains. Such have been conceptualized decades ago, but recent advancements in neural network architecture and algorithms, it has become increasingly interesting. Especially advancements in its usage for image classification are very interesting, where they give neural networks a pre-defined structure with feature detection, similar to how our brain does this; implementing multiple stages like the parts of our visual cortex.

Note we are not going to use any of the great ‘Big-3’ frameworks like Torch, Caffe or Tensorflow. Those are total overkill if you want to start learning about the basics. We’re going to be using C++ and a light FANN library for neural networks.

PRE-REQUISITES OF THIS POST:
You can read
You can infer
You can deduce

A priori reading:

This post is a combination of (in this order)
A book on NN: http://neuralnetworksanddeeplearning.com/chap1.html
The MNIST dataset: http://yann.lecun.com/exdb/mnist/
The FANN library: http://leenissen.dk/fann/html/files2/gettingstarted-txt.html

I assume that you are able to scan those through in the above order, to see what this is about. The book on NN is highly recommended.

A neural networks consists out of many of these:
One Simplified Neuron IO

A neural network can be used to describe every digital or mathematical operation, given enough neurons and enough layers. This is very important to keep in mind. I’m totally ignoring the difference between sigmoids and perceptrons here, i’ll keep it practical. However, for humans this is hard to work with, since we’d like to keep things abstract and say “4+5*1=9” instead of messing around with bits and adders and biases and weights by using neurons to solve a problem. It’s just not very straightforward, and this is not interesting for a neural network. What’s interesting is a case where you have a lot of inputs, and know the outputs. Then you let the computer calculate what needs to happen in between, as a transfer function, to get from the input to the output.

Minimal Example

I have known inputs and outputs, and i let the computer calculate what needs to happen in between.
We use 2 inputs (i1 and i2) and 1 output (o).
when i1=1 and i2=1, o=0
when i1=0 and i2=1, o=1
when i1=1 and i2=0, o=1
when i1=0 and i2=0, o=0

Essentially.. This is an ‘XOR’ function! Lets see if we can teach the computer this, without knowing shit about bits and bytes.
We use ‘FANN’, so install that. I assume you can wield a compiler (easy).

Data file

The first line of the data file is like ‘[datapoints] [numinputs] [numoutputs]’. Then every line below it is [i1] [i2]\n[o]\n ..etc.

xor.data

4 2 1
-1 -1
-1
-1 1
1
1 -1
1
1 1
-1

Main code & training

Compile with:

g++ main.cpp -o main -lfann && ./main
#include "fann.h"
 
int main()
{
    const unsigned int num_input = 2;
    const unsigned int num_output = 1;
    const unsigned int num_layers = 3;
    const unsigned int num_neurons_hidden = 3;
    const float desired_error = (const float) 0.001;
    const unsigned int max_epochs = 500000;
    const unsigned int epochs_between_reports = 1000;
 
    struct fann *ann = fann_create_standard(num_layers, num_input, num_neurons_hidden, num_output);
 
    fann_set_activation_function_hidden(ann, FANN_SIGMOID_SYMMETRIC);
    fann_set_activation_function_output(ann, FANN_SIGMOID_SYMMETRIC);
 
    fann_train_on_file(ann, "xor.data", max_epochs, epochs_between_reports, desired_error);
 
    fann_save(ann, "xor_float.net");
 
    fann_destroy(ann);
 
    return 0;
}

Now it will calculate the solution, and it will probably find a unique and correct solution. You can test it yourself with new input data using a execution function.

Testing

Compile with:

g++ execute.cpp -o execute -lfann && ./execute

In the code below, i define my inputs and outputs that need to be tested in line. You’ll probably see this works.

#include <stdio.h>
#include "floatfann.h"
 
int main()
{
    fann_type *calc_out;
    fann_type input[2];
 
    struct fann *ann = fann_create_from_file("xor_float.net");
 
    input[0] = -1;
    input[1] = 1;
    calc_out = fann_run(ann, input);
 
    printf("xor test (%f,%f) -> %f\n", input[0], input[1], calc_out[0]);
 
    fann_destroy(ann);
    return 0;
}

Homebrew OCR Neural Network!

The previous example was just too boring. But proves our case. Not lets just blow ourselves out of the water by jumping into a utterly non trivial case: OCR. Lets use the MNIST database for this. Download and unpack all data files from here: http://yann.lecun.com/exdb/mnist/.

The MNIST database consists out of 10-thousands of hand written digits (0-10) in a 28×28 pixel image.

Now lets use the FANN framework again to try and solve this. Note that our inputs are 28×28=784, and we have.. yes, 10 outputs, whos indices (0-10) stand for their respective numbers (‘zero’,..,’nine’). Note that suddenly we have multiple outputs. Interesting! This means that one not only needs to define what something is but also what it’s not! Hence, if an input is a ‘two’, your output array is defined as [-1,-1,1,-1,-1,-1,-1,-1,-1,-1]. This is where your philosophical and psychological insight should kick in.. knowing what something is, is also knowing what it isn’t.
For example, you will now become aware that the computer is about to learn what a ‘3’ is, but does it really know? For example, if i draw a very small ‘3’, would it still know what a ‘3’ is? Probably not if you didnt teach him about scale and size! I mean, a point on the ‘i’ is similar to a period ‘.’, it’s only difference is defined by location! Hence, location is important. Shit, things are getting complex.
This is why the MNIST database has all these characters center weighted and same-sized and cropped in a 28×28 window. Ideal for us noobs. Now lets Teach our computer shit with FANN.

Training

#include "fann.h"
#include <iostream>
#include <vector>
#include <fstream>     
 
// clang++ main.cpp -o main -lfann -I/opt/local/include/ -L/opt/local/lib/ && ./main
 
using namespace std;
int ReverseInt (int i){
	unsigned char ch1, ch2, ch3, ch4;
	ch1=i&255;
	ch2=(i>>8)&255;
	ch3=(i>>16)&255;
	ch4=(i>>24)&255;
	return((int)ch1<<24)+((int)ch2<<16)+((int)ch3<<8)+ch4;
}
int n_rows;
int n_cols;
 
void ReadMNIST(int NumberOfImages, int DataOfAnImage, vector<vector<int> > &arr, vector<int> &arr_labels){
	//Images
	arr.resize(NumberOfImages, vector<int>(DataOfAnImage));
 
	ifstream file ("t10k-images-idx3-ubyte", ios::binary);
	ifstream file_labels ("t10k-labels-idx1-ubyte", ios::binary);
 
	//ifstream file ("train-images-idx3-ubyte", ios::binary);
	//ifstream file_labels ("train-labels-idx1-ubyte", ios::binary);
 
	if (file.is_open()) {
	    int magic_number=0;
	    int number_of_images=0;
	    n_rows=0;
	    n_cols=0;
	    file.read((char*)&magic_number,sizeof(magic_number));
	    magic_number= ReverseInt(magic_number);
	    file.read((char*)&number_of_images,sizeof(number_of_images));
	    number_of_images= ReverseInt(number_of_images);
	    file.read((char*)&n_rows,sizeof(n_rows));
	    n_rows= ReverseInt(n_rows);
	    file.read((char*)&n_cols,sizeof(n_cols));
	    n_cols= ReverseInt(n_cols);
	    for(int i=0;i<number_of_images;++i){
	        for(int r=0;r<n_rows;++r){
	            for(int c=0;c<n_cols;++c){
	                unsigned char temp=0;
	                file.read((char*)&temp,sizeof(temp));
	                arr[i][(n_rows*r)+c]= (int)temp;
	            }
	        }
	    }
	} else {
		cout << "file (images) read error" << endl;
	}
 
	if (file_labels.is_open()){
	    //Labels
		arr_labels.resize(NumberOfImages);
	    int label=0;
		int magic_number=0;
		int number_of_images=0;
		n_rows=0;
		n_cols=0;
		file_labels.read((char*)&magic_number,sizeof(magic_number));
		magic_number= ReverseInt(magic_number);
		file_labels.read((char*)&number_of_images,sizeof(number_of_images));
		number_of_images= ReverseInt(number_of_images);
 
	    for(int i=0;i<number_of_images;++i){
 
            unsigned char temp=0;
            file_labels.read((char*)&temp, sizeof(temp));
            arr_labels[i] = (int)temp;
 
	    }
 
	} else {
		cout << "file (labels) read error" << endl;
	}
 
 
 
}
 
 
 
int main(){
	/*
    int k=0;
    for (int x=-100;x<100;x++){
        for (int y=-100;y<100;y++){
            if (y==x) continue;
            printf("%d %d\n%d\n", x,y, x>y ? -1 : 1);
            k++;
        }
    }
 
    printf("k=%d\n", k);*/
 
 
    const unsigned int num_input = 28*28; //784
    const unsigned int num_output = 10;
    const unsigned int num_layers = 3;
    const unsigned int num_neurons_hidden1 = 30;
    //const unsigned int num_neurons_hidden2 = 20;
    //const unsigned int num_neurons_hidden3 = 20;
    const float desired_error = (const float) 0.001;
    const unsigned int max_epochs = 10000;
    const unsigned int epochs_between_reports = 10;
 
	int numsets = 10000;
 
    vector< vector<int> > ar;
    vector<int> ar_labels;
    ReadMNIST(numsets, num_input, ar, ar_labels);
 
 
 
 
	FILE *fp;
	fp=fopen("handwriting.data", "w");
	fprintf(fp, "%d %d %d\n", numsets, num_input, num_output );
 
	bool printstd=true;
 
    for (int k=0; k<numsets; k++){
    	//if (k<3){ printstd=false;}
    	if (printstd) { cout << " === NUM = " << ar_labels[k] << " ===" << endl; }
	    for (int i=0;i<ar[k].size();i++){
	    	fprintf(fp, "%d ", (int)ar[k][i] );
 
	    	if (printstd){
		    	printf("%3d ", (int)ar[k][i]);
		    	if (i%28==0 && i>1) cout <<  endl;
	    	}
	    }
	    int numnow = ar_labels[k];
	    fprintf(fp, "\n" );
	    for (int i=0;i<10;i++){
			fprintf(fp, "%d ", i==numnow ? 1 : -1 );
	    }
	    fprintf(fp, "\n" );
	    if (printstd){ cout << endl; }
 
	}
 
	fclose(fp);
 
 
 
 
 
 
 
    struct fann *ann = fann_create_standard(num_layers, num_input, num_neurons_hidden1, num_output);
 
 
    fann_set_activation_function_hidden(ann, FANN_SIGMOID_SYMMETRIC); //Symmetric means it's between -1 and 1
    fann_set_activation_function_output(ann, FANN_SIGMOID_SYMMETRIC);
 
    fann_train_on_file(ann, "handwriting.data", max_epochs, epochs_between_reports, desired_error);
    //int max_neurons=200;
    //int neurons_between_reports=1;
    //fann_cascadetrain_on_file(ann, "handwriting.data", max_neurons, neurons_between_reports, desired_error);
 
    fann_save(ann, "xor_float.net");
 
    fann_destroy(ann);
 
    return 0;
}

Testing with your own images

Lets test with our own image, ‘sample_digit.png’, which is a black and white image of 28×28 pixels:
sample_digit

Because we use our own image, we use OpenCV to load it. Compile with:

g++ execute.cpp -o execute -lfann -I/opt/local/include/ -L/opt/local/lib/ `pkg-config --cflags --libs opencv` && ./execute

Note i include /opt/ paths as i an on Mac and use port.

#include <stdio.h>
#include "floatfann.h"
#include <opencv2/opencv.hpp>
 
//clang++ execute.cpp -o execute -lfann -I/opt/local/include/ -L/opt/local/lib/ `pkg-config --cflags --libs opencv` && ./execute
 
using namespace cv;
using namespace std;
 
int main()
{
    fann_type *calc_out;
 
 
    struct fann *ann = fann_create_from_file("xor_float.net");
 
    cout << endl;
    cout << "reading image" << endl;
    string filename = "sample_digit.png";
    Mat matInput = 255-imread(filename, CV_LOAD_IMAGE_GRAYSCALE);
 
    cout << "image size=" << matInput.cols << "x" << matInput.rows << endl;
 
    fann_type input[matInput.cols * matInput.rows];
 
    int i=0;
    for (int x=0;x<matInput.cols; x++){
        for (int y=0;y<matInput.rows;y++){
            int val = (int)matInput.at<unsigned char>(x,y);
            input[i]=val;
            i++;
            printf("%3d ", val);
            //if (i%28==0) cout <<  endl;
        }
        cout << endl;
    }
    cout << endl;
 
    calc_out = fann_run(ann, input);
 
    float max=-1;
    int maxid=-1;
    std::string strCalcs = "";
    for (int i=0; i<10; i++){
        strCalcs += std::to_string(calc_out[i]) + " ";
        //cout << calc_out[i]  << endl;
        if (calc_out[i] > max){
           // cout << " HAVE MAX" << endl;
            max = calc_out[i];
            maxid = i;
        }
    }
 
    cout << "probably number:" << maxid << " " <<  max*100 << "\% confident" <<  endl;
 
    printf("recognition test (%s) -> %s\n", filename.c_str(), strCalcs.c_str());
 
 
    /*
    float num1=100;
    float num2=num1;
 
    input[0] = num1;
    input[1] = num2;
    calc_out = fann_run(ann, input);
    printf("xor test (%f,%f) -> %f\n", input[0], input[1], calc_out[0]);
 
    input[0] = num1;
    input[1] = num1+1;
    calc_out = fann_run(ann, input);
    printf("xor test (%f,%f) -> %f\n", input[0], input[1], calc_out[0]);
 
    input[0] = num1+1;
    input[1] = num1;
    calc_out = fann_run(ann, input);
    printf("xor test (%f,%f) -> %f\n", input[0], input[1], calc_out[0]);
    */
 
    fann_destroy(ann);
    return 0;
}

You’re on your own now. I managed to get a 80% rate with FANN and 30 outputs and this dataset. By using the Caffe framework (easier and more advanced) i managed to get 99%. That’s the next step.

Some results:
Screenshot 1
Screenshot 2

Tim Zaman

MSc Biorobotics. Specialization in computer vision and deep learning. Works at NVIDIA.

You may also like...