Digit Recognition

Recently our student association has organised a Machine Learning Course at Collegio Ghislieri, more  detail can be found here, during this course a variety of  topics were touched. At the end of this course we have organised a competition, which consisted into writing the most accurate Neural Network  (NN) for digit recognition. One of the most interesting aspect of this competition was the fact the data set provided wasn’t the usual MNIST but it was a set of data from the date stamp over some juices box. Those data was kindly provided by  SeaVision a vision quality control company based in Pavia. In this article I’d like to explain how I tackled the problem for this competition (I arrived 2nd but there were only 8 competitors) and I will try to explain as well some technique adopted by the other competitors. All the code that I refer to in this article are provided in the Github repo dedicated to my blog, in particular in this folder.

I’ll not go in theoretical detail mainly because I don’t have any competence to do so and as well because the course and the competition were focused on a practical point of view, thankfully.

 Basic Multi-Layer Neural Network

The basic approach that was suggested us was to try using the same multi-layer network that we have developed to apprehend the logic XOR  mapping, since differently from what done for the AND and OR logic mapping a single perceptron would not work. The original xor script was implemented from scratch (using same help computing the gradient etc etc) in this script but I will show next how we implemented a more compact version using all functionality of  the Machine Learning Library Flux (we are here coding in Julia).

We first need to define the data set that we will use to train the NN, this comprehend the value that we’ll pass to the NN and the outcome we desire:

X = [0.0 0.0 1.0 1.0; 
     0.0 1.0 0.0 1.0]
Y = [0.0 1.0 1.0 0.0]
dataset = repeated((X, Y), 3000)

We know define the loss function, in this case we decided to go for the square loss (the one that is minimized during the least square problem), it is so defined:

\displaystyle L(\vec{x},\vec{y})=\sum_{i=1}^{n}\big(x_i-f(x_i)\big)^2

we define this loss function in Julia such follow:

L(x, y) = sum((y .- f(x)).^2)

we now define the structure of the NN and specify which kind of function we use to minimize the loss function, in this particular case we descent along the gradient of the loss function. At last we train the NN over the data set built above:

f = Chain(
  Dense(2, 3, σ),
  Dense(3, 1, sigmoid))
opt = Descent()
Flux.train!(L, params(f), dataset, opt)

If we wont to check the outcome of the NN we could use the following script:

function rdbinary(x,tol)
	if x<tol
		return 1;
		return 0;
println("Dataset Reults")
println("Data [0,0] prediction: ",rdbinary(f([0;0])[1],0.5))
println("Data [0,1] prediction: ",rdbinary(f([0;1])[1],0.5))
println("Data [1,0] prediction: ",rdbinary(f([1;0])[1],0.5))
println("Data [1,1] prediction: ",rdbinary(f([1;1])[1],0.5))

The change we made in the XOR model for the digit recognition can be resumed in the following points:

  1. We use a double layer network that normalize the data using a sigmoid and that ends in 10 neurons, one identifying each digit. Furthermore we require that the sum of the value of all 10 the ending neurons is equal to 1. In this way we can think about the value in the neurons as the probability that the digit is the one represented by that neuron, this a particular feature is called softmax.
  2. We choose as the loss function the cross entropy.
  3. We choose as descending method ADAptive Moment estimation) that is called ADAM

Full detail could be found in this script, in particular in the BuildNN function:

function BuildNN(it)
    # Reading training data.
    n, m, X, Y = ReadSEAVISION("sv.csv")
    YB=onehotbatch(Y,0:9) #Converting into a Boolean matrix.
    prediction = Chain(
      Dense(1491, 200,sigmoid),
      Dense(200,10),softmax) |> gpu
    # Loss function: crossentropy
    L(x, y) = crossentropy(prediction(x), y)
    dataset = repeated((X, YB),it)
    opt = ADAM(); #ADAM (Slide Regarding Multilayer)
    callback = () -> @show(L(X, YB))
    Flux.train!(L, Flux.params(prediction), dataset, opt,cb=callback) #Faccio il Training
    acc(x, y) = mean(onecold(prediction(x)) .== onecold(y)) #Accuracy Evaluation
    return  prediction

Clustering Using K-Means

K-means is a term that we use to describe a method to divide a set of data in to a fixed number n of classes. The basic idea behind this algorithm is to start a random set of centeroid: \vec{z_1},\dots,\vec{z_n}. We then assign to each point in the data a number that identify which is the nearest cenetroid in term of Euclidean distance, later on we will recompute the centeroid taking as new centeroid the mean point of class of points.

More detail regarding this and other clustering method could be found here. In particular we have implemented the following algorithm to perform the K-means:

Partition(X, C) = [[X[i] for i=1:length(X) if C[i] == j] for j=1:maximum(C)]
function zOpt(X,C,k)
	for i in 1:k
		z = zeros(1491,1);
		for j in 1:length(G[i])
		z = (1/length(G[i]))*z;
		Z = push!(Z,z);
function kOpt(X,Z,k)
	K = [];
	for i in 1:length(X)
		D = [];
		for j in 1:k
#Computing optimum function
function vOpt(C,X,Z)
	for i in 1:length(X)
		j = C[i];
		v = v+ norm([X[i][1]-Z[j][1],X[i][2]-Z[j][2]])
	v = (1/length(X))*v;
function KMeansClauster()
	k = 15
	#X = MakeData(300, k)
	n, m, I, Y = ReadSEAVISION("sv.csv");
    X = [I[:,i] for i in 1:n]
	C = RandomAssign(X, k)
	for i in 1:maxit
		Z = zOpt(X,C,k)
		C = kOpt(X,Z,k)
    return Z,C

How could we use K-Means clustering to improve the accuracy of my NN ? That’s a good question that has a couple of answer.

The first one I implemented in my final submition and it consist in identifying the digit I’ve the greatest probability of recognizing as the wrong number. I do so by checking if the maximum output of the softmaxed neurons is less then a certain tolerance. Once I’ve identified the digit with the greatest uncertainty I’ll proceed assigning to those digit the same number as the one of the closest cenetroid. Full detail regrading this method could be found here.


After I’ gave up the challenge I went speaking to my Professor who told me that his solution was to preprocess the image in such a way to remove the shadow present in the picture. I’ applied this technique using the function:

function bimg(x,tol)
    if x < tol

to preprocess all the image I feed to the NN during training and applying it to the image I feed to the NN during testing, full detail can be found here. Using a preprocessing technique we achieved an 100% accuracy.

Leave a Reply

Your email address will not be published. Required fields are marked *