To gain a deep, practical understanding of neural networks by building a Multi-Layer Perceptron (MLP) entirely from scratch in C++, using only standard libraries. The final product needed to integrate with a pre-existing terminal-based drawing program and achieve a target accuracy of >90% on recognizing user-drawn digits.
I was the sole architect and developer for this personal project, responsible for the neural network implementation, dataset creation, and integration with the UI.
I designed a classic Multi-Layer Perceptron (MLP) with three layers:
Input Layer: 784 nodes (a flattened 28x28 binary pixel grid).
Hidden Layer: 64 nodes using a Sigmoid activation function.
Output Layer: 10 nodes (for digits 0-9) using a Softmax function to output probabilities.
Design Decisions:
C++ from Scratch: I chose C++ for its performance, which was critical for efficient training without relying on external libraries like Eigen or PyTorch. This forced me to manually manage memory and implement all linear algebra operations.
Single Hidden Layer: A single hidden layer with 64 nodes provided a balance between model capacity and training time. This architecture is proven to be a universal approximator for this problem scope, and it kept the parameter count (~60,000 weights and biases) manageable for a custom implementation.
Sigmoid Activation: I selected the Sigmoid function for the hidden layer for its straightforward derivative, which simplified my implementation of backpropagation.
Terminal during training. Each line is printed after training on 100 digits. A lower cost indicates a better network.
The core of the project was the implementation of the backpropagation algorithm to train the network.
I implemented a forward pass to calculate activations, and a backward pass to compute the gradients of the loss (Mean Squared Error) with respect to every weight and bias.
The algorithm used stochastic gradient descent with a fixed learning rate to iteratively update the parameters. Handling the matrix operations and ensuring correct gradient flow through the layers was the most complex part of the project.
I created a dedicated Network struct to hold the weights, biases, and layer sizes, using pointers to pass this structure efficiently between functions during training and inference.
I identified a critical issue: a model trained on a standard dataset (like MNIST) would perform poorly on my drawing program's distinct style (binary pixels, thin lines).
Solution: I developed a data collection tool that presented a random digit to draw and saved my 28x28 binary drawing to a labeled file.
Outcome: I hand-drew a custom dataset of over 3,000 digits. This ensured the training data distribution matched the real-world application, a key factor in the model's high final accuracy.
The final step was integrating the trained model with the drawing program.
I modified my existing drawing application to process the canvas, segment multi-digit numbers, and feed each 28x28 segment into the neural network for classification.
Upon pressing 'g', the program displays its prediction in real-time.
The project was a significant success, exceeding the initial accuracy goal.
Quantitative Results: In a controlled test of 100 digits I drew, the model achieved a 96% recognition accuracy. This demonstrated the effectiveness of the core implementation and the value of a well-matched dataset.
Technical Validation: The success proved that my from-scratch implementation of forward propagation, backpropagation, and gradient descent was mathematically correct and functionally robust.
Limitations and Insight: As expected, the model's performance was highly specialized. It excelled with my drawing style but was confused by digits drawn differently (e.g., a '4' with a closed top).Â
Conclusion: Building a neural network from the ground up provided an unparalleled understanding of the mechanics of deep learning. It solidified my knowledge of optimization, the importance of data quality, and the computational considerations involved in training. This low-level experience gives me a strong foundation for now working effectively with high-level frameworks like TensorFlow and PyTorch
Video Demonstration of Final Program