But what is a neural network? | Deep learning

Written by Massa Medi
Neural networks have revolutionized the world of machine learning, powering everything from automated bank check readings to breakthrough image recognition systems. But have you ever wondered what actually happens beneath the surface when a computer “sees” a handwritten digit? In this deep dive, we’ll break down, layer by layer, exactly how neural networks decipher scrappy, pixelated digits—no advanced math degree required. We’ll explore the math, demystify the jargon, and get you inspired by how these models “think,” whether you’re a total newcomer or an aspiring AI architect.
Astonishing Human Pattern Recognition
Imagine glancing at a blurry “3,” hastily scribbled and rendered at a hilariously low resolution of just 28 by 28 pixels. Your brain barely flinches—it just knows it’s a “3.” Now pause and take a moment to reflect on just how wild that is. You, mere human, can instantly recognize an array of awkward threes—a shaky “3”, a bold “3”, a skinny “3”—even though the exact pixels, the specific light-sensitive cells in your retina firing with each different shape, change every time. Despite these variations, your visual cortex makes sense of the chaos and recognizes the underlying concept of “three.”
Computers, in contrast, face a daunting challenge. If someone challenged you to write a program that takes a 28x28 grid of pixels and reliably outputs the correct digit, that leap from human intuition to code is, frankly, enormous. The task balloons from “comically trivial” for our brains to “formidably complex” for a line-by-line computer program.
Why Machine Learning and Neural Networks Matter
In today’s world, the relevance—and necessity—of machine learning and neural networks is almost beyond question. These technologies power self-driving cars, speech recognition, medical image analysis, and so much more. But while we constantly hear terms like “deep learning” and “AI,” what do they actually mean? And how are neural networks more than just buzzwords?
The goal here is no mere surface tour. We’re rolling up our sleeves, starting from scratch, to build and visualize a neural network designed to recognize handwritten digits. This classic example serves as the perfect on-ramp to neural network theory—and, by the end, you’ll not only understand the structure, but also what happens when you hear about a neural network “learning.”
Unpacking the Neural Network: From Pixels to Predictions
At its heart, a neural network is a digital homage to the brain, loosely inspired by networks of neurons firing in biological tissue. But let’s break it down—what is a “neuron” in this context, and how are they connected?
What Is a Digital “Neuron”?
In the context of neural networks, a “neuron” is a very simple element: it’s something that holds a single number, specifically between 0 and 1. For example, the network’s input is fed through 784 neurons, one for each pixel in a 28x28 grayscale digit image. Each of these neurons holds a value representing how bright the corresponding pixel is: 0 for a black pixel and 1 for white, with values in-between for various shades of gray. The technical term for this number is the neuron’s activation.
Think of it like lights lighting up on a grid: the brighter the neuron, the higher its activation.
Layers Upon Layers
The first layer of the network holds these 784 activations. Jump to the last layer—there are 10 neurons, each representing one of the digits, 0 through 9. The activation level here indicates how confident the network is that the input image matches that particular digit. In between, sit the “hidden layers”—the enigmatic middlemen whose roles we’ll soon clarify.
For our purposes, let’s stick to a classic, “plain vanilla” architecture: two hidden layers, each with 16 neurons—an arbitrary but visually handy number for illustration. In reality, architectures can vary, sometimes wildly so, but this structure is the ideal learning ground for neural network basics.
How Do Layers Interact?
The magic of a neural network lies in how the activations in one layer influence those in the next. As in the brain, where groups of neurons firing can trigger others, each neuron in one layer is connected to every neuron in the layer ahead via so-called “weights.” After training, these connections encode the logic of recognition.
Feed in an image—say, a digit “9.” All 784 input neurons light up according to each pixel’s brightness. This pattern triggers a series of activations in the first hidden layer, which in turn triggers the next hidden layer, then the output. The neuron in the output layer with the highest activation is the network’s “guess” as to which digit has been shown.
Why Do Layered Networks Work?
What’s our intuition for thinking this sort of layered setup might exhibit intelligent behavior?
- Feature Decomposition: When you recognize digits, you break the task down—an “8” has two loops, a “4” has three straight lines, a “9” has a loop and a stalk. The hope is that, in a perfect scenario, each neuron in the penultimate layer corresponds to a sub-component like a loop or a line. When an image contains a feature (a loop up top, say), the relevant neuron “lights up.”
- Edge Detection: Hidden layers may detect smaller features: the first hidden layer might capture tiny edge segments; subsequent layers combine edges into bigger structures, like loops or lines, eventually piecing these into digits.
- Versatility Across Domains: This abstraction works beyond digits. Image and speech recognition both thrive on transforming raw sensor data into increasingly sophisticated patterns—first sounds or edges, then syllables or shapes, then words or objects.
How Does a Neuron “Detect” a Pattern?
Suppose we want a neuron in the second layer to spot a specific edge—say, a horizontal stroke in the upper left. How is this possible? That’s where the network’s parameters—weights—enter.
- Each neuron is connected to all 784 input neurons. Each connection has a weight—a number reflecting how much importance to give to that specific pixel.
- The neuron computes a “weighted sum”: Multiplying each pixel’s activation by its corresponding weight, then adding them up. If we only care about a specific region, weights outside that region go to zero.
- Edge Detection: To detect edges, the neuron can assign positive weights to the central pixels and negative weights to the edge pixels (imagine a glowing green/red heatmap!). This way, the neuron activates strongly if a line appears in the intended spot.
- Bias: To ensure a neuron only activates when the pattern is convincingly present, we add a bias—an extra number (like minus 10) before passing the sum through the final activation function.
- Activation Function: We want every neuron’s output to land between 0 and 1. For that, a sigmoid function, or “logistic curve,” is commonly used: super negative inputs get squished toward 0, very positive ones toward 1. Very dramatic slopes happen around zero.
- Multiple Neurons, Multiple Features: Each neuron in a layer can “look for” totally different patterns, each with its own weights and bias. For 16 neurons in a layer, that’s 784 connections per neuron, each with its own weight, plus a bias—so 12,544 parameters (just for the first hidden layer). Add up all the layers, and it’s nearly 13,000 parameters for this small network!
The Core of Learning: Tuning Weights and Biases
The learning part of machine learning is all about finding the right values for all those weights and biases—those 13,000+ dials and knobs—so the network performs its pattern-recognition magic.
Imagine, for a moment, trying to set them all by hand. Painstakingly zeroing in on which neurons should activate for every edge, loop, or squiggle. It’s both a fascinating intellectual exercise and a reminder that these algorithms aren’t just black boxes. Understanding what weights and biases do gives you a foundation to analyze why a network succeeds—or struggles—and helps demystify the whole “AI” thing.
Math Made Beautiful: Matrix Notation
Here’s where a little notation elegance comes in. Rather than tracking thousands of numbers one by one, we group:
- Activations: All the neurons’ activations in a layer become a single column vector.
- Weights: All connections between two layers are captured in a matrix—the row representing connections to a single neuron in the next layer.
- Matrix Multiplication: To get the next set of activations, multiply your weight matrix by the activation vector. Add a bias vector. Then feed each component through a sigmoid (or other) function.
Why does this matter? It makes the code simple (and blazing fast, thanks to optimized matrix libraries). So, the network as a whole is nothing more than a complex mathematical function: input 784 numbers (pixels), output 10 numbers (digits), with matrix multiplications and non-linear squishification along the way.
Reality Check: Why So Complicated?
Neural networks may seem (and often are) complicated—with thousands of parameters and matrix math galore. But that’s actually a reassuring sign: if we want computers to take on “messy” pattern recognition, it needs this complexity. And, crucially, we need methods that let the network learn those tweaks automatically by analyzing mountains of sample data—a topic for the next article.
A Nod to the Activation Wars: Sigmoid vs ReLU
Before we wrap up, a quick side note on activation functions—a topic that sparks debate within deep learning circles. In early neural networks, the sigmoid “S-curve” function was the workhorse, inspired by biological neurons flipping “on” and “off.” But over time, the ReLU (Rectified Linear Unit) became the new standard. It’s simple: output zero for negative values, or the input itself for positives. This function not only sped up and stabilized training, but also worked wonders for very deep networks. While sigmoids linger in textbooks and legacy code, ReLU is the practical star of most production networks today.
Looking Forward: What’s Next?
That’s a full tour of structure—how a neural network is wired up, what neurons, weights, and biases actually mean, and how the whole thing turns pixel grids into digit predictions. In the sequel, we’ll demystify learning—how all those weights and biases are tuned using raw data and clever optimization.
For hands-on learners: at the end of this small series, you’ll be pointed to resources where you can download the code, tinker, and explore neural networks on your own computer.
Bonus: Expert Insights—Sigmoid vs. ReLU
To bring another voice into this, let’s hear from Lisha Lee, a PhD-trained deep learning theorist and venture capital pro. Reflecting on activation functions:
Early neural networks used the sigmoid function, motivated by the idea of neurons being “on” or “off.” But in modern networks, that’s considered a bit old-school. Tools like the Rectified Linear Unit (ReLU) make networks much easier to train, especially as they get deeper and more complex. ReLUs are motivated partly by how biological neurons function—if activated, they output their input directly; if not, they stay silent. It simplifies the math and, as it turns out, improves training in practice.
Final Thoughts & Resources
This article is only the beginning. Stay tuned for our next deep dive into how neural networks learn, adapt, and sometimes surprise us. And if you’re keen to see more, subscribe—the next installment will cover the full training process and offer pointers to further reading and hands-on resources.
Special thanks to everyone supporting this work—especially on Patreon, and to voice-of-experience guests who make the big ideas clearer for everyone.
Recommended Articles
Tech

The Essential Guide to Computer Components: Understanding the Heart and Brain of Your PC

Google’s Antitrust Battles, AI Shenanigans, Stretchy Computers & More: Your Wild, Weird Week in Tech

The Ultimate Guide to Major Operating Systems: From Windows to Unix and Beyond

Palantir: How a Silicon Valley Unicorn Rewrote the Rules on Tech, Data, and Defense

The Secret Magic of Wi-Fi: How Invisible Waves Power Your Internet Obsession

Palantir: The Shadow Tech Giant Redefining Power, Privacy, and America’s Future

Inside Tech’s Wild Subcultures: From Devfluencers to Codepreneurs—A Candid Exposé

The Life Cycle of a Linux User: From Awareness to Enlightenment (and Everything in Between)

How to apply for a job at Google

40 Programming Projects That Will Make You a Better Developer

Bird Flu’s Shocking Spread: How H5N1 Is Upending America’s Farms—and the World Isn’t Ready

AI-Powered Bots Offend Reddit, Infiltrate Communities, and Power High-Tech Scams: What You Need To Know in 2025

Tech Jobs in 2025: Will the U.S. Tech Job Market Bounce Back as AI Takes Hold?

Tech Jobs in Freefall: Why Top Companies Are Slashing Job Postings Despite Record Profits

The Greatest Hack in History

But what is quantum computing? (Grover's Algorithm)
