In machine learning, a classifier assigns a class label to data points, such as identifying objects in an image. A Convolutional Neural Network (CNN) is a specialized type of neural network that excels at this task, particularly for image data.
Components of a CNN:
- Tensors: These are n-dimensional matrices that represent the data. In CNNs, tensors are usually 3-dimensional except for the output layer.
- Neurons: These functions process inputs to produce outputs. In CNNs, neurons generate activation maps based on their inputs.
- Layers: A collection of neurons performing the same operation. Layers in CNNs include convolutional, pooling, and fully connected layers.
- Kernel Weights and Biases: These parameters are tuned during training to help the network adapt to the dataset. They are essential for feature extraction.
How CNNs Work:
- Input Layer: Represents the input image, with each channel corresponding to RGB colors.
- Convolutional Layers: These layers apply kernels to the input image to extract features. Each kernel is a small matrix that slides over the input, performing elementwise multiplication and summation to create activation maps.
- Pooling Layers: These layers reduce the spatial dimensions of the input, which decreases the number of parameters and computation. Max-pooling is a common method that selects the maximum value from each kernel slice of the input.
- Flatten Layer: Converts 3D activation maps into a 1D vector, which is then fed into fully connected layers for classification.
- Activation Functions:
- ReLU: Adds non-linearity to the model by replacing negative values with zero, enhancing the network’s ability to learn complex patterns.
- Softmax: Converts the output logits into probabilities, ensuring that the sum of all class probabilities equals 1.
Hyperparameters:
- Padding: Adds extra pixels around the edges of the input to preserve spatial dimensions. Zero-padding is commonly used.
- Kernel Size: Determines the dimensions of the sliding window over the input. Smaller kernels capture finer details, while larger kernels capture broader features.
- Stride: Specifies the number of pixels the kernel moves during each convolution. A smaller stride captures more features, while a larger stride reduces the dimensionality more rapidly.
Interactive Features:
- Upload and classify your own images.
- Adjust activation map colorscales and hyperparameters to see their impact.
- Explore network details and operations through interactive visualizations.
Implementation and Development: CNN Explainer uses TensorFlow.js for in-browser deep learning, with visualizations created using JavaScript, Svelte, and D3.js. The project was developed by a team from Georgia Tech and Oregon State, with support from various institutions and grants.