import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
from scipy.optimize import check_grad
# Load the digits dataset
digits = load_digits()
X = digits.data
y = digits.target
print(f"Dataset shape: X={X.shape}, y={y.shape}")
print(f"Unique labels: {np.unique(y)}")INSEA Techniques de réduction de dimension - 2025
TP 5: Optimization and supervised representations
Author: Hicham Janati
How to follow this lab:
- The goal is to understand AND retain in the long term: resist copy-pasting, prefer typing manually.
- Getting stuck while programming is completely normal: search online, use documentation, or use the AI.
- When prompting the AI, you must be specific. Explain that your goal is to learn, not to get an instant solution no matter what. Ask for short, explained answers with alternatives.
- NEVER ASK THE AI TO PRODUCE MORE THAN ONE LINE OF CODE!
- Adopt the
Solve-Itmethod: always try to solve a question or predict the output of code before running it. Learning happens when you confirm your understanding—and even more when you’re wrong and surprised.
Part 1: Logistic Regression from Scratch
In this first part, we will implement logistic regression from scratch using only NumPy. This will help you understand the fundamental concepts of machine learning: loss functions, gradients, and optimization.
We’ll work with the digits dataset from scikit-learn (that we used last week), which contains images of handwritten digits (0-9). For simplicity, we’ll start with a binary classification problem: distinguishing digit 0 from digit 1.
1.1 Loading and preparing the data
Let’s start by loading the digits dataset and preparing it for binary classification:
Question 1:
Filter the dataset to keep only digits 0 and 1. Then split the data into training and test sets (use 80% for training, 20% for test). What are the shapes of your training and test sets?
Question 2:
Normalize the features by subtracting the mean and dividing by the standard deviation. Why is this important? Apply the normalization to both training and test sets, but compute the mean and standard deviation only from the training set.
1.2 The logistic regression model
Logistic regression models the probability that a sample belongs to class 1 using the sigmoid function:
\[P(y=1 | x) = \sigma(w^T x) = \frac{1}{1 + e^{-w^T x}}\]
where \(w\) is the weight vector (including the bias term) and \(\sigma\) is the sigmoid function.
Question 3:
Implement the sigmoid function. What is the range of its output? What happens when the input is very large (positive or negative)?
def sigmoid(z):
"""
Compute the sigmoid function: sigma(z) = 1 / (1 + exp(-z))
Args:
z: Input (can be a scalar or array)
Returns:
Sigmoid of z
"""
# TODO: implement the sigmoid function
Question 4:
Implement a function that computes the predictions (probabilities) for a given weight vector \(w\) and input data \(X\). The function should return probabilities for each sample.
def predict_proba(X, w):
"""
Compute the probability P(y=1 | x) for each sample in X.
Args:
X: Input data (n_samples, n_features)
w: Weight vector (n_features,)
Returns:
Probabilities (n_samples,)
"""
# test it with random weights1.3 The loss function
For binary classification, we use the binary cross-entropy loss (also called log loss):
\[L(w) = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\sigma(w^T x_i)) + (1-y_i) \log(1-\sigma(w^T x_i)) \right]\]
This loss function penalizes confident wrong predictions more than uncertain ones.
Question 5:
Implement the loss function. What happens if \(\sigma(w^T x_i) = 0\) when \(y_i = 1\)? How can we avoid numerical issues?
def compute_loss(X, y, w):
"""
Compute the binary cross-entropy loss.
Args:
X: Input data (n_samples, n_features)
y: True labels (n_samples,)
w: Weight vector (n_features,)
Returns:
Loss value (scalar)
"""
return loss
# Test the loss function1.4 Computing the gradient
To minimize the loss function using gradient descent, we need to compute its gradient with respect to the weights \(w\). Compute the gradient of the binary cross-entropy loss with pen and paper then implement the gradient function. Verify that the output has the same shape as the weight vector \(w\).
def compute_gradient(X, y, w):
"""
Compute the gradient of the loss with respect to w.
Args:
X: Input data (n_samples, n_features)
y: True labels (n_samples,)
w: Weight vector (n_features,)
Returns:
Gradient vector (n_features,)
"""
return gradient
# Test the gradient1.5 Gradient checking
Before implementing gradient descent, it’s crucial to verify that our gradient computation is correct. We can do this using numerical differentiation and comparing it with our analytical gradient.
Question 7:
We can use scipy.optimize.check_grad to verify that your gradient implementation of the loss is correct. Here’s an example with the function: \[ x \mapsto f(x, a, b) = a \|x\|^2 + b^\top x \] its gradient is given by: \[ \nabla_x f(x, a, b) = 2 a x + b\]
# Wrapper functions for check_grad
import numpy as np
from scipy.optimize import check_grad
dim = 10
a = 5
b = np.random.randn(dim)
def f(x, a, b):
return a * np.linalg.norm(x)**2 + b @ x
def grad_f(x, a, b):
return 2 * a * x + b
def loss_wrapper(x):
return f(x, a, b)
def grad_wrapper(x):
return grad_f(x, a, b)
x_check = np.random.randn(dim)
error = check_grad(loss_wrapper, grad_wrapper, x_check)
print(f"Gradient check error: {error:.2e}")1.6 Gradient descent
Now that we’ve verified our gradient, we can implement gradient descent to minimize the loss function. The update rule is:
\[w_{t+1} = w_t - \alpha \nabla_w L(w_t)\]
where \(\alpha\) is the learning rate.
Question 8:
Implement gradient descent. The function should: 1. Initialize weights (you can use zeros or small random values) 2. For each iteration: - Compute the gradient - Update the weights - Optionally store the loss for visualization 3. Return the final weights and the history of losses
Visualize the curve of the loss as function of the iterations
def gradient_descent(X, y, learning_rate=0.01, n_iterations=1000, verbose=True):
"""
Perform gradient descent to minimize the loss.
Args:
X: Input data (n_samples, n_features)
y: True labels (n_samples,)
learning_rate: Step size for gradient descent
n_iterations: Number of iterations
verbose: Whether to print progress
Returns:
w: Final weight vector
loss_history: List of loss values at each iteration
"""
1.7 Effect of learning rate
The learning rate is a crucial hyperparameter. If it’s too small, convergence is slow. If it’s too large, the algorithm might diverge or oscillate.
Question 10:
Run gradient descent with different learning rates (e.g., 0.001, 0.01, 0.1, 1.0) and compare: - The convergence speed - The final loss value - Whether the algorithm converges or diverges
Visualize the loss curves for different learning rates on the same plot and explain what you observe.
1.8 Evaluating on test data
Now that we’ve trained our model, we need to evaluate its performance on unseen test data. This is crucial to assess whether our model generalizes well.
Question 12:
Implement a function that: 1. Computes predictions (probabilities) for test data 2. Converts probabilities to binary predictions (threshold = 0.5) 3. Computes the accuracy: (number of correct predictions) / (total number of samples)
What is the accuracy on the training set? On the test set? Are they similar?
def predict(X, w, threshold=0.5):
"""
Make binary predictions.
Args:
X: Input data (n_samples, n_features)
w: Weight vector (n_features,)
threshold: Decision threshold
Returns:
Binary predictions (n_samples,)
"""
def compute_accuracy(X, y, w):
"""
Compute the accuracy of predictions.
Args:
X: Input data (n_samples, n_features)
y: True labels (n_samples,)
w: Weight vector (n_features,)
Returns:
Accuracy (scalar between 0 and 1)
"""
Part 2: Neural Networks with PyTorch
In this second part, we’ll explore neural networks using PyTorch. We’ll build a neural network with one hidden layer and learn about automatic differentiation, which is one of the key features that makes deep learning frameworks powerful.
2.1 Introduction to PyTorch
PyTorch is a deep learning framework that provides automatic differentiation (autograd). This means we don’t need to manually compute gradients—PyTorch tracks operations and can compute gradients automatically.
Let’s start by importing PyTorch and understanding tensors:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
print(f"PyTorch version: {torch.__version__}")
# Create a simple tensor
x = torch.tensor([1.0, 2.0, 3.0])
print(f"Tensor x: {x}")
print(f"Tensor shape: {x.shape}")
print(f"Tensor dtype: {x.dtype}")2.2 Automatic Differentiation (Autograd)
The key feature of PyTorch is automatic differentiation. When we create a tensor with requires_grad=True, PyTorch tracks all operations on it and can compute gradients automatically.
Question 14:
Run the following code and explain what happens. What is the difference between requires_grad=True and requires_grad=False?
# Create a tensor that requires gradient computation
x = torch.tensor([2.0], requires_grad=True)
print(f"x: {x}")
print(f"x.requires_grad: {x.requires_grad}")
# Define a simple function: y = x^2
y = x ** 2
print(f"y: {y}")
print(f"y.requires_grad: {y.requires_grad}")
print(f"y.grad_fn: {y.grad_fn}") # This shows the operation that created y
# Compute the gradient
y.backward() # This computes dy/dx
print(f"x.grad: {x.grad}") # Should be 2*x = 4.0Question 15:
Try a more complex function: \(z = x^2 + 2xy + y^2\) where \(x=2\) and \(y=3\). Compute \(\frac{\partial z}{\partial x}\) and \(\frac{\partial z}{\partial y}\) using PyTorch’s autograd. Verify manually that the gradients are correct.
2.3 Building a Neural Network
Now let’s build a neural network with one hidden layer. We’ll use PyTorch’s nn.Module class, which provides a convenient way to define neural networks.
Our network will have: - Input layer: 64 features (8x8 image flattened) - Hidden layer: 32 neurons with ReLU activation - Output layer: 10 neurons (one for each digit 0-9) with softmax
Question 17:
We create a neural network class that inherits from nn.Module. What is the purpose of the __init__ and forward methods?
class NeuralNetwork(nn.Module):
def __init__(self, input_size=64, hidden_size=32, output_size=10):
"""
Initialize the neural network.
Args:
input_size: Number of input features
hidden_size: Number of neurons in the hidden layer
output_size: Number of output classes
"""
super(NeuralNetwork, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size) # Input to hidden
self.fc2 = nn.Linear(hidden_size, output_size) # Hidden to output
self.relu = nn.ReLU() # Activation function
def forward(self, x):
"""
Forward pass through the network.
Args:
x: Input tensor (batch_size, input_size)
Returns:
Output tensor (batch_size, output_size)
"""
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
# Create an instance of the network
model = NeuralNetwork(input_size=64, hidden_size=32, output_size=10)
print(model)
# Test with a random input
x_test = torch.randn(5, 64) # Batch of 5 samples
output = model(x_test)
print(f"\nInput shape: {x_test.shape}")
print(f"Output shape: {output.shape}")2.4 Preparing the Data
Before training, we need to convert our NumPy arrays to PyTorch tensors and create data loaders for efficient batch processing.
Question 18:
We convert the digits dataset to PyTorch tensors. Use all 10 classes (not just 0 and 1). What is the difference between torch.tensor() and torch.from_numpy()?
# Load full digits dataset (all 10 classes)
digits = load_digits()
X_full = digits.data
y_full = digits.target
# Split into train and test
X_train_full, X_test_full, y_train_full, y_test_full = train_test_split(
X_full, y_full, test_size=0.2, random_state=42
)
# Normalize
mean_train_full = X_train_full.mean(axis=0)
std_train_full = X_train_full.std(axis=0)
X_train_full_norm = (X_train_full - mean_train_full) / (std_train_full + 1e-8)
X_test_full_norm = (X_test_full - mean_train_full) / (std_train_full + 1e-8)
# Convert to PyTorch tensors
X_train_tensor = torch.from_numpy(X_train_full_norm).float()
y_train_tensor = torch.from_numpy(y_train_full).long()
X_test_tensor = torch.from_numpy(X_test_full_norm).float()
y_test_tensor = torch.from_numpy(y_test_full).long()
print(f"Training set: {X_train_tensor.shape}, {y_train_tensor.shape}")
print(f"Test set: {X_test_tensor.shape}, {y_test_tensor.shape}")
# Create data loaders for batch processing
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
print(f"\nNumber of batches in training set: {len(train_loader)}")
print(f"Batch size: {batch_size}")2.5 Training the Model
Now we’ll train the neural network. The training loop involves: 1. Forward pass: compute predictions 2. Compute loss 3. Backward pass: compute gradients 4. Update weights using an optimizer
We implement the training loop. Using: - Cross-entropy loss (nn.CrossEntropyLoss) - Stochastic gradient descent optimizer (optim.SGD) - Learning rate of 0.01
Train for 100 epochs and print the loss every 10 epochs (iterations).
# Create model, loss function, and optimizer
model = NeuralNetwork(input_size=64, hidden_size=32, output_size=10)
criterion = nn.CrossEntropyLoss() # Loss function for multi-class classification
optimizer = optim.SGD(model.parameters(), lr=0.01) # Stochastic Gradient Descent
# Training loop
n_epochs = 100
train_losses = []
for epoch in range(n_epochs):
epoch_loss = 0.0
n_batches = 0
# Iterate over batches
for batch_X, batch_y in train_loader:
optimizer.zero_grad()
# Forward pass
outputs = model(batch_X)
# Compute loss
loss = criterion(outputs, batch_y)
# Backward pass
loss.backward()
# Update weights
optimizer.step()
epoch_loss += loss.item()
n_batches += 1
avg_loss = epoch_loss / n_batches
train_losses.append(avg_loss)
if (epoch + 1) % 10 == 0:
print(f"Epoch [{epoch+1}/{n_epochs}], Loss: {avg_loss:.4f}")
# Plot training loss
plt.figure(figsize=(10, 5))
plt.plot(train_losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.grid(True)
plt.show()Question 20:
Evaluate the model on the test set. Compute the accuracy. How does it compare to the training accuracy?
def evaluate_model(model, data_loader):
"""
Evaluate the model on a dataset.
Returns:
accuracy: Classification accuracy
"""
model.eval() # Set model to evaluation mode (disables dropout, etc.)
correct = 0
total = 0
with torch.no_grad(): # Disable gradient computation for efficiency
for batch_X, batch_y in data_loader:
outputs = model(batch_X)
_, predicted = torch.max(outputs.data, 1) # Get predicted class
total += batch_y.size(0)
correct += (predicted == batch_y).sum().item()
accuracy = correct / total
return accuracy
# Evaluate on training and test sets
train_accuracy = evaluate_model(model, train_loader)
test_accuracy = evaluate_model(model, test_loader)
print(f"Training accuracy: {train_accuracy:.4f}")
print(f"Test accuracy: {test_accuracy:.4f}")