INSEA             Techniques de réduction de dimension - 2025

TP 2: Numpy: introduction to scientific computing

            Author: Hicham Janati

How to follow this lab:

The goal is to understand AND retain in the long term: resist copy-pasting, prefer typing manually.
Getting stuck while programming is completely normal: search online, use documentation, or use the AI.
When prompting the AI, you must be specific. Explain that your goal is to learn, not to get an instant solution no matter what. Ask for short, explained answers with alternatives.
NEVER ASK THE AI TO PRODUCE MORE THAN ONE LINE OF CODE!
Adopt the Solve-It method: always try to solve a question or predict the output of code before running it. Learning happens when you confirm your understanding—and even more when you’re wrong and surprised.

2 - Scientific computing

2.1 Floating point numbers

Floats in Python are represented using binary fractions / powers in base 2. Numbers between 0-1 for example are sums of the form: \[ \sum_{k=1} \frac{a_k}{2^k} \] with the \(a_k\) being equal to either 0 or 1.

You can check how numbers are represented (and whether this representation is exact) using decimal

from decimal import Decimal

print(Decimal(0.1))

Question 1: explain the following results

0.1

0.1 + 0.1 + 0.1

0.1 + 0.1 + 0.1 + 0.1

a = 1e16
b = -1e16
c = 1.0

(a + b) + c

a + (b + c)

Always be careful of the order of magnitude of the numbers you’re manipulating ! For more information on how floating pointing numbers work, see Python doc – floats

2.2 Computing with Python types is slow !

Calculating a sum with a loop using built-in ints. You can “time” the operation with the magic command %%time.

%%time

N = 10_000_000 # underscores in whole numbers are ignored by Python

numbers = []  # create an empty list
for ii in range(N):
    numbers.append(ii) # add the number to the list
total = sum(numbers)

print(f"The total is {total}")

The %%time at the beginning of the cell above is an example of magic commands. It allows you to measure the time the processor (CPU) took to execute the entire cell. Magic commands with a single percent sign apply only to a single line.

%time print("The total is", sum([ii for ii in range(N)]))

Always prefer list comprehensions !

Lists in python can contain any mix of types, to run the sum above, Python needs to do type checks at each step before computing the +. Moreover, everything in Python is an object, even integers which have their own attributes and methods, making built-in types even slower. Numpy allows to circumvent this limitation by creating arrays where elements have the same type.

2.3 Numpy arrays

You can create a simple numpy array using:

import numpy as np

x = np.arange(5)
x

We can check its type:

x.dtype

You can change its type:

x = np.ones(5)
x

x = np.ones(5).astype(int)
x

Numpy arrays can handle all elementary operations element-wise:

x = np.ones(5)
print(x)
x = x + 1
print(x)

you can create a uniform grid of floats between -10, 10 using linspace and compute the function \(x \mapsto x^2\), again power is applied element-wise:

x = np.linspace(-10, 10, 21)
print(x)
y = x ** 2
print(y)

comparisons are also applied element-wise:

x = np.linspace(-10, 10, 21)
print(x)
print(x > 0)

If you pass a boolean array to an array, it will return the elements of the array where the boolean is true:

x[x > 0]

you also have methods you can use to compute stats or transform the array:

x = np.random.randint(-5, 10, 10)
print(x)
print(x.sum(), x.max(), x.min(), x.mean())

print(f"absolute value of x: {np.abs(x)}")

Question 2

Filter out in this random array keeping only numbers such that |x| > 1:

x = np.random.randn(10)

Numpy integers and Python (built-in) ints are not the same:

Question 3: Run the following cells, explain:

We compute powers of 2:

x = np.ones(1).astype(int) * 2
y = x ** 62
y

y = x ** 63
y

But using built-in integers:

y = 2 ** 62
y

y = 2 ** 63
y

Back to the sum computation. With numpy, we can run it in one line:

%time print(np.arange(N).sum())

As you can see, NumPy is a lot faster than pure Python sums, and the code is much shorter. NumPy should always be preferred over Python’s native lists when working solely with numbers and matrices. In a Python list, each int is an entire object (not a fixed 64 bits, it can be as large as your RAM !), to do a sum, Python needs to loop over each object, check its type, unpack it to get its value then compute the sum. In Numpy, arrays are continguous bytes in memory (each number next to the other): no type checks, almost pure CPU computations.

Moreover, when doing vector or matrix computations, NumPy vectorizes operations: instead of going through rows/columns one by one in a for loop, the operations (dot products) are applied simultaneously. This is possible because Numpy runs in C code not in pure Python. The main reason is that C allows for using multiple processor threads to run the code in parallel. This is not possible in Python because it has a Global Interpreter Lock (GIL) which only allows for one single thread.

Note: Since Python 3.14, it’s now possible to release the GIL in Python and use multiple threads. This feature is still experimental and not supported by most libraries.

The dot product between two arrays ( x ) and ( y ) of length ( n ) is defined as:
\[ \langle x, y \rangle = x^{\top} y = \sum_{i=1}^n x_i y_i. \]

Question 4:

Complete the following cells to compare the speed of dot products using native Python loops and the numpy.dot operation to see how much faster numpy is:

%%time

N = 10_000_000
x = np.random.randn(N)  # creates a list of random numbers following the Gaussian distribution
y = np.random.randn(N)

result_loops = 0

## To do 
for ii in range(N):
    result_loops += x[ii] * y[ii]

%%time
result_numpy = x @ y ## to do

Numpy slicing

NumPy offers a simple way to select subsets of an array, called slicing. To obtain a slice from the 3rd to the 5th element, for example:

x = np.arange(10)
print("All the array: ", x)
print("A slice: ", x[2:5])

Remember that Python starts indexing at 0 and that a slice [start:end] includes start but excludes end. If start is omitted, it is set to 0, and if end is omitted, it corresponds to the last index of the array. To select the first 6 elements:

x[:6]

To exclude the last element, you can use negative indices. For example:

x[:-1]

x[:-3]

Slices can also have a third parameter called step. So far, we have omitted this argument, which defaults to 1. Starting from 0, we can select the even indices by using a step of 2:

x[0:10:2]

We can omit the start and end arguments since they don’t do anything in this case:

x[::2]

Question 5:

Create a slice that selects the odd numbers in reverse order using a single slice:

Numpy views vs copies

Are numpy arrays like lists when it comes to slices ? Does a slice copy the array ?

x = np.arange(10)
y = x[:4]

print(f"x: {x} | id: {id(x)}")
print(f"y: {y} | id: {id(y)}")

Question 6: `id` is not the same ! is it a copy ? Change elements of `y`, does `x` change ?

Check the flags attribute of x and y: (x.flags), do you notice any difference ?

y is a different Python object (different id) but they point to the same underlying data. y is called a view. Numpy array creates views with all these operations: - slices y = x[2:5] - reshapes y = x.reshape(2, 5) - transposes y = x.T

You can check whether two arrays share the same data:

np.shares_memory(x, y)

If you want to create a copy of the array, you should always use the copy method:

y = x.copy()
np.shares_memory(x, y)

id(y.data)

2.4 Linear algebra with numpy

You can create a simple identity matrix:

A = np.eye(3)
A

Just like with vectors, all operations are element-wise:

B = 2 * A + 0.5
B

C = B - 10 * np.eye(3)
C

C > 0

The shape of a numpy array is by default not a column and not a row: just an array. To make it a row or a column, we must add an “imaginary” axis to change its shape:

a = np.array([1, 2, 3])
a.shape

b = a[np.newaxis, :]
b.shape

c = a[:, np.newaxis]
c.shape

This is equivalent to a reshape:

b = a.reshape(3, 1)
b.shape

The numpy dot operation takes into account the shapes before computing: if the inputs do not have 2D shapes (simple arrays) the assumption is that they are columns by default, so dot computes a scalar product:

x, y = np.arange(10), np.arange(10)
print(f"shape of x: {x.shape} | shape of y: {y.shape}")
x.dot(y)

Question 7:

Reshape the arrays such that you implement specifically the row column product \(x^\top y\). What changed compared to the result above ?

The dot operation is not only for scalar product, it conducts any matrix-matrix operation as long as the shapes are compatible.

Question 8:

We want to implement the square 10x10 matrix with general term \(x_i y_j\). How can you do this with reshapes and dot ?

Question 9

In the last example, the dot is equivalent to * (there is no sum !) does * give the same result ?

Question 10:

Keeping the reshapes you used above, what is the term of the general matrix you obtain if you replace * with + or - ?

With numpy, you can create 3D tensors like:

x = np.arange(12).reshape(2, 2, 3)
print(x.shape)

Question 11:

Generalizing the logic with two vectors, how can you create the tensor with general term \(X_{ijk} = a_i + b_j + c_k\) ?

a, b, c = np.arange(5), np.arange(5), np.arange(5)

We can apply operations (max, min, mean, sum) on matrices:

A = np.arange(10).reshape(2, 5)
print(A)
print(f"sum of A: {A.sum()}")

We can also choose to apply it on the rows only or columns only. Axis 0 means: we sum over the rows (sum over i), Index 1 we sum over the columns (sum over j):

A.sum(axis=0)

A.sum(axis=1)

Question 11:

All numpy methods that “aggregate” data have an axis argument. Using this arg for the np.linalg.norm function, compute the L2 norm of each of rows of this matrix:

A = np.random.randn(10, 5)

Question 12:

We have two vectors \(a\) and \(b\). We would like to compute the distance matrix \(D_{ij} = (a_i - b_j)^2\). Implement this using built-in Python and Numpy. Which is faster ?

a = np.random.randn(10)
b = np.random.randn(20)

Question 13:

We have two datasets \(A\) and \(B\) . We would like to compute the distance matrix \(D_{ij} = \|A_i - B_j\|^2\). Implement this using built-in Python and Numpy.

A = np.random.randn(10, 4)
B = np.random.randn(8, 4)

Question 14

How can you use the functionnp.linalg.norm to solve the last question ? which is faster ?

2.5 Matrix inverse and solving linear systems

We can solve linear systems like \(Ax = b\) and find the unknown \(x\) using:

A = np.random.randn(10, 10)
b = np.random.randn(10)

x = np.linalg.solve(A, b)
print(x)

[-1.23075777 -1.38216739  0.156997    1.2443776  -0.94236743 -0.47399296
 -1.59190301 -1.57523578 -0.78723647  0.70608696]

To evaluate the accuracy, never compare floats exactly with == since floats are approximations (Section 1). Instead we use approximate comparisons:

abs(A.dot(x) - b).max()

np.float64(1.4988010832439613e-15)

which are used in the comparisons functions of numpy like:

np.allclose(A.dot(x), b)

True

We can also compute the solution by inverting the matrix:

Ainv = np.linalg.inv(A)
x2 = Ainv.dot(b)

abs(x2 - x).max()

np.float64(1.2212453270876722e-15)

Both solutions are identical.

Question 15

Using different sizes of A and b, compare the time it takes to solve the linear system vs inverting the matrix.

Question 16

Compare both solutions with this matrix A and vector b, how can you explain this ?

grid = np.arange(5)
A = np.arange(25).reshape(5, 5) / 25
b = np.linspace(0, 1, 5)
A[0] = A[1:].sum(axis=0) + 1e-10

x = np.linalg.solve(A, b)
x2 = np.linalg.inv(A).dot(b)
abs(x2 - x).max()

np.float64(1.5)

Question 17:

You need to implement a formula involving a matrix inverse like: \[ y = \| A^\top B^{-1} z \| \] What is the best implementation ?

A = np.random.randn(10, 10)
B = np.random.randn(10, 10)
z = np.random.randn(10)

y = # todo

TP 2: Numpy: introduction to scientific computing

How to follow this lab:

2 - Scientific computing

2.1 Floating point numbers

Question 1: explain the following results

2.2 Computing with Python types is slow !

2.3 Numpy arrays

Question 2

Question 3: Run the following cells, explain:

Question 4:

Numpy slicing

Question 5:

Numpy views vs copies

Question 6: id is not the same ! is it a copy ? Change elements of y, does x change ?

2.4 Linear algebra with numpy

Question 7:

Question 8:

Question 9

Question 10:

Question 11:

Question 11:

Question 12:

Question 13:

Question 14

2.5 Matrix inverse and solving linear systems

Question 15

Question 16

Question 17:

Question 6: `id` is not the same ! is it a copy ? Change elements of `y`, does `x` change ?