Skip to content

Introduction to NumPy

What is NumPy?

NumPy (Numerical Python) is a Python library for working with arrays of numbers. It's the foundation of nearly every Python data science library — Pandas, scikit-learn, TensorFlow, PyTorch, OpenCV — they all use NumPy under the hood.

Why not just use Python lists?

You can store numbers in a Python list:

nums = [1, 2, 3, 4, 5]

But: 1. Slow for math — Python lists do everything element-by-element with overhead. 2. Verbose — no built-in array + 5 to add 5 to every element. 3. Memory-heavy — each element is a full Python object.

NumPy fixes all three:

import numpy as np

# Add 5 to every element — without writing a loop
a = np.array([1, 2, 3, 4, 5])
print(a + 5)

# Multiply every element by 10
print(a * 10)

# Compute statistics
print("sum:", a.sum())
print("mean:", a.mean())
print("max:", a.max())

To do that with plain lists you'd need:

nums = [1, 2, 3, 4, 5]
result = [x + 5 for x in nums]    # comprehension
total = sum(nums)
mean = total / len(nums)

NumPy is FAST

A vectorized operation is 10-100× faster than a Python loop. Run this:

import numpy as np
import time

N = 1_000_000

# Pure Python
py_list = list(range(N))
t = time.time()
result = [x * 2 for x in py_list]
py_time = time.time() - t

# NumPy
np_array = np.arange(N)
t = time.time()
result = np_array * 2
np_time = time.time() - t

print(f"Python list comprehension: {py_time*1000:.1f} ms")
print(f"NumPy vectorized:          {np_time*1000:.1f} ms")
print(f"Speedup: {py_time/np_time:.0f}x")

NumPy is fast because: - Operations happen in compiled C code, not Python. - Data lives in a contiguous block of memory — no pointer chasing. - Each element is a fixed-size raw number — not a Python object.

Installing NumPy

In a real Python environment:

pip install numpy

In this tutorial, NumPy is already loaded — every ▶ Run block has it available.

A peek at what NumPy can do

import numpy as np

# 1D array
a = np.array([1, 2, 3, 4, 5])
print("1D:", a)

# 2D array (matrix)
b = np.array([[1, 2, 3],
              [4, 5, 6]])
print("2D:")
print(b)
print("shape:", b.shape)
print("dtype:", b.dtype)

# Quick generators
print("zeros:", np.zeros(5))
print("ones :", np.ones((2, 3)))
print("range:", np.arange(0, 10, 2))
print("evenly spaced:", np.linspace(0, 1, 5))

Where NumPy gets used

Field What for
Data Science / ML Feature matrices, statistics, distance calculations
Computer Vision Images are just 3D arrays (height × width × RGB)
Scientific Computing Simulations, signal processing, optimization
Finance Time series, options pricing, risk models
Pandas DataFrames are wrappers around NumPy arrays
Deep Learning Tensors are NumPy-style arrays on a GPU

What you'll learn in this tutorial

# Chapter
2 Creating arrays
3 Array attributes — shape, dtype, ndim
4 Indexing & slicing
5 Reshaping
6 Math operations & universal functions
7 Broadcasting
8 Aggregations — sum, mean, std, axis
9 Sorting & searching
10 Linear algebra
11 Random sampling
12 Concatenating & splitting
13 Boolean masks & filtering
14 Real-world examples

Prerequisites

  • Python basics — lists, loops, functions.
  • That's it. NumPy is approachable.

A note on the runnable code

Every Python block in this tutorial has a ▶ Run button. Click to execute in your browser (no setup), edit the code and re-run. The first run downloads Python (~10 MB) — instant after that.

Practice

What does this print?

Expected: [2 4 6 8 10]

import numpy as np
a = np.array([1, 2, 3, 4, 5])
print(a * 2)

Print the sum of the array (10), not '1234'

Expected: 10

import numpy as np
a = np.array([1, 2, 3, 4])
print(sum(str(x) for x in a))    # bug: this concatenates strings

Quiz — Quick check

What you remember

Q1. Why is np.array * 2 faster than [x*2 for x in list]?

  • NumPy uses multi-threading by default
  • NumPy operations run in compiled C code over a contiguous memory block
  • NumPy caches list comprehensions
  • Python lists are deprecated

Why: NumPy stores numbers in a contiguous typed buffer and runs vectorized ops in C — no Python-level loop overhead, no boxed objects.

Q2. Which of these is NOT a typical use case for NumPy?

  • Image processing (images are 3D arrays)
  • Statistics on numerical data
  • Linear algebra
  • Parsing JSON documents

Why: NumPy is for numerical array data. JSON parsing is text/object work — use the standard json module instead.

Q3. What does np.array([1, 2, 3]).dtype print on a typical 64-bit system?

  • float64
  • int64
  • int32
  • int8

Why: NumPy infers the dtype from the input. All ints → defaults to the platform-native integer (int64 on 64-bit systems). Mix in a float and it'd promote to float64.

Common doubts

When should I use a Python list and when a NumPy array?

Use a list for mixed-type, small, or variable-length data (["alice", 25, True]). Use a NumPy array for fixed-type numerical data, especially when you'll do math on it. Once you have >1000 numbers or any element-wise math, NumPy wins big.

Is import numpy as np mandatory, or just convention?

Just convention — but universal. Every NumPy tutorial, book, and codebase uses np. Importing it differently (e.g. import numpy as numpy) works but makes your code feel alien to other readers.

Does NumPy work on the GPU?

NumPy itself is CPU-only. For GPU, look at CuPy (NumPy-compatible API on NVIDIA), JAX (research/ML, GPU+TPU), or PyTorch tensors (also work on GPU). The API is intentionally similar so you can port code between them.

Where this leaves us

Creating Arrays