Stacking & Splitting Arrays¶

How to combine arrays end-to-end and break them back apart.

`np.concatenate()` — the general workhorse¶

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(np.concatenate([a, b]))                   # [1 2 3 4 5 6]

For 2D, pick the axis:

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Vertical (axis 0) — stack rows
print(np.concatenate([A, B], axis=0))

print()
# Horizontal (axis 1) — stack columns
print(np.concatenate([A, B], axis=1))

Convenience shortcuts¶

np.vstack = vertical stack = concatenate(axis=0)
np.hstack = horizontal stack = concatenate(axis=1)
np.dstack = depth stack = stack into the 3^rd dimension
np.stack = creates a NEW axis

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("vstack:")
print(np.vstack([a, b]))             # treats 1D as rows
# shape (2, 3)

print("\nhstack:")
print(np.hstack([a, b]))             # [1 2 3 4 5 6]
# shape (6,)

print("\nstack (new axis):")
print(np.stack([a, b]))               # shape (2, 3) — like vstack here
print(np.stack([a, b], axis=1))       # shape (3, 2) — pairs them as columns

stack vs vstack — stack creates a NEW axis; vstack just joins along the existing first axis.

import numpy as np

A = np.ones((3, 4))
B = np.zeros((3, 4))

print("vstack:", np.vstack([A, B]).shape)    # (6, 4) — concat along axis 0
print("stack :", np.stack([A, B]).shape)      # (2, 3, 4) — NEW axis
print("dstack:", np.dstack([A, B]).shape)     # (3, 4, 2) — stack along last axis

Adding a column to a 2D array¶

import numpy as np

A = np.array([
    [1, 2],
    [3, 4],
    [5, 6],
])
new_col = np.array([10, 20, 30])

# Reshape new_col to a column vector
result = np.hstack([A, new_col[:, None]])
print(result)

Splitting — opposite of stacking¶

import numpy as np

a = np.arange(12)
print(np.split(a, 3))           # split into 3 equal parts
print(np.split(a, [3, 7]))      # split at indices 3 and 7

# Same with vsplit / hsplit
matrix = np.arange(24).reshape(4, 6)
print("Top half:")
top, bottom = np.vsplit(matrix, 2)
print(top)
print("Bottom half:")
print(bottom)

`np.array_split()` — handles uneven splits¶

import numpy as np

a = np.arange(10)

# `split` would fail because 10 doesn't divide evenly into 3
# np.split(a, 3)  # ValueError

# array_split happily makes uneven chunks
for chunk in np.array_split(a, 3):
    print(chunk)

Repeating arrays — `np.repeat` and `np.tile`¶

import numpy as np

a = np.array([1, 2, 3])

# Repeat each element 3 times
print(np.repeat(a, 3))             # [1 1 1 2 2 2 3 3 3]

# Tile the whole array 3 times
print(np.tile(a, 3))                # [1 2 3 1 2 3 1 2 3]

# 2D tile
print(np.tile(a, (2, 3)))           # 2 rows, each tile a 3 times

Flip and roll¶

import numpy as np

a = np.array([1, 2, 3, 4, 5])

print(np.flip(a))                  # [5 4 3 2 1]
print(np.roll(a, 2))                # [4 5 1 2 3] — shift right by 2
print(np.roll(a, -1))               # [2 3 4 5 1] — shift left by 1

For 2D, flip an axis:

import numpy as np

m = np.array([
    [1, 2, 3],
    [4, 5, 6],
])

print(np.flip(m, axis=0))     # flip vertically
print()
print(np.flip(m, axis=1))     # flip horizontally

Insert and append¶

import numpy as np

a = np.array([1, 2, 3, 4, 5])

# Insert 99 at index 2
print(np.insert(a, 2, 99))

# Insert multiple
print(np.insert(a, 2, [99, 88, 77]))

# Append to the end
print(np.append(a, 999))
print(np.append(a, [100, 200]))

Delete¶

import numpy as np

a = np.array([10, 20, 30, 40, 50])
print(np.delete(a, 2))             # remove index 2 → [10 20 40 50]
print(np.delete(a, [0, 4]))         # remove indices 0 and 4

# For 2D
m = np.arange(12).reshape(3, 4)
print(np.delete(m, 1, axis=0))     # remove row 1
print(np.delete(m, 1, axis=1))     # remove column 1

These return COPIES (NumPy arrays are fixed-size).

A real example — building a feature matrix¶

import numpy as np

rng = np.random.default_rng(0)

# 5 samples, each with 3 base features
X = rng.random((5, 3))

# Add a bias column (intercept) at the start
ones = np.ones((5, 1))
X_with_bias = np.hstack([ones, X])

print("Shape before:", X.shape)
print("Shape after :", X_with_bias.shape)
print(X_with_bias.round(2))

That's the trick to fit a linear regression with one matrix expression (see Linear Algebra).

Cheatsheet¶

Goal	Function
Combine along existing axis	`np.concatenate([a, b], axis=...)`
Stack as rows (1D arrays become rows)	`np.vstack([a, b])`
Stack as cols / side-by-side	`np.hstack([a, b])`
Stack creating a NEW axis	`np.stack([a, b], axis=...)`
Stack along last dim	`np.dstack([a, b])`
Split into N equal parts	`np.split(a, N)`
Split allowing uneven	`np.array_split(a, N)`
Split at indices	`np.split(a, [i1, i2])`
Repeat each elem	`np.repeat(a, n)`
Tile the array	`np.tile(a, n)`
Reverse	`np.flip(a, axis=...)`
Cyclic shift	`np.roll(a, n)`
Insert	`np.insert(a, idx, val)`
Append	`np.append(a, val)`
Delete	`np.delete(a, idx)`

Common pitfalls¶

❗ Shape mismatch — to vstack, columns must match; to hstack, rows must match.
❗ np.append is slow for repeated use — it creates a new array every time. For building up data, use a Python list and convert once at the end.
❗ stack vs concatenate — stack creates a new dim, concatenate doesn't. Easy to mix up.
❗ 1D vs 2D in hstack — hstack([1Darr, 1Darr]) concatenates as 1D. To stack as columns, reshape first.

Practice¶

What does this print?

Expected: (6,)

import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.concatenate([a, b]).shape)

Combine A and B into a (2, 3, 4) array by stacking on a new axis

Expected: (2, 3, 4)

import numpy as np
A = np.ones((3, 4))
B = np.zeros((3, 4))
print(np.vstack([A, B]).shape)    # bug: vstack gives (6, 4); use stack for new axis

Quiz — Quick check¶

What you remember

Q1. What's the difference between np.stack and np.concatenate?

No difference
np.stack creates a new axis; np.concatenate joins along an existing axis
np.stack is faster
np.concatenate is deprecated

Why: Stacking 2 arrays of shape (3, 4) with np.stack → (2, 3, 4). With np.concatenate(axis=0) → (6, 4). Different operations, easy to confuse.

Q2. Why use np.array_split instead of np.split?

It's faster
array_split handles uneven splits gracefully; split raises an error
split only works on 2D
Same function, different name

Why: np.split(arr_of_10, 3) raises because 10 doesn't divide by 3. np.array_split happily makes uneven chunks (e.g. sizes 4, 3, 3).

Q3. What does np.repeat([1, 2, 3], 2) produce?

[1, 2, 3, 1, 2, 3]
[1, 1, 2, 2, 3, 3]
[2, 4, 6]
[[1, 2, 3], [1, 2, 3]]

Why: repeat duplicates each element the given number of times. tile is the one that repeats the whole array — np.tile([1, 2, 3], 2) gives [1, 2, 3, 1, 2, 3].

Common doubts¶

Why is np.append(arr, val) slow when called in a loop?

Because NumPy arrays have fixed size — every append allocates a new buffer and copies the old data. For O(N) appends you do O(N²) work. Build a Python list with .append() (O(1)), then convert once with np.array(my_list) at the end.

Difference between np.flip and np.roll?

np.flip reverses an axis — [1, 2, 3] → [3, 2, 1]. np.roll shifts cyclically — np.roll([1, 2, 3], 1) → [3, 1, 2] (last element wraps to front). Different operations, easy to mix up.

When should I use hstack vs np.column_stack?

For 2D arrays they're equivalent. For 1D arrays they're different: np.hstack([a, b]) concatenates as 1D [a..., b...]. np.column_stack([a, b]) treats each 1D array as a column, producing a 2D matrix. Use column_stack when you want columns.

What's next¶

→ Boolean Masks & Filtering