Reshaping — pivot, melt, stack, unstack¶

Real data comes in two shapes:

Long format (tidy): one row per observation, repeating keys.
Wide format: one row per entity, columns for each variable/period.

Reshaping = converting between them.

LONG (tidy)                    WIDE
+-------+------+-------+       +-------+---+---+---+
| city  | year | sales |       | city  |2024|2025|...
+-------+------+-------+       +-------+---+---+
| Mum   | 2024 |  100  |       | Mum   |100|150|
| Mum   | 2025 |  150  |  ⇄    | Delhi | 80|120|
| Delhi | 2024 |   80  |       +-------+---+---+
| Delhi | 2025 |  120  |
+-------+------+-------+

`pivot` — long → wide (simple)¶

import pandas as pd

df = pd.DataFrame({
    "city":  ["Mum","Mum","Delhi","Delhi","Pune","Pune"],
    "year":  [2024, 2025, 2024, 2025, 2024, 2025],
    "sales": [100, 150, 80, 120, 60, 90],
})
print("Long:")
print(df)
print()

wide = df.pivot(index="city", columns="year", values="sales")
print("Wide (after pivot):")
print(wide)

pivot requires unique (index, columns) pairs. If there are duplicates, use pivot_table.

`pivot_table` — pivot with aggregation¶

When the same (row, column) combo appears multiple times, pivot_table aggregates:

import pandas as pd

df = pd.DataFrame({
    "city":   ["Mum","Mum","Mum","Delhi","Delhi","Pune","Pune"],
    "year":   [2024, 2024, 2025, 2024, 2025, 2024, 2025],
    "sales":  [100, 200, 150, 80, 120, 60, 90],
})

# Default aggfunc = "mean"
print(df.pivot_table(index="city", columns="year", values="sales", aggfunc="mean"))
print()

# Sum
print(df.pivot_table(index="city", columns="year", values="sales", aggfunc="sum"))
print()

# Fill missing with 0, add row/column totals (margins)
print(df.pivot_table(
    index="city", columns="year", values="sales",
    aggfunc="sum",
    fill_value=0,
    margins=True,
    margins_name="TOTAL",
))

Multiple value columns / multiple aggregations¶

import pandas as pd

df = pd.DataFrame({
    "city":    ["Mum","Mum","Delhi","Delhi","Pune","Pune"],
    "year":    [2024, 2025, 2024, 2025, 2024, 2025],
    "revenue": [100, 150, 80, 120, 60, 90],
    "units":   [10, 15, 8, 12, 6, 9],
})

# Mean of multiple value columns
print(df.pivot_table(
    index="city", columns="year",
    values=["revenue", "units"],
    aggfunc="mean",
))
print()

# Multiple aggs
print(df.pivot_table(
    index="city", columns="year", values="revenue",
    aggfunc=["mean", "sum"],
))

`melt` — wide → long¶

The opposite of pivot. Useful when you have a CSV with year columns and you want one row per (entity, year):

import pandas as pd

wide = pd.DataFrame({
    "city":  ["Mum","Delhi","Pune"],
    "2024":  [100, 80, 60],
    "2025":  [150, 120, 90],
})
print("Wide:")
print(wide)
print()

long = wide.melt(
    id_vars="city",                  # columns to keep
    value_vars=["2024","2025"],       # columns to unpivot
    var_name="year",
    value_name="sales",
)
print("Long:")
print(long)

melt is what makes a dataset "tidy" — one observation per row. Most plotting libraries (seaborn, plotly) expect tidy data.

`stack` and `unstack` — index pivoting¶

For DataFrames with a MultiIndex (or MultiIndex columns):

import pandas as pd

df = pd.DataFrame({
    "city":  ["Mum","Mum","Delhi","Delhi"],
    "year":  [2024, 2025, 2024, 2025],
    "sales": [100, 150, 80, 120],
})

# Make a multi-index DataFrame
indexed = df.set_index(["city","year"])
print("Multi-index:")
print(indexed)
print()

# Unstack — last index level becomes columns
print("Unstacked:")
print(indexed.unstack())
print()

# Stack — columns become an inner index
print("Stacked back:")
print(indexed.unstack().stack())

stack/unstack are the index-aware versions of pivot/melt.

`crosstab` — frequency cross-tabulation¶

When you just want a contingency table:

import pandas as pd

df = pd.DataFrame({
    "city":   ["Mum","Mum","Delhi","Delhi","Pune","Pune","Mum","Delhi"],
    "gender": ["M","F","M","M","F","F","M","F"],
})

print(pd.crosstab(df["city"], df["gender"]))
print()

# With margins (totals)
print(pd.crosstab(df["city"], df["gender"], margins=True))
print()

# As percentages of rows
print(pd.crosstab(df["city"], df["gender"], normalize="index").round(2))

Explode — list-cells become multiple rows¶

When a column contains lists:

import pandas as pd

df = pd.DataFrame({
    "user":   ["Alice","Bob","Carol"],
    "tags":   [["py","ml"], ["js"], ["py","ml","ai"]],
})
print("Before:")
print(df)
print()

print("After explode:")
print(df.explode("tags").reset_index(drop=True))

Useful for splitting comma-separated values into rows.

A practical example — monthly sales report¶

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
n = 100

sales = pd.DataFrame({
    "date":    pd.date_range("2025-01-01", periods=n, freq="3D"),
    "region":  rng.choice(["N","S","E","W"], size=n),
    "product": rng.choice(["A","B","C"], size=n),
    "revenue": rng.integers(100, 1000, size=n),
})
sales["month"] = sales["date"].dt.month

# Wide report: rows=region, cols=month, values=total revenue
report = sales.pivot_table(
    index="region",
    columns="month",
    values="revenue",
    aggfunc="sum",
    fill_value=0,
    margins=True,
    margins_name="TOTAL",
)
print(report)

Wide-to-long with multi-column patterns — `wide_to_long`¶

For columns like salary_2023, salary_2024, salary_2025:

import pandas as pd

wide = pd.DataFrame({
    "name":         ["Alice","Bob","Carol"],
    "salary_2023":  [50000, 60000, 75000],
    "salary_2024":  [55000, 65000, 80000],
    "salary_2025":  [60000, 70000, 90000],
})
print("Wide:")
print(wide)
print()

long = pd.wide_to_long(
    wide,
    stubnames="salary",
    i="name",
    j="year",
    sep="_",
).reset_index()
print("Long:")
print(long)

When to use which¶

You have...	You want...	Use
Long (tidy)	Wide table	`pivot`
Long with duplicates	Aggregated wide	`pivot_table`
Wide	Long (tidy)	`melt`
Two-column frequency	Cross-tab	`crosstab`
MultiIndex DataFrame	Index → column	`unstack`
Wide column with index	Column → index	`stack`
List-cell column	Each item → its own row	`explode`
Wide columns like `x_2023, x_2024`	Long	`wide_to_long`

Cheatsheet¶

# pivot — long to wide (no aggregation needed)
df.pivot(index="row", columns="col", values="v")

# pivot_table — long to wide WITH aggregation
df.pivot_table(index="r", columns="c", values="v", aggfunc="mean",
               fill_value=0, margins=True)

# melt — wide to long
df.melt(id_vars=["keep"], value_vars=["a","b"], var_name="var", value_name="val")

# stack / unstack — pivot at index level
df.set_index(["r","c"]).unstack()

# crosstab — contingency table
pd.crosstab(df["a"], df["b"])

# explode — list-cell to rows
df.explode("col")

# wide_to_long — multi-column unpivot
pd.wide_to_long(df, stubnames="x", i="id", j="year", sep="_")

Common pitfalls¶

❗ pivot with duplicates — raises an error. Use pivot_table.
❗ pivot_table defaults to mean — set aggfunc="sum" explicitly when you want totals.
❗ MultiIndex result — pivoting on multiple values gives a MultiIndex for columns. Flatten with df.columns = ["_".join(...)] if needed.
❗ melt value type changes — if you melt columns of different dtypes, the resulting value column becomes object.
❗ NaN in pivot — by default unmatched cells are NaN. Use fill_value=0 (or another value) to clean up.

Practice¶

What does this print?

Expected: 2

import pandas as pd
df = pd.DataFrame({"city": ["A","B","A","B"], "year": [2024, 2024, 2025, 2025], "sales": [100, 200, 150, 250]})
wide = df.pivot(index="city", columns="year", values="sales")
print(wide.shape[1])

Reshape from wide to long format (use melt — currently the data isn't tidy)

Expected: 6

import pandas as pd
wide = pd.DataFrame({"city": ["A", "B"], "2024": [100, 80], "2025": [150, 120], "2026": [200, 160]})
long = wide                  # bug: still wide — need to melt
print(len(long) * 1)

Quiz — Quick check¶

What you remember

Q1. When should you use pivot_table instead of pivot?

Always — pivot is deprecated
When the same (index, columns) combination appears multiple times and needs aggregation
For 3D data
For text columns only

Why: pivot requires unique (row, column) pairs and raises if duplicates exist. pivot_table aggregates duplicates (default: mean). Pick based on whether your data has duplicates.

Q2. What does melt do?

Removes columns
Reshapes from wide to long — multiple value columns become rows
Sorts the DataFrame
Joins two DataFrames

Why: melt is the opposite of pivot. Useful when columns represent values of a variable (years, products, regions) and you want one row per (entity, variable) instead.

Q3. What does df.explode("tags") do when tags contains lists?

Removes the column
Combines all lists into one
Creates one row per list element — duplicating the other columns
Raises an error

Why: If row 1 has tags=["py", "ml"], explode turns it into two rows (one with py, one with ml) and duplicates the rest of the columns. Useful when a column was crammed with multi-value data.

Common doubts¶

When should I use tidy (long) vs wide format?

Use tidy for analysis, modeling, and plotting (seaborn, plotly assume tidy). Use wide for presentation, executive reports, and Excel-style spreadsheets. Pandas makes converting between them easy with pivot/melt.

Why does my pivot table have MultiIndex columns?

Because you pivoted on multiple value columns or multiple aggregations. Flatten with df.columns = ["_".join(col).strip("_") for col in df.columns] or use pivot_table(...) with a single value column and a named aggregation.

What's the difference between unstack and pivot?

They produce similar results but work differently. unstack moves an index level to become columns — requires a MultiIndex first. pivot works on columns directly. Use pivot for cleaner code; unstack when you already have a MultiIndex from groupby.

What's next¶

→ Time Series