Reshaping — pivot, melt, stack, unstack¶
Real data comes in two shapes:
- Long format (tidy): one row per observation, repeating keys.
- Wide format: one row per entity, columns for each variable/period.
Reshaping = converting between them.
LONG (tidy) WIDE
+-------+------+-------+ +-------+---+---+---+
| city | year | sales | | city |2024|2025|...
+-------+------+-------+ +-------+---+---+
| Mum | 2024 | 100 | | Mum |100|150|
| Mum | 2025 | 150 | ⇄ | Delhi | 80|120|
| Delhi | 2024 | 80 | +-------+---+---+
| Delhi | 2025 | 120 |
+-------+------+-------+
pivot — long → wide (simple)¶
import pandas as pd
df = pd.DataFrame({
"city": ["Mum","Mum","Delhi","Delhi","Pune","Pune"],
"year": [2024, 2025, 2024, 2025, 2024, 2025],
"sales": [100, 150, 80, 120, 60, 90],
})
print("Long:")
print(df)
print()
wide = df.pivot(index="city", columns="year", values="sales")
print("Wide (after pivot):")
print(wide)
pivot requires unique (index, columns) pairs. If there are duplicates, use pivot_table.
pivot_table — pivot with aggregation¶
When the same (row, column) combo appears multiple times, pivot_table aggregates:
import pandas as pd
df = pd.DataFrame({
"city": ["Mum","Mum","Mum","Delhi","Delhi","Pune","Pune"],
"year": [2024, 2024, 2025, 2024, 2025, 2024, 2025],
"sales": [100, 200, 150, 80, 120, 60, 90],
})
# Default aggfunc = "mean"
print(df.pivot_table(index="city", columns="year", values="sales", aggfunc="mean"))
print()
# Sum
print(df.pivot_table(index="city", columns="year", values="sales", aggfunc="sum"))
print()
# Fill missing with 0, add row/column totals (margins)
print(df.pivot_table(
index="city", columns="year", values="sales",
aggfunc="sum",
fill_value=0,
margins=True,
margins_name="TOTAL",
))
Multiple value columns / multiple aggregations¶
import pandas as pd
df = pd.DataFrame({
"city": ["Mum","Mum","Delhi","Delhi","Pune","Pune"],
"year": [2024, 2025, 2024, 2025, 2024, 2025],
"revenue": [100, 150, 80, 120, 60, 90],
"units": [10, 15, 8, 12, 6, 9],
})
# Mean of multiple value columns
print(df.pivot_table(
index="city", columns="year",
values=["revenue", "units"],
aggfunc="mean",
))
print()
# Multiple aggs
print(df.pivot_table(
index="city", columns="year", values="revenue",
aggfunc=["mean", "sum"],
))
melt — wide → long¶
The opposite of pivot. Useful when you have a CSV with year columns and you want one row per (entity, year):
import pandas as pd
wide = pd.DataFrame({
"city": ["Mum","Delhi","Pune"],
"2024": [100, 80, 60],
"2025": [150, 120, 90],
})
print("Wide:")
print(wide)
print()
long = wide.melt(
id_vars="city", # columns to keep
value_vars=["2024","2025"], # columns to unpivot
var_name="year",
value_name="sales",
)
print("Long:")
print(long)
melt is what makes a dataset "tidy" — one observation per row. Most plotting libraries (seaborn, plotly) expect tidy data.
stack and unstack — index pivoting¶
For DataFrames with a MultiIndex (or MultiIndex columns):
import pandas as pd
df = pd.DataFrame({
"city": ["Mum","Mum","Delhi","Delhi"],
"year": [2024, 2025, 2024, 2025],
"sales": [100, 150, 80, 120],
})
# Make a multi-index DataFrame
indexed = df.set_index(["city","year"])
print("Multi-index:")
print(indexed)
print()
# Unstack — last index level becomes columns
print("Unstacked:")
print(indexed.unstack())
print()
# Stack — columns become an inner index
print("Stacked back:")
print(indexed.unstack().stack())
stack/unstack are the index-aware versions of pivot/melt.
crosstab — frequency cross-tabulation¶
When you just want a contingency table:
import pandas as pd
df = pd.DataFrame({
"city": ["Mum","Mum","Delhi","Delhi","Pune","Pune","Mum","Delhi"],
"gender": ["M","F","M","M","F","F","M","F"],
})
print(pd.crosstab(df["city"], df["gender"]))
print()
# With margins (totals)
print(pd.crosstab(df["city"], df["gender"], margins=True))
print()
# As percentages of rows
print(pd.crosstab(df["city"], df["gender"], normalize="index").round(2))
Explode — list-cells become multiple rows¶
When a column contains lists:
import pandas as pd
df = pd.DataFrame({
"user": ["Alice","Bob","Carol"],
"tags": [["py","ml"], ["js"], ["py","ml","ai"]],
})
print("Before:")
print(df)
print()
print("After explode:")
print(df.explode("tags").reset_index(drop=True))
Useful for splitting comma-separated values into rows.
A practical example — monthly sales report¶
import pandas as pd
import numpy as np
rng = np.random.default_rng(0)
n = 100
sales = pd.DataFrame({
"date": pd.date_range("2025-01-01", periods=n, freq="3D"),
"region": rng.choice(["N","S","E","W"], size=n),
"product": rng.choice(["A","B","C"], size=n),
"revenue": rng.integers(100, 1000, size=n),
})
sales["month"] = sales["date"].dt.month
# Wide report: rows=region, cols=month, values=total revenue
report = sales.pivot_table(
index="region",
columns="month",
values="revenue",
aggfunc="sum",
fill_value=0,
margins=True,
margins_name="TOTAL",
)
print(report)
Wide-to-long with multi-column patterns — wide_to_long¶
For columns like salary_2023, salary_2024, salary_2025:
import pandas as pd
wide = pd.DataFrame({
"name": ["Alice","Bob","Carol"],
"salary_2023": [50000, 60000, 75000],
"salary_2024": [55000, 65000, 80000],
"salary_2025": [60000, 70000, 90000],
})
print("Wide:")
print(wide)
print()
long = pd.wide_to_long(
wide,
stubnames="salary",
i="name",
j="year",
sep="_",
).reset_index()
print("Long:")
print(long)
When to use which¶
| You have... | You want... | Use |
|---|---|---|
| Long (tidy) | Wide table | pivot |
| Long with duplicates | Aggregated wide | pivot_table |
| Wide | Long (tidy) | melt |
| Two-column frequency | Cross-tab | crosstab |
| MultiIndex DataFrame | Index → column | unstack |
| Wide column with index | Column → index | stack |
| List-cell column | Each item → its own row | explode |
Wide columns like x_2023, x_2024 |
Long | wide_to_long |
Cheatsheet¶
# pivot — long to wide (no aggregation needed)
df.pivot(index="row", columns="col", values="v")
# pivot_table — long to wide WITH aggregation
df.pivot_table(index="r", columns="c", values="v", aggfunc="mean",
fill_value=0, margins=True)
# melt — wide to long
df.melt(id_vars=["keep"], value_vars=["a","b"], var_name="var", value_name="val")
# stack / unstack — pivot at index level
df.set_index(["r","c"]).unstack()
# crosstab — contingency table
pd.crosstab(df["a"], df["b"])
# explode — list-cell to rows
df.explode("col")
# wide_to_long — multi-column unpivot
pd.wide_to_long(df, stubnames="x", i="id", j="year", sep="_")
Common pitfalls¶
- ❗
pivotwith duplicates — raises an error. Usepivot_table. - ❗
pivot_tabledefaults tomean— setaggfunc="sum"explicitly when you want totals. - ❗ MultiIndex result — pivoting on multiple values gives a MultiIndex for columns. Flatten with
df.columns = ["_".join(...)]if needed. - ❗
meltvalue type changes — if you melt columns of different dtypes, the resultingvaluecolumn becomesobject. - ❗ NaN in pivot — by default unmatched cells are NaN. Use
fill_value=0(or another value) to clean up.
Practice¶
What does this print?
Expected: 2
Reshape from wide to long format (use melt — currently the data isn't tidy)
Expected: 6
Quiz — Quick check¶
What you remember
Q1. When should you use pivot_table instead of pivot?
- Always —
pivotis deprecated - When the same (index, columns) combination appears multiple times and needs aggregation
- For 3D data
- For text columns only
Why:
pivotrequires unique (row, column) pairs and raises if duplicates exist.pivot_tableaggregates duplicates (default: mean). Pick based on whether your data has duplicates.
Q2. What does melt do?
- Removes columns
- Reshapes from wide to long — multiple value columns become rows
- Sorts the DataFrame
- Joins two DataFrames
Why:
meltis the opposite ofpivot. Useful when columns represent values of a variable (years, products, regions) and you want one row per (entity, variable) instead.
Q3. What does df.explode("tags") do when tags contains lists?
- Removes the column
- Combines all lists into one
- Creates one row per list element — duplicating the other columns
- Raises an error
Why: If row 1 has
tags=["py", "ml"], explode turns it into two rows (one withpy, one withml) and duplicates the rest of the columns. Useful when a column was crammed with multi-value data.
Common doubts¶
When should I use tidy (long) vs wide format?
Use tidy for analysis, modeling, and plotting (seaborn, plotly assume tidy). Use wide for presentation, executive reports, and Excel-style spreadsheets. Pandas makes converting between them easy with pivot/melt.
Why does my pivot table have MultiIndex columns?
Because you pivoted on multiple value columns or multiple aggregations. Flatten with df.columns = ["_".join(col).strip("_") for col in df.columns] or use pivot_table(...) with a single value column and a named aggregation.
What's the difference between unstack and pivot?
They produce similar results but work differently. unstack moves an index level to become columns — requires a MultiIndex first. pivot works on columns directly. Use pivot for cleaner code; unstack when you already have a MultiIndex from groupby.