Skip to content

Reshaping — pivot, melt, stack, unstack

Real data comes in two shapes:

  • Long format (tidy): one row per observation, repeating keys.
  • Wide format: one row per entity, columns for each variable/period.

Reshaping = converting between them.

LONG (tidy)                    WIDE
+-------+------+-------+       +-------+---+---+---+
| city  | year | sales |       | city  |2024|2025|...
+-------+------+-------+       +-------+---+---+
| Mum   | 2024 |  100  |       | Mum   |100|150|
| Mum   | 2025 |  150  |  ⇄    | Delhi | 80|120|
| Delhi | 2024 |   80  |       +-------+---+---+
| Delhi | 2025 |  120  |
+-------+------+-------+

pivot — long → wide (simple)

import pandas as pd

df = pd.DataFrame({
    "city":  ["Mum","Mum","Delhi","Delhi","Pune","Pune"],
    "year":  [2024, 2025, 2024, 2025, 2024, 2025],
    "sales": [100, 150, 80, 120, 60, 90],
})
print("Long:")
print(df)
print()

wide = df.pivot(index="city", columns="year", values="sales")
print("Wide (after pivot):")
print(wide)

pivot requires unique (index, columns) pairs. If there are duplicates, use pivot_table.

pivot_table — pivot with aggregation

When the same (row, column) combo appears multiple times, pivot_table aggregates:

import pandas as pd

df = pd.DataFrame({
    "city":   ["Mum","Mum","Mum","Delhi","Delhi","Pune","Pune"],
    "year":   [2024, 2024, 2025, 2024, 2025, 2024, 2025],
    "sales":  [100, 200, 150, 80, 120, 60, 90],
})

# Default aggfunc = "mean"
print(df.pivot_table(index="city", columns="year", values="sales", aggfunc="mean"))
print()

# Sum
print(df.pivot_table(index="city", columns="year", values="sales", aggfunc="sum"))
print()

# Fill missing with 0, add row/column totals (margins)
print(df.pivot_table(
    index="city", columns="year", values="sales",
    aggfunc="sum",
    fill_value=0,
    margins=True,
    margins_name="TOTAL",
))

Multiple value columns / multiple aggregations

import pandas as pd

df = pd.DataFrame({
    "city":    ["Mum","Mum","Delhi","Delhi","Pune","Pune"],
    "year":    [2024, 2025, 2024, 2025, 2024, 2025],
    "revenue": [100, 150, 80, 120, 60, 90],
    "units":   [10, 15, 8, 12, 6, 9],
})

# Mean of multiple value columns
print(df.pivot_table(
    index="city", columns="year",
    values=["revenue", "units"],
    aggfunc="mean",
))
print()

# Multiple aggs
print(df.pivot_table(
    index="city", columns="year", values="revenue",
    aggfunc=["mean", "sum"],
))

melt — wide → long

The opposite of pivot. Useful when you have a CSV with year columns and you want one row per (entity, year):

import pandas as pd

wide = pd.DataFrame({
    "city":  ["Mum","Delhi","Pune"],
    "2024":  [100, 80, 60],
    "2025":  [150, 120, 90],
})
print("Wide:")
print(wide)
print()

long = wide.melt(
    id_vars="city",                  # columns to keep
    value_vars=["2024","2025"],       # columns to unpivot
    var_name="year",
    value_name="sales",
)
print("Long:")
print(long)

melt is what makes a dataset "tidy" — one observation per row. Most plotting libraries (seaborn, plotly) expect tidy data.

stack and unstack — index pivoting

For DataFrames with a MultiIndex (or MultiIndex columns):

import pandas as pd

df = pd.DataFrame({
    "city":  ["Mum","Mum","Delhi","Delhi"],
    "year":  [2024, 2025, 2024, 2025],
    "sales": [100, 150, 80, 120],
})

# Make a multi-index DataFrame
indexed = df.set_index(["city","year"])
print("Multi-index:")
print(indexed)
print()

# Unstack — last index level becomes columns
print("Unstacked:")
print(indexed.unstack())
print()

# Stack — columns become an inner index
print("Stacked back:")
print(indexed.unstack().stack())

stack/unstack are the index-aware versions of pivot/melt.

crosstab — frequency cross-tabulation

When you just want a contingency table:

import pandas as pd

df = pd.DataFrame({
    "city":   ["Mum","Mum","Delhi","Delhi","Pune","Pune","Mum","Delhi"],
    "gender": ["M","F","M","M","F","F","M","F"],
})

print(pd.crosstab(df["city"], df["gender"]))
print()

# With margins (totals)
print(pd.crosstab(df["city"], df["gender"], margins=True))
print()

# As percentages of rows
print(pd.crosstab(df["city"], df["gender"], normalize="index").round(2))

Explode — list-cells become multiple rows

When a column contains lists:

import pandas as pd

df = pd.DataFrame({
    "user":   ["Alice","Bob","Carol"],
    "tags":   [["py","ml"], ["js"], ["py","ml","ai"]],
})
print("Before:")
print(df)
print()

print("After explode:")
print(df.explode("tags").reset_index(drop=True))

Useful for splitting comma-separated values into rows.

A practical example — monthly sales report

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
n = 100

sales = pd.DataFrame({
    "date":    pd.date_range("2025-01-01", periods=n, freq="3D"),
    "region":  rng.choice(["N","S","E","W"], size=n),
    "product": rng.choice(["A","B","C"], size=n),
    "revenue": rng.integers(100, 1000, size=n),
})
sales["month"] = sales["date"].dt.month

# Wide report: rows=region, cols=month, values=total revenue
report = sales.pivot_table(
    index="region",
    columns="month",
    values="revenue",
    aggfunc="sum",
    fill_value=0,
    margins=True,
    margins_name="TOTAL",
)
print(report)

Wide-to-long with multi-column patterns — wide_to_long

For columns like salary_2023, salary_2024, salary_2025:

import pandas as pd

wide = pd.DataFrame({
    "name":         ["Alice","Bob","Carol"],
    "salary_2023":  [50000, 60000, 75000],
    "salary_2024":  [55000, 65000, 80000],
    "salary_2025":  [60000, 70000, 90000],
})
print("Wide:")
print(wide)
print()

long = pd.wide_to_long(
    wide,
    stubnames="salary",
    i="name",
    j="year",
    sep="_",
).reset_index()
print("Long:")
print(long)

When to use which

You have... You want... Use
Long (tidy) Wide table pivot
Long with duplicates Aggregated wide pivot_table
Wide Long (tidy) melt
Two-column frequency Cross-tab crosstab
MultiIndex DataFrame Index → column unstack
Wide column with index Column → index stack
List-cell column Each item → its own row explode
Wide columns like x_2023, x_2024 Long wide_to_long

Cheatsheet

# pivot — long to wide (no aggregation needed)
df.pivot(index="row", columns="col", values="v")

# pivot_table — long to wide WITH aggregation
df.pivot_table(index="r", columns="c", values="v", aggfunc="mean",
               fill_value=0, margins=True)

# melt — wide to long
df.melt(id_vars=["keep"], value_vars=["a","b"], var_name="var", value_name="val")

# stack / unstack — pivot at index level
df.set_index(["r","c"]).unstack()

# crosstab — contingency table
pd.crosstab(df["a"], df["b"])

# explode — list-cell to rows
df.explode("col")

# wide_to_long — multi-column unpivot
pd.wide_to_long(df, stubnames="x", i="id", j="year", sep="_")

Common pitfalls

  • pivot with duplicates — raises an error. Use pivot_table.
  • pivot_table defaults to mean — set aggfunc="sum" explicitly when you want totals.
  • MultiIndex result — pivoting on multiple values gives a MultiIndex for columns. Flatten with df.columns = ["_".join(...)] if needed.
  • melt value type changes — if you melt columns of different dtypes, the resulting value column becomes object.
  • NaN in pivot — by default unmatched cells are NaN. Use fill_value=0 (or another value) to clean up.

Practice

What does this print?

Expected: 2

import pandas as pd
df = pd.DataFrame({"city": ["A","B","A","B"], "year": [2024, 2024, 2025, 2025], "sales": [100, 200, 150, 250]})
wide = df.pivot(index="city", columns="year", values="sales")
print(wide.shape[1])

Reshape from wide to long format (use melt — currently the data isn't tidy)

Expected: 6

import pandas as pd
wide = pd.DataFrame({"city": ["A", "B"], "2024": [100, 80], "2025": [150, 120], "2026": [200, 160]})
long = wide                  # bug: still wide — need to melt
print(len(long) * 1)

Quiz — Quick check

What you remember

Q1. When should you use pivot_table instead of pivot?

  • Always — pivot is deprecated
  • When the same (index, columns) combination appears multiple times and needs aggregation
  • For 3D data
  • For text columns only

Why: pivot requires unique (row, column) pairs and raises if duplicates exist. pivot_table aggregates duplicates (default: mean). Pick based on whether your data has duplicates.

Q2. What does melt do?

  • Removes columns
  • Reshapes from wide to long — multiple value columns become rows
  • Sorts the DataFrame
  • Joins two DataFrames

Why: melt is the opposite of pivot. Useful when columns represent values of a variable (years, products, regions) and you want one row per (entity, variable) instead.

Q3. What does df.explode("tags") do when tags contains lists?

  • Removes the column
  • Combines all lists into one
  • Creates one row per list element — duplicating the other columns
  • Raises an error

Why: If row 1 has tags=["py", "ml"], explode turns it into two rows (one with py, one with ml) and duplicates the rest of the columns. Useful when a column was crammed with multi-value data.

Common doubts

When should I use tidy (long) vs wide format?

Use tidy for analysis, modeling, and plotting (seaborn, plotly assume tidy). Use wide for presentation, executive reports, and Excel-style spreadsheets. Pandas makes converting between them easy with pivot/melt.

Why does my pivot table have MultiIndex columns?

Because you pivoted on multiple value columns or multiple aggregations. Flatten with df.columns = ["_".join(col).strip("_") for col in df.columns] or use pivot_table(...) with a single value column and a named aggregation.

What's the difference between unstack and pivot?

They produce similar results but work differently. unstack moves an index level to become columns — requires a MultiIndex first. pivot works on columns directly. Use pivot for cleaner code; unstack when you already have a MultiIndex from groupby.

What's next

Time Series