Sorting & Ranking¶

Ordering a DataFrame by column values (or index), and assigning ranks.

Sort by a single column — `sort_values`¶

import pandas as pd

df = pd.DataFrame({
    "name":   ["Alice","Bob","Carol","Dave","Eve"],
    "age":    [25, 30, 35, 28, 22],
    "salary": [50000, 80000, 75000, 60000, 90000],
})

# Ascending (default)
print(df.sort_values("age"))
print()

# Descending
print(df.sort_values("salary", ascending=False))

Sort by multiple columns¶

import pandas as pd

df = pd.DataFrame({
    "name":   ["Alice","Bob","Carol","Dave","Eve","Frank"],
    "city":   ["Mumbai","Delhi","Mumbai","Delhi","Pune","Mumbai"],
    "salary": [50000, 80000, 75000, 60000, 90000, 100000],
})

# Sort by city (asc), then within each city by salary (desc)
print(df.sort_values(
    by=["city", "salary"],
    ascending=[True, False],
))

Sort by index — `sort_index`¶

import pandas as pd

df = pd.DataFrame({
    "val": [3, 1, 2, 4]
}, index=["d", "b", "c", "a"])

print(df.sort_index())
print()

# Descending
print(df.sort_index(ascending=False))

NaN handling¶

By default NaN goes to the bottom. Override with na_position:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "name":  ["Alice","Bob","Carol","Dave"],
    "score": [85, np.nan, 90, 70],
})

print("NaN at bottom (default):")
print(df.sort_values("score"))
print()

print("NaN at top:")
print(df.sort_values("score", na_position="first"))

In-place sort¶

import pandas as pd

df = pd.DataFrame({"x": [3, 1, 4, 1, 5]})

# Default — returns a new sorted DataFrame
sorted_df = df.sort_values("x")
print("original:", df["x"].tolist())
print("sorted  :", sorted_df["x"].tolist())

# In-place
df.sort_values("x", inplace=True)
print("after in-place:", df["x"].tolist())

Reset the index after sorting¶

Sorting reshuffles rows but keeps the old index. Often you want a clean integer index:

import pandas as pd

df = pd.DataFrame({
    "x": [3, 1, 4, 1, 5],
})

sorted_df = df.sort_values("x").reset_index(drop=True)
print(sorted_df)

drop=True discards the old index. Without it, the old index becomes a new "index" column.

Sort with a custom key¶

Sort by string length, by month name, by anything computable:

import pandas as pd

df = pd.DataFrame({
    "name": ["bee", "Aardvark", "cat", "Donkey", "elephant"],
})

# Case-insensitive
print(df.sort_values("name", key=lambda s: s.str.lower()))
print()

# By name length
print(df.sort_values("name", key=lambda s: s.str.len()))

Top / Bottom N — `nlargest` / `nsmallest`¶

Faster than sort + slice when you only need the top N:

import pandas as pd

df = pd.DataFrame({
    "name":   ["Alice","Bob","Carol","Dave","Eve","Frank","Grace","Henry"],
    "salary": [50000, 80000, 75000, 60000, 90000, 100000, 70000, 110000],
})

# Top 3 by salary
print(df.nlargest(3, "salary"))
print()

# Bottom 3 by salary
print(df.nsmallest(3, "salary"))

Sorting categorical with custom order¶

If a column has a logical order (small < medium < large), tell Pandas:

import pandas as pd

df = pd.DataFrame({
    "size":  ["medium", "small", "large", "small", "large", "medium"],
    "price": [25, 10, 50, 12, 55, 23],
})

# Default: alphabetical (large, medium, small) — wrong!
print(df.sort_values("size"))
print()

# Categorical with explicit order
df["size"] = pd.Categorical(df["size"], categories=["small","medium","large"], ordered=True)
print(df.sort_values("size"))

Ranking — `.rank()`¶

Assign ranks to values:

import pandas as pd

df = pd.DataFrame({
    "name":   ["Alice","Bob","Carol","Dave"],
    "score":  [85, 92, 78, 92],     # Bob and Dave tied
})

df["rank"]        = df["score"].rank(ascending=False)
df["rank_min"]    = df["score"].rank(ascending=False, method="min")
df["rank_dense"]  = df["score"].rank(ascending=False, method="dense")

print(df)

Method	Behavior with ties
`average` (default)	average rank of tied positions (e.g. 1.5, 1.5)
`min`	lowest of the tied ranks (e.g. 1, 1, 3)
`max`	highest of the tied ranks
`first`	order ties by appearance
`dense`	like `min` but no gaps in the rank sequence

Reverse a DataFrame¶

import pandas as pd

df = pd.DataFrame({"x": [1, 2, 3, 4, 5]})

print(df[::-1])               # reverse rows
print()
print(df.iloc[::-1])           # same — explicit

Sorting groups — common pattern¶

Within each city, sort employees by salary descending:

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "name":   [f"User{i}" for i in range(1, 11)],
    "city":   rng.choice(["Mumbai","Delhi","Pune"], size=10),
    "salary": rng.integers(40_000, 200_000, size=10),
})

result = df.sort_values(["city", "salary"], ascending=[True, False])
print(result)

Top-K per group¶

Use groupby + head (after sorting):

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "city":   rng.choice(["Mumbai","Delhi","Pune"], size=15),
    "name":   [f"User{i}" for i in range(1, 16)],
    "salary": rng.integers(40_000, 200_000, size=15),
})

# Top 2 highest-paid per city
result = (
    df
    .sort_values("salary", ascending=False)
    .groupby("city")
    .head(2)
)
print(result.sort_values(["city", "salary"], ascending=[True, False]))

Cheatsheet¶

Goal	Code
Sort by a column	`df.sort_values("c")`
Sort descending	`df.sort_values("c", ascending=False)`
Sort by multiple	`df.sort_values(["c1","c2"], ascending=[True,False])`
Sort by index	`df.sort_index()`
Reset index	`.reset_index(drop=True)`
Top-N rows	`df.nlargest(n, "c")`
Bottom-N rows	`df.nsmallest(n, "c")`
Rank a column	`df["c"].rank()`
Custom key	`df.sort_values("c", key=lambda s: s.str.lower())`
NaN placement	`na_position="first"` or `"last"`
Top-N per group	`df.sort_values(...).groupby("g").head(n)`

Common pitfalls¶

❗ Sort doesn't modify by default — assign the result or use inplace=True.
❗ String columns sort alphabetically — "10" sorts before "2". Convert to numeric or pass a key.
❗ Old index sticks after sorting — use .reset_index(drop=True) if you want a clean integer index.
❗ rank() returns floats by default — use method="min" and cast to int if you want plain ranks.
❗ Multiple-column sort with mixed ascending — pass a list matching the columns.

Practice¶

What does this print?

Expected: [1, 2, 3]

import pandas as pd
df = pd.DataFrame({"x": [3, 1, 2]})
print(df.sort_values("x")["x"].tolist())

Get the top 3 highest salaries (use nlargest)

Expected: [110000, 100000, 90000]

import pandas as pd
df = pd.DataFrame({"salary": [50000, 80000, 75000, 60000, 90000, 100000, 70000, 110000]})
print(df.sort_values("salary").head(3)["salary"].tolist())  # bug: sorts ascending — need descending or nlargest

Quiz — Quick check¶

What you remember

Q1. What does df.sort_values("x") return?

Modifies df in place and returns None
A new sorted DataFrame — df is unchanged
An error if x has NaN
A sorted Series

Why: All Pandas methods are immutable by default. To modify in place, pass inplace=True or reassign: df = df.sort_values("x").

Q2. Where does NaN go in a default ascending sort?

Top
Bottom
Removed
Raises an error

Why: Pandas sorts NaN to the bottom in both ascending and descending order. Override with na_position="first".

Q3. What's faster than df.sort_values("x", ascending=False).head(3) for "top 3 by x"?

df.head(3).sort_values("x") (wrong result)
df.nlargest(3, "x")
df[df["x"] > df["x"].mean()]
df.sort_index().head(3)

Why: nlargest uses a partial-sort algorithm — O(N log K) instead of O(N log N). For huge DataFrames where K is small, the difference is significant.

Common doubts¶

Why does my index look weird after sorting?

Because sorting reorders rows but keeps their original labels. The first row in the result might have index 42 if that row was sorted to the top. Add .reset_index(drop=True) for a clean 0..N-1 index.

How do I sort a string column case-insensitively?

Pass a key= argument: df.sort_values("name", key=lambda s: s.str.lower()). Without the key, "Zebra" comes before "alpha" because uppercase letters have lower Unicode codepoints than lowercase.

What's rank for?

Assigning a position (1^st, 2^nd, 3^rd) within a Series. Used for percentile calculations, leaderboards, and ML feature engineering. Pay attention to method= for tie handling — min, dense, and average give different answers.

What's next¶

→ GroupBy — Split-Apply-Combine