Skip to content

Sorting & Ranking

Ordering a DataFrame by column values (or index), and assigning ranks.

Sort by a single column — sort_values

import pandas as pd

df = pd.DataFrame({
    "name":   ["Alice","Bob","Carol","Dave","Eve"],
    "age":    [25, 30, 35, 28, 22],
    "salary": [50000, 80000, 75000, 60000, 90000],
})

# Ascending (default)
print(df.sort_values("age"))
print()

# Descending
print(df.sort_values("salary", ascending=False))

Sort by multiple columns

import pandas as pd

df = pd.DataFrame({
    "name":   ["Alice","Bob","Carol","Dave","Eve","Frank"],
    "city":   ["Mumbai","Delhi","Mumbai","Delhi","Pune","Mumbai"],
    "salary": [50000, 80000, 75000, 60000, 90000, 100000],
})

# Sort by city (asc), then within each city by salary (desc)
print(df.sort_values(
    by=["city", "salary"],
    ascending=[True, False],
))

Sort by index — sort_index

import pandas as pd

df = pd.DataFrame({
    "val": [3, 1, 2, 4]
}, index=["d", "b", "c", "a"])

print(df.sort_index())
print()

# Descending
print(df.sort_index(ascending=False))

NaN handling

By default NaN goes to the bottom. Override with na_position:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "name":  ["Alice","Bob","Carol","Dave"],
    "score": [85, np.nan, 90, 70],
})

print("NaN at bottom (default):")
print(df.sort_values("score"))
print()

print("NaN at top:")
print(df.sort_values("score", na_position="first"))

In-place sort

import pandas as pd

df = pd.DataFrame({"x": [3, 1, 4, 1, 5]})

# Default — returns a new sorted DataFrame
sorted_df = df.sort_values("x")
print("original:", df["x"].tolist())
print("sorted  :", sorted_df["x"].tolist())

# In-place
df.sort_values("x", inplace=True)
print("after in-place:", df["x"].tolist())

Reset the index after sorting

Sorting reshuffles rows but keeps the old index. Often you want a clean integer index:

import pandas as pd

df = pd.DataFrame({
    "x": [3, 1, 4, 1, 5],
})

sorted_df = df.sort_values("x").reset_index(drop=True)
print(sorted_df)

drop=True discards the old index. Without it, the old index becomes a new "index" column.

Sort with a custom key

Sort by string length, by month name, by anything computable:

import pandas as pd

df = pd.DataFrame({
    "name": ["bee", "Aardvark", "cat", "Donkey", "elephant"],
})

# Case-insensitive
print(df.sort_values("name", key=lambda s: s.str.lower()))
print()

# By name length
print(df.sort_values("name", key=lambda s: s.str.len()))

Top / Bottom N — nlargest / nsmallest

Faster than sort + slice when you only need the top N:

import pandas as pd

df = pd.DataFrame({
    "name":   ["Alice","Bob","Carol","Dave","Eve","Frank","Grace","Henry"],
    "salary": [50000, 80000, 75000, 60000, 90000, 100000, 70000, 110000],
})

# Top 3 by salary
print(df.nlargest(3, "salary"))
print()

# Bottom 3 by salary
print(df.nsmallest(3, "salary"))

Sorting categorical with custom order

If a column has a logical order (small < medium < large), tell Pandas:

import pandas as pd

df = pd.DataFrame({
    "size":  ["medium", "small", "large", "small", "large", "medium"],
    "price": [25, 10, 50, 12, 55, 23],
})

# Default: alphabetical (large, medium, small) — wrong!
print(df.sort_values("size"))
print()

# Categorical with explicit order
df["size"] = pd.Categorical(df["size"], categories=["small","medium","large"], ordered=True)
print(df.sort_values("size"))

Ranking — .rank()

Assign ranks to values:

import pandas as pd

df = pd.DataFrame({
    "name":   ["Alice","Bob","Carol","Dave"],
    "score":  [85, 92, 78, 92],     # Bob and Dave tied
})

df["rank"]        = df["score"].rank(ascending=False)
df["rank_min"]    = df["score"].rank(ascending=False, method="min")
df["rank_dense"]  = df["score"].rank(ascending=False, method="dense")

print(df)
Method Behavior with ties
average (default) average rank of tied positions (e.g. 1.5, 1.5)
min lowest of the tied ranks (e.g. 1, 1, 3)
max highest of the tied ranks
first order ties by appearance
dense like min but no gaps in the rank sequence

Reverse a DataFrame

import pandas as pd

df = pd.DataFrame({"x": [1, 2, 3, 4, 5]})

print(df[::-1])               # reverse rows
print()
print(df.iloc[::-1])           # same — explicit

Sorting groups — common pattern

Within each city, sort employees by salary descending:

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "name":   [f"User{i}" for i in range(1, 11)],
    "city":   rng.choice(["Mumbai","Delhi","Pune"], size=10),
    "salary": rng.integers(40_000, 200_000, size=10),
})

result = df.sort_values(["city", "salary"], ascending=[True, False])
print(result)

Top-K per group

Use groupby + head (after sorting):

import pandas as pd
import numpy as np

rng = np.random.default_rng(0)
df = pd.DataFrame({
    "city":   rng.choice(["Mumbai","Delhi","Pune"], size=15),
    "name":   [f"User{i}" for i in range(1, 16)],
    "salary": rng.integers(40_000, 200_000, size=15),
})

# Top 2 highest-paid per city
result = (
    df
    .sort_values("salary", ascending=False)
    .groupby("city")
    .head(2)
)
print(result.sort_values(["city", "salary"], ascending=[True, False]))

Cheatsheet

Goal Code
Sort by a column df.sort_values("c")
Sort descending df.sort_values("c", ascending=False)
Sort by multiple df.sort_values(["c1","c2"], ascending=[True,False])
Sort by index df.sort_index()
Reset index .reset_index(drop=True)
Top-N rows df.nlargest(n, "c")
Bottom-N rows df.nsmallest(n, "c")
Rank a column df["c"].rank()
Custom key df.sort_values("c", key=lambda s: s.str.lower())
NaN placement na_position="first" or "last"
Top-N per group df.sort_values(...).groupby("g").head(n)

Common pitfalls

  • Sort doesn't modify by default — assign the result or use inplace=True.
  • String columns sort alphabetically — "10" sorts before "2". Convert to numeric or pass a key.
  • Old index sticks after sorting — use .reset_index(drop=True) if you want a clean integer index.
  • rank() returns floats by default — use method="min" and cast to int if you want plain ranks.
  • Multiple-column sort with mixed ascending — pass a list matching the columns.

Practice

What does this print?

Expected: [1, 2, 3]

import pandas as pd
df = pd.DataFrame({"x": [3, 1, 2]})
print(df.sort_values("x")["x"].tolist())

Get the top 3 highest salaries (use nlargest)

Expected: [110000, 100000, 90000]

import pandas as pd
df = pd.DataFrame({"salary": [50000, 80000, 75000, 60000, 90000, 100000, 70000, 110000]})
print(df.sort_values("salary").head(3)["salary"].tolist())  # bug: sorts ascending — need descending or nlargest

Quiz — Quick check

What you remember

Q1. What does df.sort_values("x") return?

  • Modifies df in place and returns None
  • A new sorted DataFrame — df is unchanged
  • An error if x has NaN
  • A sorted Series

Why: All Pandas methods are immutable by default. To modify in place, pass inplace=True or reassign: df = df.sort_values("x").

Q2. Where does NaN go in a default ascending sort?

  • Top
  • Bottom
  • Removed
  • Raises an error

Why: Pandas sorts NaN to the bottom in both ascending and descending order. Override with na_position="first".

Q3. What's faster than df.sort_values("x", ascending=False).head(3) for "top 3 by x"?

  • df.head(3).sort_values("x") (wrong result)
  • df.nlargest(3, "x")
  • df[df["x"] > df["x"].mean()]
  • df.sort_index().head(3)

Why: nlargest uses a partial-sort algorithm — O(N log K) instead of O(N log N). For huge DataFrames where K is small, the difference is significant.

Common doubts

Why does my index look weird after sorting?

Because sorting reorders rows but keeps their original labels. The first row in the result might have index 42 if that row was sorted to the top. Add .reset_index(drop=True) for a clean 0..N-1 index.

How do I sort a string column case-insensitively?

Pass a key= argument: df.sort_values("name", key=lambda s: s.str.lower()). Without the key, "Zebra" comes before "alpha" because uppercase letters have lower Unicode codepoints than lowercase.

What's rank for?

Assigning a position (1st, 2nd, 3rd) within a Series. Used for percentile calculations, leaderboards, and ML feature engineering. Pay attention to method= for tie handling — min, dense, and average give different answers.

What's next

GroupBy — Split-Apply-Combine