State & Schema¶

1. Why this matters¶

LangChain chains pass implicit data through | pipes — each step gets whatever the previous step returned. That breaks down when:

You need to share data between non-adjacent nodes (e.g., the classifier in step 1 and the formatter in step 5).
A loop needs to accumulate values across iterations (chat history, retry counter).
Two parallel branches both want to append to the same list.

State + reducers solve all of that explicitly.

2. Mental model¶

Think of the graph as a Pregel-style compute over a shared, typed dictionary:

flowchart LR
    S0[State at start<br/>{a:1, b:'x', msgs:[]}] --> N1[Node A<br/>returns {a: 2, msgs: ['hi']}]
    N1 -->|engine merges using reducers| S1[State after A<br/>{a:2, b:'x', msgs:['hi']}]
    S1 --> N2[Node B<br/>returns {msgs: ['hello']}]
    N2 -->|add_messages reducer appends| S2[State after B<br/>{a:2, b:'x', msgs:['hi','hello']}]

Two rules to remember: 1. Nodes return updates, not full state. Just the keys they changed. 2. Reducers control merging. Without a reducer, new value replaces old. With Annotated[list, add_messages], new value appends.

3. Architecture / Flow¶

How a single state key gets updated:

flowchart TD
    A[Node returns:<br/> messages: new_msg] --> B{Reducer defined?}
    B -->|No| C[Replace: state.messages = new_msg]
    B -->|operator.add or add_messages| D[Append: state.messages = old + new_msg]
    C --> E[New state snapshot]
    D --> E

4. Core concepts¶

TypedDict — Python's typed dictionary. The standard way to define state. Fast, simple, no validation.
Pydantic BaseModel — alternative state schema. Slower but gives validation and defaults.
Annotated[Type, Reducer] — attach a reducer to a field. Without it: replace. With it: merge per the reducer.
operator.add — the simplest reducer: +=. Works for lists, ints, strings.
add_messages — the messages-aware reducer. Appends new messages, de-duplicates by ID, lets you target updates by ID. Use this for any list of LangChain messages.
MessagesState — a pre-built TypedDict that's just {"messages": Annotated[list[BaseMessage], add_messages]}. Saves typing.
Partial state return — a node returns only the keys it changed. The engine handles the merge.
Reading state — inside a node, do state["key"] — it's a regular dict.

5. Code — minimal working example¶

from typing import TypedDict, Annotated
import operator
from langgraph.graph import StateGraph, START, END

class CounterState(TypedDict):
    # Without Annotated → replace
    name: str
    # With operator.add → append/accumulate
    log: Annotated[list[str], operator.add]
    count: Annotated[int, operator.add]      # ints add too!

def step_a(state):
    return {"log": ["A ran"], "count": 1}

def step_b(state):
    return {"log": ["B ran"], "count": 1, "name": "B"}

g = StateGraph(CounterState)
g.add_node("a", step_a); g.add_node("b", step_b)
g.add_edge(START, "a"); g.add_edge("a", "b"); g.add_edge("b", END)
graph = g.compile()

print(graph.invoke({"name": "init", "log": [], "count": 0}))
# {'name': 'B', 'log': ['A ran', 'B ran'], 'count': 2}
#                   ↑ appended       ↑ summed     ↑ replaced

6. Code — real-world pattern¶

Chat state with add_messages reducer — the most common LangGraph pattern:

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, SystemMessage
from langchain_openai import ChatOpenAI

class ChatState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]

llm = ChatOpenAI(model="gpt-4o-mini")

def chat_node(state: ChatState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}   # add_messages will APPEND, not replace

g = StateGraph(ChatState)
g.add_node("chat", chat_node)
g.add_edge(START, "chat")
g.add_edge("chat", END)
chatbot = g.compile()

result = chatbot.invoke({
    "messages": [
        SystemMessage("You are helpful."),
        HumanMessage("What is LangGraph in one sentence?"),
    ]
})
for m in result["messages"]:
    print(m.type, "→", m.content)

Pydantic state with validation:

from pydantic import BaseModel, Field
from typing import Literal

class TweetState(BaseModel):
    topic: str
    draft: str = ""
    iteration: int = 0
    status: Literal["draft", "approved", "rejected"] = "draft"

# Pass the class to StateGraph — same as with TypedDict
g = StateGraph(TweetState)

Custom reducer — keep only the last N items:

from typing import Annotated, TypedDict

def keep_last_10(old: list, new: list) -> list:
    return (old + new)[-10:]

class State(TypedDict):
    recent_events: Annotated[list[str], keep_last_10]

Inspecting state mid-run via streaming:

for event in graph.stream({"messages": [HumanMessage("hi")]},
                          stream_mode="values"):
    print("State now:", event)

7. Common pitfalls¶

❗ Returning the full state. Nodes should return ONLY the keys they changed. Returning the full state works but defeats the merge model — and can silently overwrite parallel updates.
❗ Forgetting Annotated[..., add_messages] on a messages list. Without it, every chat turn REPLACES the conversation. The bot forgets everything.
❗ Adding Annotated then writing the full new list anyway. add_messages appends — return {"messages": [new_msg]} to append one, not {"messages": old + [new_msg]} (which double-appends).
❗ Pydantic states forgetting defaults. Pydantic raises if a required field is missing in the initial state — give defaults or pass them all explicitly.
❗ Storing huge blobs in state. State is checkpointed on every step — large blobs balloon storage. Keep state lean; store big data in a side store + reference by ID.

8. When to use vs not use¶

Choice	When
`TypedDict` state	Default — fast, simple
Pydantic `BaseModel` state	You want validation, defaults, computed fields
`MessagesState` shortcut	Pure chatbot with no other state
Custom reducer	Sliding windows, max-of, set-union, dedup
Replace (no reducer)	Single-writer fields (current_step, classification result)
`operator.add` reducer	Simple accumulators (counters, log lists)
`add_messages` reducer	Any list of LangChain messages

9. Cheatsheet¶

from typing import TypedDict, Annotated
from operator import add
from langgraph.graph import StateGraph, START, END, MessagesState
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage

# Replace (default)
class S(TypedDict):
    user_id: str
    current_step: str

# Append a list
class S(TypedDict):
    logs: Annotated[list[str], add]

# Sum a counter
class S(TypedDict):
    count: Annotated[int, add]

# Messages (most common)
class S(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]

# Or just use the prebuilt
class S(MessagesState):
    extra_field: str            # add your own keys alongside messages

# Custom reducer
def dedup_union(old: set, new: set) -> set:
    return (old or set()) | new

# In a node — return partial update
def my_node(state: S) -> dict:
    return {"messages": [AIMessage("hi")]}    # appended
    # NOT: return {**state, "messages": ...}  # don't return full state

10. Q&A — recall test¶

Q: Why don't nodes return the full state? A: Returning a partial update lets the engine apply reducers correctly. Returning the full state would clobber any parallel updates and bypass the reducer logic.
Q: What does Annotated[list, add_messages] do that Annotated[list, operator.add] doesn't? A: add_messages is messages-aware: appends, de-duplicates by message ID, lets you target replacements by ID. operator.add just concatenates lists blindly.
Q: When should you use a Pydantic state instead of TypedDict? A: When you want validation (raise on bad input), defaults for missing fields, or computed properties. Costs a bit of speed but worth it for production-ish state.
Q: Two parallel nodes both return {"logs": ["X"]} and {"logs": ["Y"]}. Final value of logs if the reducer is operator.add? A: ["X", "Y"] (order may depend on engine). Without a reducer, you'd get whichever wrote last — a race.
Q: Should you store a large vector store inside state? A: No. State is checkpointed every step — keep it tiny. Store the vector store outside (FAISS file, Pinecone) and reference by name/path in state if needed.

Practice¶

What does this print?

Expected: True

from typing import TypedDict
class State(TypedDict):
    query: str
    results: list
s: State = {"query": "hello", "results": []}
print("query" in s and isinstance(s["results"], list))

Use Annotated with operator.add to ACCUMULATE messages across nodes

Expected: True

use_reducer_for_accumulation = False       # bug: should be True for list accumulation
print(not use_reducer_for_accumulation)

Quiz — Quick check¶

What you remember

Q1. What's the typical type used to define LangGraph state?

dict
TypedDict (or Pydantic BaseModel)
list
set

Why: TypedDict gives static type checking while remaining dict-like at runtime. Pydantic adds validation. TypedDict is the default in LangGraph examples.

Q2. What does Annotated[list, operator.add] mean for a state field?

New values returned by nodes are APPENDED (concatenated) to the existing list — reducer pattern
Validates the field is a list
Marks it as required
Makes it immutable

Why: Without a reducer, a node returning {"messages": [new_msg]} REPLACES the entire messages list. With operator.add reducer, the new list is appended. Standard pattern for chat history.

Q3. Should state stay small?

Yes — state is checkpointed at every step; large state means slow checkpoints
No, bigger is better
Doesn't matter
Only for production

Why: Each step writes the full state. Store identifiers/IDs/small results in state; load heavy resources from external stores (vector DB, file system, S3) on demand.

Common doubts¶

TypedDict or Pydantic for state?

TypedDict for simple state. Pydantic when you want runtime validation, defaults, or complex constraints. Pydantic is slower but catches bugs early. Default to TypedDict; upgrade if needed.

What goes in state vs config?

State = changes during the run (messages, intermediate results). Config = set once at the start (user_id, thread_id). Access config in nodes via the second arg.

Can two nodes write to the same field concurrently?

Yes — that's where reducers matter. Without a reducer, last writer wins. With Annotated[list, operator.add], both contributions concatenate. Custom reducers for other merge logic.