State & Schema¶
1. Why this matters¶
LangChain chains pass implicit data through | pipes — each step gets whatever the previous step returned. That breaks down when:
- You need to share data between non-adjacent nodes (e.g., the classifier in step 1 and the formatter in step 5).
- A loop needs to accumulate values across iterations (chat history, retry counter).
- Two parallel branches both want to append to the same list.
State + reducers solve all of that explicitly.
2. Mental model¶
Think of the graph as a Pregel-style compute over a shared, typed dictionary:
flowchart LR
S0[State at start<br/>{a:1, b:'x', msgs:[]}] --> N1[Node A<br/>returns {a: 2, msgs: ['hi']}]
N1 -->|engine merges using reducers| S1[State after A<br/>{a:2, b:'x', msgs:['hi']}]
S1 --> N2[Node B<br/>returns {msgs: ['hello']}]
N2 -->|add_messages reducer appends| S2[State after B<br/>{a:2, b:'x', msgs:['hi','hello']}]
Two rules to remember:
1. Nodes return updates, not full state. Just the keys they changed.
2. Reducers control merging. Without a reducer, new value replaces old. With Annotated[list, add_messages], new value appends.
3. Architecture / Flow¶
How a single state key gets updated:
flowchart TD
A[Node returns:<br/> messages: new_msg] --> B{Reducer defined?}
B -->|No| C[Replace: state.messages = new_msg]
B -->|operator.add or add_messages| D[Append: state.messages = old + new_msg]
C --> E[New state snapshot]
D --> E
4. Core concepts¶
TypedDict— Python's typed dictionary. The standard way to define state. Fast, simple, no validation.Pydantic BaseModel— alternative state schema. Slower but gives validation and defaults.Annotated[Type, Reducer]— attach a reducer to a field. Without it: replace. With it: merge per the reducer.operator.add— the simplest reducer:+=. Works for lists, ints, strings.add_messages— the messages-aware reducer. Appends new messages, de-duplicates by ID, lets you target updates by ID. Use this for any list of LangChain messages.MessagesState— a pre-builtTypedDictthat's just{"messages": Annotated[list[BaseMessage], add_messages]}. Saves typing.- Partial state return — a node returns only the keys it changed. The engine handles the merge.
- Reading state — inside a node, do
state["key"]— it's a regular dict.
5. Code — minimal working example¶
from typing import TypedDict, Annotated
import operator
from langgraph.graph import StateGraph, START, END
class CounterState(TypedDict):
# Without Annotated → replace
name: str
# With operator.add → append/accumulate
log: Annotated[list[str], operator.add]
count: Annotated[int, operator.add] # ints add too!
def step_a(state):
return {"log": ["A ran"], "count": 1}
def step_b(state):
return {"log": ["B ran"], "count": 1, "name": "B"}
g = StateGraph(CounterState)
g.add_node("a", step_a); g.add_node("b", step_b)
g.add_edge(START, "a"); g.add_edge("a", "b"); g.add_edge("b", END)
graph = g.compile()
print(graph.invoke({"name": "init", "log": [], "count": 0}))
# {'name': 'B', 'log': ['A ran', 'B ran'], 'count': 2}
# ↑ appended ↑ summed ↑ replaced
6. Code — real-world pattern¶
Chat state with add_messages reducer — the most common LangGraph pattern:
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, SystemMessage
from langchain_openai import ChatOpenAI
class ChatState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
llm = ChatOpenAI(model="gpt-4o-mini")
def chat_node(state: ChatState):
response = llm.invoke(state["messages"])
return {"messages": [response]} # add_messages will APPEND, not replace
g = StateGraph(ChatState)
g.add_node("chat", chat_node)
g.add_edge(START, "chat")
g.add_edge("chat", END)
chatbot = g.compile()
result = chatbot.invoke({
"messages": [
SystemMessage("You are helpful."),
HumanMessage("What is LangGraph in one sentence?"),
]
})
for m in result["messages"]:
print(m.type, "→", m.content)
Pydantic state with validation:
from pydantic import BaseModel, Field
from typing import Literal
class TweetState(BaseModel):
topic: str
draft: str = ""
iteration: int = 0
status: Literal["draft", "approved", "rejected"] = "draft"
# Pass the class to StateGraph — same as with TypedDict
g = StateGraph(TweetState)
Custom reducer — keep only the last N items:
from typing import Annotated, TypedDict
def keep_last_10(old: list, new: list) -> list:
return (old + new)[-10:]
class State(TypedDict):
recent_events: Annotated[list[str], keep_last_10]
Inspecting state mid-run via streaming:
for event in graph.stream({"messages": [HumanMessage("hi")]},
stream_mode="values"):
print("State now:", event)
7. Common pitfalls¶
- ❗ Returning the full state. Nodes should return ONLY the keys they changed. Returning the full state works but defeats the merge model — and can silently overwrite parallel updates.
- ❗ Forgetting
Annotated[..., add_messages]on a messages list. Without it, every chat turn REPLACES the conversation. The bot forgets everything. - ❗ Adding
Annotatedthen writing the full new list anyway.add_messagesappends — return{"messages": [new_msg]}to append one, not{"messages": old + [new_msg]}(which double-appends). - ❗ Pydantic states forgetting defaults. Pydantic raises if a required field is missing in the initial state — give defaults or pass them all explicitly.
- ❗ Storing huge blobs in state. State is checkpointed on every step — large blobs balloon storage. Keep state lean; store big data in a side store + reference by ID.
8. When to use vs not use¶
| Choice | When |
|---|---|
TypedDict state |
Default — fast, simple |
Pydantic BaseModel state |
You want validation, defaults, computed fields |
MessagesState shortcut |
Pure chatbot with no other state |
| Custom reducer | Sliding windows, max-of, set-union, dedup |
| Replace (no reducer) | Single-writer fields (current_step, classification result) |
operator.add reducer |
Simple accumulators (counters, log lists) |
add_messages reducer |
Any list of LangChain messages |
9. Cheatsheet¶
from typing import TypedDict, Annotated
from operator import add
from langgraph.graph import StateGraph, START, END, MessagesState
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
# Replace (default)
class S(TypedDict):
user_id: str
current_step: str
# Append a list
class S(TypedDict):
logs: Annotated[list[str], add]
# Sum a counter
class S(TypedDict):
count: Annotated[int, add]
# Messages (most common)
class S(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
# Or just use the prebuilt
class S(MessagesState):
extra_field: str # add your own keys alongside messages
# Custom reducer
def dedup_union(old: set, new: set) -> set:
return (old or set()) | new
# In a node — return partial update
def my_node(state: S) -> dict:
return {"messages": [AIMessage("hi")]} # appended
# NOT: return {**state, "messages": ...} # don't return full state
10. Q&A — recall test¶
-
Q: Why don't nodes return the full state? A: Returning a partial update lets the engine apply reducers correctly. Returning the full state would clobber any parallel updates and bypass the reducer logic.
-
Q: What does
Annotated[list, add_messages]do thatAnnotated[list, operator.add]doesn't? A:add_messagesis messages-aware: appends, de-duplicates by message ID, lets you target replacements by ID.operator.addjust concatenates lists blindly. -
Q: When should you use a Pydantic state instead of
TypedDict? A: When you want validation (raise on bad input), defaults for missing fields, or computed properties. Costs a bit of speed but worth it for production-ish state. -
Q: Two parallel nodes both return
{"logs": ["X"]}and{"logs": ["Y"]}. Final value oflogsif the reducer isoperator.add? A:["X", "Y"](order may depend on engine). Without a reducer, you'd get whichever wrote last — a race. -
Q: Should you store a large vector store inside state? A: No. State is checkpointed every step — keep it tiny. Store the vector store outside (FAISS file, Pinecone) and reference by name/path in state if needed.
Practice¶
What does this print?
Expected: True
Use Annotated with operator.add to ACCUMULATE messages across nodes
Expected: True
Quiz — Quick check¶
What you remember
Q1. What's the typical type used to define LangGraph state?
- dict
-
TypedDict(or Pydantic BaseModel) - list
- set
Why:
TypedDictgives static type checking while remaining dict-like at runtime. Pydantic adds validation.TypedDictis the default in LangGraph examples.
Q2. What does Annotated[list, operator.add] mean for a state field?
- New values returned by nodes are APPENDED (concatenated) to the existing list — reducer pattern
- Validates the field is a list
- Marks it as required
- Makes it immutable
Why: Without a reducer, a node returning
{"messages": [new_msg]}REPLACES the entire messages list. Withoperator.addreducer, the new list is appended. Standard pattern for chat history.
Q3. Should state stay small?
- Yes — state is checkpointed at every step; large state means slow checkpoints
- No, bigger is better
- Doesn't matter
- Only for production
Why: Each step writes the full state. Store identifiers/IDs/small results in state; load heavy resources from external stores (vector DB, file system, S3) on demand.
Common doubts¶
TypedDict or Pydantic for state?
TypedDict for simple state. Pydantic when you want runtime validation, defaults, or complex constraints. Pydantic is slower but catches bugs early. Default to TypedDict; upgrade if needed.
What goes in state vs config?
State = changes during the run (messages, intermediate results). Config = set once at the start (user_id, thread_id). Access config in nodes via the second arg.
Can two nodes write to the same field concurrently?
Yes — that's where reducers matter. Without a reducer, last writer wins. With Annotated[list, operator.add], both contributions concatenate. Custom reducers for other merge logic.