Tools & Tool-Calling¶
1. Why this matters¶
LLMs are isolated text-completion machines. To answer "What's the weather in Paris?" or "Is x.com paying its taxes?" the model needs tools — functions it can call to fetch real, fresh, computed data.
Modern OpenAI / Anthropic / Gemini models support native function calling — they reliably return structured tool-call requests instead of text. LangChain standardizes this across providers so the same tool definition works for all of them.
Tools are also the foundation of agents (next chapter) — an agent is just an LLM that calls tools in a loop.
2. Mental model¶
A tool is:
The LLM sees name + description + schema, decides if/when to call, and emits a structured request. Your code runs the function — the LLM never executes code itself.
sequenceDiagram
actor U as User
participant M as LLM
participant App as Your code
participant T as Tool
U->>M: "Weather in Paris?"
M-->>App: tool_call: get_weather city="Paris"
App->>T: get_weather("Paris")
T-->>App: "18°C, sunny"
App->>M: ToolMessage content="18°C, sunny"
M->>U: "It's 18°C and sunny in Paris."
The model can also choose not to call any tool and just answer directly. It can also call multiple tools in parallel.
3. Architecture / Flow¶
Three ways to define a tool, escalating in power:
flowchart TD
A{What do you need?} -->|Quick prototype<br/>type hints + docstring| B["@tool decorator<br/> 80–90% of cases "]
A -->|Strict validation,<br/>Pydantic schema| C[StructuredTool.from_function<br/>+ Pydantic args_schema]
A -->|Custom logic / state /<br/>async / error handling| D[Subclass BaseTool]
style B fill:#e8f5e9
4. Core concepts¶
- Tool name — short identifier the model uses to refer to it (
get_weather). - Description — natural language explanation of when to use it. This is the most important field — the model picks tools based on the description.
args_schema— Pydantic model describing the input args, with field descriptions. Drives both validation and the function-call schema sent to the LLM.- Tool calling —
model.bind_tools([tool1, tool2])returns a new model that may emittool_callson itsAIMessages. ToolMessage— the message type used to feed a tool's result back to the model.InjectedToolArg— mark an arg as "passed by the runtime, not the LLM" (e.g., user_id from auth). Keeps the LLM from inventing one.- Toolkit — a bundled collection of related tools (e.g.,
SQLDatabaseToolkit= list_tables + describe_table + run_query).
5. Code — minimal working example¶
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
@tool
def multiply(a: int, b: int) -> int:
"""Multiply two integers together."""
return a * b
model = ChatOpenAI(model="gpt-4o-mini").bind_tools([multiply])
reply = model.invoke("What is 23 times 17?")
print(reply.tool_calls)
# [{'name': 'multiply', 'args': {'a': 23, 'b': 17}, 'id': 'call_xxx'}]
Then run the tool and feed the result back:
from langchain_core.messages import HumanMessage, ToolMessage
messages = [HumanMessage("What is 23 times 17?")]
ai_msg = model.invoke(messages)
messages.append(ai_msg)
for call in ai_msg.tool_calls:
result = multiply.invoke(call["args"])
messages.append(ToolMessage(content=str(result), tool_call_id=call["id"]))
final = model.invoke(messages)
print(final.content) # "23 times 17 is 391."
6. Code — real-world pattern¶
StructuredTool with Pydantic schema — strict validation for production:
from pydantic import BaseModel, Field
from langchain_core.tools import StructuredTool
class WeatherInput(BaseModel):
city: str = Field(description="City name in English, e.g., 'Paris'")
units: str = Field(default="celsius", description="'celsius' or 'fahrenheit'")
def get_weather_fn(city: str, units: str) -> str:
# ... real API call ...
return f"{city}: 18°{units[0].upper()}, sunny"
weather_tool = StructuredTool.from_function(
func=get_weather_fn,
name="get_weather",
description="Get current weather for a city. Use ONLY when the user asks about weather.",
args_schema=WeatherInput,
return_direct=False, # if True, tool output is returned to the user without LLM post-processing
)
Injected args — pass runtime context (user_id, request_id) that the LLM should NOT control:
from typing import Annotated
from langchain_core.tools import InjectedToolArg
@tool
def get_account_balance(
account_id: str,
user_id: Annotated[str, InjectedToolArg], # never exposed to LLM
) -> str:
"""Get the balance for a given account."""
if not user_owns_account(user_id, account_id):
return "Access denied."
return f"Balance: ${lookup(account_id)}"
# When invoking, you supply user_id from auth; the LLM only fills account_id
tool_call = ai_msg.tool_calls[0]
result = get_account_balance.invoke({**tool_call["args"], "user_id": current_user.id})
BaseTool subclass — full control + async:
from langchain_core.tools import BaseTool
from typing import Optional
class DatabaseQueryTool(BaseTool):
name: str = "db_query"
description: str = "Run a read-only SQL query on the analytics DB."
args_schema: type = QueryInput
def _run(self, query: str) -> str:
return run_sql(query)
async def _arun(self, query: str) -> str:
return await arun_sql(query)
Use a pre-built tool (e.g., Tavily for web search):
from langchain_community.tools.tavily_search import TavilySearchResults
search = TavilySearchResults(max_results=4)
results = search.invoke("latest news about LangChain")
Wire into an LCEL agent loop — the simplest agent is just model.bind_tools(...) + a loop. For real agents, use the Agents chapter or LangGraph.
7. Common pitfalls¶
- ❗ Weak tool descriptions. "Calculates things" is useless — the model can't tell when to use it. Be precise: "Multiply two integers. Use only for arithmetic, not for currency or date math."
- ❗ Letting the LLM control sensitive args. Tenant IDs, user IDs, auth tokens — wrap with
InjectedToolArgand supply them server-side. - ❗ Hand-parsing tool_calls when you don't need to. Use
create_react_agent(LangGraph),AgentExecutor, or an LCEL agent loop instead of writing your own dispatch. - ❗ Tools with side effects + no idempotency. Models retry on error. A
send_emailtool that doesn't dedupe byrequest_idwill send 3 emails. - ❗ Too many tools at once. Models start mis-selecting around 15–20+ tools. Either narrow per request or use a hierarchical/router pattern.
- ❗ Returning huge blobs to the model. Tool output goes back into the context window. Return concise text — paginate or summarize before returning.
8. When to use vs not use¶
| Choose | When |
|---|---|
@tool decorator |
Default. Simple, typed, prototyping |
StructuredTool.from_function() |
Need explicit Pydantic schema, descriptions per field, production validation |
BaseTool subclass |
Async + sync, custom error handling, state |
| Pre-built community tool | Common needs (search, calculator, SQL, Python REPL) — don't reinvent |
| Don't use tools at all | One-shot answer the model already knows; or use a deterministic chain instead |
9. Cheatsheet¶
# Decorator (most common)
from langchain_core.tools import tool
@tool
def my_tool(arg: str) -> str:
"""Description matters — model picks based on this."""
return ...
@tool("custom_name", return_direct=True)
def my_tool(...): ...
# StructuredTool
from langchain_core.tools import StructuredTool
tool_obj = StructuredTool.from_function(
func=fn,
name="...",
description="...",
args_schema=PydanticModel,
return_direct=False,
)
# Bind to model and emit tool_calls
model_with_tools = ChatOpenAI(model="gpt-4o-mini").bind_tools(
tools=[tool1, tool2],
tool_choice="auto", # or "required", or a specific name
strict=True, # OpenAI strict schema
)
ai_msg = model_with_tools.invoke("...")
ai_msg.tool_calls # [{'name': ..., 'args': {...}, 'id': ...}]
# Invoke a tool
result = my_tool.invoke({"arg": "value"})
# Feed result back
from langchain_core.messages import ToolMessage
ToolMessage(content=str(result), tool_call_id=call["id"])
# Injected args (server-side context, hidden from LLM)
from typing import Annotated
from langchain_core.tools import InjectedToolArg
@tool
def t(public_arg: str, user_id: Annotated[str, InjectedToolArg]) -> str: ...
# Common pre-built tools (langchain_community)
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import PythonREPL
from langchain_community.agent_toolkits import SQLDatabaseToolkit
10. Q&A — recall test¶
-
Q: Does the LLM execute tool code itself? A: No. The LLM emits a structured
tool_callrequest (name + args). Your application code runs the function and feeds the result back via aToolMessage. -
Q: What's the single most important field on a tool? A: The description. The model decides whether to use a tool based on it. Vague descriptions → wrong tool selected.
-
Q: Difference between
@toolandStructuredTool.from_function? A:@toolis a quick decorator with implicit schema from type hints + docstring.StructuredTool.from_functiontakes an explicit Pydanticargs_schemafor stricter validation + better per-field descriptions. -
Q: What problem does
InjectedToolArgsolve? A: Some args (user_id, tenant, auth token) must be supplied by your server, not chosen by the LLM.InjectedToolArgkeeps them out of the LLM-visible schema while still letting the tool receive them at invocation. -
Q: Why might the model emit zero
tool_calls? A: It decided no tool was needed and answered directly. That's normal —tool_callsis empty when the model is confident it can answer. -
Q: When does tool-calling break down? A: Too many tools (~15+), tools with overlapping descriptions, or tools returning massive blobs. Mitigations: narrow the tool set per request, sharpen descriptions, summarize tool output before returning.
Practice¶
What does this print?
Expected: 15
Add type hints and a clear docstring (LLMs use these to decide when to call the tool)
Expected: True
Quiz — Quick check¶
What you remember
Q1. What's the most important part of a tool definition for the LLM?
- The function name
- The docstring + type hints — the LLM reads these to decide WHEN to call the tool
- The return type
- The arguments
Why: The LLM doesn't see your function body. It only sees the description (docstring) and the parameter schema (from type hints). Vague descriptions = wrong tool calls.
Q2. What happens after the LLM calls a tool?
- You execute the tool with the LLM's arguments and send the result back as another message; the LLM continues with that info
- The tool runs automatically inside the LLM
- Execution is instant
- The LLM can't continue
Why: Tool calling is a back-and-forth. LLM proposes the call → your code runs it → result goes back to LLM → LLM continues reasoning with the result. LangChain (or LangGraph) orchestrates this loop.
Q3. How many tools can an LLM reliably use?
- Unlimited
- ~10-15 — beyond that, tool choice quality drops as descriptions blur together
- Exactly 1
- 100
Why: When tools overlap or descriptions are vague, the LLM picks wrong tools. Sharpen descriptions, group related tools into a single "router" tool, or use per-task tool subsets.
Common doubts¶
Should I use LangChain tools or write raw function calls myself?
For simple cases, raw is fine. LangChain tools save you the parsing/dispatching boilerplate and integrate cleanly with LangGraph agents. For 2+ tools, LangChain pays off. For 1 tool, do whatever's simpler.
What's the difference between sync and async tools?
Sync tools (@tool on a regular function) block. Async tools (@tool on an async def) let you await without blocking the rest of the chain — important when calling slow APIs. In a long-running agent, async tools dramatically improve throughput.
How do I handle tools that need authentication or user-scoped state?
Use dependency injection — pass user_id / auth context via the chain's configurable or RunnableConfig. Tools access it from there. Avoid global state; it doesn't scale to multi-tenant servers.