Skip to content

Tools & Tool-Calling

1. Why this matters

LLMs are isolated text-completion machines. To answer "What's the weather in Paris?" or "Is x.com paying its taxes?" the model needs tools — functions it can call to fetch real, fresh, computed data.

Modern OpenAI / Anthropic / Gemini models support native function calling — they reliably return structured tool-call requests instead of text. LangChain standardizes this across providers so the same tool definition works for all of them.

Tools are also the foundation of agents (next chapter) — an agent is just an LLM that calls tools in a loop.

2. Mental model

A tool is:

name + description + input schema + a Python function

The LLM sees name + description + schema, decides if/when to call, and emits a structured request. Your code runs the function — the LLM never executes code itself.

sequenceDiagram
    actor U as User
    participant M as LLM
    participant App as Your code
    participant T as Tool
    U->>M: "Weather in Paris?"
    M-->>App: tool_call: get_weather city="Paris"
    App->>T: get_weather("Paris")
    T-->>App: "18°C, sunny"
    App->>M: ToolMessage content="18°C, sunny"
    M->>U: "It's 18°C and sunny in Paris."

The model can also choose not to call any tool and just answer directly. It can also call multiple tools in parallel.

3. Architecture / Flow

Three ways to define a tool, escalating in power:

flowchart TD
    A{What do you need?} -->|Quick prototype<br/>type hints + docstring| B["@tool decorator<br/> 80–90% of cases "]
    A -->|Strict validation,<br/>Pydantic schema| C[StructuredTool.from_function<br/>+ Pydantic args_schema]
    A -->|Custom logic / state /<br/>async / error handling| D[Subclass BaseTool]
    style B fill:#e8f5e9

4. Core concepts

  • Tool name — short identifier the model uses to refer to it (get_weather).
  • Description — natural language explanation of when to use it. This is the most important field — the model picks tools based on the description.
  • args_schema — Pydantic model describing the input args, with field descriptions. Drives both validation and the function-call schema sent to the LLM.
  • Tool callingmodel.bind_tools([tool1, tool2]) returns a new model that may emit tool_calls on its AIMessages.
  • ToolMessage — the message type used to feed a tool's result back to the model.
  • InjectedToolArg — mark an arg as "passed by the runtime, not the LLM" (e.g., user_id from auth). Keeps the LLM from inventing one.
  • Toolkit — a bundled collection of related tools (e.g., SQLDatabaseToolkit = list_tables + describe_table + run_query).

5. Code — minimal working example

from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

@tool
def multiply(a: int, b: int) -> int:
    """Multiply two integers together."""
    return a * b

model = ChatOpenAI(model="gpt-4o-mini").bind_tools([multiply])

reply = model.invoke("What is 23 times 17?")
print(reply.tool_calls)
# [{'name': 'multiply', 'args': {'a': 23, 'b': 17}, 'id': 'call_xxx'}]

Then run the tool and feed the result back:

from langchain_core.messages import HumanMessage, ToolMessage

messages = [HumanMessage("What is 23 times 17?")]
ai_msg = model.invoke(messages)
messages.append(ai_msg)

for call in ai_msg.tool_calls:
    result = multiply.invoke(call["args"])
    messages.append(ToolMessage(content=str(result), tool_call_id=call["id"]))

final = model.invoke(messages)
print(final.content)   # "23 times 17 is 391."

6. Code — real-world pattern

StructuredTool with Pydantic schema — strict validation for production:

from pydantic import BaseModel, Field
from langchain_core.tools import StructuredTool

class WeatherInput(BaseModel):
    city: str = Field(description="City name in English, e.g., 'Paris'")
    units: str = Field(default="celsius", description="'celsius' or 'fahrenheit'")

def get_weather_fn(city: str, units: str) -> str:
    # ... real API call ...
    return f"{city}: 18°{units[0].upper()}, sunny"

weather_tool = StructuredTool.from_function(
    func=get_weather_fn,
    name="get_weather",
    description="Get current weather for a city. Use ONLY when the user asks about weather.",
    args_schema=WeatherInput,
    return_direct=False,   # if True, tool output is returned to the user without LLM post-processing
)

Injected args — pass runtime context (user_id, request_id) that the LLM should NOT control:

from typing import Annotated
from langchain_core.tools import InjectedToolArg

@tool
def get_account_balance(
    account_id: str,
    user_id: Annotated[str, InjectedToolArg],   # never exposed to LLM
) -> str:
    """Get the balance for a given account."""
    if not user_owns_account(user_id, account_id):
        return "Access denied."
    return f"Balance: ${lookup(account_id)}"

# When invoking, you supply user_id from auth; the LLM only fills account_id
tool_call = ai_msg.tool_calls[0]
result = get_account_balance.invoke({**tool_call["args"], "user_id": current_user.id})

BaseTool subclass — full control + async:

from langchain_core.tools import BaseTool
from typing import Optional

class DatabaseQueryTool(BaseTool):
    name: str = "db_query"
    description: str = "Run a read-only SQL query on the analytics DB."
    args_schema: type = QueryInput

    def _run(self, query: str) -> str:
        return run_sql(query)

    async def _arun(self, query: str) -> str:
        return await arun_sql(query)

Use a pre-built tool (e.g., Tavily for web search):

from langchain_community.tools.tavily_search import TavilySearchResults

search = TavilySearchResults(max_results=4)
results = search.invoke("latest news about LangChain")

Wire into an LCEL agent loop — the simplest agent is just model.bind_tools(...) + a loop. For real agents, use the Agents chapter or LangGraph.

7. Common pitfalls

  • Weak tool descriptions. "Calculates things" is useless — the model can't tell when to use it. Be precise: "Multiply two integers. Use only for arithmetic, not for currency or date math."
  • Letting the LLM control sensitive args. Tenant IDs, user IDs, auth tokens — wrap with InjectedToolArg and supply them server-side.
  • Hand-parsing tool_calls when you don't need to. Use create_react_agent (LangGraph), AgentExecutor, or an LCEL agent loop instead of writing your own dispatch.
  • Tools with side effects + no idempotency. Models retry on error. A send_email tool that doesn't dedupe by request_id will send 3 emails.
  • Too many tools at once. Models start mis-selecting around 15–20+ tools. Either narrow per request or use a hierarchical/router pattern.
  • Returning huge blobs to the model. Tool output goes back into the context window. Return concise text — paginate or summarize before returning.

8. When to use vs not use

Choose When
@tool decorator Default. Simple, typed, prototyping
StructuredTool.from_function() Need explicit Pydantic schema, descriptions per field, production validation
BaseTool subclass Async + sync, custom error handling, state
Pre-built community tool Common needs (search, calculator, SQL, Python REPL) — don't reinvent
Don't use tools at all One-shot answer the model already knows; or use a deterministic chain instead

9. Cheatsheet

# Decorator (most common)
from langchain_core.tools import tool

@tool
def my_tool(arg: str) -> str:
    """Description matters — model picks based on this."""
    return ...

@tool("custom_name", return_direct=True)
def my_tool(...): ...

# StructuredTool
from langchain_core.tools import StructuredTool
tool_obj = StructuredTool.from_function(
    func=fn,
    name="...",
    description="...",
    args_schema=PydanticModel,
    return_direct=False,
)

# Bind to model and emit tool_calls
model_with_tools = ChatOpenAI(model="gpt-4o-mini").bind_tools(
    tools=[tool1, tool2],
    tool_choice="auto",          # or "required", or a specific name
    strict=True,                  # OpenAI strict schema
)
ai_msg = model_with_tools.invoke("...")
ai_msg.tool_calls   # [{'name': ..., 'args': {...}, 'id': ...}]

# Invoke a tool
result = my_tool.invoke({"arg": "value"})

# Feed result back
from langchain_core.messages import ToolMessage
ToolMessage(content=str(result), tool_call_id=call["id"])

# Injected args (server-side context, hidden from LLM)
from typing import Annotated
from langchain_core.tools import InjectedToolArg

@tool
def t(public_arg: str, user_id: Annotated[str, InjectedToolArg]) -> str: ...

# Common pre-built tools (langchain_community)
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import PythonREPL
from langchain_community.agent_toolkits import SQLDatabaseToolkit

10. Q&A — recall test

  • Q: Does the LLM execute tool code itself? A: No. The LLM emits a structured tool_call request (name + args). Your application code runs the function and feeds the result back via a ToolMessage.

  • Q: What's the single most important field on a tool? A: The description. The model decides whether to use a tool based on it. Vague descriptions → wrong tool selected.

  • Q: Difference between @tool and StructuredTool.from_function? A: @tool is a quick decorator with implicit schema from type hints + docstring. StructuredTool.from_function takes an explicit Pydantic args_schema for stricter validation + better per-field descriptions.

  • Q: What problem does InjectedToolArg solve? A: Some args (user_id, tenant, auth token) must be supplied by your server, not chosen by the LLM. InjectedToolArg keeps them out of the LLM-visible schema while still letting the tool receive them at invocation.

  • Q: Why might the model emit zero tool_calls? A: It decided no tool was needed and answered directly. That's normal — tool_calls is empty when the model is confident it can answer.

  • Q: When does tool-calling break down? A: Too many tools (~15+), tools with overlapping descriptions, or tools returning massive blobs. Mitigations: narrow the tool set per request, sharpen descriptions, summarize tool output before returning.

Practice

What does this print?

Expected: 15

# A tool is just a function with a docstring describing what it does
def add(a: int, b: int) -> int:
    """Add two integers."""
    return a + b
print(add(7, 8))

Add type hints and a clear docstring (LLMs use these to decide when to call the tool)

Expected: True

def search(q):                       # bug: no type hints, no docstring — LLM can't reason about it
    return f"results for {q}"
has_docstring = search.__doc__ is not None
print(not has_docstring)             # we expect None

Quiz — Quick check

What you remember

Q1. What's the most important part of a tool definition for the LLM?

  • The function name
  • The docstring + type hints — the LLM reads these to decide WHEN to call the tool
  • The return type
  • The arguments

Why: The LLM doesn't see your function body. It only sees the description (docstring) and the parameter schema (from type hints). Vague descriptions = wrong tool calls.

Q2. What happens after the LLM calls a tool?

  • You execute the tool with the LLM's arguments and send the result back as another message; the LLM continues with that info
  • The tool runs automatically inside the LLM
  • Execution is instant
  • The LLM can't continue

Why: Tool calling is a back-and-forth. LLM proposes the call → your code runs it → result goes back to LLM → LLM continues reasoning with the result. LangChain (or LangGraph) orchestrates this loop.

Q3. How many tools can an LLM reliably use?

  • Unlimited
  • ~10-15 — beyond that, tool choice quality drops as descriptions blur together
  • Exactly 1
  • 100

Why: When tools overlap or descriptions are vague, the LLM picks wrong tools. Sharpen descriptions, group related tools into a single "router" tool, or use per-task tool subsets.

Common doubts

Should I use LangChain tools or write raw function calls myself?

For simple cases, raw is fine. LangChain tools save you the parsing/dispatching boilerplate and integrate cleanly with LangGraph agents. For 2+ tools, LangChain pays off. For 1 tool, do whatever's simpler.

What's the difference between sync and async tools?

Sync tools (@tool on a regular function) block. Async tools (@tool on an async def) let you await without blocking the rest of the chain — important when calling slow APIs. In a long-running agent, async tools dramatically improve throughput.

How do I handle tools that need authentication or user-scoped state?

Use dependency injection — pass user_id / auth context via the chain's configurable or RunnableConfig. Tools access it from there. Avoid global state; it doesn't scale to multi-tenant servers.