LangChain Guide

LangChain agents can use Catalogian as a tool provider through the OpenAI Responses API endpoint. This guide walks through setting up a LangChain agent that can list sources, query delta events, and browse snapshot data.

Prerequisites

  • • Python 3.10+
  • • Catalogian API key (cat_live_) — requires Brand plan or higher
  • pip install langchain langchain-openai openai

Setting up the Catalogian client

Catalogian implements the OpenAI Responses API format, so you can use the standard OpenAI Python SDK pointed at Catalogian's endpoint:

import os
from openai import OpenAI

catalogian = OpenAI(
    base_url="https://api.catalogian.com/v1",
    api_key=os.environ["CATALOGIAN_KEY"],  # cat_live_...
)

Defining Catalogian tools

Define the tools your agent can use. Here are the most common ones:

CATALOGIAN_TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "list_sources",
            "description": "List all product feed sources",
            "parameters": {"type": "object", "properties": {}},
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_delta",
            "description": "Get recent delta events (changes) for a source",
            "parameters": {
                "type": "object",
                "properties": {
                    "sourceSlug": {
                        "type": "string",
                        "description": "The source slug (e.g. 'acme-products')",
                    },
                    "limit": {
                        "type": "integer",
                        "description": "Max events to return (default 10)",
                    },
                },
                "required": ["sourceSlug"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "profile_snapshot",
            "description": "Get field cardinality, null rates, and value distributions for a source",
            "parameters": {
                "type": "object",
                "properties": {
                    "sourceSlug": {
                        "type": "string",
                        "description": "The source slug",
                    },
                },
                "required": ["sourceSlug"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_snapshot",
            "description": "Full-text search across all row fields in a source",
            "parameters": {
                "type": "object",
                "properties": {
                    "sourceSlug": {"type": "string"},
                    "query": {"type": "string", "description": "Search term"},
                },
                "required": ["sourceSlug", "query"],
            },
        },
    },
]

Building the tool executor

Create a function that dispatches tool calls to Catalogian:

import json

def call_catalogian_tool(tool_name: str, arguments: dict) -> str:
    """Execute a Catalogian tool via the Responses API."""
    response = catalogian.responses.create(
        model="catalogian-1",
        input=json.dumps(arguments) if arguments else "{}",
        tools=[{"type": "function", "name": tool_name}],
        tool_choice={"type": "function", "name": tool_name},
    )
    # Extract the text content from the response
    for item in response.output:
        if hasattr(item, "content"):
            for part in item.content:
                if hasattr(part, "text"):
                    return part.text
    return "No result"

Creating the LangChain agent

Wrap the Catalogian tools as LangChain tools and create an agent:

from langchain_core.tools import StructuredTool
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Wrap each Catalogian tool as a LangChain tool
def make_langchain_tool(tool_def):
    func_def = tool_def["function"]
    props = func_def.get("parameters", {}).get("properties", {})

    def invoke(**kwargs):
        return call_catalogian_tool(func_def["name"], kwargs)

    return StructuredTool.from_function(
        func=invoke,
        name=func_def["name"],
        description=func_def["description"],
    )

tools = [make_langchain_tool(t) for t in CATALOGIAN_TOOLS]

# Create the agent with your preferred LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a product data analyst with access to Catalogian,
a product feed monitoring service. Use the available tools to answer
questions about product catalog changes, feed health, and data quality.
Always call profile_snapshot before analyzing a source for the first time."""),
    MessagesPlaceholder("chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder("agent_scratchpad"),
])

agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Using the agent

# Ask about recent changes
result = executor.invoke({
    "input": "What changed in acme-products in the last 24 hours?"
})
print(result["output"])

# Search for a specific product
result = executor.invoke({
    "input": "Find all products matching 'wireless headphones' in acme-products"
})
print(result["output"])

# Get a data quality report
result = executor.invoke({
    "input": "Profile the acme-products feed and tell me about data quality issues"
})
print(result["output"])

All available tools

Catalogian exposes 16 tools through both MCP and the Responses API. See the full list in the MCP Integration docs — all tools are available via the Responses API as well.

Best practice: Always call profile_snapshot before querying a source for the first time. It returns field names, types, cardinality, and null rates — giving the LLM context it needs to write effective queries. See Agent Best Practices.

Building with CrewAI instead? CrewAI Guide →