Diffbot MCP for AI Agents

Securely connect your AI agents and chatbots (Claude, ChatGPT, Cursor, etc) with Diffbot MCP or direct API to extract article content, analyze product listings, enrich structured web data, and automate web research through natural language.

Diffbot logoDiffbot
Api Key

Diffbot is an AI-powered platform for extracting and structuring data from any web page. It transforms unstructured web content into rich, linked, and queryable data for analytics, research, and automation.

35 Tools

Try Diffbot now

Type what you want done — sign in and watch it run live in the Tool Router playground.

TOOL ROUTER PLAYGROUND
Diffbot
Try asking
TOOLS

Supported Tools

Every Diffbot action and event your agent gets out of the box.

Combine Entity Profiles

Combine multiple entity profiles into a unified view using the Diffbot Knowledge Graph.

Create Bulk Extract Job

Tool to submit a bulk extract job to process multiple URLs with Extract APIs.

Create or Update Custom API

Tool to create or update the parameters and ruleset of a Custom API.

Create Bulk Enhance Job

Tool to submit a bulk enhance job to enrich multiple entities asynchronously.

Delete Custom API

Tool to delete custom API definitions for a given URL pattern.

Delete KG Enhance Bulkjob

Tool to delete an Enhance Bulkjob.

Download Bulk Job Results

Tool to download results of a bulk enhance job with filtering options via POST request.

Enhance Entity with Knowledge Graph

Enrich a person or organization with comprehensive data from the Diffbot Knowledge Graph.

Diffbot Extract Job

Tool to extract structured job posting data from job listing pages.

Diffbot Extract List

Tool to extract structured data from list-style pages like news indexes, product listings, and directory pages.

Get Diffbot Account Details

Retrieves comprehensive Diffbot account information including subscription plan details, credit balance, usage history, and account status.

Diffbot Analyze

Automatically analyzes a web page to determine its type and extract structured data.

Get Article Data

Tool to extract information from articles, including authors, publication dates, and images.

Get Bulk Job Data

Tool to download extracted results from a completed bulk job.

Get Bulk Job Status

Tool to poll the status of a specific Diffbot Knowledge Graph Enhance bulk job.

Get Bulk Job Results

Tool to download the results of a completed Enhance Bulkjob.

Get Bulk Single Result

Tool to download the result of a single job within a Diffbot bulk enhance job.

Get Crawl Data

Download extracted results from a completed crawl job.

Get Discussion Thread

Extract structured discussion threads from web pages including forums, comment sections, product reviews, Reddit discussions, and blog comments.

Diffbot Get Event

Tool to extract event details from web pages.

Diffbot Get Image

Tool to extract detailed information about images, including dimensions and recognition data.

Get KG Coverage Report by ID

Download Knowledge Graph coverage report by report ID.

Diffbot Get Product

Tool to extract product information such as specifications, prices, availability, and reviews.

Get Video Data

Tool to extract information from videos, including titles, descriptions, and embedded HTML.

List Bulk Jobs

Tool to list all Bulk jobs associated with a specific token.

List Bulk Jobs Status For Token

Tool to get the status of all bulk enhance jobs for a token.

List Custom APIs

Tool to retrieve all Custom APIs and their extraction rules currently defined on your Diffbot token.

Manage Crawl Job

Manages Diffbot crawl jobs: pause, restart, delete, or view status.

Resolve Lost ID

Tool to resolve lost IDs in the Knowledge Graph.

Diffbot Knowledge Graph Search

Search the Diffbot Knowledge Graph using DQL (Diffbot Query Language).

Search Crawl Job Data

Tool to query crawl job collections using DQL (Diffbot Query Language).

Start Bulk Job

Tool to start a Bulk Extract job.

Start Crawl Job

Initiates a Diffbot crawl job that spiders a website starting from seed URLs and processes discovered pages with a specified Extract API.

Stop Bulk Job

Tool to pause (stop) a running Bulk job.

Stop KG Bulk Job By ID

Tool to stop an active Knowledge Graph Enhance bulk job by its ID.

SETUP GUIDE

Connect Diffbot MCP Tool with your Agent

1

Install Composio

typescript
npm install @composio/core ai @ai-sdk/openai @ai-sdk/mcp
Install the Composio SDK and Claude Agent SDK
2

Create Tool Router Session

typescript
import { Composio } from '@composio/core';

const composio = new Composio({ apiKey: 'your-api-key' });

console.log("Creating Tool Router session...");
const { mcp } = await composio.create('your-user-id');
console.log(`Tool Router session created: ${mcp.url}`);
Initialize the Composio client and create a Tool Router session
3

Connect to AI Agent

typescript
import { openai } from '@ai-sdk/openai';
import { experimental_createMCPClient as createMCPClient } from '@ai-sdk/mcp';
import { generateText, stepCountIs } from 'ai';

const client = await createMCPClient({
  transport: {
    type: 'http',
    url: mcp.url,
    headers: { 'x-api-key': 'your-composio-api-key' }
  }
});

const tools = await client.tools();

const { text } = await generateText({
  model: openai('gpt-4o'),
  tools,
  messages: [{ role: 'user', content: 'Extract product details from https://www.example.com/product/12345' }],
  stopWhen: stepCountIs(5)
});

console.log(`Agent: ${text}`);
Use the MCP server with your AI agent
SETUP GUIDE

Connect Diffbot API Tool with your Agent

1

Install Composio

typescript
npm install @composio/openai
Install the Composio SDK
2

Initialize Composio and Create Tool Router Session

typescript
import OpenAI from 'openai';
import { Composio } from '@composio/core';
import { OpenAIResponsesProvider } from '@composio/openai';

const composio = new Composio({
  provider: new OpenAIResponsesProvider(),
});
const openai = new OpenAI({});
const session = await composio.create('your-user-id');
Import and initialize Composio client, then create a Tool Router session
3

Execute Diffbot Tools via Tool Router with Your Agent

typescript
const tools = session.tools;
const response = await openai.responses.create({
  model: 'gpt-4.1',
  tools: tools,
  input: [{
    role: 'user',
    content: 'Extract product info from this Amazon page'
  }],
});
const result = await composio.provider.handleToolCalls(
  'your-user-id',
  response.output
);
console.log(result);
Get tools from Tool Router session and execute Diffbot actions with your Agent

Why Use Composio?

AI Native Diffbot Integration

  • Supports both Diffbot MCP and direct API based integrations
  • Structured, LLM-friendly schemas for reliable tool execution
  • Rich coverage for extracting, analyzing, and enriching web data

Managed Auth

  • Built-in API key management with secure storage and rotation
  • Central place to manage, scope, and revoke Diffbot API keys
  • Per user and per environment credentials instead of hard-coded keys

Agent Optimized Design

  • Tools are tuned using real error and success rates to improve reliability over time
  • Comprehensive execution logs so you always know what ran, when, and on whose behalf

Enterprise Grade Security

  • Fine-grained RBAC so you control which agents and users can access Diffbot
  • Scoped, least privilege access to Diffbot resources
  • Full audit trail of agent actions to support review and compliance
FAQ

Frequently asked questions

Yes, Diffbot requires you to configure your own API key credentials. Once set up, Composio handles secure credential storage and API request handling for you.

Yes! Composio's Tool Router enables agents to use multiple toolkits. Learn more.

Composio is SOC 2 and ISO 27001 compliant with all data encrypted in transit and at rest. Learn more.

Composio maintains and updates all toolkit integrations automatically, so your agents always work with the latest API versions.

Start with Diffbot.It takes 30 seconds.

Managed auth, hosted MCP servers, and every Diffbot tool your agent needs.Free to start.

Start building