drafty
Datadog logo

Build a Datadog reliability dashboard with Claude

Connect the Datadog MCP server to Claude, ask for an infra, APM, and SLO dashboard from your live observability data, and publish it to a link your team comments on directly — no extra BI tool, no screenshots pasted into Slack.

What you'll build
A self-contained reliability dashboard — service SLOs and error budgets, p95/p99 latency and error rate per service, host health and resource saturation, open monitors and incidents — generated by Claude from your real Datadog data, then published to a drafty.im/canvas/… link. Your team clicks the exact tile or number they want changed and leaves a note. Claude reads the comments and ships a revised version to the same URL.

This is an end-to-end example: connect a data source over MCP, generate a dashboard from live numbers, and close the review loop on one link. Total time, start to shared link, is under fifteen minutes. The same shape works for any of the other examples — only the connection step changes.

Here's the finished dashboard, published to a canvas — click any tile or number to leave a comment, exactly as your team would:

Live canvas — comment on any elementOpen ↗

The three moving parts

  1. The Datadog MCP server gives Claude read access to your observability data — metrics, logs, traces and spans, monitors, hosts, incidents, dashboards — through a controlled set of tools. You approve what it can touch.
  2. Claude pulls the numbers and writes a single self-contained HTML dashboard. You iterate on it in the artifact panel until it's right.
  3. Drafty turns that HTML into a stable link your team reviews. Comments pin to the exact element; Claude ships the fix to the same URL.

The generation step is fast now. The part this example is really about is the third one — getting the dashboard in front of an on-call team or a reliability review without losing their feedback to a screenshot circled in Preview.

Step 1 — Connect the Datadog MCP server

Datadog runs an official remote MCP server at https://mcp.datadoghq.com/api/unstable/mcp-server/mcp. You connect once; it authenticates over OAuth, so no API key is pasted into a config file.

In Claude Code:

claude
claude mcp add --transport http datadog https://mcp.datadoghq.com/api/unstable/mcp-server/mcp

Then run /mcp inside Claude Code and follow the OAuth prompt to authorize your Datadog organization. If you're not on the US1 site, swap the host for your region's endpoint — for example mcp.datadoghq.eu for EU.

In Claude Desktop: open Settings → Connectors → Add custom connector, paste https://mcp.datadoghq.com/api/unstable/mcp-server/mcp, and authorize with OAuth the same way.

Safety first
Datadog gates the MCP server with two roles — mcp_read for retrieving data and mcp_write for creating or modifying resources. A reporting dashboard only reads, so connect with an account that has mcp_read only and not mcp_write. Never paste an application or API key into a config file or commit it. The dashboard has no reason to hold write permissions.

Step 2 — Pull the numbers

Ask Claude in plain language. It uses the MCP server's read tools (get_metrics, list_metrics, get_monitors, list_hosts, list_incidents, get_logs, list_spans) to fetch real data:

claude
Using the Datadog MCP server, pull everything we need for a reliability dashboard for the last 24 hours: for our top services, the SLO attainment and remaining error budget, p95 and p99 latency and request error rate; host count with CPU and memory saturation; the count of monitors currently in Alert and Warn; and any open incidents with severity. Summarize the figures before you build anything.

Claude calls Datadog, returns the figures, and you sanity-check them against your Datadog dashboards and SLO pages before going further. This is the moment to catch a wrong assumption — the wrong time window, a service tag that doesn't match, an SLO target you misremembered — while it's cheap.

Step 3 — Build the dashboard

Once the numbers look right, ask for the artifact:

claude
Build a single self-contained HTML dashboard from those figures. SLO attainment and error budget as the hero row, then a per-service table with p95/p99 latency, error rate, and request volume. A panel for host health and resource saturation, and a panel listing monitors in Alert/Warn and any open incidents. Clean, no external dependencies — inline the CSS and any chart code.

Claude renders it live in the artifact panel. Iterate in place — you're not regenerating from scratch:

Step 4 — Publish to Drafty for review

A Claude artifact link is a preview, not a stable URL — iterate the artifact and the link you already sent now shows the old version. Ask Claude to publish it to a Drafty canvas instead, so the link you share always stays current:

claude
Publish this dashboard to Drafty as a canvas and give me the shareable link.

Claude pushes the dashboard and hands back a drafty.im/canvas/… link that renders on any device. Send it — your team opens it in a browser, no login and no Claude account needed.

Step 5 — The review loop

This is the part that's not obvious until you've done it once.

A reviewer clicks the specific tile, chart, or number they want changed and leaves a pinned comment — "this checkout SLO looks too healthy, are we counting the synthetic test traffic?" The comment is anchored to that element, not floating in a Slack thread. Claude reads the comments through the CLI, reruns the relevant Datadog query if needed, and pushes a revised dashboard to the same URL. The reviewer refreshes and sees the change; the thread stays attached to the element.

The mechanic matters because of what it removes. A Slack message about a chart produces "the number on the left looks wrong." A pinned comment on the actual tile produces "this — exclude synthetic monitors from the error rate." One of those produces a correct revision; the other produces a guess.

Keeping it fresh

An MCP-generated dashboard is a snapshot — it holds the numbers Claude pulled when it built it; it doesn't re-query Datadog when someone opens the link. For a weekly reliability review or an incident postmortem snapshot, that's fine.

To make it a live canvas that always shows the current state, copy this prompt — Claude sets up the refresh for you and schedules it to run on its own:

claude
Turn this Datadog dashboard into a live canvas: every morning, re-pull the latest SLOs, latency, error rates, host health, and open monitors from Datadog via the MCP server, rebuild the dashboard, and push a new version to the same canvas URL so the link always shows the current state. Schedule it to run daily on its own.

The link stays stable while the content updates underneath it — see keeping a canvas updated automatically.

What to watch for

Datadog dashboard with Claude — FAQ

Do I need to paste my Datadog API or application key anywhere?
No. The remote Datadog MCP server authenticates over OAuth, so you authorize your organization through a consent screen instead of pasting a key. Connect with an account scoped to mcp_read for a reporting dashboard — never paste an application or API key into a config file, and never commit one to a repo.
Is the dashboard live or a snapshot?
A snapshot. It contains the numbers Claude pulled when it built the file; it does not re-query Datadog when someone opens the link. To refresh it, ask Claude to repull and re-push to the same URL — or put that on a daily schedule so the stable link always shows the current state.
Can my team comment without a Datadog or Claude account?
Yes. The dashboard is published to a Drafty canvas link that renders in any browser. Reviewers click the exact element they want changed and leave a pinned comment with no login required. Only the person connecting Datadog needs access to the account.
Is it safe to give Claude access to my Datadog data?
Connect with the mcp_read role only, and a reliability dashboard never needs more than that. Every tool call is mediated by the MCP server, and in Claude you approve actions. Don't grant the mcp_write role for a read-only reporting task.
How is this different from a native Datadog dashboard?
Native Datadog dashboards query live data against widgets and monitors you maintain — the right choice for always-on operational views and on-call. This approach is for a fast, shareable snapshot you can spin up in minutes and iterate by talking to Claude, then collect feedback on inline — a reliability review, a postmortem, or a status summary for people who don't live in Datadog. Different jobs: one is a standing system, the other is a quick reviewable deliverable.