Serve Lean Markdown to LLM Agents via Accept Header

Agents don’t need to see websites with markup and styling. Anything other than plain Markdown is wasted money spent on context tokens. By inspecting the Accept header on incoming requests, you can serve a lean Markdown version of your pages to LLMs while humans continue to see normal HTML. This technique was inspired by a post from the Bun team on X and is now live on skeptrune.com. You can verify it right now:

curl -H "Accept: text/markdown" https://www.skeptrune.com
curl -H "Accept: text/plain" https://www.skeptrune.com

The motivation is both economic and strategic. The Bun team reported a 10x token drop for Markdown vs HTML. Frontier labs charge per token, so cheaper pages get scraped more often, are more likely to end up in training data, and earn a little extra lift from AI assistants and search.

Static site generators are already halfway there

Static site generators like Astro and Gatsby already generate a big folder of HTML files, typically in a dist or public folder through npm run build. The only missing piece is converting those HTML files to Markdown. There’s a great CLI tool for this: html-to-markdown. Install it as a dev dependency:

npm install -D @wcj/html-to-markdown-cli

Here’s a Bash script that converts all HTML files in dist/html to Markdown files in dist/markdown, preserving the directory structure:

# convert-to-markdown.sh
mkdir -p dist/markdown

find dist/html -type f -name "*.html" | while read -r file; do
    relative_path="${file#dist/html/}"
    dest_path="dist/markdown/${relative_path%.html}.md"
    mkdir -p "$(dirname "$dest_path")"
    npx @wcj/html-to-markdown-cli "$file" --stdout > "$dest_path"
done

Wire this into your package.json as a post-build action:

"scripts": {
    "build": "astro build && yarn mv-html && yarn convert-to-markdown",
    "mv-html": "mkdir -p dist/html && find dist -type f -name '*.html' -not -path 'dist/html/*' -exec sh -c 'for f; do dest=\"dist/html/${f#dist/}\"; mkdir -p \"$(dirname \"$dest\")\"; mv -f \"$f\" \"$dest\"; done' sh {} +",
    "convert-to-markdown": "bash convert-to-markdown.sh"
}

Moving all HTML files to dist/html first is only necessary if you’re using Cloudflare Workers, which will serve existing static assets before falling back to your Worker. If you’re using a traditional reverse proxy, skip that step and convert directly from dist to dist/markdown.

After finishing this setup, there’s a simpler Cloudflare-specific alternative: add run_worker_first = ["*"] to your wrangler.json. This forces the worker to always run first, so you don’t have to move files around at all.

Cloudflare Workers configuration

If you’re hosting on Cloudflare Workers, configuring this requires more steps than a traditional reverse proxy. If you’re using Nginx or Caddy, skip this section — you’ll have an easier time. Cloudflare Workers force you into a different paradigm. What would normally be a simple Nginx rewrite rule becomes custom wrangler.jsonc configuration, shadow directories, and JavaScript that manually checks headers and uses env.ASSETS.fetch to serve files. This is also what makes Next.js middleware click: it’s not middleware in the REST API sense. It’s more like “use this where you would normally have a real reverse proxy.” Both Cloudflare Workers and Next.js Middleware are JavaScript-based reverse proxies that intercept requests before they hit your application.

wrangler.jsonc

Reference a new worker script and bind your build output directory as a static asset namespace:

{
  "main": "worker.js",
  "assets": {
    "directory": "./dist",
    "binding": "ASSETS"
  }
}

Worker script

Below is a minimal worker that inspects the Accept header and serves Markdown when requested, otherwise falling back to HTML:

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const acceptHeader = request.headers.get("accept") || "";
    const acceptTypes = acceptHeader.split(",");

    const plainIndex = acceptTypes.findIndex(
      (t) => t.includes("text/plain") || t.includes("text/markdown")
    );
    const htmlIndex = acceptTypes.findIndex((t) => t.includes("text/html"));
    const prefersMarkdown =
      plainIndex !== -1 && (htmlIndex === -1 || plainIndex < htmlIndex);

    const tryServeContent = async (format) => {
      let contentType;
      if (format === "markdown") {
        if (url.pathname == "" || url.pathname == "/") {
          const sitemapResponse = await env.ASSETS.fetch(
            new Request(new URL("/sitemap-0.xml", request.url))
          );
          if (sitemapResponse.ok) {
            const content = await sitemapResponse.text();
            return new Response(content, {
              headers: {
                "Content-Type": "application/xml; charset=utf-8",
                "Cache-Control": "public, max-age=3600",
              },
            });
          }
        }

        contentType = "text/plain; charset=utf-8";
        let distPath = `/markdown${url.pathname}`;

        if (!distPath.endsWith(".md") && !distPath.endsWith("/")) {
          distPath += "/index.md";
        } else if (distPath.endsWith("/")) {
          distPath += "index.md";
        }

        if (url.pathname === "/") {
          distPath = "/markdown/index.md";
        }

        try {
          const response = await env.ASSETS.fetch(
            new Request(new URL(distPath, request.url))
          );
          if (response.ok) {
            const content = await response.text();
            return new Response(content, {
              headers: {
                "Content-Type": contentType,
                "Cache-Control": "public, max-age=3600",
              },
            });
          }
        } catch (error) {
          console.error(`Error fetching markdown file from ${distPath}:`, error);
        }
      } else {
        contentType = "text/html; charset=utf-8";
        let distPath = `/html${url.pathname}`;

        if (!distPath.endsWith(".html") && !distPath.endsWith("/")) {
          distPath += "/index.html";
        } else if (distPath.endsWith("/")) {
          distPath += "index.html";
        }

        if (url.pathname === "/") {
          distPath = "/html/index.html";
        }

        try {
          const response = await env.ASSETS.fetch(
            new Request(new URL(distPath, request.url))
          );
          if (response.ok) {
            const content = await response.text();
            return new Response(content, {
              headers: {
                "Content-Type": contentType,
                "Cache-Control": "public, max-age=3600",
              },
            });
          }
        } catch (error) {
          console.error(`Error fetching HTML file from ${distPath}:`, error);
        }
      }

      return null;
    };

    if (prefersMarkdown) {
      const markdownResponse = await tryServeContent("markdown");
      if (markdownResponse) return markdownResponse;

      const htmlResponse = await tryServeContent("html");
      if (htmlResponse) return htmlResponse;
    } else {
      const htmlResponse = await tryServeContent("html");
      if (htmlResponse) return htmlResponse;

      const markdownResponse = await tryServeContent("markdown");
      if (markdownResponse) return markdownResponse;
    }

    return await env.ASSETS.fetch(
      new Request(new URL("/html/404.html", request.url))
    );
  },
};

Make the root path / serve your sitemap.xml instead of Markdown content for your homepage. That way, an agent visiting your root URL can see all the links on your site and discover content efficiently.

Caddy configuration

If you’re using a traditional reverse proxy, Caddy makes this significantly simpler. Here’s a complete Caddyfile configuration:

your-personal-domain.com {
    root * /path/to/your/dist
    file_server

    @markdown {
        header Accept *text/markdown*
        header Accept *text/plain*
        not header Accept *text/html*
    }
    handle @markdown {
        rewrite * /markdown{path}/index.md
        try_files {path} {path}.md /markdown/index.md
        file_server
    }

    handle {
        rewrite * /html{path}/index.html
        try_files {path} {path}.html /html/index.html
        file_server
    }

    handle_errors {
        respond "404 Not Found" 404
        try_files /html/404.html
    }
}

Nginx configuration is left as an exercise for the reader — or the reader’s LLM of choice.

Conclusion: a more accessible web for agents

By serving lean, semantic Markdown to LLM agents, you achieve a 10x reduction in token usage while making your content more accessible and efficient for the AI systems that increasingly browse the web. This optimization isn’t just about saving money on tokens. It’s about GEO (Generative Engine Optimization) for a world where millions of users discover content through AI assistants. Cheaper pages get scraped more, are more likely to end up in training data, and earn more visibility from assistants and search. Astro’s flexibility made this implementation surprisingly straightforward — only a couple of hours to get a personal blog and a production app to support the feature. For a fun exercise, copy the URL of a blog post and ask your favorite LLM to “Use the blog post to write a Cloudflare Worker for my own site.” See how it does. Source code for a working implementation is at github.com/skeptrunedev/personal-site.

About

AI & Agents

Startups & Marketing

Engineering

Work & Management

Serve Lean Markdown to LLM Agents via Accept Header

Static site generators are already halfway there

Cloudflare Workers configuration

wrangler.jsonc

Worker script

Caddy configuration

Conclusion: a more accessible web for agents

About

AI & Agents

Startups & Marketing

Engineering

Work & Management

Documentation Index

​Static site generators are already halfway there

​Cloudflare Workers configuration

​wrangler.jsonc

​Worker script

​Caddy configuration

​Conclusion: a more accessible web for agents

Static site generators are already halfway there

Cloudflare Workers configuration

wrangler.jsonc

Worker script

Caddy configuration

Conclusion: a more accessible web for agents