> ## Documentation Index
> Fetch the complete documentation index at: https://mint.skeptrune.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Wildcard TLS for Multi-Tenant SaaS with DNS-01 Challenges

> Provision wildcard TLS for multi-tenant SaaS using DNS-01, Caddy, and Cloudflare. Covers Docker, cert-manager on Kubernetes, and certificate security.

AI app builders are everywhere now. You enter a prompt, get a deployed product on `your-app.builder.com`, and ship. Replit, Bolt, Lovable, v0, and dozens of other similar platforms launched in the past few months, and they all need instant subdomain provisioning with HTTPS for every user. This pattern isn't new—multi-tenant SaaS has used `tenant-id.foo.com` subdomains forever—but the explosion of AI builders that spin up hundreds of new subdomains daily makes the certificate management problem more visible. You can't provision individual certificates for every generated app; you need wildcard certificates.

## The problem: per-tenant certificates don't scale

If you provision individual certificates for each tenant, you're running ACME challenges for every new tenant signup, managing certificate renewals for potentially tens of thousands of certificates, and hitting rate limits from Let's Encrypt.

<Warning>
  Let's Encrypt enforces a hard limit of **50 certificates per registered domain per week**. At scale, per-tenant certificate provisioning will hit this wall and block new signups entirely.
</Warning>

The full set of Let's Encrypt rate limits relevant here:

* 50 certificates per registered domain per week
* 5 failed validation attempts per account per hostname per hour
* 300 new orders per account per 3 hours

With a wildcard certificate, you provision one certificate regardless of tenant count, so you'll never hit the 50-per-week limit. This is a massive operational advantage over per-tenant certificates.

## Wildcard certificates: one cert, infinite tenants

A wildcard certificate for `*.foo.com` covers all first-level subdomains. Any subdomain directly under your base domain gets automatic TLS coverage from a single certificate.

```
tenant-a.foo.com     ✓
tenant-b.foo.com     ✓
tenant-xyz.foo.com   ✓
```

The wildcard does not extend to the apex domain or to nested subdomains:

```
foo.com                      ✗  (apex domain)
api.tenant-a.foo.com         ✗  (nested subdomain)
```

For most multi-tenant systems this is exactly what you want: one certificate, provisioned once, renewed automatically, working for every tenant you'll ever onboard.

## Why you must use DNS-01 challenges

To get a wildcard certificate from Let's Encrypt (or any ACME-compliant CA), you must use the **DNS-01** challenge type. The more common HTTP-01 challenge does not work for wildcards.

With HTTP-01, the CA verifies domain ownership by requesting a specific file at `http://your-domain/.well-known/acme-challenge/token`. For `*.foo.com` there's no single HTTP endpoint to verify—the wildcard represents infinite possible subdomains.

DNS-01 solves this by verifying ownership at the DNS level:

1. Your ACME client requests a wildcard certificate for `*.foo.com`.
2. Let's Encrypt generates a challenge token and instructs you to create a TXT record at `_acme-challenge.foo.com` with that token as its value.
3. Let's Encrypt queries public DNS for that TXT record.
4. If the record exists with the correct value, Let's Encrypt knows you control the domain and issues the certificate.

This means your certificate provisioning system needs **programmatic access** to your DNS provider's API to create and delete TXT records on demand.

## How DNS-01 automation works

The key to wildcard certificates is automating the DNS-01 challenge. This requires your web server or load balancer to have API access to your DNS provider. When Let's Encrypt needs to verify domain ownership, your system creates a temporary TXT record, waits for DNS propagation, completes the challenge, and cleans up the record.

The examples below use Caddy as the reverse proxy and Cloudflare as the DNS provider, but the architecture is the same regardless of your stack. Nginx with cert-manager on Kubernetes works the same way. HAProxy with acme.sh works the same way. The pattern is universally:

```
web server + DNS provider plugin + ACME client = automated wildcard certificates
```

### The architecture (Cloudflare example)

The system has three layers:

* **Caddy** — the web server that needs TLS certificates
* **`caddy-dns/cloudflare`** — a thin adapter (\~120 lines of Go) that sits between Caddy and the actual DNS API client
* **`libdns/cloudflare`** — handles the real work of talking to Cloudflare's API

Caddy handles the web server and ACME logic, `certmagic` handles certificate management and renewal, `libdns/cloudflare` handles DNS API calls, and the plugin just connects them together.

This same pattern exists for every major DNS provider:

<Tabs>
  <Tab title="Cloudflare">
    ```bash theme={null}
    xcaddy build --with github.com/caddy-dns/cloudflare
    ```
  </Tab>

  <Tab title="AWS Route53">
    ```bash theme={null}
    xcaddy build --with github.com/caddy-dns/route53
    ```
  </Tab>

  <Tab title="GCP Cloud DNS">
    ```bash theme={null}
    xcaddy build --with github.com/caddy-dns/googleclouddns
    ```
  </Tab>

  <Tab title="Azure DNS">
    ```bash theme={null}
    xcaddy build --with github.com/caddy-dns/azure
    ```
  </Tab>
</Tabs>

The code structure is nearly identical across all providers—you just swap the API client.

### Building Caddy with DNS provider support

Standard Caddy doesn't include DNS provider modules. You need to build a custom binary with the plugin compiled in.

```bash theme={null}
# Install xcaddy (Caddy's build tool)
go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest

# Build Caddy with the Cloudflare DNS plugin
xcaddy build --with github.com/caddy-dns/cloudflare
```

This uses Caddy's module system to compile the plugin into a single binary. The result is a `caddy` executable that includes the DNS provider integration. You can include multiple providers if you manage domains across different DNS platforms.

### Configuring your Caddyfile

Once you've built Caddy with the DNS provider plugin, the configuration is minimal:

```caddyfile theme={null}
*.foo.com {
    tls {
        dns cloudflare {env.CF_API_TOKEN}
    }

    # Your reverse proxy config
    reverse_proxy localhost:8000
}
```

Three lines of TLS configuration give you automatic wildcard certificate provisioning, automatic renewal 30 days before expiration, DNS-01 challenges handled transparently, and zero ongoing maintenance.

### Getting DNS provider credentials

Your web server needs API credentials to manage DNS records. The required permissions are consistent across providers: read access to list zones/domains, and write access to create and delete TXT records.

<Tabs>
  <Tab title="Cloudflare">
    Create an API token at `https://dash.cloudflare.com/profile/api-tokens` with these permissions:

    * `Zone.Zone:Read`
    * `Zone.DNS:Edit`

    ```bash theme={null}
    export CF_API_TOKEN="your_token_here"
    ```
  </Tab>

  <Tab title="AWS Route53">
    Create an IAM user or role with these permissions:

    * `route53:ListHostedZones`
    * `route53:GetChange`
    * `route53:ChangeResourceRecordSets`
  </Tab>

  <Tab title="GCP Cloud DNS">
    Create a service account with the `dns.admin` role scoped to your DNS zone (not project-wide).
  </Tab>
</Tabs>

<Tip>
  Follow the principle of least privilege. Grant only the permissions needed for DNS challenge automation—nothing more. If your token leaks, the blast radius should be limited to DNS operations on specific zones.
</Tip>

The `{env.CF_API_TOKEN}` placeholder in the Caddyfile is replaced with the environment variable's value when Caddy starts.

## What happens under the hood

When you start Caddy with the configuration above, the complete certificate provisioning flow runs automatically.

<Steps>
  <Step title="Configuration parsing">
    Caddy reads your Caddyfile and encounters the `dns cloudflare` directive. The plugin's `UnmarshalCaddyfile()` function extracts the token from `{env.CF_API_TOKEN}`.
  </Step>

  <Step title="Token validation">
    The plugin validates the token format using the regex `^[A-Za-z0-9_-]{35,50}$`. This catches common mistakes—such as wrapping the token in quotes or leaving the environment variable unset—before they produce cryptic API errors.
  </Step>

  <Step title="Module provisioning">
    Caddy calls the plugin's `Provision()` function, which replaces environment variable placeholders with actual values and performs final validation.
  </Step>

  <Step title="Certificate check">
    Caddy checks its certificate cache (default `~/.local/share/caddy/certificates/acme-v02.api.letsencrypt.org-directory/`) to see if a valid certificate for `*.foo.com` already exists. If so, it loads it and the flow ends here.
  </Step>

  <Step title="ACME challenge request">
    If no valid certificate exists, Caddy's ACME client requests a certificate from Let's Encrypt. Let's Encrypt responds with a DNS-01 challenge: "Prove you control `foo.com` by creating a TXT record at `_acme-challenge.foo.com` with value `xyz123_random_token`."
  </Step>

  <Step title="DNS record creation">
    The plugin calls the Cloudflare API to create the challenge record. First, it queries for the zone ID:

    ```http theme={null}
    GET https://api.cloudflare.com/client/v4/zones?name=foo.com
    Authorization: Bearer your_token_here
    ```

    Then it creates the TXT record with the challenge token:

    ```http theme={null}
    POST https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records
    Authorization: Bearer your_token_here
    Content-Type: application/json

    {
      "type": "TXT",
      "name": "_acme-challenge.foo.com",
      "content": "xyz123_random_token",
      "ttl": 120
    }
    ```

    The short TTL (2 minutes) is intentional—these records are temporary. AWS Route53 uses `ChangeResourceRecordSets`, GCP uses `managedZones.changes.create`, Azure uses their DNS REST API. Different endpoints, same result.
  </Step>

  <Step title="DNS propagation wait">
    Caddy polls public DNS servers to verify the TXT record has propagated. By default, it uses your system's DNS resolver. You can configure a custom resolver for faster propagation checks:

    ```caddyfile theme={null}
    *.foo.com {
        tls {
            dns cloudflare {env.CF_API_TOKEN}
            resolvers 1.1.1.1
        }
    }
    ```

    Using your DNS provider's public resolver (1.1.1.1 for Cloudflare, 8.8.8.8 for Google) is often faster because records propagate to the provider's own resolvers first. This step is critical—if propagation is incomplete when Let's Encrypt checks, the challenge fails.
  </Step>

  <Step title="Challenge completion">
    Caddy tells Let's Encrypt "The TXT record is ready, check it." Let's Encrypt queries multiple DNS servers worldwide to verify the record exists. Once verified, it issues the wildcard certificate.
  </Step>

  <Step title="Cleanup">
    The plugin automatically deletes the temporary TXT record to keep your DNS zone clean:

    ```http theme={null}
    DELETE https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records/{record_id}
    Authorization: Bearer your_token_here
    ```
  </Step>

  <Step title="Certificate storage and renewal">
    Caddy stores the certificate chain and private key in its certificate cache. Certificates are automatically renewed 30 days before expiration—the entire DNS-01 flow repeats with zero human intervention.
  </Step>
</Steps>

## The code: how the plugin works

The entire `caddy-dns/cloudflare` plugin is \~120 lines of Go. Here are the key parts.

### Module registration

```go theme={null}
type Provider struct{ *cloudflare.Provider }

func init() {
    caddy.RegisterModule(Provider{})
}

func (Provider) CaddyModule() caddy.ModuleInfo {
    return caddy.ModuleInfo{
        ID:  "dns.providers.cloudflare",
        New: func() caddy.Module { return &Provider{new(cloudflare.Provider)} },
    }
}
```

The plugin wraps `github.com/libdns/cloudflare` and registers itself as a Caddy module with the ID `dns.providers.cloudflare`. When you write `dns cloudflare` in your Caddyfile, Caddy loads this module.

### Caddyfile parsing

The parsing logic handles both inline and block configuration syntaxes:

```go theme={null}
func (p *Provider) UnmarshalCaddyfile(d *caddyfile.Dispenser) error {
    d.Next() // consume directive name

    if d.NextArg() {
        // Single token syntax: cloudflare {env.CF_API_TOKEN}
        p.Provider.APIToken = d.Val()
    } else {
        // Block syntax: cloudflare { api_token ... }
        for nesting := d.Nesting(); d.NextBlock(nesting); {
            switch d.Val() {
            case "api_token":
                if d.NextArg() {
                    p.Provider.APIToken = d.Val()
                }
            case "zone_token":
                if d.NextArg() {
                    p.Provider.ZoneToken = d.Val()
                }
            }
        }
    }

    if p.Provider.APIToken == "" {
        return d.Err("missing API token")
    }
    return nil
}
```

Both syntaxes are valid:

```caddyfile theme={null}
# Inline syntax (recommended)
dns cloudflare {env.CF_API_TOKEN}

# Block syntax (for dual tokens)
dns cloudflare {
    api_token {env.CF_API_TOKEN}
}
```

### Token validation

```go theme={null}
var cloudflareTokenRegexp = regexp.MustCompile(`^[A-Za-z0-9_-]{35,50}$`)

func (p *Provider) Provision(ctx caddy.Context) error {
    // Replace placeholders like {env.CF_API_TOKEN} with actual values
    p.Provider.APIToken = caddy.NewReplacer().ReplaceAll(p.Provider.APIToken, "")

    if !cloudflareTokenRegexp.MatchString(p.Provider.APIToken) {
        return fmt.Errorf("API token '%s' appears invalid", p.Provider.APIToken)
    }
    return nil
}
```

Cloudflare tokens are always 35–50 characters of alphanumerics, dashes, or underscores. If you accidentally wrap the token in quotes or the environment variable is unset, this catches the problem immediately with a clear error—rather than a cryptic "Invalid request headers" from the Cloudflare API.

### The actual DNS operations

The plugin doesn't implement DNS operations directly. It delegates to `libdns/cloudflare`, which implements the `libdns` interface:

```go theme={null}
type RecordSetter interface {
    SetRecords(ctx context.Context, zone string, records []Record) ([]Record, error)
}

type RecordDeleter interface {
    DeleteRecords(ctx context.Context, zone string, records []Record) ([]Record, error)
}
```

Caddy's ACME client calls these methods at the appropriate times during the DNS-01 challenge. The plugin is just the adapter that makes Caddy aware of the Cloudflare DNS provider.

## Debugging and common issues

<AccordionGroup>
  <Accordion title="&#x22;Invalid request headers&#x22;">
    Your API token is malformed or the environment variable isn't set. Verify it:

    ```bash theme={null}
    echo $CF_API_TOKEN
    ```

    If the output is empty, that's the problem. When the environment variable isn't set, Caddy tries to use `{env.CF_API_TOKEN}` literally as the token value, which causes authentication failures.
  </Accordion>

  <Accordion title="&#x22;timed out waiting for record to fully propagate&#x22;">
    The DNS propagation check is timing out. There are three common causes:

    * **DNS caching** — your local resolver is caching the old "record doesn't exist" response. Add `resolvers 1.1.1.1` to your TLS block.
    * **Private DNS** — `foo.com` is defined in `/etc/hosts` or resolved by a private DNS server, causing public verification to fail. Use a public resolver or temporarily remove the private DNS entry.
    * **Zone access** — the token doesn't have `Zone:Read` permission. Verify permissions in your DNS provider dashboard.
  </Accordion>

  <Accordion title="&#x22;expected 1 zone, got 0&#x22;">
    The plugin can't find the zone for your domain. This happens if:

    * The domain isn't in Cloudflare DNS
    * The API token doesn't have `Zone:Read` permission
    * The zone name doesn't match (e.g., you're requesting `*.sub.foo.com` but only `foo.com` is registered in Cloudflare)
  </Accordion>

  <Accordion title="Certificate Transparency logs">
    All certificates issued by public CAs are logged to Certificate Transparency logs. You can inspect your wildcard cert at [https://crt.sh](https://crt.sh)—search for `%.foo.com` to find wildcard certificates.

    This is a feature, not a bug. It proves certificates were issued legitimately and helps detect mis-issuance. It does mean anyone can see that `foo.com` has a wildcard certificate, but they cannot enumerate individual tenant subdomains from that.
  </Accordion>
</AccordionGroup>

## Production deployment patterns

### Docker Compose

```yaml theme={null}
services:
  caddy:
    build:
      context: .
      dockerfile: Dockerfile.caddy
    ports:
      - "443:443"
      - "80:80"
    environment:
      - CF_API_TOKEN=${CF_API_TOKEN}
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    restart: unless-stopped

volumes:
  caddy_data:
  caddy_config:
```

The `caddy_data` volume persists certificates across container restarts. The `caddy_config` volume persists Caddy's runtime configuration.

### Dockerfile with Cloudflare plugin

```dockerfile theme={null}
FROM caddy:builder AS builder

RUN xcaddy build \
    --with github.com/caddy-dns/cloudflare

FROM caddy:latest

COPY --from=builder /usr/bin/caddy /usr/bin/caddy
```

This multi-stage build compiles Caddy with the Cloudflare plugin in the builder stage, then copies just the binary to the final image.

### Kubernetes with cert-manager

If you're running Kubernetes, consider using cert-manager instead of running ACME clients on your web servers. Cert-manager is purpose-built for Kubernetes certificate lifecycle management and supports DNS-01 challenges with all major cloud providers.

```yaml theme={null}
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-foo-com
spec:
  secretName: wildcard-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - "*.foo.com"
  - "foo.com"
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@foo.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - dns01:
        cloudflare:
          email: admin@foo.com
          apiTokenSecretRef:
            name: cloudflare-api-token
            key: api-token
```

Cert-manager provisions the certificate as a Kubernetes Secret, which your Ingress controller (nginx, Traefik, Envoy, etc.) can reference. Swap `cloudflare` for `route53`, `clouddns`, or `azuredns` with the appropriate credential references.

### Multi-region deployments

File-based certificate storage works for single-server deployments, but multi-region requires shared storage. You have three options:

* Mount the certificate directory from a network filesystem (NFS, EFS, or cloud-provider equivalents)
* Use Caddy storage plugins for S3, Consul, Redis, or other distributed stores
* Run certificate provisioning centrally and distribute via your secrets management system

The simplest approach for most systems: run certificate provisioning in one region, store certificates in your cloud provider's secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault), and distribute to all regions. This keeps the ACME logic centralized while making certificates available everywhere.

## Security considerations

The wildcard certificate's private key protects all your tenant subdomains. If it leaks, an attacker can impersonate any tenant. Protect it like you'd protect your database credentials.

### Public Suffix List registration

If you're running a multi-tenant platform where each tenant gets a subdomain, you should submit your domain to the [Public Suffix List](https://publicsuffix.org/). The PSL is a registry that browsers use to determine security boundaries between sites.

Without PSL registration, browsers treat `tenant-a.foo.com` and `tenant-b.foo.com` as the same site. This means one tenant could potentially set cookies readable by another tenant—a serious security and privacy issue.

When you add `foo.com` to the PSL, browsers treat each tenant subdomain as an independent site. Cookies set by `tenant-a.foo.com` cannot be read by `tenant-b.foo.com`. Major platforms including GitHub (`github.io`), Vercel (`vercel.app`), and Netlify (`netlify.app`) are all registered on the PSL. If you're building tenant infrastructure, you should be too.

Submit via the [PSL GitHub repository](https://github.com/publicsuffix/list) with documentation proving you control the domain and explaining your multi-tenant use case.

### Token scope limiting

<Warning>
  Do not use global credentials. Scope your DNS provider tokens to the minimum required permissions on specific zones only.
</Warning>

* **Cloudflare** — scope tokens to specific zones with only `Zone.Zone:Read` and `Zone.DNS:Edit`
* **AWS Route53** — use IAM policies that grant access only to specific hosted zones, not all DNS resources in your account
* **GCP Cloud DNS** — create service accounts with the `dns.admin` role scoped to individual zones, not project-wide access

If your token leaks, the blast radius should be limited to DNS operations on specific zones—not your entire cloud account.

### Certificate revocation tradeoffs

If you need to revoke a wildcard certificate, you can't selectively revoke it for one tenant—revocation affects all tenants. This is a fundamental tradeoff of wildcard certificates.

If you need per-tenant revocation capability, you need per-tenant certificates. For most systems, the operational simplicity of wildcards outweighs this limitation.

## When not to use wildcard certificates

Wildcards are the wrong choice in these situations:

* **Tenants bring their own domains** — if tenants use `tenant-a.com` instead of `tenant-a.foo.com`, you need per-tenant certificates with ACME HTTP-01 challenges
* **Deep subdomain nesting** — `*.foo.com` doesn't cover `api.tenant-a.foo.com`; if your architecture requires nested subdomains, you need multiple wildcard certificates or per-tenant certificates
* **Regulatory compliance requiring certificate isolation** — some compliance frameworks require cryptographic isolation between tenants; if your wildcard private key is compromised, all tenants are affected
* **Per-tenant certificate revocation** — if you need to revoke access for individual tenants by revoking their certificate, wildcards won't work

## Summary

For multi-tenant systems with `tenant-id.foo.com` subdomains, wildcard certificates are the right choice. The implementation pattern is the same regardless of your infrastructure: pick a web server (Caddy, Nginx, HAProxy), integrate with your DNS provider's API (Cloudflare, Route53, Cloud DNS, Azure DNS), and let ACME automation handle the rest.

The alternative—per-tenant certificates—is operationally complex, technically fragile, and doesn't scale past a few hundred tenants. Wildcard certificates are the pragmatic choice, and modern tooling makes them trivial to implement across any cloud platform.
