,

Premium Content Playbook (2025): Keep SEO, Block AI Training & Scrapers From Your Members Area

Updated: October 19, 2025
When your best work lives behind a paywall, the last thing you want is AI models and “reader” apps quietly training on it. In this guide, we’ll allow Google/Bing/Apple to crawl your public content while blocking AI training and scraper bots from your premium paths (e.g., /premium/, /members/, /courses/). Let’s Block AI Training! We’ll do it in layers: robots.txt ➝ WAF rules ➝ server rules ➝ WordPress gate ➝ quick tests.


TL;DR (Copy & Go)

Selective opt-out robots.txt (only blocks premium)

# --- Keep SEO bots working sitewide ---
User-agent: Googlebot
Disallow:
User-agent: Bingbot
Disallow:
User-agent: Applebot
Disallow:

# --- Opt-out of AI training for premium path only ---
User-agent: Google-Extended
Disallow: /premium/
User-agent: Applebot-Extended
Disallow: /premium/

# --- Block common AI harvesters in premium path ---
User-agent: GPTBot
Disallow: /premium/
User-agent: CCBot
Disallow: /premium/
User-agent: PerplexityBot
Disallow: /premium/

# --- Housekeeping ---
User-agent: *
Allow: /$
Sitemap: https://aihika.com/sitemap_index.xml

Why this setup?

  • Public content = SEO friendly. Googlebot/Bingbot/Applebot can crawl everything they need.
  • Premium content = opt-out from AI training and blocked for common “AI reader” crawlers.
  • Defense in depth. robots.txt is polite control; some bots ignore it, so we add WAF and server rules plus a login gate.

Step-by-Step Guide

1) Decide your structure (2 minutes)

  • Use one folder for paid stuff, e.g., /premium/.
  • Keep a public preview on /blog/... or /premium/slug/ with an excerpt.

2) Add the selective robots.txt (3 minutes)

WordPress:

  • Yoast → Tools → File editor (or RankMath → General Settings → Edit robots.txt)
  • Paste the TL;DR block above → Save
  • Test in browser: https://yourdomain.com/robots.txt

Non-WP:

  • Upload robots.txt to webroot (public_html/), then open it in the browser.

Robots is not secret—don’t list private URLs. Just the folder.

3) Cloudflare WAF rule (edge-level blocking)

Target: requests to /premium/ with “AI” UAs → Block (or Managed Challenge if you want a softer gate).

Expression (example):

(http.request.uri.path starts_with "/premium/")
and (
  http.user_agent contains "GPTBot" or
  http.user_agent contains "CCBot" or
  http.user_agent contains "PerplexityBot" or
  http.user_agent contains "Perplexity-User" or
  http.user_agent contains "Claude-Web" or
  http.user_agent contains "AI2Bot" or
  http.user_agent contains "DataForSeoBot"
)

Tip: Create an Allow rule for (Googlebot|Bingbot|Applebot) so legit crawlers never get challenged.

4) Server rules (Nginx/Apache) — optional but strong

Nginx (inside your server block):

location ^~ /premium/ {
  # Kill known AI/scraper UAs
  if ($http_user_agent ~* "(GPTBot|CCBot|PerplexityBot|Perplexity-User|Claude-Web|AI2Bot|DataForSeoBot)") { return 403; }

  # Let good crawlers through (for preview snippets, if any)
  if ($http_user_agent ~* "(Googlebot|Bingbot|Applebot)") { try_files $uri $uri/ =404; break; }

  # Require WP login for humans
  if ($http_cookie !~* "wordpress_logged_in_") {
    return 302 https://yourdomain.com/login/?redirect_to=$scheme://$host$request_uri;
  }

  add_header X-Robots-Tag "noarchive, nosnippet" always;
  try_files $uri $uri/ /index.php?$args;
}

Apache (.htaccess):

RewriteEngine On

# Block AIs on premium
RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP_USER_AGENT} (GPTBot|CCBot|PerplexityBot|Perplexity-User|Claude-Web|AI2Bot|DataForSeoBot) [NC]
RewriteRule ^ - [F,L]

# Gate humans (must be logged in)
RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP:Cookie} !wordpress_logged_in_ [NC]
RewriteRule ^ https://yourdomain.com/login/?redirect_to=https://%{HTTP_HOST}%{REQUEST_URI} [R=302,L]

# Optional header to avoid snippet/archive leakage
<IfModule mod_headers.c>
  <FilesMatch "^premium/">
    Header set X-Robots-Tag "noarchive, nosnippet"
  </FilesMatch>
</IfModule>

5) WordPress-only fallback (if you can’t edit server)

Drop into a small mu-plugin or functions.php:

add_action('template_redirect', function () {
  $uri = $_SERVER['REQUEST_URI'] ?? '';
  if (strpos($uri, '/premium/') === 0) {
    // Allow major SEO bots to view previews
    $ua = $_SERVER['HTTP_USER_AGENT'] ?? '';
    if (preg_match('/Googlebot|Bingbot|Applebot/i', $ua)) { return; }

    if (!is_user_logged_in()) {
      wp_redirect( wp_login_url( home_url($uri) ) );
      exit;
    }
  }
});

6) Testing (1 minute each)

Run these from a terminal or your monitoring box:

# See your robots
curl -s https://yourdomain.com/robots.txt

# AI UA should be blocked on premium
curl -I -A "GPTBot" https://yourdomain.com/premium/example/
curl -I -A "Perplexity-User" https://yourdomain.com/premium/example/

# Human (no login) should redirect to login
curl -I https://yourdomain.com/premium/example/

# Googlebot should be allowed to fetch preview
curl -I -A "Googlebot" https://yourdomain.com/premium/example/

Short FAQ

Q1. Will this hurt my Google rankings?
No—Googlebot stays allowed. We only block Google-Extended (AI training) and certain AI/scraper UAs on /premium/.

Q2. Why do I need WAF if I already set robots.txt?
Because robots.txt is a polite request. Some bots ignore it. WAF blocks at the edge before they touch PHP/MySQL.

Q3. Do I need both Nginx/Apache rules and the WP snippet?
Pick one strong layer plus WAF. If you can edit server config, use Nginx/Apache. If not, use the WP snippet as a fallback.

Q4. Should I block /premium/ from Google completely?
Usually no. Keep a public preview page indexed for discovery; gate the full content behind login.

Q5. Can I use a different path, like /members/ or /courses/?
Absolutely. Replace /premium/ everywhere (robots, WAF, server, WP code).

Q6. What about new AI bots I haven’t listed?
Review logs monthly. Add new UAs/IPs to your WAF expression. This landscape changes fast.

Bonus: Copy Pack (one place for your VA)

robots.txt (selective)

User-agent: Googlebot
Disallow:
User-agent: Bingbot
Disallow:
User-agent: Applebot
Disallow:

User-agent: Google-Extended
Disallow: /premium/
User-agent: Applebot-Extended
Disallow: /premium/

User-agent: GPTBot
Disallow: /premium/
User-agent: CCBot
Disallow: /premium/
User-agent: PerplexityBot
Disallow: /premium/

User-agent: *
Allow: /$
Sitemap: https://yourdomain.com/sitemap_index.xml

Cloudflare WAF expression

(http.request.uri.path starts_with "/premium/") and
(http.user_agent contains "GPTBot" or http.user_agent contains "CCBot" or
 http.user_agent contains "PerplexityBot" or http.user_agent contains "Perplexity-User" or
 http.user_agent contains "Claude-Web" or http.user_agent contains "AI2Bot" or
 http.user_agent contains "DataForSeoBot")

Nginx location

location ^~ /premium/ {
  if ($http_user_agent ~* "(GPTBot|CCBot|PerplexityBot|Perplexity-User|Claude-Web|AI2Bot|DataForSeoBot)") { return 403; }
  if ($http_user_agent ~* "(Googlebot|Bingbot|Applebot)") { try_files $uri $uri/ =404; break; }
  if ($http_cookie !~* "wordpress_logged_in_") { return 302 https://yourdomain.com/login/?redirect_to=$scheme://$host$request_uri; }
  add_header X-Robots-Tag "noarchive, nosnippet" always;
  try_files $uri $uri/ /index.php?$args;
}

Apache .htaccess

RewriteEngine On
RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP_USER_AGENT} (GPTBot|CCBot|PerplexityBot|Perplexity-User|Claude-Web|AI2Bot|DataForSeoBot) [NC]
RewriteRule ^ - [F,L]

RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP:Cookie} !wordpress_logged_in_ [NC]
RewriteRule ^ https://yourdomain.com/login/?redirect_to=https://%{HTTP_HOST}%{REQUEST_URI} [R=302,L]

<IfModule mod_headers.c>
  <FilesMatch "^premium/">
    Header set X-Robots-Tag "noarchive, nosnippet"
  </FilesMatch>
</IfModule>

If you want, I can now draft a “5-minute Cloudflare recipe” spinoff (super concise guide with only the

Sources & Further Reading

Google-Extended announcement

An update on web publisher controls — what Google-Extended is and why it exists.

Google common crawlers (incl. Google-Extended)

Google’s common crawlers — tokens, behavior & robots.txt notes.

Vertex AI: Grounding with Google Search

Grounding with Google Search — honors Google-Extended disallow.

Applebot & Applebot-Extended

About Applebot — Apple’s crawler, features it powers, and control notes.

OpenAI GPTBot

GPTBot docs — how to allow/deny via robots.txt.

Common Crawl CCBot

CCBot page  |  FAQ — UA string, robots.txt, crawl-delay.

Perplexity crawlers

PerplexityBot & Perplexity-User — roles, IP JSON, behavior.

Cloudflare on “stealth crawling” claims

Cloudflare blog  |  Coverage: The Verge

Cloudflare WAF (User-Agent blocking)

User Agent Blocking — create UA rules; actions (Block/Challenge).

Google: Paywalled content markup

Subscription & Paywalled Content — structured data & SEO best practices.

Edit robots.txt in Yoast

Yoast guide — File Editor path + tips.

Edit robots.txt in Rank Math

Rank Math guide — virtual robots.txt editor.

Leave a Reply

Your email address will not be published. Required fields are marked *