What’s the goal of this playbook?

Keep public pages SEO-friendly while preventing AI training and scraper bots from accessing members-only or premium directories (for example, /premium/).

Will this hurt Google rankings?

No—Googlebot remains allowed to crawl public content and preview pages. Only AI-training tokens (for example, Google-Extended) and specific AI/scraper user agents are blocked on premium paths.

What’s the difference between Googlebot and Google-Extended?

Googlebot indexes for Search; Google-Extended controls whether your content can be used to improve certain generative AI products. You can allow Googlebot and disallow Google-Extended simultaneously.

How do I block AI training but keep SEO?

Use a selective robots.txt: allow Googlebot/Bingbot/Applebot sitewide, then disallow Google-Extended and Applebot-Extended only on premium paths. Add server/WAF rules as enforcement and keep a public preview for indexing.

Why add WAF and server rules if robots.txt exists?

robots.txt is advisory—some bots ignore it. WAF and web server rules provide enforcement (block/challenge specific user agents or patterns) on protected directories.

How should I structure URLs?

Put paid content under one folder (for example, /premium/) and publish a public preview (lead-in) that’s indexable. Gate the full article behind login.

Do I need paywalled content markup?

Yes for paywalled articles shown in Search. Use Google’s subscription/paywalled structured data and optional noarchive/nosnippet on the gated portion to be transparent while preserving SEO.

Which user agents should I review for premium paths?

Common ones include Google-Extended, Applebot-Extended, GPTBot, CCBot, and PerplexityBot. Monitor logs and update rules as new crawlers appear.

How do I test that protections work?

Use curl with custom User-Agent headers to request a /premium/ URL (for example, GPTBot or Perplexity-User) and confirm 403/blocked. Test Googlebot to ensure previews still fetch.

Can I customize for /members/ or /courses/?

Yes—replace /premium/ consistently across robots.txt, WAF expressions, server rules, and WordPress gates.

Premium Content Playbook 2025: SEO Without Scrapers

Updated: October 22, 2025
When your best work lives behind a paywall, the last thing you want is AI models and “reader” apps quietly training on it. In this guide, we’ll allow Google/Bing/Apple to crawl your public content while blocking AI training and scraper bots from your premium paths (e.g., /premium/, /members/, /courses/). Let’s Block AI Training! We’ll do it in layers: robots.txt ➝ WAF rules ➝ server rules ➝ WordPress gate ➝ quick tests.

premium content playbook - aihika.com — Premium Content Playbook (2025): Keep SEO, Block AI Training & Scrapers From Your Members Area 2

TL;DR (Copy & Go)

Selective opt-out robots.txt (only blocks premium)

# --- Keep SEO bots working sitewide ---
User-agent: Googlebot
Disallow:
User-agent: Bingbot
Disallow:
User-agent: Applebot
Disallow:

# --- Opt-out of AI training for premium path only ---
User-agent: Google-Extended
Disallow: /premium/
User-agent: Applebot-Extended
Disallow: /premium/

# --- Block common AI harvesters in premium path ---
User-agent: GPTBot
Disallow: /premium/
User-agent: CCBot
Disallow: /premium/
User-agent: PerplexityBot
Disallow: /premium/

# --- Housekeeping ---
User-agent: *
Allow: /$
Sitemap: https://aihika.com/sitemap_index.xml

Why this setup?

Public content = SEO friendly. Googlebot/Bingbot/Applebot can crawl everything they need.
Premium content = opt-out from AI training and blocked for common “AI reader” crawlers.
Defense in depth. robots.txt is polite control; some bots ignore it, so we add WAF and server rules plus a login gate.

Step-by-Step Guide

1) Decide your structure (2 minutes)

Use one folder for paid stuff, e.g., /premium/.
Keep a public preview on /blog/... or /premium/slug/ with an excerpt.

2) Add the selective `robots.txt` (3 minutes)

WordPress:

Yoast → Tools → File editor (or RankMath → General Settings → Edit robots.txt)
Paste the TL;DR block above → Save
Test in browser: https://yourdomain.com/robots.txt

Non-WP:

Upload robots.txt to webroot (public_html/), then open it in the browser.

Robots is not secret—don’t list private URLs. Just the folder.

3) Cloudflare WAF rule (edge-level blocking)

Target: requests to /premium/ with “AI” UAs → Block (or Managed Challenge if you want a softer gate).

Expression (example):

(http.request.uri.path starts_with "/premium/")
and (
  http.user_agent contains "GPTBot" or
  http.user_agent contains "CCBot" or
  http.user_agent contains "PerplexityBot" or
  http.user_agent contains "Perplexity-User" or
  http.user_agent contains "Claude-Web" or
  http.user_agent contains "AI2Bot" or
  http.user_agent contains "DataForSeoBot"
)

Tip: Create an Allow rule for (Googlebot|Bingbot|Applebot) so legit crawlers never get challenged.

4) Server rules (Nginx/Apache) — optional but strong

Nginx (inside your server block):

location ^~ /premium/ {
  # Kill known AI/scraper UAs
  if ($http_user_agent ~* "(GPTBot|CCBot|PerplexityBot|Perplexity-User|Claude-Web|AI2Bot|DataForSeoBot)") { return 403; }

  # Let good crawlers through (for preview snippets, if any)
  if ($http_user_agent ~* "(Googlebot|Bingbot|Applebot)") { try_files $uri $uri/ =404; break; }

  # Require WP login for humans
  if ($http_cookie !~* "wordpress_logged_in_") {
    return 302 https://yourdomain.com/login/?redirect_to=$scheme://$host$request_uri;
  }

  add_header X-Robots-Tag "noarchive, nosnippet" always;
  try_files $uri $uri/ /index.php?$args;
}

Apache (.htaccess):

RewriteEngine On

# Block AIs on premium
RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP_USER_AGENT} (GPTBot|CCBot|PerplexityBot|Perplexity-User|Claude-Web|AI2Bot|DataForSeoBot) [NC]
RewriteRule ^ - [F,L]

# Gate humans (must be logged in)
RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP:Cookie} !wordpress_logged_in_ [NC]
RewriteRule ^ https://yourdomain.com/login/?redirect_to=https://%{HTTP_HOST}%{REQUEST_URI} [R=302,L]

# Optional header to avoid snippet/archive leakage
<IfModule mod_headers.c>
  <FilesMatch "^premium/">
    Header set X-Robots-Tag "noarchive, nosnippet"
  </FilesMatch>
</IfModule>

5) WordPress-only fallback (if you can’t edit server)

Drop into a small mu-plugin or functions.php:

add_action('template_redirect', function () {
  $uri = $_SERVER['REQUEST_URI'] ?? '';
  if (strpos($uri, '/premium/') === 0) {
    // Allow major SEO bots to view previews
    $ua = $_SERVER['HTTP_USER_AGENT'] ?? '';
    if (preg_match('/Googlebot|Bingbot|Applebot/i', $ua)) { return; }

    if (!is_user_logged_in()) {
      wp_redirect( wp_login_url( home_url($uri) ) );
      exit;
    }
  }
});

6) Testing (1 minute each)

Run these from a terminal or your monitoring box:

# See your robots
curl -s https://yourdomain.com/robots.txt

# AI UA should be blocked on premium
curl -I -A "GPTBot" https://yourdomain.com/premium/example/
curl -I -A "Perplexity-User" https://yourdomain.com/premium/example/

# Human (no login) should redirect to login
curl -I https://yourdomain.com/premium/example/

# Googlebot should be allowed to fetch preview
curl -I -A "Googlebot" https://yourdomain.com/premium/example/

FAQ Premium Content Playbook

Q1. Will this hurt my Google rankings?
No Googlebot stays allowed. We only block Google-Extended (AI training) and certain AI/scraper UAs on /premium/.

Q2. Why do I need WAF if I already set robots.txt?
Because robots.txt is a polite request. Some bots ignore it. WAF blocks at the edge before they touch PHP/MySQL.

Q3. Do I need both Nginx/Apache rules and the WP snippet?
Pick one strong layer plus WAF. If you can edit server config, use Nginx/Apache. If not, use the WP snippet as a fallback.

Q4. Should I block /premium/ from Google completely?
Usually no. Keep a public preview page indexed for discovery; gate the full content behind login.

Q5. Can I use a different path, like /members/ or /courses/?
Absolutely. Replace /premium/ everywhere (robots, WAF, server, WP code).

Q6. What about new AI bots I haven’t listed?
Review logs monthly. Add new UAs/IPs to your WAF expression. This landscape changes fast.

Bonus: Copy Pack (one place for your VA)

robots.txt (selective)

User-agent: Googlebot
Disallow:
User-agent: Bingbot
Disallow:
User-agent: Applebot
Disallow:

User-agent: Google-Extended
Disallow: /premium/
User-agent: Applebot-Extended
Disallow: /premium/

User-agent: GPTBot
Disallow: /premium/
User-agent: CCBot
Disallow: /premium/
User-agent: PerplexityBot
Disallow: /premium/

User-agent: *
Allow: /$
Sitemap: https://yourdomain.com/sitemap_index.xml

Cloudflare WAF expression

(http.request.uri.path starts_with "/premium/") and
(http.user_agent contains "GPTBot" or http.user_agent contains "CCBot" or
 http.user_agent contains "PerplexityBot" or http.user_agent contains "Perplexity-User" or
 http.user_agent contains "Claude-Web" or http.user_agent contains "AI2Bot" or
 http.user_agent contains "DataForSeoBot")

Nginx location

location ^~ /premium/ {
  if ($http_user_agent ~* "(GPTBot|CCBot|PerplexityBot|Perplexity-User|Claude-Web|AI2Bot|DataForSeoBot)") { return 403; }
  if ($http_user_agent ~* "(Googlebot|Bingbot|Applebot)") { try_files $uri $uri/ =404; break; }
  if ($http_cookie !~* "wordpress_logged_in_") { return 302 https://yourdomain.com/login/?redirect_to=$scheme://$host$request_uri; }
  add_header X-Robots-Tag "noarchive, nosnippet" always;
  try_files $uri $uri/ /index.php?$args;
}

Apache .htaccess

RewriteEngine On
RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP_USER_AGENT} (GPTBot|CCBot|PerplexityBot|Perplexity-User|Claude-Web|AI2Bot|DataForSeoBot) [NC]
RewriteRule ^ - [F,L]

RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP:Cookie} !wordpress_logged_in_ [NC]
RewriteRule ^ https://yourdomain.com/login/?redirect_to=https://%{HTTP_HOST}%{REQUEST_URI} [R=302,L]

<IfModule mod_headers.c>
  <FilesMatch "^premium/">
    Header set X-Robots-Tag "noarchive, nosnippet"
  </FilesMatch>
</IfModule>

If you want, I can now draft a “5-minute Cloudflare recipe” spinoff (super concise guide with only the

Premium Content Playbook (2025): Keep SEO, Block AI Training & Scrapers From Your Members Area

Table of Contents

TL;DR (Copy & Go)

Why this setup?

Step-by-Step Guide

1) Decide your structure (2 minutes)

2) Add the selective `robots.txt` (3 minutes)

3) Cloudflare WAF rule (edge-level blocking)

4) Server rules (Nginx/Apache) — optional but strong

5) WordPress-only fallback (if you can’t edit server)

6) Testing (1 minute each)

FAQ Premium Content Playbook

Bonus: Copy Pack (one place for your VA)

About

Aihika.com

RECENT POST

AI Chatbot: The Ultimate Guide to Artificial Intelligence Chat

Exclusive Report: How 5 Fortune 500 Companies Are Using “AI Managers” And What It Means for 40% of White-Collar Workers

Gemini 3 Review (2025): Is It Finally Better Than GPT-5 for Coding & SEO?

The Truth About AI Layoffs 2025: Turning Disruption Into Opportunity

Categories

MAIN TOPIC

SEO IN AI ERA

AI Change Daily Routibe

Web Trends 2025

You May Also Like:

AI Chatbot: The Ultimate Guide to Artificial Intelligence Chat

Exclusive Report: How 5 Fortune 500 Companies Are Using “AI Managers” And What It Means for 40% of White-Collar Workers

Gemini 3 Review (2025): Is It Finally Better Than GPT-5 for Coding & SEO?

The Truth About AI Layoffs 2025: Turning Disruption Into Opportunity

The Shocking Philosophy of AI in Sci-Fi: Ancient Wisdom Driving Today’s Machine Fears 2025

7 Game-Changing Powerful Ways to Use AI Tools for Small Business in 2025

Meta AI + Midjourney in 2025: Free Image & Video Creation and A Practical Guide

ChatGPT Agents Builder (2025): How to Build AI Agent with ChatGPT

Premium Content Playbook (2025): Keep SEO, Block AI Training & Scrapers From Your Members Area

Table of Contents

TL;DR (Copy & Go)

Why this setup?

Step-by-Step Guide

1) Decide your structure (2 minutes)

2) Add the selective robots.txt (3 minutes)

3) Cloudflare WAF rule (edge-level blocking)

4) Server rules (Nginx/Apache) — optional but strong

5) WordPress-only fallback (if you can’t edit server)

6) Testing (1 minute each)

FAQ Premium Content Playbook

Bonus: Copy Pack (one place for your VA)

Related Articles from Aihika

Sources & Further Reading

About

Aihika.com

RECENT POST

Categories

MAIN TOPIC

You May Also Like:

2) Add the selective `robots.txt` (3 minutes)