AI Discovery Layer

AI Discovery Layer

Endpoint Overview

All 14 endpoints are registered at template_redirect priority 1: before WordPress renders any page. Each endpoint checks the request path, verifies the Discovery Hub is enabled, generates content, sends headers, and exits.

Every endpoint inherits from BaseHandler:

abstract class BaseHandler implements Contracts\Handler {
    abstract protected function path(): string;
    abstract protected function contentType(): string;
    abstract protected function generate(): string;

    public function handle(): void {
        if (URLManager::get_request_path() !== $this->path()) return;
        $settings = get_option('cybermaps_settings', []);
        if (empty($settings['enable_discovery_hub'])) return;
        $output = $this->generate();
        Integrity::send_headers($output);
        header('Content-Type: ' . $this->contentType());
        echo $output;
        exit;
    }
}

`/llms.txt` and `/llms-full.txt`

Implements the llmstxt.org proposal: a machine-readable site overview for large language models.

Structure:

# Site Title
> Site description and mission statement

targeted_agents:
  - GPTBot
  - Claude-Web
  - anthropic-ai

## 📚 Research & Information Hub
- [Post Title](url): AI-extracted snippet

## 🎯 Conversion & Action Center
- [Page Title](url): AI-extracted snippet

## Optional
- TL;DR Summary: /llms-tldr.txt
- Knowledge Graph: /knowledge-graph.json
- REST API: /wp-json/cybermaps/v1/discovery
- MCP Server: /.well-known/mcp/server-card.json
- JSON Feed: /feed.json

## YAML Sitemap
/path:
  priority: 0.8
  updated: 2026-05-26
  type: informational

Content Map: Posts are split into intent-based silos (informational vs. transactional) using IntentEngine::calculate(). Configurable through:

llms_included_types: which post types to include
llms_exclude_ids: specific post IDs to exclude
llms_filter_taxonomies: taxonomy term filtering
_cybermaps_exclude_ai post meta: per-post exclusion checkbox

Dual output: llms.txt includes 20 posts. llms-full.txt includes 100 posts. Both share the same YAML frontmatter and sitemap block.

`/llms-tldr.txt`

Single-paragraph summary with:

Numeric quality scoring (calculate_semantic_score()): content length, heading count, media presence, outgoing links
Topic clustering via shared taxonomy terms
Duplicate detection: posts sharing ≥2 taxonomy terms are collapsed
Quality filtering: excludes shortcode-only content, empty posts, and demo/placeholder text
Hard 80K token cap: output is truncated with a message if it exceeds the budget
AI-generated snippets preferred over raw excerpts

`/.well-known/ai.json`

AI Discovery Protocol (ADP) manifest. Schema version 3.0:

{
  "schema_version": "3.0",
  "ai_manifest": {
    "capabilities": ["content_discovery", "semantic_search", "structured_data"],
    "endpoints": {
      "llms_txt": "https://example.com/llms.txt",
      "knowledge_graph": "https://example.com/knowledge-graph.json",
      "search_api": "https://example.com/wp-json/cybermaps/v1/search"
    },
    "targeted_agents": ["GPTBot", "Claude-Web", "PerplexityBot"]
  }
}

The targeted_agents list is derived from CrawlerRegistry::get_targeted_agents(): all bots with LLM access enabled, respecting per-bot override settings.

`/ai-sitemap.xml`

AI-optimized XML sitemap with semantic metadata:

<url>
  <loc>https://example.com/post-slug</loc>
  <ai:visual_weight>0.85</ai:visual_weight>
  <ai:info_gain>0.72</ai:info_gain>
  <ai:intent>informational</ai:intent>
</url>

info_gain is a float (0.0-1.0) computed from content freshness, recency, and structural quality. visual_weight comes from media attachment analysis. intent comes from IntentEngine.

`/knowledge-graph.json`

Full schema.org graph for the entire site. Includes:

Identity data from Identity Hub (organization/person, social profiles, contact points)
Post type listings with ItemList schema
Intent silos with CollectionPage schema
Catalog references for ecommerce sites

Cached for 1 hour. Invalidated on save_post and settings changes.

`/feed.json`

JSON Feed v1.1: programmatic content access:

{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Site Title",
  "home_page_url": "https://example.com",
  "feed_url": "https://example.com/feed.json",
  "items": [
    {
      "id": "https://example.com/post-slug",
      "url": "https://example.com/post-slug",
      "title": "Post Title",
      "content_text": "...",
      "date_published": "2026-05-26T17:17:00+00:00"
    }
  ]
}

Cached for 15 minutes.

`/.well-known/ai-plugin.json`

OpenAI/ChatGPT plugin manifest. Compatible with ChatGPT, Copilot, and other plugin-capable AI platforms:

{
  "schema_version": "v1",
  "name_for_model": "Site Name",
  "name_for_human": "Site Name Discovery",
  "description_for_model": "Access the knowledge graph, semantic search...",
  "api": {
    "type": "openapi",
    "url": "https://example.com/wp-json/cybermaps/v1/discovery"
  }
}

`/.well-known/mcp/server-card.json`

Model Context Protocol (MCP) server card. Advertises tools and resources for MCP-compatible AI clients.

`/skill.md`

Machine-readable site skills and capabilities. Describes what the site offers and how AI agents can interact with it. Uses Markdown format for both human and machine readability.

Response Headers

All discovery endpoints include:

Header	Purpose
`Content-Type`	Appropriate MIME type per endpoint
`Cache-Control`	`max-age` matching the endpoint’s cache TTL
`ETag`	SHA-256 hash for conditional requests
`Last-Modified`	Most recent post modification time (GMT)
`X-Cybermaps-Token-Count`	Estimated token count for LLM budget planning
`Content-Digest`	SHA-256 hash for integrity verification

Endpoint Overview

/llms.txt and /llms-full.txt

/llms-tldr.txt

/.well-known/ai.json

/ai-sitemap.xml

/knowledge-graph.json

/feed.json

/.well-known/ai-plugin.json

/.well-known/mcp/server-card.json

/skill.md