AI Discovery Layer

AI Discovery Layer

Endpoint Overview

All 14 endpoints are registered at template_redirect priority 1: before WordPress renders any page. Each endpoint checks the request path, verifies the Discovery Hub is enabled, generates content, sends headers, and exits.

Every endpoint inherits from BaseHandler:

abstract class BaseHandler implements Contracts\Handler {
    abstract protected function path(): string;
    abstract protected function contentType(): string;
    abstract protected function generate(): string;

    public function handle(): void {
        if (URLManager::get_request_path() !== $this->path()) return;
        $settings = get_option('cybermaps_settings', []);
        if (empty($settings['enable_discovery_hub'])) return;
        $output = $this->generate();
        Integrity::send_headers($output);
        header('Content-Type: ' . $this->contentType());
        echo $output;
        exit;
    }
}

/llms.txt and /llms-full.txt

Implements the llmstxt.org proposal: a machine-readable site overview for large language models.

Structure:

# Site Title
> Site description and mission statement

targeted_agents:
  - GPTBot
  - Claude-Web
  - anthropic-ai

## 📚 Research & Information Hub
- [Post Title](url): AI-extracted snippet

## 🎯 Conversion & Action Center
- [Page Title](url): AI-extracted snippet

## Optional
- TL;DR Summary: /llms-tldr.txt
- Knowledge Graph: /knowledge-graph.json
- REST API: /wp-json/cybermaps/v1/discovery
- MCP Server: /.well-known/mcp/server-card.json
- JSON Feed: /feed.json

## YAML Sitemap
/path:
  priority: 0.8
  updated: 2026-05-26
  type: informational

Content Map: Posts are split into intent-based silos (informational vs. transactional) using IntentEngine::calculate(). Configurable through:

  • llms_included_types: which post types to include
  • llms_exclude_ids: specific post IDs to exclude
  • llms_filter_taxonomies: taxonomy term filtering
  • _cybermaps_exclude_ai post meta: per-post exclusion checkbox

Dual output: llms.txt includes 20 posts. llms-full.txt includes 100 posts. Both share the same YAML frontmatter and sitemap block.

/llms-tldr.txt

Single-paragraph summary with:

  • Numeric quality scoring (calculate_semantic_score()): content length, heading count, media presence, outgoing links
  • Topic clustering via shared taxonomy terms
  • Duplicate detection: posts sharing ≥2 taxonomy terms are collapsed
  • Quality filtering: excludes shortcode-only content, empty posts, and demo/placeholder text
  • Hard 80K token cap: output is truncated with a message if it exceeds the budget
  • AI-generated snippets preferred over raw excerpts

/.well-known/ai.json

AI Discovery Protocol (ADP) manifest. Schema version 3.0:

{
  "schema_version": "3.0",
  "ai_manifest": {
    "capabilities": ["content_discovery", "semantic_search", "structured_data"],
    "endpoints": {
      "llms_txt": "https://example.com/llms.txt",
      "knowledge_graph": "https://example.com/knowledge-graph.json",
      "search_api": "https://example.com/wp-json/cybermaps/v1/search"
    },
    "targeted_agents": ["GPTBot", "Claude-Web", "PerplexityBot"]
  }
}

The targeted_agents list is derived from CrawlerRegistry::get_targeted_agents(): all bots with LLM access enabled, respecting per-bot override settings.

/ai-sitemap.xml

AI-optimized XML sitemap with semantic metadata:

<url>
  <loc>https://example.com/post-slug</loc>
  <ai:visual_weight>0.85</ai:visual_weight>
  <ai:info_gain>0.72</ai:info_gain>
  <ai:intent>informational</ai:intent>
</url>

info_gain is a float (0.0-1.0) computed from content freshness, recency, and structural quality. visual_weight comes from media attachment analysis. intent comes from IntentEngine.

/knowledge-graph.json

Full schema.org graph for the entire site. Includes:

  • Identity data from Identity Hub (organization/person, social profiles, contact points)
  • Post type listings with ItemList schema
  • Intent silos with CollectionPage schema
  • Catalog references for ecommerce sites

Cached for 1 hour. Invalidated on save_post and settings changes.

/feed.json

JSON Feed v1.1: programmatic content access:

{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Site Title",
  "home_page_url": "https://example.com",
  "feed_url": "https://example.com/feed.json",
  "items": [
    {
      "id": "https://example.com/post-slug",
      "url": "https://example.com/post-slug",
      "title": "Post Title",
      "content_text": "...",
      "date_published": "2026-05-26T17:17:00+00:00"
    }
  ]
}

Cached for 15 minutes.

/.well-known/ai-plugin.json

OpenAI/ChatGPT plugin manifest. Compatible with ChatGPT, Copilot, and other plugin-capable AI platforms:

{
  "schema_version": "v1",
  "name_for_model": "Site Name",
  "name_for_human": "Site Name Discovery",
  "description_for_model": "Access the knowledge graph, semantic search...",
  "api": {
    "type": "openapi",
    "url": "https://example.com/wp-json/cybermaps/v1/discovery"
  }
}

/.well-known/mcp/server-card.json

Model Context Protocol (MCP) server card. Advertises tools and resources for MCP-compatible AI clients.

/skill.md

Machine-readable site skills and capabilities. Describes what the site offers and how AI agents can interact with it. Uses Markdown format for both human and machine readability.

Response Headers

All discovery endpoints include:

Header Purpose
Content-Type Appropriate MIME type per endpoint
Cache-Control max-age matching the endpoint’s cache TTL
ETag SHA-256 hash for conditional requests
Last-Modified Most recent post modification time (GMT)
X-Cybermaps-Token-Count Estimated token count for LLM budget planning
Content-Digest SHA-256 hash for integrity verification