AI Discovery Layer
AI Discovery Layer
Endpoint Overview
All 14 endpoints are registered at template_redirect priority 1: before WordPress renders any page. Each endpoint checks the request path, verifies the Discovery Hub is enabled, generates content, sends headers, and exits.
Every endpoint inherits from BaseHandler:
abstract class BaseHandler implements Contracts\Handler {
abstract protected function path(): string;
abstract protected function contentType(): string;
abstract protected function generate(): string;
public function handle(): void {
if (URLManager::get_request_path() !== $this->path()) return;
$settings = get_option('cybermaps_settings', []);
if (empty($settings['enable_discovery_hub'])) return;
$output = $this->generate();
Integrity::send_headers($output);
header('Content-Type: ' . $this->contentType());
echo $output;
exit;
}
}
/llms.txt and /llms-full.txt
Implements the llmstxt.org proposal: a machine-readable site overview for large language models.
Structure:
# Site Title
> Site description and mission statement
targeted_agents:
- GPTBot
- Claude-Web
- anthropic-ai
## 📚 Research & Information Hub
- [Post Title](url): AI-extracted snippet
## 🎯 Conversion & Action Center
- [Page Title](url): AI-extracted snippet
## Optional
- TL;DR Summary: /llms-tldr.txt
- Knowledge Graph: /knowledge-graph.json
- REST API: /wp-json/cybermaps/v1/discovery
- MCP Server: /.well-known/mcp/server-card.json
- JSON Feed: /feed.json
## YAML Sitemap
/path:
priority: 0.8
updated: 2026-05-26
type: informational
Content Map: Posts are split into intent-based silos (informational vs. transactional) using IntentEngine::calculate(). Configurable through:
llms_included_types: which post types to includellms_exclude_ids: specific post IDs to excludellms_filter_taxonomies: taxonomy term filtering_cybermaps_exclude_aipost meta: per-post exclusion checkbox
Dual output: llms.txt includes 20 posts. llms-full.txt includes 100 posts. Both share the same YAML frontmatter and sitemap block.
/llms-tldr.txt
Single-paragraph summary with:
- Numeric quality scoring (
calculate_semantic_score()): content length, heading count, media presence, outgoing links - Topic clustering via shared taxonomy terms
- Duplicate detection: posts sharing ≥2 taxonomy terms are collapsed
- Quality filtering: excludes shortcode-only content, empty posts, and demo/placeholder text
- Hard 80K token cap: output is truncated with a message if it exceeds the budget
- AI-generated snippets preferred over raw excerpts
/.well-known/ai.json
AI Discovery Protocol (ADP) manifest. Schema version 3.0:
{
"schema_version": "3.0",
"ai_manifest": {
"capabilities": ["content_discovery", "semantic_search", "structured_data"],
"endpoints": {
"llms_txt": "https://example.com/llms.txt",
"knowledge_graph": "https://example.com/knowledge-graph.json",
"search_api": "https://example.com/wp-json/cybermaps/v1/search"
},
"targeted_agents": ["GPTBot", "Claude-Web", "PerplexityBot"]
}
}
The targeted_agents list is derived from CrawlerRegistry::get_targeted_agents(): all bots with LLM access enabled, respecting per-bot override settings.
/ai-sitemap.xml
AI-optimized XML sitemap with semantic metadata:
<url>
<loc>https://example.com/post-slug</loc>
<ai:visual_weight>0.85</ai:visual_weight>
<ai:info_gain>0.72</ai:info_gain>
<ai:intent>informational</ai:intent>
</url>
info_gain is a float (0.0-1.0) computed from content freshness, recency, and structural quality. visual_weight comes from media attachment analysis. intent comes from IntentEngine.
/knowledge-graph.json
Full schema.org graph for the entire site. Includes:
- Identity data from Identity Hub (organization/person, social profiles, contact points)
- Post type listings with
ItemListschema - Intent silos with
CollectionPageschema - Catalog references for ecommerce sites
Cached for 1 hour. Invalidated on save_post and settings changes.
/feed.json
JSON Feed v1.1: programmatic content access:
{
"version": "https://jsonfeed.org/version/1.1",
"title": "Site Title",
"home_page_url": "https://example.com",
"feed_url": "https://example.com/feed.json",
"items": [
{
"id": "https://example.com/post-slug",
"url": "https://example.com/post-slug",
"title": "Post Title",
"content_text": "...",
"date_published": "2026-05-26T17:17:00+00:00"
}
]
}
Cached for 15 minutes.
/.well-known/ai-plugin.json
OpenAI/ChatGPT plugin manifest. Compatible with ChatGPT, Copilot, and other plugin-capable AI platforms:
{
"schema_version": "v1",
"name_for_model": "Site Name",
"name_for_human": "Site Name Discovery",
"description_for_model": "Access the knowledge graph, semantic search...",
"api": {
"type": "openapi",
"url": "https://example.com/wp-json/cybermaps/v1/discovery"
}
}
/.well-known/mcp/server-card.json
Model Context Protocol (MCP) server card. Advertises tools and resources for MCP-compatible AI clients.
/skill.md
Machine-readable site skills and capabilities. Describes what the site offers and how AI agents can interact with it. Uses Markdown format for both human and machine readability.
Response Headers
All discovery endpoints include:
| Header | Purpose |
|---|---|
Content-Type |
Appropriate MIME type per endpoint |
Cache-Control |
max-age matching the endpoint’s cache TTL |
ETag |
SHA-256 hash for conditional requests |
Last-Modified |
Most recent post modification time (GMT) |
X-Cybermaps-Token-Count |
Estimated token count for LLM budget planning |
Content-Digest |
SHA-256 hash for integrity verification |