Semantic Sitemaps

Semantic Sitemaps & Information Gain

Traditional sitemaps are built for 1990s indexing algorithms. CYBERMAPS introduces the Semantic Sitemap Extension, adding high-density metadata for LLM training and RAG ingestion.

AI Namespace Extensions

We extend the standard Sitemap XML with the ai: namespace:

  • ai:info_gain: A value from 0.0 to 1.0 indicating the “uniqueness” of the content. High info-gain pages are prioritized for training.
  • ai:visual_weight: Indicates the importance of media on the page for multimodal models (GPT-4o, Gemini 1.5).
  • ai:intent: Classifies the page as informational, transactional, or navigational.

Example Output

<url>
  <loc>https://example.com/deep-dive-post</loc>
  <lastmod>2026-05-27</lastmod>
  <ai:info_gain>0.92</ai:info_gain>
  <ai:intent>informational</ai:intent>
</url>

Substance Scoring

Before a page is included in the AI sitemap, it must pass a Substance Audit. Pages with low text-to-code ratios or shortcode-heavy “thin content” are automatically excluded to preserve the quality of the discovery data.