Content Taxonomy and Schema for Turing Post Newsletter

· combray's blog


Date: 2025-12-01 Purpose: Define a consistent frontmatter schema for all newsletter content Source: Analysis of ~130 emails in content/uncategorized/


Executive Summary #

The Turing Post newsletter contains several distinct content series, each with identifiable patterns in titles. This document defines a frontmatter schema that can be applied consistently across all emails to enable filtering, categorization, and content discovery.


Content Series Identification #

Primary Series (Numbered Episodes) #

Series Title Pattern Frontmatter Value Example
Findings of the Day FOD#N: series: fod "FOD#64: Golden Age for Indie Devs"
Topic Deep Dives Topic N: series: topic "Topic 4: What is FSDP and YaFSDP?"
Agentic Series 🦸🏻#N: or 🦸#N: series: agentic "🦸🏻#5: Building Blocks of Agentic Systems"
SF Insights 🌁#N: series: insights "🌁#81: Key AI Concepts to Follow in 2025"
Concepts Concepts: series: concepts "Concepts: RLHF, RLAIF, RLEF, RLCF"
Podcast/Interviews πŸŽ™οΈ series: podcast "πŸŽ™οΈπŸ§© TP/Inference: Sharon Zhou..."

Non-Numbered Series #

Series Identification Frontmatter Value
Unicorn Profiles Company deep-dive structure, "Unicorn Journey" in title series: unicorn
Guest Posts Guest Post: or Guest post: prefix series: guestpost

Non-Series Content Types #

Type Identification Frontmatter Value
Curated Lists Book lists, resource roundups, recaps content_type: curated
Announcements Welcome messages, holiday greetings content_type: announcement
Sponsored Content Webinars, ebook promotions, partner content content_type: sponsored
Forwarded Emails Fwd: prefix content_type: forwarded
Surveys Requests for reader input content_type: survey

Frontmatter Schema #

Required Fields #

1---
2title: "string"           # Cleaned title (remove redundant prefixes if needed)
3date: YYYY-MM-DD          # Publication date
4series: "string|null"     # One of: fod, topic, agentic, insights, concepts, podcast, unicorn, guestpost, or null
5content_type: "string"    # One of: article, digest, explainer, interview, profile, curated, announcement, sponsored, forwarded, survey
6---

Series-Specific Fields #

1---
2episode: integer          # Episode number when part of numbered series (e.g., 64 for FOD#64)
3---

Categorization Fields #

1---
2primary_topic: "string"   # Main subject area (see Topic Categories below)
3tags: ["string", ...]     # Specific technologies, techniques, or concepts
4---

People and Organizations #

 1---
 2people_mentioned:         # Notable individuals referenced in the content
 3  - name: "string"        # Full name
 4    role: "string"        # Their role/title (optional)
 5    affiliation: "string" # Company/institution (optional)
 6
 7companies_mentioned:      # Organizations referenced
 8  - "string"              # Company name
 9
10models_mentioned:         # AI models referenced
11  - "string"              # Model name (e.g., "GPT-4", "Llama 3", "DeepSeek-V3")
12---

Content Characteristics #

1---
2audience: "string"        # One of: technical, general, business, mixed
3depth: "string"           # One of: overview, intermediate, deep-dive
4has_research_papers: bool # Contains academic paper references
5has_code_examples: bool   # Contains code snippets
6is_premium: bool          # Premium subscriber content
7---

Source Metadata #

1---
2source: "Turing Post"     # Always "Turing Post" for this newsletter
3author: "string"          # Primary author (default: "Ksenia Se")
4original_subject: "string" # Raw email subject line
5original_file: "string"   # Original .eml filename
6url: "string"             # Canonical URL (from footer: turingpost.com/p/...)
7---

Field Value Reference #

primary_topic Values #

Value Description Common Keywords
llm Language models, architectures, training transformer, GPT, attention, tokenization
agents AI agents, agentic workflows, multi-agent agent, workflow, tool use, MCP, A2A, planning
infrastructure ML ops, training infrastructure, optimization FSDP, inference, GPU, scaling, deployment
research Academic papers, benchmarks, novel methods paper, benchmark, evaluation, SOTA
industry Company news, funding, product launches unicorn, funding, launch, valuation
business Enterprise AI, productivity, use cases enterprise, ROI, productivity, adoption
ethics AI safety, policy, regulation, alignment safety, alignment, regulation, bias, risk
hardware Chips, robotics, physical AI GPU, TPU, robot, embodied, chip
multimodal Vision, audio, video AI image, video, audio, vision, multimodal

content_type Values #

Value Description
digest Weekly roundup with multiple sections (typical for FOD)
explainer Technical deep-dive on a specific concept (typical for Topic)
article General article or essay
interview Q&A or conversation format
profile Company or person profile
curated Lists, recommendations, roundups
announcement Newsletter meta-content
sponsored Partner/sponsored content
forwarded Forwarded email (not original content)
survey Reader survey or feedback request

series Values #

Value Full Name
fod Findings of the Day
topic Topic Deep Dives
agentic Agentic Series
insights SF Insights
concepts Concept Cards
podcast TP/Inference Podcast
unicorn Unicorn Series
guestpost Guest Contributions

Example Frontmatter #

FOD Digest Example #

 1---
 2title: "Golden Age for Indie Devs and Engineers"
 3date: 2024-08-26
 4series: fod
 5episode: 64
 6content_type: digest
 7primary_topic: industry
 8tags: ["cursor", "lerobot", "indie-hacking", "robotics", "fine-tuning"]
 9people_mentioned:
10  - name: "Andrew Ng"
11    affiliation: "Landing AI"
12  - name: "Andrej Karpathy"
13    affiliation: "OpenAI (former)"
14companies_mentioned: ["Cursor", "Hugging Face", "OpenAI", "Anthropic", "Mistral", "Aleph Alpha"]
15models_mentioned: ["GPT-4o", "Pharia-1-LLM", "Jamba-1.5"]
16audience: mixed
17depth: overview
18has_research_papers: true
19has_code_examples: false
20is_premium: false
21source: "Turing Post"
22author: "Ksenia Se"
23original_subject: "FOD#64: Golden Age for Indie Devs and Engineers"
24original_file: "FOD#64_ Golden Age for Indie Devs and Engineers.eml"
25url: "https://www.turingpost.com/p/fod64"
26---

Topic Explainer Example #

 1---
 2title: "What is FSDP and YaFSDP?"
 3date: 2024-06-20
 4series: topic
 5episode: 4
 6content_type: explainer
 7primary_topic: infrastructure
 8tags: ["fsdp", "yafsdp", "distributed-training", "gpu-optimization", "pytorch"]
 9people_mentioned: []
10companies_mentioned: ["Meta", "Yandex"]
11models_mentioned: ["Llama 2", "Llama 3"]
12audience: technical
13depth: deep-dive
14has_research_papers: true
15has_code_examples: false
16is_premium: true
17source: "Turing Post"
18author: "Ksenia Se"
19original_subject: "Topic 4: What is FSDP and YaFSDP?"
20original_file: "Topic 4_ What is FSDP and YaFSDP_.eml"
21url: "https://www.turingpost.com/p/yafsdp"
22---

Unicorn Profile Example #

 1---
 2title: "Glean: How to Outpace Competitors in Enterprise AI"
 3date: 2024-10-12
 4series: unicorn
 5episode: null
 6content_type: profile
 7primary_topic: business
 8tags: ["enterprise-search", "rag", "knowledge-management", "startup"]
 9people_mentioned:
10  - name: "Arvind Jain"
11    role: "CEO & Co-founder"
12    affiliation: "Glean"
13  - name: "Tony Gentilcore"
14    role: "Co-founder"
15    affiliation: "Glean"
16  - name: "Bipul Sinha"
17    affiliation: "Rubrik"
18companies_mentioned: ["Glean", "OpenAI", "Google", "Rubrik", "Salesforce"]
19models_mentioned: ["BERT"]
20audience: business
21depth: deep-dive
22has_research_papers: false
23has_code_examples: false
24is_premium: true
25source: "Turing Post"
26author: "Ksenia Se"
27original_subject: "Glean: How to Outpace Competitors in Enterprise AI (and Frustrate OpenAI in the Process)"
28original_file: "Glean_ How to Outpace Competitors in Enterprise AI (and Frustrate OpenAI in the Process).eml"
29url: "https://www.turingpost.com/p/glean"
30---

Podcast Interview Example #

 1---
 2title: "Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI"
 3date: 2025-03-23
 4series: podcast
 5episode: 1
 6content_type: interview
 7primary_topic: llm
 8tags: ["hallucinations", "fine-tuning", "agents", "lamini", "enterprise-ai"]
 9people_mentioned:
10  - name: "Sharon Zhou"
11    role: "CEO & Co-founder"
12    affiliation: "Lamini"
13  - name: "Andrew Ng"
14    affiliation: "Stanford / DeepLearning.AI"
15  - name: "Andrej Karpathy"
16    affiliation: "OpenAI (former)"
17companies_mentioned: ["Lamini", "OpenAI", "Anthropic", "Colgate", "Meta"]
18models_mentioned: ["Llama", "DeepSeek", "GPT-4"]
19audience: mixed
20depth: intermediate
21has_research_papers: true
22has_code_examples: false
23is_premium: false
24source: "Turing Post"
25author: "Ksenia Se"
26original_subject: "πŸŽ™οΈπŸ§© TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI"
27original_file: "πŸŽ™οΈπŸ§© TP_Inference_ Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI.eml"
28url: "https://www.turingpost.com/p/sharonzhou"
29---

Content to Exclude or Flag #

Should Be Excluded #

Pattern Reason Example
Fwd: prefix Not original newsletter content "Fwd: Invite: Adobe MAX in Miami"
Pure survey emails Administrative, not content "Calling AI/ML/Data Engineers – We Need Your Input"

Should Be Flagged for Review #

Pattern Reason Example
Duplicate topics Same content, different versions "What is Mixture-of-Depths" vs "full version"
Holiday messages Minimal content "Happy πŸ©ΆπŸ€πŸ’œ"
Single-link promotions Primarily promotional eBook announcements

Notable People (Frequently Mentioned) #

For reference, these individuals appear across multiple articles:

Name Typical Affiliation Context
Andrew Ng Stanford / DeepLearning.AI AI education, industry commentary
Sam Altman OpenAI Company announcements, strategy
Andrej Karpathy Tesla / OpenAI (former) Technical insights, education
Fei-Fei Li Stanford / World Labs AI research, policy
Yoshua Bengio Mila AI safety, research
Yann LeCun Meta AI research, debates
Demis Hassabis DeepMind Research announcements
Dario Amodei Anthropic AI safety, company strategy
Ilya Sutskever SSI / OpenAI (former) Research, departures
Harrison Chase LangChain Agents, frameworks
Nathan Lambert HuggingFace / AI2 RLHF, research commentary

Implementation Notes #

  1. Series Detection Priority: Check for emoji prefixes first (🦸🏻, 🌁, πŸŽ™οΈ), then text patterns (FOD#, Topic, Concepts:, Guest Post:)

  2. Episode Number Extraction: Use regex to extract numbers after series identifiers

  3. URL Extraction: Parse the footer text for turingpost.com/p/ pattern

  4. People Extraction: Look for patterns like "Name, Role at Company" or "Name from Company" or quoted attributions

  5. Model Detection: Maintain a list of known model names (GPT-*, Claude, Llama, Mistral, DeepSeek, etc.)

  6. Company Detection: Look for known AI companies plus any mentioned in funding/acquisition context

last updated: