Content Taxonomy and Schema for Turing Post Newsletter

Date: 2025-12-01 Purpose: Define a consistent frontmatter schema for all newsletter content Source: Analysis of ~130 emails in content/uncategorized/

Executive Summary #

The Turing Post newsletter contains several distinct content series, each with identifiable patterns in titles. This document defines a frontmatter schema that can be applied consistently across all emails to enable filtering, categorization, and content discovery.

Content Series Identification #

Primary Series (Numbered Episodes) #

Series	Title Pattern	Frontmatter Value	Example
Findings of the Day	`FOD#N:`	`series: fod`	"FOD#64: Golden Age for Indie Devs"
Topic Deep Dives	`Topic N:`	`series: topic`	"Topic 4: What is FSDP and YaFSDP?"
Agentic Series	`🦸🏻#N:` or `🦸#N:`	`series: agentic`	"🦸🏻#5: Building Blocks of Agentic Systems"
SF Insights	`🌁#N:`	`series: insights`	"🌁#81: Key AI Concepts to Follow in 2025"
Concepts	`Concepts:`	`series: concepts`	"Concepts: RLHF, RLAIF, RLEF, RLCF"
Podcast/Interviews	`🎙️`	`series: podcast`	"🎙️🧩 TP/Inference: Sharon Zhou..."

Non-Numbered Series #

Series	Identification	Frontmatter Value
Unicorn Profiles	Company deep-dive structure, "Unicorn Journey" in title	`series: unicorn`
Guest Posts	`Guest Post:` or `Guest post:` prefix	`series: guestpost`

Non-Series Content Types #

Type	Identification	Frontmatter Value
Curated Lists	Book lists, resource roundups, recaps	`content_type: curated`
Announcements	Welcome messages, holiday greetings	`content_type: announcement`
Sponsored Content	Webinars, ebook promotions, partner content	`content_type: sponsored`
Forwarded Emails	`Fwd:` prefix	`content_type: forwarded`
Surveys	Requests for reader input	`content_type: survey`

Frontmatter Schema #

Required Fields #

1---
2title: "string"           # Cleaned title (remove redundant prefixes if needed)
3date: YYYY-MM-DD          # Publication date
4series: "string|null"     # One of: fod, topic, agentic, insights, concepts, podcast, unicorn, guestpost, or null
5content_type: "string"    # One of: article, digest, explainer, interview, profile, curated, announcement, sponsored, forwarded, survey
6---

Series-Specific Fields #

1---
2episode: integer          # Episode number when part of numbered series (e.g., 64 for FOD#64)
3---

Categorization Fields #

1---
2primary_topic: "string"   # Main subject area (see Topic Categories below)
3tags: ["string", ...]     # Specific technologies, techniques, or concepts
4---

People and Organizations #

 1---
 2people_mentioned:         # Notable individuals referenced in the content
 3  - name: "string"        # Full name
 4    role: "string"        # Their role/title (optional)
 5    affiliation: "string" # Company/institution (optional)
 6
 7companies_mentioned:      # Organizations referenced
 8  - "string"              # Company name
 9
10models_mentioned:         # AI models referenced
11  - "string"              # Model name (e.g., "GPT-4", "Llama 3", "DeepSeek-V3")
12---

Content Characteristics #

1---
2audience: "string"        # One of: technical, general, business, mixed
3depth: "string"           # One of: overview, intermediate, deep-dive
4has_research_papers: bool # Contains academic paper references
5has_code_examples: bool   # Contains code snippets
6is_premium: bool          # Premium subscriber content
7---

Source Metadata #

1---
2source: "Turing Post"     # Always "Turing Post" for this newsletter
3author: "string"          # Primary author (default: "Ksenia Se")
4original_subject: "string" # Raw email subject line
5original_file: "string"   # Original .eml filename
6url: "string"             # Canonical URL (from footer: turingpost.com/p/...)
7---

Field Value Reference #

`primary_topic` Values #

Value	Description	Common Keywords
`llm`	Language models, architectures, training	transformer, GPT, attention, tokenization
`agents`	AI agents, agentic workflows, multi-agent	agent, workflow, tool use, MCP, A2A, planning
`infrastructure`	ML ops, training infrastructure, optimization	FSDP, inference, GPU, scaling, deployment
`research`	Academic papers, benchmarks, novel methods	paper, benchmark, evaluation, SOTA
`industry`	Company news, funding, product launches	unicorn, funding, launch, valuation
`business`	Enterprise AI, productivity, use cases	enterprise, ROI, productivity, adoption
`ethics`	AI safety, policy, regulation, alignment	safety, alignment, regulation, bias, risk
`hardware`	Chips, robotics, physical AI	GPU, TPU, robot, embodied, chip
`multimodal`	Vision, audio, video AI	image, video, audio, vision, multimodal

`content_type` Values #

Value	Description
`digest`	Weekly roundup with multiple sections (typical for FOD)
`explainer`	Technical deep-dive on a specific concept (typical for Topic)
`article`	General article or essay
`interview`	Q&A or conversation format
`profile`	Company or person profile
`curated`	Lists, recommendations, roundups
`announcement`	Newsletter meta-content
`sponsored`	Partner/sponsored content
`forwarded`	Forwarded email (not original content)
`survey`	Reader survey or feedback request

`series` Values #

Value	Full Name
`fod`	Findings of the Day
`topic`	Topic Deep Dives
`agentic`	Agentic Series
`insights`	SF Insights
`concepts`	Concept Cards
`podcast`	TP/Inference Podcast
`unicorn`	Unicorn Series
`guestpost`	Guest Contributions

Example Frontmatter #

FOD Digest Example #

 1---
 2title: "Golden Age for Indie Devs and Engineers"
 3date: 2024-08-26
 4series: fod
 5episode: 64
 6content_type: digest
 7primary_topic: industry
 8tags: ["cursor", "lerobot", "indie-hacking", "robotics", "fine-tuning"]
 9people_mentioned:
10  - name: "Andrew Ng"
11    affiliation: "Landing AI"
12  - name: "Andrej Karpathy"
13    affiliation: "OpenAI (former)"
14companies_mentioned: ["Cursor", "Hugging Face", "OpenAI", "Anthropic", "Mistral", "Aleph Alpha"]
15models_mentioned: ["GPT-4o", "Pharia-1-LLM", "Jamba-1.5"]
16audience: mixed
17depth: overview
18has_research_papers: true
19has_code_examples: false
20is_premium: false
21source: "Turing Post"
22author: "Ksenia Se"
23original_subject: "FOD#64: Golden Age for Indie Devs and Engineers"
24original_file: "FOD#64_ Golden Age for Indie Devs and Engineers.eml"
25url: "https://www.turingpost.com/p/fod64"
26---

Topic Explainer Example #

 1---
 2title: "What is FSDP and YaFSDP?"
 3date: 2024-06-20
 4series: topic
 5episode: 4
 6content_type: explainer
 7primary_topic: infrastructure
 8tags: ["fsdp", "yafsdp", "distributed-training", "gpu-optimization", "pytorch"]
 9people_mentioned: []
10companies_mentioned: ["Meta", "Yandex"]
11models_mentioned: ["Llama 2", "Llama 3"]
12audience: technical
13depth: deep-dive
14has_research_papers: true
15has_code_examples: false
16is_premium: true
17source: "Turing Post"
18author: "Ksenia Se"
19original_subject: "Topic 4: What is FSDP and YaFSDP?"
20original_file: "Topic 4_ What is FSDP and YaFSDP_.eml"
21url: "https://www.turingpost.com/p/yafsdp"
22---

Unicorn Profile Example #

 1---
 2title: "Glean: How to Outpace Competitors in Enterprise AI"
 3date: 2024-10-12
 4series: unicorn
 5episode: null
 6content_type: profile
 7primary_topic: business
 8tags: ["enterprise-search", "rag", "knowledge-management", "startup"]
 9people_mentioned:
10  - name: "Arvind Jain"
11    role: "CEO & Co-founder"
12    affiliation: "Glean"
13  - name: "Tony Gentilcore"
14    role: "Co-founder"
15    affiliation: "Glean"
16  - name: "Bipul Sinha"
17    affiliation: "Rubrik"
18companies_mentioned: ["Glean", "OpenAI", "Google", "Rubrik", "Salesforce"]
19models_mentioned: ["BERT"]
20audience: business
21depth: deep-dive
22has_research_papers: false
23has_code_examples: false
24is_premium: true
25source: "Turing Post"
26author: "Ksenia Se"
27original_subject: "Glean: How to Outpace Competitors in Enterprise AI (and Frustrate OpenAI in the Process)"
28original_file: "Glean_ How to Outpace Competitors in Enterprise AI (and Frustrate OpenAI in the Process).eml"
29url: "https://www.turingpost.com/p/glean"
30---

Podcast Interview Example #

 1---
 2title: "Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI"
 3date: 2025-03-23
 4series: podcast
 5episode: 1
 6content_type: interview
 7primary_topic: llm
 8tags: ["hallucinations", "fine-tuning", "agents", "lamini", "enterprise-ai"]
 9people_mentioned:
10  - name: "Sharon Zhou"
11    role: "CEO & Co-founder"
12    affiliation: "Lamini"
13  - name: "Andrew Ng"
14    affiliation: "Stanford / DeepLearning.AI"
15  - name: "Andrej Karpathy"
16    affiliation: "OpenAI (former)"
17companies_mentioned: ["Lamini", "OpenAI", "Anthropic", "Colgate", "Meta"]
18models_mentioned: ["Llama", "DeepSeek", "GPT-4"]
19audience: mixed
20depth: intermediate
21has_research_papers: true
22has_code_examples: false
23is_premium: false
24source: "Turing Post"
25author: "Ksenia Se"
26original_subject: "🎙️🧩 TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI"
27original_file: "🎙️🧩 TP_Inference_ Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI.eml"
28url: "https://www.turingpost.com/p/sharonzhou"
29---

Content to Exclude or Flag #

Should Be Excluded #

Pattern	Reason	Example
`Fwd:` prefix	Not original newsletter content	"Fwd: Invite: Adobe MAX in Miami"
Pure survey emails	Administrative, not content	"Calling AI/ML/Data Engineers – We Need Your Input"

Should Be Flagged for Review #

Pattern	Reason	Example
Duplicate topics	Same content, different versions	"What is Mixture-of-Depths" vs "full version"
Holiday messages	Minimal content	"Happy 🩶🤍💜"
Single-link promotions	Primarily promotional	eBook announcements

Notable People (Frequently Mentioned) #

For reference, these individuals appear across multiple articles:

Name	Typical Affiliation	Context
Andrew Ng	Stanford / DeepLearning.AI	AI education, industry commentary
Sam Altman	OpenAI	Company announcements, strategy
Andrej Karpathy	Tesla / OpenAI (former)	Technical insights, education
Fei-Fei Li	Stanford / World Labs	AI research, policy
Yoshua Bengio	Mila	AI safety, research
Yann LeCun	Meta	AI research, debates
Demis Hassabis	DeepMind	Research announcements
Dario Amodei	Anthropic	AI safety, company strategy
Ilya Sutskever	SSI / OpenAI (former)	Research, departures
Harrison Chase	LangChain	Agents, frameworks
Nathan Lambert	HuggingFace / AI2	RLHF, research commentary

Implementation Notes #

Series Detection Priority: Check for emoji prefixes first (🦸🏻, 🌁, 🎙️), then text patterns (FOD#, Topic, Concepts:, Guest Post:)
Episode Number Extraction: Use regex to extract numbers after series identifiers
URL Extraction: Parse the footer text for turingpost.com/p/ pattern
People Extraction: Look for patterns like "Name, Role at Company" or "Name from Company" or quoted attributions
Model Detection: Maintain a list of known model names (GPT-*, Claude, Llama, Mistral, DeepSeek, etc.)
Company Detection: Look for known AI companies plus any mentioned in funding/acquisition context

last updated: 2025-12-01