Date: 2025-12-01
Purpose: Define a consistent frontmatter schema for all newsletter content
Source: Analysis of ~130 emails in content/uncategorized/
Executive Summary #
The Turing Post newsletter contains several distinct content series, each with identifiable patterns in titles. This document defines a frontmatter schema that can be applied consistently across all emails to enable filtering, categorization, and content discovery.
Content Series Identification #
Primary Series (Numbered Episodes) #
| Series | Title Pattern | Frontmatter Value | Example |
|---|---|---|---|
| Findings of the Day | FOD#N: |
series: fod |
"FOD#64: Golden Age for Indie Devs" |
| Topic Deep Dives | Topic N: |
series: topic |
"Topic 4: What is FSDP and YaFSDP?" |
| Agentic Series | π¦Έπ»#N: or π¦Έ#N: |
series: agentic |
"π¦Έπ»#5: Building Blocks of Agentic Systems" |
| SF Insights | π#N: |
series: insights |
"π#81: Key AI Concepts to Follow in 2025" |
| Concepts | Concepts: |
series: concepts |
"Concepts: RLHF, RLAIF, RLEF, RLCF" |
| Podcast/Interviews | ποΈ |
series: podcast |
"ποΈπ§© TP/Inference: Sharon Zhou..." |
Non-Numbered Series #
| Series | Identification | Frontmatter Value |
|---|---|---|
| Unicorn Profiles | Company deep-dive structure, "Unicorn Journey" in title | series: unicorn |
| Guest Posts | Guest Post: or Guest post: prefix |
series: guestpost |
Non-Series Content Types #
| Type | Identification | Frontmatter Value |
|---|---|---|
| Curated Lists | Book lists, resource roundups, recaps | content_type: curated |
| Announcements | Welcome messages, holiday greetings | content_type: announcement |
| Sponsored Content | Webinars, ebook promotions, partner content | content_type: sponsored |
| Forwarded Emails | Fwd: prefix |
content_type: forwarded |
| Surveys | Requests for reader input | content_type: survey |
Frontmatter Schema #
Required Fields #
1---
2title: "string" # Cleaned title (remove redundant prefixes if needed)
3date: YYYY-MM-DD # Publication date
4series: "string|null" # One of: fod, topic, agentic, insights, concepts, podcast, unicorn, guestpost, or null
5content_type: "string" # One of: article, digest, explainer, interview, profile, curated, announcement, sponsored, forwarded, survey
6---
Series-Specific Fields #
1---
2episode: integer # Episode number when part of numbered series (e.g., 64 for FOD#64)
3---
Categorization Fields #
1---
2primary_topic: "string" # Main subject area (see Topic Categories below)
3tags: ["string", ...] # Specific technologies, techniques, or concepts
4---
People and Organizations #
1---
2people_mentioned: # Notable individuals referenced in the content
3 - name: "string" # Full name
4 role: "string" # Their role/title (optional)
5 affiliation: "string" # Company/institution (optional)
6
7companies_mentioned: # Organizations referenced
8 - "string" # Company name
9
10models_mentioned: # AI models referenced
11 - "string" # Model name (e.g., "GPT-4", "Llama 3", "DeepSeek-V3")
12---
Content Characteristics #
1---
2audience: "string" # One of: technical, general, business, mixed
3depth: "string" # One of: overview, intermediate, deep-dive
4has_research_papers: bool # Contains academic paper references
5has_code_examples: bool # Contains code snippets
6is_premium: bool # Premium subscriber content
7---
Source Metadata #
1---
2source: "Turing Post" # Always "Turing Post" for this newsletter
3author: "string" # Primary author (default: "Ksenia Se")
4original_subject: "string" # Raw email subject line
5original_file: "string" # Original .eml filename
6url: "string" # Canonical URL (from footer: turingpost.com/p/...)
7---
Field Value Reference #
primary_topic Values #
| Value | Description | Common Keywords |
|---|---|---|
llm |
Language models, architectures, training | transformer, GPT, attention, tokenization |
agents |
AI agents, agentic workflows, multi-agent | agent, workflow, tool use, MCP, A2A, planning |
infrastructure |
ML ops, training infrastructure, optimization | FSDP, inference, GPU, scaling, deployment |
research |
Academic papers, benchmarks, novel methods | paper, benchmark, evaluation, SOTA |
industry |
Company news, funding, product launches | unicorn, funding, launch, valuation |
business |
Enterprise AI, productivity, use cases | enterprise, ROI, productivity, adoption |
ethics |
AI safety, policy, regulation, alignment | safety, alignment, regulation, bias, risk |
hardware |
Chips, robotics, physical AI | GPU, TPU, robot, embodied, chip |
multimodal |
Vision, audio, video AI | image, video, audio, vision, multimodal |
content_type Values #
| Value | Description |
|---|---|
digest |
Weekly roundup with multiple sections (typical for FOD) |
explainer |
Technical deep-dive on a specific concept (typical for Topic) |
article |
General article or essay |
interview |
Q&A or conversation format |
profile |
Company or person profile |
curated |
Lists, recommendations, roundups |
announcement |
Newsletter meta-content |
sponsored |
Partner/sponsored content |
forwarded |
Forwarded email (not original content) |
survey |
Reader survey or feedback request |
series Values #
| Value | Full Name |
|---|---|
fod |
Findings of the Day |
topic |
Topic Deep Dives |
agentic |
Agentic Series |
insights |
SF Insights |
concepts |
Concept Cards |
podcast |
TP/Inference Podcast |
unicorn |
Unicorn Series |
guestpost |
Guest Contributions |
Example Frontmatter #
FOD Digest Example #
1---
2title: "Golden Age for Indie Devs and Engineers"
3date: 2024-08-26
4series: fod
5episode: 64
6content_type: digest
7primary_topic: industry
8tags: ["cursor", "lerobot", "indie-hacking", "robotics", "fine-tuning"]
9people_mentioned:
10 - name: "Andrew Ng"
11 affiliation: "Landing AI"
12 - name: "Andrej Karpathy"
13 affiliation: "OpenAI (former)"
14companies_mentioned: ["Cursor", "Hugging Face", "OpenAI", "Anthropic", "Mistral", "Aleph Alpha"]
15models_mentioned: ["GPT-4o", "Pharia-1-LLM", "Jamba-1.5"]
16audience: mixed
17depth: overview
18has_research_papers: true
19has_code_examples: false
20is_premium: false
21source: "Turing Post"
22author: "Ksenia Se"
23original_subject: "FOD#64: Golden Age for Indie Devs and Engineers"
24original_file: "FOD#64_ Golden Age for Indie Devs and Engineers.eml"
25url: "https://www.turingpost.com/p/fod64"
26---
Topic Explainer Example #
1---
2title: "What is FSDP and YaFSDP?"
3date: 2024-06-20
4series: topic
5episode: 4
6content_type: explainer
7primary_topic: infrastructure
8tags: ["fsdp", "yafsdp", "distributed-training", "gpu-optimization", "pytorch"]
9people_mentioned: []
10companies_mentioned: ["Meta", "Yandex"]
11models_mentioned: ["Llama 2", "Llama 3"]
12audience: technical
13depth: deep-dive
14has_research_papers: true
15has_code_examples: false
16is_premium: true
17source: "Turing Post"
18author: "Ksenia Se"
19original_subject: "Topic 4: What is FSDP and YaFSDP?"
20original_file: "Topic 4_ What is FSDP and YaFSDP_.eml"
21url: "https://www.turingpost.com/p/yafsdp"
22---
Unicorn Profile Example #
1---
2title: "Glean: How to Outpace Competitors in Enterprise AI"
3date: 2024-10-12
4series: unicorn
5episode: null
6content_type: profile
7primary_topic: business
8tags: ["enterprise-search", "rag", "knowledge-management", "startup"]
9people_mentioned:
10 - name: "Arvind Jain"
11 role: "CEO & Co-founder"
12 affiliation: "Glean"
13 - name: "Tony Gentilcore"
14 role: "Co-founder"
15 affiliation: "Glean"
16 - name: "Bipul Sinha"
17 affiliation: "Rubrik"
18companies_mentioned: ["Glean", "OpenAI", "Google", "Rubrik", "Salesforce"]
19models_mentioned: ["BERT"]
20audience: business
21depth: deep-dive
22has_research_papers: false
23has_code_examples: false
24is_premium: true
25source: "Turing Post"
26author: "Ksenia Se"
27original_subject: "Glean: How to Outpace Competitors in Enterprise AI (and Frustrate OpenAI in the Process)"
28original_file: "Glean_ How to Outpace Competitors in Enterprise AI (and Frustrate OpenAI in the Process).eml"
29url: "https://www.turingpost.com/p/glean"
30---
Podcast Interview Example #
1---
2title: "Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI"
3date: 2025-03-23
4series: podcast
5episode: 1
6content_type: interview
7primary_topic: llm
8tags: ["hallucinations", "fine-tuning", "agents", "lamini", "enterprise-ai"]
9people_mentioned:
10 - name: "Sharon Zhou"
11 role: "CEO & Co-founder"
12 affiliation: "Lamini"
13 - name: "Andrew Ng"
14 affiliation: "Stanford / DeepLearning.AI"
15 - name: "Andrej Karpathy"
16 affiliation: "OpenAI (former)"
17companies_mentioned: ["Lamini", "OpenAI", "Anthropic", "Colgate", "Meta"]
18models_mentioned: ["Llama", "DeepSeek", "GPT-4"]
19audience: mixed
20depth: intermediate
21has_research_papers: true
22has_code_examples: false
23is_premium: false
24source: "Turing Post"
25author: "Ksenia Se"
26original_subject: "ποΈπ§© TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI"
27original_file: "ποΈπ§© TP_Inference_ Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI.eml"
28url: "https://www.turingpost.com/p/sharonzhou"
29---
Content to Exclude or Flag #
Should Be Excluded #
| Pattern | Reason | Example |
|---|---|---|
Fwd: prefix |
Not original newsletter content | "Fwd: Invite: Adobe MAX in Miami" |
| Pure survey emails | Administrative, not content | "Calling AI/ML/Data Engineers β We Need Your Input" |
Should Be Flagged for Review #
| Pattern | Reason | Example |
|---|---|---|
| Duplicate topics | Same content, different versions | "What is Mixture-of-Depths" vs "full version" |
| Holiday messages | Minimal content | "Happy π©Άπ€π" |
| Single-link promotions | Primarily promotional | eBook announcements |
Notable People (Frequently Mentioned) #
For reference, these individuals appear across multiple articles:
| Name | Typical Affiliation | Context |
|---|---|---|
| Andrew Ng | Stanford / DeepLearning.AI | AI education, industry commentary |
| Sam Altman | OpenAI | Company announcements, strategy |
| Andrej Karpathy | Tesla / OpenAI (former) | Technical insights, education |
| Fei-Fei Li | Stanford / World Labs | AI research, policy |
| Yoshua Bengio | Mila | AI safety, research |
| Yann LeCun | Meta | AI research, debates |
| Demis Hassabis | DeepMind | Research announcements |
| Dario Amodei | Anthropic | AI safety, company strategy |
| Ilya Sutskever | SSI / OpenAI (former) | Research, departures |
| Harrison Chase | LangChain | Agents, frameworks |
| Nathan Lambert | HuggingFace / AI2 | RLHF, research commentary |
Implementation Notes #
-
Series Detection Priority: Check for emoji prefixes first (π¦Έπ», π, ποΈ), then text patterns (FOD#, Topic, Concepts:, Guest Post:)
-
Episode Number Extraction: Use regex to extract numbers after series identifiers
-
URL Extraction: Parse the footer text for
turingpost.com/p/pattern -
People Extraction: Look for patterns like "Name, Role at Company" or "Name from Company" or quoted attributions
-
Model Detection: Maintain a list of known model names (GPT-*, Claude, Llama, Mistral, DeepSeek, etc.)
-
Company Detection: Look for known AI companies plus any mentioned in funding/acquisition context