Arabian bedtime stories, rekindled for modern souls. In four languages. By one person.
I run Sheherazade's Lantern โ a podcast that produces original Arabian-inspired fiction in English, Arabic, Chinese, and Spanish. Each episode runs 15โ18 minutes with a full voice cast (8 distinct characters per language, each with a different accent/voice).
Ten episodes done. Four languages. One person.
Here's exactly how the pipeline works, where the complexity really lives, and the one thing that matters more than all the engineering put together.
[Character Universe] โ [LLM Story Gen] โ [Parse & Save] โ [Audio Production]
โ
[SEO + RSS + Sitemap] โ [Static Site Gen] โ [Deploy via API] โ [Cover Art]
Every component is CLI-driven, no web UI, no database, no server. The entire project lives in a ~/.hermes/projects/ directory on a Linux server.
The #1 problem with AI-generated fiction: every story starts from zero. Characters don't carry over, the world resets, nothing compounds.
The fix: a shared character universe stored as JSON files.
// universe/characters.json (simplified)
{
"yueya": {
"name": "Yueya (Moon-Child)",
"archetype": "The Star-Seeker",
"traits": ["curious", "dreamy", "innocent"],
"signature_objects": ["brass astrolabe"],
"voice": "en-GB-MaisieNeural",
"appearances": 3,
"status": "active"
},
"rashid": {
"name": "Rashid the Mapmaker",
"archetype": "The Unreliable Chronicler",
"traits": ["scatterbrained", "charming", "obsessive"],
"voice": "en-IE-ConnorNeural", // Irish accent
"appearances": 2,
"status": "active"
}
// ... 5 more characters
}
The character selection algorithm:
appearance_count (ascending โ promote underused characters)This single trick turns a collection of stories into a canon. Characters form relationships. Artifacts reappear. Listeners who skip an episode miss context. The half-life of each story goes from "read-once" to "re-listen to catch the callback."
Proven effect: After 10 episodes with the same 7 characters, new stories write themselves faster because the LLM has consistent character voices to draw on.
This is where I lost the most time to a deceptive bug.
Each language has a full voice palette โ 8 distinct voices mapped to the same 8 characters:
# English โ different accents so listeners identify characters instantly
VOICE_MAP_EN = {
"THE LANTERN KEEPER": "en-GB-SoniaNeural", # BBC English
"YUEYA": "en-GB-MaisieNeural", # Young British
"RASHID": "en-IE-ConnorNeural", # Irish
"NADIA": "en-US-JennyNeural", # American
"DUNYAZAD": "en-IN-NeerjaExpressiveNeural", # Indian
"LAYLA": "en-AU-NatashaNeural", # Australian
# ...
}
For Chinese, I use voices from mainland China, Taiwan, and Hong Kong to create the same effect โ each character sounds from a different region.
edge-tts (the Python library for Microsoft Edge's TTS) reads XML tags as speech. I tried to use SSML for emotional control:
# โ WRONG โ edge-tts escapes the XML internally
comm = edge_tts.Communicate(
'<mstts:express-as type="cheerful">Hello there!</mstts:express-as>',
voice
)
# Output: "speak version 1.0 m s s t s express as type cheerful Hello there"
The library calls xml.sax.saxutils.escape() on your text before wrapping it in a <speak> envelope. Custom SSML is architecturally impossible.
The fix: Segment-by-segment plain-text concatenation.
# โ
CORRECT โ plain text only, voice and rate as parameters
for segment in parsed_segments:
comm = edge_tts.Communicate(
segment["text"], # plain text
segment["voice"], # voice name
rate=segment["rate"] # e.g. "+5%" for excited
)
await comm.save(temp_path)
# Concatenate with pydub
Each 1,500-word story becomes 70โ120 segments. A 5-episode batch takes ~25 minutes (each segment is a network call).
Emotional range comes from rate modulation only:
| Emotion | Rate |
|---|---|
| Default | +0% |
| Excited | +5% to +10% |
| Calm | -3% to -5% |
| Sad/Grief | -8% to -10% |
No SSML. No XML. Plain text. This is the universal approach that works for all languages.
Cover art for a podcast is more important than most people think. It's the first thing people see in Apple Podcasts or Spotify.
The approach: Royalty-free photography from Unsplash + PIL overlays.
from PIL import Image, ImageDraw, ImageFont
import requests
# Download high-res photo
resp = requests.get("https://images.unsplash.com/photo-1509316785289-025f5b846b35?w=1920")
img = Image.open(BytesIO(resp.content))
# Crop to square
img = img.crop(...) # center crop
# Apply Gaussian blur for dreamy anime feel
img = img.filter(ImageFilter.GaussianBlur(radius=1.5))
# Add edge vignette โ progressive darkening
for r in range(1500, 200, -30):
alpha = ... # increasing toward edges
draw.ellipse([cx-r, cy-r, cx+r, cy+r], fill=(0,0,0,alpha))
# Typography only โ episode number, title, brand
draw.text((x, y), "#01", fill=(255,255,255,200))
draw.text((x, y+80), "The Map of the Moving Streets", ...)
Critical rule learned the hard way: No PIL-drawn circles. No glow rings. No badge backgrounds behind episode numbers. My first version had a white glow circle behind the episode number, and the user's response was... emphatically negative. The photo provides all the visual depth. Typography is all you need.
The website is 11ty (Eleventy) โ a zero-JS static site generator.
src/
โโโ _data/
โ โโโ site.json # Multilingual config (names, taglines, footer per lang)
โ โโโ episodes.json # ALL episode data in one file
โโโ _includes/
โ โโโ layout.njk # Apple aesthetic template
โ โโโ rss.njk # RSS template (generates per-language feeds)
โโโ en/ / cn/ / ar/ / es/ # One Nunjucks page per language
โโโ sitemap.njk # Auto-generates sitemap.xml
Each language gets:
/en/, /ar/)/en/podcast.xml, /ar/podcast.xml)<link rel="alternate" hreflang="..."> pointing to all language variantsAdding a new episode: Edit one JSON file โ npm run build โ deploy. No database, no CMS, no hosting bill (GitHub Pages is free).
{
"en": {
"episodes": [
{
"number": "06",
"title": "The Spice That Remembered",
"audio": "audio/episodes/ep06_anime.mp3",
"duration": "16:24",
"description": "Nadia the spice merchant has a shelf...",
"characters": ["Nadia", "Rashid"]
}
]
},
"cn": { /* same episode, Chinese */ }
}
Inside <script type="application/ld+json">, Nunjucks auto-escapes ' to ', which breaks JSON-LD parsing. Fix: a custom filter.
// .eleventy.js
eleventyConfig.addFilter("json_str", (str) => {
return JSON.stringify(str).slice(1, -1);
});
// Template: use | json_str | safe
"name": "{{ site.names[lang] | json_str | safe }}"
The server doesn't have git-remote-https. No git push. The solution: Git Data API for atomic multi-file commits.
import requests, base64
def api(method, path, data=None):
url = f"https://api.github.com/repos/{OWNER}/{REPO}/{path}"
return requests.request(method, url, headers=HEADERS, json=data).json()
# 1. Get current HEAD
ref = api("GET", "git/refs/heads/main")
base_sha = ref["object"]["sha"]
# 2. Get base tree
commit = api("GET", f"git/commits/{base_sha}")
base_tree = commit["tree"]["sha"]
# 3. Create blobs for each file
blobs = []
for git_path, local_path in files.items():
with open(local_path, "rb") as f:
content = f.read()
is_binary = git_path.endswith(".mp3")
blob_data = {"content": base64.b64encode(content).decode(),
"encoding": "base64" if is_binary else "utf-8"}
blob = api("POST", "git/blobs", blob_data)
blobs.append({"path": git_path, "mode": "100644",
"type": "blob", "sha": blob["sha"]})
# 4. Create tree โ commit โ update ref
tree = api("POST", "git/trees", {"base_tree": base_tree, "tree": blobs})
new_commit = api("POST", "git/commits", {
"message": "Add Ep06: The Spice That Remembered (all 4 languages)",
"tree": tree["sha"],
"parents": [base_sha]
})
api("PATCH", "git/refs/heads/main", {"sha": new_commit["sha"]})
One deploy script. Zero infrastructure. Atomic commits with 4-language audio, RSS, and homepage in a single push.
After building all this, I ran the pipeline through an HLT (Half-life ร Leverage ร Transferability) framework to evaluate where to invest next.
| Metric | Score | Analysis |
|---|---|---|
| H โ Half-life | 8/10 | Classic tales + shared universe โ content ages well |
| L โ Leverage | 3/10 | Bottleneck. Great content, zero audience |
| T โ Transferability | 7/10 | Platform-independent, portable across hosting |
Score: 8ร3ร7 = 168 (decent, but far below potential).
The diagnostic is clear: ่ไนฆ้ (Library Trap) โ beautiful content locked in a room with no visitors. Distribution is the only bottleneck.
After distribution (Lโ7): 9ร7ร8 = 504 ๐ โ higher than my top-ranked activity (writing tech blogs, 448).
I can generate a 4-language episode in ~20 minutes. I can build a 10-episode backlog in a weekend. The entire pipeline โ character universe, multi-voice TTS, cover art, multilingual RSS, SEO โ took about two weeks to build.
But none of that matters if nobody listens.
The real work isn't the pipeline. It's:
If you're building an AI content pipeline: spend 20% of your time on the pipeline and 80% on distribution. The production side is solved. The distribution side is not โ and that's where the real leverage lives.
The full source code and project structure are at github.com/bossman-lab/sheherazades-lantern. Questions? Comments? Reach out below.
One more thing: the podcast itself is live at bossman-lab.github.io/sheherazades-lantern โ pick a language and give it a listen.