A two-worker Cloudflare system that scrapes, parses, stores and serves the AES Schwalbach substitution schedule. A cron worker snapshots PDFs into R2 every hour; a second worker parses the raw HTML live, merges dual sources, exposes a clean JSON API, and drives a React + Tailwind frontend at vp.l3on.org.
@cloudflare/puppeteer) and
navigates to both subst_001.htm and subst_002.htm on the Untis portal.
page.pdf() and uploaded to R2 under the key
DD.MM.YYYY/subst_00N.pdf with rich custom metadata (date, source URL, polledAt).
GET /vplan/parse/:date endpoint extracts plain text from stored PDFs
via unpdf (WASM, no Node) and runs a custom token-stream parser
that handles the flat Untis table layout — splitting rows on repeating day+date anchors,
expanding multi-class cells, and detecting Entfall via column 12 "x" markers.
subst_001.htm and subst_002.htm
live on each request (with a 60-second Cloudflare cache). It decodes
the response with the correct charset from the Content-Type header (often ISO-8859-1 for German Umlauts).
.list* CSS classes, reconstructing 12-column substitution records.
Entfall detection is tri-modal: explicit "x" in column 12, info field containing
"entfall", or both teacher and room being empty dashes.
Promise.allSettled. It compares their "Stand:" timestamps to decide
which is primary, then merges them into a deduplicated map keyed by
datum|klasse|stunde|fach. Multi-class rows like "07A, 07B, 07C" are
expanded into individual records.
env.ASSETS. It auto-refreshes every 60 s, splits compound lesson
slots like "1-2" into individual hour cards, and persists the class filter and
view mode to localStorage.
messages
and meta. Messages older than 1 hour are pruned on each write to keep
storage lean.
state.acceptWebSocket) so it only consumes compute when messages
arrive. Incoming JSON is validated, stored, and broadcast to all other connected
sockets in the same room.
@ai or @smartypants triggers
Groq (llama-3.3-70b-versatile) via a 15-second timeout fetch.
The last 20 messages are sent as conversation context. A 3-second per-room cooldown
(stored in the meta table) prevents API flooding.
| Method | Path | What it does | Auth |
|---|---|---|---|
| GET | /api/vertretungsplan/today | Live parse of subst_001.htm → JSON entries | none |
| GET | /api/vertretungsplan/tomorrow | Live parse of subst_002.htm → JSON entries | none |
| GET | /api/admin/vplan-dual | Parallel fetch + timestamp-aware merge of both sources | X-Admin-Code |
| GET | /api/news | Scrape + decode AES news feed, cache 5 min | none |
| GET | /api/calendar | Proxy school iCal feed, cache 10 min | none |
| GET | /api/chat/:code/history | Last hour of chat messages for a room | none |
| GET | /api/chat/:code/ws | WebSocket upgrade → Durable Object room | none |
| GET | /api/hub/files | List all files in the R2 file hub | none |
| POST | /api/hub/upload | Multipart file upload to R2 (PNG/JPEG/PDF ≤10 MB) | none |
| POST | /api/mensa/sync | FlareSolverr bypass → scrape & store Mensa PDFs | env secret |
| GET | /vplan/parse/:date | Extract & parse stored R2 PDFs for a date (vplan-worker) | X-API-Key |
charset= parameter from
the Content-Type header before decoding, so German Umlauts (ä, ö, ü, ß) always
survive the round-trip.
state.storage.sql.exec().
This keeps all room state co-located with the WebSocket connections and eliminates
cross-datacenter round-trips.
unpdf (compiled to WASM) to extract text from stored R2 PDFs without
any Node.js runtime. The token stream is then walked with a state machine that
anchors on repeating "Mi 6.5." / "Do 7.5." day+date patterns to split rows.