CLAUDE.md
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
What this repo is
Dr. George Papagiannakis’ personal academic website, deployed to GitHub Pages at https://papagiannakis.github.io. It is a fork of academicpages, which is itself a fork of the Minimal Mistakes Jekyll theme. Most of the codebase is upstream theme machinery — day-to-day work is content edits, not code changes.
Running locally
bundle install # delete Gemfile.lock first if you hit version errors
bundle exec jekyll liveserve # serves at http://localhost:4000 with live reload
_config.dev.yml overrides the production _config.yml for local dev (sets url: http://localhost:4000, disables analytics). Jekyll picks it up automatically with liveserve via the hawkins plugin declared in Gemfile.
Note: _config.yml is not reloaded automatically — restart the server after changing it.
There is no test suite and no linter. The package.json scripts (uglify, watch:js, build:js) are upstream theme tooling for rebuilding assets/js/main.min.js from the un-minified sources in assets/js/; they are rarely needed.
Content model
Content is organized as Jekyll collections, declared in _config.yml under collections:. Each collection is a folder of markdown files with YAML frontmatter:
- _publications/ — papers, with fields like
title,collection,permalink,excerpt,date,venue,paperurl,citation - _talks/ — talks (uses the dedicated
talklayout) - _teaching/ — courses
- _portfolio/ — projects (currently hidden from nav)
- _posts/ — blog posts
- _pages/ — top-level pages (about, CV, contact, publications index, etc.)
Each collection has defaults in _config.yml that set its layout and which features (author profile, comments, share) are enabled — when adding a new collection or page type, set defaults there rather than per-file.
Main nav links live in _data/navigation.yml. Author/bio/social data is in _config.yml under author: and rendered by _includes/author-profile.html.
Bulk content generation
markdown_generator/ holds Jupyter notebooks and equivalent .py scripts that convert TSV/CSV/BibTeX into the per-item markdown files for _publications/ and _talks/:
publications.ipynb/publications.py— frompublications.tsvPubsFromBib.ipynb/pubsFromBib.py— from BibTeXtalks.ipynb/talks.py— fromtalks.tsv
These scripts write into the collection folders. Prefer editing the TSV/BibTeX and regenerating over hand-editing many .md files; for one-off changes, edit the markdown directly.
talkmap.py / talkmap.ipynb scrape the location: field from _talks/*.md, geocode via Nominatim, and emit a Leaflet cluster map into talkmap/. Run from inside _talks/. The /talkmap/ page is linked from talks.html when talkmap_link: true in _config.yml.
Theme structure (when you need to touch presentation)
- _layouts/ — page templates (
single,talk,archive,splash, …) - _includes/ — partials referenced from layouts (header, footer, author profile, archive item renderers per collection)
- _sass/ — SCSS partials; entry point is
assets/css/main.scss, compiled by Jekyll - assets/ — compiled CSS, JS bundles, fonts, images used by the theme itself
- images/ and files/ — site content assets (publication figures, PDFs). Files in
files/are served at/files/...because_config.ymlinclude:lists it explicitly.
Upstream patching
Per CONTRIBUTING.md and README.md, the academicpages upstream tracks theme changes via GitHub issues labeled code change. If pulling in upstream fixes, browse those issues rather than merging the whole upstream branch — this fork has diverged in content and config and a straight merge will conflict heavily.
Workflow: ingest publication
When the user says “ingest publication” (or equivalent: “add this paper”, “publish this”), run this exact flow. Ask for only two things, nothing else:
- The DOI (e.g.,
10.1007/s00264-023-06038-8, or a full URL likehttps://doi.org/10.1007/s00264-023-06038-8— strip the URL prefix). For arXiv-only papers, the arXiv DOI form is10.48550/arXiv.<id>. - The PDF filename inside
files/(e.g.,kenanidis-et-al-00264-023-06038-8.pdf). The user is responsible for putting the PDF there; verify it exists withls files/<name>and warn if missing, but proceed anyway — they may push the PDF in the same commit.
Fetch the BibTeX from the DOI
Run CrossRef content negotiation:
curl -sLH "Accept: application/x-bibtex" "https://doi.org/<DOI>"
This returns a @article{...} / @inproceedings{...} / @incollection{...} etc. entry with title, author, journal/booktitle, volume, number, pages, year, month, publisher, DOI, sometimes abstract. Use that BibTeX as the source for every derivation below.
Fallbacks:
- If
curlreturns empty or HTML (not BibTeX), the DOI is wrong or the registrar doesn’t support content negotiation. Stop and ask the user to double-check the DOI. - For arXiv DOIs (
10.48550/arXiv.<id>), CrossRef may not have the entry. Fall back to the arXiv API:curl -s "http://export.arxiv.org/api/query?id_list=<id>"and parse the Atom XML for<title>,<author><name>,<summary>(abstract),<published>(date). Setvenueto"arXiv preprint". - If the fetched BibTeX has no
abstractfield, that’s fine — leaveexcerptblank. Don’t ask the user for an abstract.
Derive everything else from the BibTeX
Map BibTeX fields to the CSV schema (pub_date,title,venue,excerpt,citation,url_slug,paper_url):
pub_date(YYYY-MM-DD):year+monthpresent →YYYY-MM-01(use month number; convertMay→05).- Only
year→YYYY-01-01. @miscwitheprint = {2408.02275}(arXiv) → use the arXiv submission date if derivable from the ID’sYYMMprefix →20YY-MM-01.- Call out the chosen date explicitly in the diff so the user can correct it before committing.
title→ BibTeXtitlefield, with braces stripped.venue→journal(for@article),booktitle(for@inproceedings/@incollection),publisher(for@book), or"arXiv preprint"for arXiv@misc.excerpt→ BibTeXabstractfield if present; otherwise leave blank (the.pyhandles empty vialen > 5check, so a blank cell is fine).citation→ assemble in the existing house style, matching rows already inpublications.csv:Authors (Last, F., Last, F., …), "Title", in Venue (optional: eds. X & Y), Publisher, doi:<doi>, YearUse the exact author order from BibTeX. The
doi:<doi>segment is mandatory (DOI is a required input — see step 1). For arXiv, this isdoi:10.48550/arXiv.<id>.url_slug→ follow the establishedpaper-<type>-<N>pattern:@article→paper-journal-<N>@inproceedings/@conference→paper-conference-<N>@incollection/@inbook→paper-chapter-<N>@misc(arXiv) →paper-arxiv-<N><N>= (max existing N for that type, found byls _publications/+ grep) + 1. If unclear, pick the next integer above the max across all types.
paper_url→https://papagiannakis.github.io/files/<filename>using the filename the user provided.
Steps to execute
- Append one row to markdown_generator/publications.csv. Use proper RFC-4180 CSV quoting — fields with commas, quotes, or newlines wrapped in
"…", embedded"doubled to"". Do not touchpublications.tsv(it’s stale demo data). - Write the markdown file directly to
_publications/<pub_date>-<url_slug>.md. Match the exact frontmatter shape thatpublications.pywould produce (see _publications/2024-08-05-paper-arxiv-100.md as the canonical template — fields in this order:title,collection: publications,permalink: /publication/<pub_date>-<url_slug>,excerpt(if non-empty),date,venue,paperurl,citation). HTML-escape&→&,"→",'→'intitle,excerpt,venue,citationto match the script’s output. Body:[Download paper here](<paper_url>)then the unescaped excerpt thenRecommended citation: <citation>. - Do NOT run
publications.py— it reads.tsvnot.csvand would overwrite hand-edited.mdfiles. Writing the single new file directly produces the same end state without collateral damage. - Show the diff (
git difffor the CSV,git status+catfor the new.md) and pause for the user’s OK. - On confirmation, commit with a message matching the existing style — short, lowercase, e.g.
added <venue-shortname> paper <short-title>(seegit log --oneline _publications/for examples likeadded arxiv paper SIG-ASIA,updated MAGES publication). - Pause again and ask before
git push origin main— pushing is the public deploy step. Never push without an explicit OK in this turn. GitHub Pages rebuilds in ~30s after the push.
Don’t do
- Don’t ask the user for the abstract, slug, date, citation format, or anything else — derive or default everything from the BibTeX. If a field is genuinely underivable (e.g., truly missing year), leave it blank or pick a sensible default and flag it in the diff for the user to fix.
- Don’t run any bulk regeneration script. Single-row append, single-file write.
- Don’t push without confirmation, even if the user OK’d the commit.
- Don’t modify
publications.tsv. It is not the source of truth and is not used by the live site.
Workflow: ingest talk
When the user says “ingest talk” (or equivalent: “add this talk”, “publish this talk”), run this flow. Talks are looser than publications — there is no DOI/CrossRef equivalent, so metadata extraction is best-effort and must be confirmed by the user before files are written, not after.
Inputs — accept any of these three shapes
Try them in order; whichever the user provides is what you use.
- URL + filename (canonical, most analogous to the publication flow):
- URL to the talk’s announcement / event page (conference program, IEEE/ACM session listing, keynote speakers page, etc.).
- Filename inside
files/for the slides PDF (e.g.,GP-IEEEVRkeynote2023.pdf). If the talk is video-only, the user can give a video URL instead — settalk_urlto that URL directly and skip thefiles/check.
- Pasted text blob + filename — if there’s no clean URL, the user pastes the CFP/program text/announcement and provides the slides filename.
- Explicit fields + filename — if the user just lists
title:,venue:,date:,location:,type:, optionaldescription:, take them verbatim.
Resolve metadata
- For Option 1:
WebFetchthe URL and extracttitle,venue,date,location,description(abstract),type. - For Option 2: parse the blob for the same fields.
- For Option 3: trust the fields as given.
- For all: if
talk_urlis a slides PDF, verify withls files/<name>and warn (don’t block) if missing.
Map to the talks schema
Columns in markdown_generator/talks.csv: title, type, url_slug, venue, date, location, talk_url, description.
title— from the source.type— one ofTalk,Tutorial,Keynote Talk,Invited Talk,Panel. Default toTalkif unclear. Thetypecontrols the slug prefix.url_slug—talk-<N>for nearly everything. The actual house convention (visible intalks.csv) is a single global counter, not type-prefixed: Invited Talks, Tutorials, workshops, and some Keynote Talks all usetalk-<N>. The prefixkeynote-<N>is an occasional exception applied to some keynotes (e.g.,keynote-17,keynote-25) but is not consistent — other keynotes usetalk-<N>(e.g.,talk-24is a Keynote Talk). Default totalk-<N>unless there’s a clear reason to mirror thekeynote-style.<N>is a single global counter across both prefixes: extract trailing integers from everyurl_sluginmarkdown_generator/talks.csvand every filename in_talks/*.md, take the max, add 1. The CSV is the more reliable source because_talks/is sparse.venue— conference/event name (e.g.,IEEE Virtual Reality 2023).date—YYYY-MM-DD. If only year+month known, default day to01.location—"City, Country"or"online"(matches existing rows).talk_url—https://papagiannakis.github.io/files/<filename>for slides, or the raw video/event URL.description— abstract/summary. May be empty.
Steps to execute
- Resolve and show derived values FIRST — before writing anything, print the extracted/derived values and ask the user to confirm or correct. This is the key difference from publications: CrossRef gives clean data, but talk extraction is fuzzy. Echo back something like:
title: … type: Keynote Talk url_slug: keynote-18 venue: … date: 2026-05-03 location: … talk_url: … description: <first 200 chars>…Wait for an OK or corrections.
- Append one row to
markdown_generator/talks.csvwith proper RFC-4180 CSV quoting (commas/quotes/newlines wrapped in"…", embedded"doubled to""). Do not touchtalks.tsv(stale demo data). - Write the markdown file to
_talks/<date>-<url_slug>.md. Match the frontmatter shape that markdown_generator/talks.py produces — fields in this order:title(double-quoted),collection: talks,type(double-quoted),permalink: /talks/<date>-<url_slug>,venue(double-quoted) if present,date,location(double-quoted) if present. Body: optional[More information here](<talk_url>)line, then the description. Notetalks.pydoes not HTML-escape the waypublications.pydoes, so keep raw"/'in the description — but be careful that the title and venue, which the script wraps in double quotes, do not themselves contain unescaped". If they do, replace with"or use a different quoting style. - Do NOT run
talks.py— same reasoning as publications: it reads.tsvnot.csv, and would clobber any hand-edited talk pages. - Show the diff (
git difffor the CSV,git statusfor the new.mdand any slides PDF) and pause for the user’s OK to commit. - On confirmation, commit with a message matching the publication style — short, lowercase, e.g.
added <venue-shortname> <type> on <short-topic>(e.g.,added IEEE VR 2023 keynote on geometric algebra). - Regenerate the talk map (see next subsection). This produces a separate commit when the map output changes.
- Pause and ask before
git push origin main— push both commits together. Never push without explicit OK. GitHub Pages rebuilds in ~30s.
Regenerate the talk map
The home page and /talks/ link to a Leaflet cluster map at talkmap/map.html, generated from the location: field of every _talks/*.md. Each new talk’s location needs to be added to talkmap/org-locations.js so its pin appears.
Canonical source is _talks/talkmap.ipynb. Do NOT use the root-level talkmap.ipynb or talkmap.py — both are stale upstream copies (the .py calls Nominatim() with no user_agent which breaks on geopy >= 2.0; the root .ipynb predates the rewrite).
The current notebook is cache-then-incremental: it parses the existing org-locations.js, scans _talks/*.md for location: fields, and geocodes only locations not already cached. It uses Komoot’s Photon geocoder (free, no key, lenient rate limits) — Nominatim was abandoned because it bans us within seconds when we re-geocode every location on every run (which the original notebook did).
To run it, either open it in Jupyter or execute non-interactively from the repo root:
/Users/papagian/opt/anaconda3/bin/jupyter nbconvert --to notebook --execute --inplace _talks/talkmap.ipynb
The notebook is idempotent — running it with no new talks reports Missing: [] and rewrites org-locations.js with the same content (no diff). After running, check git diff talkmap/:
- If the diff is just a few
+lines for the new talk’s location → expected, commit asupdated talkmap. - If the diff shows many
-lines → something went wrong, revert withgit checkout -- talkmap/and investigate. - If
git status talkmap/shows nothing → the new talk’s location was already in the cache (e.g., another talk had the same city). Skip the talkmap commit.
Caveats:
- Dependencies:
getorg,geopy(Photon is in geopy), andjupyter. Anaconda has all of these by default. If you getModuleNotFoundErrorfrom a different Python, runpython3 -m pip install --user getorg geopy jupyter. - Wrong-interpreter trap on macOS:
pipandpython3often point at different installs (anaconda vs system). Use the full path/Users/papagian/opt/anaconda3/bin/jupyterand/Users/papagian/opt/anaconda3/bin/pythonto be safe, or usepython3 -m pip ...for installs. - Photon quirks: the geocoded
addressfield may name a nearby business rather than the city center (e.g., “Coventry, UK” resolved to “FEV UK Ltd, Cheetah Road, Coventry”). The lat/lon coordinates are still correct for the city, which is all the cluster map needs. Don’t worry about the verbose address string. - If Photon also fails (rare — happens during outages): manually geocode via the Photon web UI at https://photon.komoot.io, copy the lat/lon, and append a new entry to
org-locations.jsby hand. map.htmlis static and never regenerated by the new notebook. Onlyorg-locations.jschanges per talk.
After the script runs, check the diff:
git status talkmap/ && git diff --stat talkmap/
If map.html or org-locations.js changed, commit as a separate commit matching the existing house style — git log --oneline talkmap* shows the pattern: updated talkmap, updated talk map, fix talkmap. Use updated talkmap.
If nothing changed (e.g., the new talk’s location was already represented), skip the talkmap commit silently.
Don’t do
- Don’t write files before the user confirms the derived metadata. Talk extraction is fuzzy — confirm first.
- Don’t run
talks.pyor any bulk regeneration script. - Don’t run
talkmap.pyor the roottalkmap.ipynb— both are stale. Use_talks/talkmap.ipynb. - Don’t switch back to Nominatim “to match the original data”. The cache preserves Nominatim’s original results; only new locations use Photon. Mixing is fine — both report WGS84 lat/lon to enough precision for a cluster pin.
- Don’t modify
talks.tsv. It is not the source of truth. - Don’t push without confirmation, even if the user OK’d the commit.
- Don’t invent a
talk_url— if the user gave neither a slides filename nor a video/event URL, leave the column blank and skip the “More information here” line in the markdown body. - Don’t skip talkmap regeneration silently when the script fails — surface the error so the user knows the map is stale.
