Flor y Canto Nahuatl exists to establish Eastern Huasteca Nahuatl as a teachable spoken standard, Modern Standard Nahuatl as a respected written norm, and a living poetic high register that reconnects Nahuatl speech, literature, song, and public life.
What this is · Why it matters · What you can do with it
The largest freely available structured dataset of Classical Nahuatl in existence — paired with a formal framework for standardizing the language across speech, writing, and literature, and an open instructional track with 32 bilingual lessons mapped from beginner to advanced. Everything is open. Everything is free. Here's where to start depending on who you are.
28,709 parsed entries from Siméon's 1885 dictionary in structured JSON. 8,465 Wiktionary lexical rows across four Nahuatl varieties. Two UD treebanks. 55,904 classical examples. Provenance-tracked, license-tagged, queryable. Use it for NLP, typology, historical linguistics, or computational work.
GitHub repository →A three-layer framework that respects spoken Nahuatl as the foundation, establishes a clear written standard, and creates space for poetry, song, and literature. Governance documents define how the language is handled across registers — so the standard serves speakers, not the other way around.
Read the governance documents →Structured JSON and JSONL ready for ingestion. CSV exports for quick analysis. A provenance pipeline you can fork. Build a dictionary app, a learning tool, a search engine, a language model — the data is CC BY-SA 3.0 and public domain. No paywall. No API key. Just download it.
Browse the data →An open instructional track built from Nāhuatlahtolli: 32 canonical lessons in English and Spanish, proficiency-mapped from A1 to B2, with vocabulary prioritization, dialogue extraction, assessment items, and product bundles. Built for reuse — fork it, adapt it, teach with it.
See the curriculum →Speech · Writing · Literature
Eastern Huasteca Nahuatl. The living spoken base drawn from community speech. All pronunciation, phonology, and conversational register grounded here.
Modern Standard Nahuatl. The neutral written reference norm for education, publishing, governance, and formal communication. Clear, consistent, teachable.
The elevated literary register for poetry, song, ceremony, and public oratory. Classical resonance with modern clarity. Where the language creates, not just communicates.
Open · Structured · Provenance-tracked
The first machine-readable dataset of Classical Nahuatl. Parsed from Siméon's 1885 dictionary, Wiktionary across four varieties (Classical, Central, Eastern Huasteca, Highland Puebla), and two Universal Dependencies treebanks. Every entry carries provenance, license tracking, and source confidence scoring.
All data is free and open under CC BY-SA 3.0 / GFDL. Public-domain sources remain public domain.
Open · Bilingual · Proficiency-Mapped
A complete open instructional pipeline built from Nāhuatlahtolli — the University of Texas COERLL course for Eastern Huasteca Nahuatl. 32 canonical lessons in English and Spanish, cleaned, deduplicated, and organized into proficiency bands from A1 through a B2 bridge.
The pipeline includes lesson normalization, unit assembly, vocabulary prioritization, dialogue extraction, assessment generation, product bundling, and a spoken EHN primer foundation. Every stage produces a SQLite database that builds on the last — fork any stage and extend it.
Alphabet, greetings, questions, names, numbers, colors, basic verbs, possessives, diminutives, household vocabulary, food, fields, conditional, non-specific objects.
Professions, past tense (three parts), city navigation, colors and numbers in context. Building toward independent sentence production.
Intransitive and transitive verbs, time division, family, appearance, reflexives, likes/dislikes, imperatives, market language, illness vocabulary, cleansing ceremonies.
Constitutional framework · Version 0.1
Code · Data · Curriculum · Music
Source code, parsers, governance documents, and project infrastructure. Includes fcn_source_parsers.py, fcn_legal_ingest.py, and the full pipeline.
Public data files: Siméon parsed JSON, Kaikki JSONL across four varieties, UD treebanks, classical example bank, and lexical rows.
Original compositions in Nahuatl. Worship, poetry, and song in the tradition of in xochitl in cuicatl. New uploads daily.
28,709 structured entries from Rémi Siméon's 1885 Dictionnaire de la langue nahuatl. The first machine-readable version. 6.2 MB JSON.
The FCN master lexicon in SQLite — all sources merged, register-tagged, provenance-tracked, with orthography candidates and editorial status. 100 MB.
The complete Phase 8 instructional pipeline in SQLite — lessons, dialogues, vocabulary, units, assessments, product bundles, and spoken EHN primer.
Loading…