core¶
Core phoneme infrastructure — CMU dict access, phoneme utilities, and inverted indices.
phonemenal.core
¶
Core phoneme infrastructure — CMU dict loading, inverted pronunciation index, leader/trailer phoneme indexing, and syllable-aware phoneme splitting.
All dict access is lazy-loaded and cached. First call triggers NLTK's CMU dict download if not already present.
get_dict() -> dict[str, list[list[str]]]
cached
¶
Return the CMU pronouncing dictionary as {word: [[phoneme, ...], ...]}.
get_entries() -> list[tuple[str, list[str]]]
cached
¶
Return CMU dict entries as [(word, [phoneme, ...]), ...].
get_inverted() -> dict[tuple[str, ...], set[str]]
cached
¶
Build inverted pronunciation index: phoneme tuple → set of words.
This is the core data structure for finding exact homophones: given a pronunciation, look up all words that share it.
Source code in phonemenal/core.py
get_leader_trailer() -> tuple[dict[str, set[str]], dict[str, set[str]]]
cached
¶
Build leader/trailer phoneme indices for fast candidate filtering.
Returns (leader_index, trailer_index) where: - leader_index: first phoneme → set of words starting with that phoneme - trailer_index: last phoneme → set of words ending with that phoneme
Source code in phonemenal/core.py
get_phonemes(word: str) -> list[list[str]]
¶
Get all pronunciations for a word from CMU dict.
Returns list of pronunciations (each a list of phonemes), or empty list if the word is not in the dictionary.
Source code in phonemenal/core.py
strip_stress(phoneme: str) -> str
¶
is_vowel(phoneme: str) -> bool
¶
split_phonemes_by_syllables(phonemes: list[str]) -> list[list[str]]
¶
Split a phoneme list into syllable groups.
Syllable boundaries are marked by stressed vowels (phonemes ending in 0/1/2). Each syllable contains its onset consonants + the vowel. Trailing consonants after the last vowel are appended to the final syllable.
Example
['R', 'IY0', 'T', 'R', 'AE1', 'K', 'SH', 'AH0', 'N'] → [['R', 'IY0'], ['T', 'R', 'AE1'], ['K', 'SH', 'AH0', 'N']]
Source code in phonemenal/core.py
syllable_count(phonemes: list[str]) -> int
¶
phonemes_to_str(phonemes: list[str]) -> str
¶
find_words_by_pronunciation(phonemes: list[str] | tuple[str, ...]) -> set[str]
¶
Find all words with the exact given pronunciation.
find_words_by_leader(phoneme: str) -> set[str]
¶
Find all words whose pronunciation starts with the given phoneme.
find_words_by_trailer(phoneme: str) -> set[str]
¶
Find all words whose pronunciation ends with the given phoneme.
normalize_name(name: str) -> str
¶
Normalize a name for phonetic comparison.
Strips separators (hyphens, underscores, dots) and lowercases.
get_phonemes_or_fallback(word: str) -> Optional[list[list[str]]]
¶
Get phonemes from CMU dict, returning None if not found.
Callers should use fallback.phonetic_key() for words not in the dict.