splitting¶
Compound word splitting and homophone permutation recombination.
phonemenal.splitting
¶
Compound word splitting and homophone permutation recombination.
Given a concatenated word like "bluevoyage", split it into components ["blue", "voyage"], find homophones for each, and recombine all permutations ("blewvoyage", "bleuvoyage", etc.).
split(word: str) -> list[str]
¶
Split a compound/concatenated word into component words.
Uses wordninja's ML-based word segmentation trained on Wikipedia word frequency data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
word
|
str
|
Input string (e.g. "bluevoyage"). |
required |
Returns list of component words (e.g. ["blue", "voyage"]). Returns [word] if no meaningful split is found.
Source code in phonemenal/splitting.py
component_homophones(word: str) -> dict[str, list[str]]
¶
Find exact homophones for each component of a compound word.
Returns dict mapping each component → list of its homophones. Components without homophones map to [component] (just themselves).
Source code in phonemenal/splitting.py
homophone_permutations(word: str, *, include_original: bool = True, max_permutations: int = 100) -> list[str]
¶
Generate all homophone recombinations of a compound word.
Splits the word, finds homophones for each component, and produces all combinations by joining them.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
word
|
str
|
Input compound word. |
required |
include_original
|
bool
|
Whether to include the original word in results. |
True
|
max_permutations
|
int
|
Cap on results to avoid combinatorial explosion. |
100
|
Returns list of recombined strings.
Example
"bluevoyage" → ["bluevoyage", "blewvoyage", "bleuvoyage", ...]