similarity¶
Phonetic similarity scoring with three complementary algorithms.
All public functions return normalized scores between 0.0 (completely different) and 1.0 (identical). Pass raw=True to get a (score, details) tuple with intermediate computation data.
phonemenal.similarity
¶
Three phonetic similarity algorithms, all normalized to 0.0–1.0.
-
PPC-A (Positional Phoneme Correlation — Absolute): Measures overlap of positional phoneme patterns between two words. Based on building forward and reverse phoneme combinations with positional padding.
-
PLD (Phoneme Levenshtein Distance): Syllable-level edit distance using rapidfuzz. Treats each syllable as an atomic unit so distance reflects how many whole syllables differ.
-
LCS (Longest Common Subsequence): Ratio-based scoring on phonetic keys or raw phoneme sequences.
Composite scoring combines all three with configurable weights.
ppc(word1: str, word2: str, *, raw: bool = False) -> float | tuple[float, dict]
¶
Positional Phoneme Correlation — Absolute (PPC-A).
Builds positional phoneme combinations by traversing forward and reverse directions with padding, then measures set intersection.
Returns normalized score 0.0–1.0 (higher = more similar). If raw=True, returns (score, details_dict) with intermediate values.
Source code in phonemenal/similarity.py
pld(word1: str, word2: str, *, raw: bool = False) -> float | tuple[float, dict]
¶
Phoneme Levenshtein Distance at syllable level.
Each syllable is treated as an atomic unit (tuple of phonemes). Distance is computed between syllable sequences, then normalized to 0.0–1.0 where 1.0 = identical and 0.0 = maximally different.
If raw=True, returns (score, details_dict).
Source code in phonemenal/similarity.py
lcs(word1: str, word2: str, *, use_phonemes: bool = True, raw: bool = False) -> float | tuple[float, dict]
¶
Longest Common Subsequence ratio.
When use_phonemes=True (default), compares phoneme sequences from CMU dict. When use_phonemes=False, compares raw character strings (useful as fallback for words not in the dictionary).
Returns 0.0–1.0 where 1.0 = identical sequences. If raw=True, returns (score, details_dict).
Source code in phonemenal/similarity.py
composite(word1: str, word2: str, *, weights: tuple[float, float, float] = (1.0, 1.0, 1.0), raw: bool = False) -> float | tuple[float, dict]
¶
Weighted composite of PPC-A, PLD, and LCS scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
word1
|
str
|
First word to compare. |
required |
word2
|
str
|
Second word to compare. |
required |
weights
|
tuple[float, float, float]
|
(ppc_weight, pld_weight, lcs_weight). Default equal weighting. |
(1.0, 1.0, 1.0)
|
raw
|
bool
|
If True, return (composite_score, details_dict). |
False
|
Returns:
| Type | Description |
|---|---|
float | tuple[float, dict]
|
Composite similarity score between 0.0 and 1.0. |
Source code in phonemenal/similarity.py
compare(word1: str, word2: str, *, weights: Optional[tuple[float, float, float]] = None) -> dict
¶
Full comparison report between two words.
Returns a dict with all three individual scores, composite score, and pronunciation details.