Skip to content

homophones

Exact and near-homophone discovery using the CMU Pronouncing Dictionary.

phonemenal.homophones

Exact and near-homophone discovery via CMU pronunciation dictionary inversion.

  • find(): exact homophones (identical pronunciation)
  • find_similar(): near-homophones above a similarity threshold

find(word: str, *, include_self: bool = False) -> list[str]

Find exact homophones — words with identical pronunciation.

Parameters:

Name Type Description Default
word str

Word to look up (case-insensitive).

required
include_self bool

If True, include the input word in results.

False

Returns list of homophones sorted alphabetically.

Source code in phonemenal/homophones.py
def find(word: str, *, include_self: bool = False) -> list[str]:
    """Find exact homophones — words with identical pronunciation.

    Args:
        word: Word to look up (case-insensitive).
        include_self: If True, include the input word in results.

    Returns list of homophones sorted alphabetically.
    """
    pronunciations = get_phonemes(word)
    if not pronunciations:
        return []

    homophones: set[str] = set()
    for pron in pronunciations:
        matches = find_words_by_pronunciation(pron)
        homophones.update(matches)

    if not include_self:
        homophones.discard(word.lower())

    return sorted(homophones)

find_similar(word: str, *, candidates: list[str] | None = None, min_score: float = 0.75, weights: tuple[float, float, float] = (1.0, 1.0, 1.0), max_results: int = 20) -> list[dict]

Find near-homophones above a similarity threshold.

Parameters:

Name Type Description Default
word str

Word to compare against.

required
candidates list[str] | None

List of words to check. If None, checks exact homophones only (use with a candidate list for broader scanning).

None
min_score float

Minimum composite similarity score (0.0–1.0).

0.75
weights tuple[float, float, float]

(ppc_weight, pld_weight, lcs_weight) for composite scoring.

(1.0, 1.0, 1.0)
max_results int

Maximum results to return.

20

Returns list of dicts with 'word', 'score', and 'is_exact_homophone' keys, sorted by score descending.

Source code in phonemenal/homophones.py
def find_similar(
    word: str,
    *,
    candidates: list[str] | None = None,
    min_score: float = 0.75,
    weights: tuple[float, float, float] = (1.0, 1.0, 1.0),
    max_results: int = 20,
) -> list[dict]:
    """Find near-homophones above a similarity threshold.

    Args:
        word: Word to compare against.
        candidates: List of words to check. If None, checks exact homophones
                   only (use with a candidate list for broader scanning).
        min_score: Minimum composite similarity score (0.0–1.0).
        weights: (ppc_weight, pld_weight, lcs_weight) for composite scoring.
        max_results: Maximum results to return.

    Returns list of dicts with 'word', 'score', and 'is_exact_homophone' keys,
    sorted by score descending.
    """
    results: list[dict] = []
    exact = set(find(word))

    if candidates is None:
        # Without candidates, just return exact homophones with score 1.0
        for h in sorted(exact):
            results.append({"word": h, "score": 1.0, "is_exact_homophone": True})
        return results[:max_results]

    word_lower = word.lower()

    for candidate in candidates:
        if normalize_name(candidate) == normalize_name(word_lower):
            continue

        score = composite(word, candidate, weights=weights)
        if score >= min_score:
            results.append(
                {
                    "word": candidate,
                    "score": round(score, 4),
                    "is_exact_homophone": candidate.lower() in exact,
                }
            )

    results.sort(key=lambda r: r["score"], reverse=True)
    return results[:max_results]