Skip to content

Variant Generation

The variants and splitting modules generate sound-alike names proactively — useful for exploring the phonetic neighborhood of any word. Applications include typosquatting detection, brand-name screening, domain squatting analysis, and search-term expansion.

Phonetic Variants

The variants.generate() function applies phonetic substitution rules to produce names that sound like the input but are spelled differently:

from phonemenal import variants

variants.generate("flask")
# → {"phlask", "flazk", "flasc", ...}

variants.generate("click")
# → {"clik", "klick", "klik", ...}

Substitution Rules

phonemenal applies 22 bidirectional substitution patterns:

Pattern Examples
phf flask → phlask
ckk click → clik
xks flux → fluks
sz flask → flazk
iy click → clyck
qukw quest → kwest
chc rich → ric
shs bash → bas
tht math → mat
eror server → servor
leel bottle → bottel
aiay train → trayn
And more...

Each substitution is applied in both directions, and double letters are toggled (e.g., lll, lll).

Separator Permutations

By default, variants include separator permutations — hyphens, underscores, and concatenation:

variants.generate("my-package")
# Includes: "my_package", "mypackage", "my-package" variations

# Disable separator permutations
variants.generate("my-package", include_separators=False)

Morphological Variants

For typosquatting that exploits suffix confusion rather than pronunciation:

from phonemenal import variants

variants.generate_morphological("packaging")
# → {"packaged", "packager", "packages", ...}

These aren't phonetic substitutions — they swap suffixes like -ing-ed, -er, -es, which is a common typosquatting vector.


Compound Word Splitting

Package names are often compound words without separators. The splitting module uses ML-based segmentation to break them apart:

from phonemenal import splitting

splitting.split("bluevoyage")   # → ["blue", "voyage"]
splitting.split("fastapi")      # → ["fast", "api"]
splitting.split("pytorch")      # → ["py", "torch"]

Homophone Permutations

Once a name is split, phonemenal finds homophones for each component and generates all recombinations:

splitting.component_homophones("bluevoyage")
# → {"blue": ["blew", "bleu"], "voyage": ["voyage"]}

splitting.homophone_permutations("bluevoyage")
# → ["bluevoyage", "blewvoyage", "bleuvoyage", ...]

This catches attacks like publishing blewvoyage to target users of bluevoyage.

# Cap the output for names with many homophones
splitting.homophone_permutations("bluevoyage", max_permutations=50)

Combining Approaches

For thorough coverage, combine all generation strategies:

from phonemenal import variants, splitting

name = "bluevoyage"

# Phonetic variants of the whole name
phonetic = variants.generate(name)

# Morphological variants
morphological = variants.generate_morphological(name)

# Homophone permutations of components
permutations = set(splitting.homophone_permutations(name))

# Union of all candidates
all_candidates = phonetic | morphological | permutations

Or let the scanning module handle this automatically with scan_with_reverse():

from phonemenal.scanning import scan_with_reverse

matches = scan_with_reverse(
    candidates=["bluevoyage"],
    known_names=known_packages,
    exists_fn=check_existence,
    include_morphological=True,
)