as is common with many fraud-related searches, we are usually looking to attribute risk or synthesize relevant context with loosely written detections that simply detect anamolous behavior. this search will need to be customized to fit your environment—improving its fidelity by counting based on something much more specific, such as a device id that may be present in your dataset. consideration for whether the large number of registrations are occuring from a first-time seen domain may also be important. extending the search window to look further back in time, or even calculating the average per hour/day for each email domain to look for an anomalous spikes, will improve this search. you can also use shannon entropy or levenshtein distance (both courtesy of url toolbox) to consider the randomness or similarity of the email name or email domain, as the names are often machine-generated. | |