LoFP LoFP / legitimate creative writers developing fictional characters, game developers creating roleplay scenarios, educators teaching about ai ethics and limitations, researchers studying ai behavior, or users engaging in harmless creative storytelling may trigger false positives.

Techniques

Sample rules

M365 Copilot Impersonation Jailbreak Attack

Description

Detects M365 Copilot impersonation and roleplay jailbreak attempts where users try to manipulate the AI into adopting alternate personas, behaving as unrestricted entities, or impersonating malicious AI systems to bypass safety controls. The detection searches exported eDiscovery prompt logs for roleplay keywords like “pretend you are,” “act as,” “you are now,” “amoral,” and “roleplay as” in the Subject_Title field. Prompts are categorized into specific impersonation types (AI_Impersonation, Malicious_AI_Persona, Unrestricted_AI_Persona, etc.) to identify attempts to override the AI’s safety guardrails through persona injection attacks.

Detection logic

`m365_exported_ediscovery_prompt_logs` 
| search Subject_Title="*Pretend you are*" OR Subject_Title="*act as*" OR Subject_Title="*you are now*" OR Subject_Title="*amoral*" OR Subject_Title="*being*" OR Subject_Title="*roleplay as*" OR Subject_Title="*imagine you are*" OR Subject_Title="*behave like*" 
| eval user = Sender 
| eval impersonation_type=case(match(Subject_Title, "(?i)pretend you are.*AI"), "AI_Impersonation", match(Subject_Title, "(?i)(act as
|roleplay as).*AI"), "AI_Roleplay", match(Subject_Title, "(?i)amoral.*AI"), "Amoral_AI", match(Subject_Title, "(?i)transcendent being"), "Fictional_Entity", match(Subject_Title, "(?i)(act as
|pretend you are).*(entities
|multiple)"), "Multi_Entity", match(Subject_Title, "(?i)(imagine you are
|behave like).*AI"), "AI_Behavioral_Change", match(Subject_Title, "(?i)you are now.*AI"), "AI_Identity_Override", match(Subject_Title, "(?i)(evil
|malicious
|harmful).*AI"), "Malicious_AI_Persona", match(Subject_Title, "(?i)(unrestricted
|unlimited
|uncensored).*AI"), "Unrestricted_AI_Persona", 1=1, "Generic_Roleplay") 
| table _time, user, Subject_Title, impersonation_type, Workload 
| sort -_time 
| `m365_copilot_impersonation_jailbreak_attack_filter`