legitimate users discussing ai ethics research, security professionals testing system robustness, developers creating training materials for ai safety, or academic discussions about ai limitations and behavioral constraints may trigger false positives.

Techniques

Sample rules

M365 Copilot Jailbreak Attempts

source: splunk
technicques:
- T1562.001

Description

Detects M365 Copilot jailbreak attempts through prompt injection techniques including rule manipulation, system bypass commands, and AI impersonation requests that attempt to circumvent built-in safety controls. The detection searches exported eDiscovery prompt logs for jailbreak keywords like “pretend you are,” “act as,” “rules=,” “ignore,” “bypass,” and “override” in the Subject_Title field, assigning severity scores based on the manipulation type (score of 4 for amoral impersonation or explicit rule injection, score of 3 for entity roleplay or bypass commands). Prompts with a jailbreak score of 2 or higher are flagged, prioritizing the most severe attempts to override AI safety mechanisms through direct instruction injection or unauthorized persona adoption.

Detection logic

`m365_exported_ediscovery_prompt_logs` 
| search Subject_Title="*pretend you are*" OR Subject_Title="*act as*" OR Subject_Title="*rules=*" OR Subject_Title="*ignore*" OR Subject_Title="*bypass*" OR Subject_Title="*override*" 
| eval user = Sender 
| eval jailbreak_score=case( match(Subject_Title, "(?i)pretend you are.*amoral"), 4, match(Subject_Title, "(?i)act as.*entities"), 3, match(Subject_Title, "(?i)(ignore
|bypass
|override)"), 3, match(Subject_Title, "(?i)rules\s*="), 4, 1=1, 1) 
| where jailbreak_score >= 2 
| table _time, user, Subject_Title, jailbreak_score, Workload, Size 
| sort -jailbreak_score, -_time 
| `m365_copilot_jailbreak_attempts_filter`

M365 Copilot Agentic Jailbreak Attack

source: splunk
technicques:
- T1562

Description

Detects agentic AI jailbreak attempts that try to establish persistent control over M365 Copilot through rule injection, universal triggers, response automation, system overrides, and persona establishment techniques. The detection analyzes the PromptText field for keywords like “from now on,” “always respond,” “ignore previous,” “new rule,” “override,” and role-playing commands (e.g., “act as,” “you are now”) that attempt to inject persistent instructions. The search computes risk by counting distinct jailbreak indicators per user session, flagging coordinated manipulation attempts.

Detection logic

`m365_exported_ediscovery_prompt_logs` 
| eval user = Sender 
| eval rule_injection=if(match(Subject_Title, "(?i)(rules
|instructions)\s*="), "YES", "NO") 
| eval universal_trigger=if(match(Subject_Title, "(?i)(every
|all).*prompt"), "YES", "NO") 
| eval response_automation=if(match(Subject_Title, "(?i)(always
|automatic).*respond"), "YES", "NO") 
| eval system_override=if(match(Subject_Title, "(?i)(override
|bypass
|ignore).*(system
|default)"), "YES", "NO") 
| eval persona_establishment=if(match(Subject_Title, "(?i)(with.*\[.*\]
|persona)"), "YES", "NO") 
| where rule_injection="YES" OR universal_trigger="YES" OR response_automation="YES" OR system_override="YES" OR persona_establishment="YES" 
| table _time, "Source ID", user, Subject_Title, rule_injection, universal_trigger, response_automation, system_override, persona_establishment, Workload 
| sort -_time 
| `m365_copilot_agentic_jailbreak_attack_filter`