LoFP LoFP / known false positives include security research and testing activities where red teams or developers intentionally test prompt injection defenses, as well as educational content where documentation, tutorials, or training materials discussing prompt injection techniques are legitimately processed by the ai assistant. additionally, security tool development involving code reviews or development of prompt injection detection mechanisms may contain these patterns, and quoted references in conversations where users discuss or report prompt injection attempts they encountered elsewhere could trigger this detection.

Techniques

Sample rules

MCP Prompt Injection

Description

This detection identifies potential prompt injection attempts within MCP (Model Context Protocol) communications by monitoring for known malicious phrases and patterns commonly used to manipulate AI assistants. Prompt injection is a critical vulnerability where adversaries embed hidden instructions in content processed by AI tools, attempting to override system prompts, bypass security controls, or hijack the AI’s behavior. The search monitors JSON-RPC traffic for phrases such as “IGNORE PREVIOUS INSTRUCTIONS,” “SYSTEM PROMPT OVERRIDE,” and “ignore all security” which indicate attempts to subvert the AI’s intended behavior and potentially execute unauthorized actions through the MCP toolchain.

Detection logic

`mcp_server` direction=inbound ( "IGNORE PREVIOUS INSTRUCTIONS" OR "AI_INSTRUCTION" OR "SYSTEM PROMPT OVERRIDE" OR "[SYSTEM]:" OR "ignore all security" OR "New directive" OR "ignore security policies" )

| eval dest=host

| eval injection_payload=coalesce('params.content_preview', 'params.result_preview')

| eval target_path='params.path'

| eval sql_query='params.query'

| stats count min(_time) as firstTime max(_time) as lastTime values(method) as method values(target_path) as target_path values(sql_query) as sql_query values(injection_payload) as injection_payload by dest, source

| `security_content_ctime(firstTime)` 

| `security_content_ctime(lastTime)`

| table dest firstTime lastTime count source method target_path sql_query injection_payload

| `mcp_prompt_injection_filter`