<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on br0k3nlab</title><link>https://br0k3nlab.com/posts/</link><description>Recent content in Posts on br0k3nlab</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>Justin Ibarra | @br0k3ns0und</copyright><lastBuildDate>Mon, 06 Apr 2026 10:00:00 -0700</lastBuildDate><atom:link href="https://br0k3nlab.com/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>Sound as an Attack Vector: Introducing phonemenal</title><link>https://br0k3nlab.com/posts/2026/04/sound-as-an-attack-vector-introducing-phonemenal/</link><pubDate>Mon, 06 Apr 2026 10:00:00 -0700</pubDate><guid>https://br0k3nlab.com/posts/2026/04/sound-as-an-attack-vector-introducing-phonemenal/</guid><description>phonemenal is a phonetic similarity and homophone detection library for Python. It is designed for identifying sound-alike collisions across namespaces - package registries, domains, social handles, and anywhere else that a name spoken aloud could be confused for another. The docs can be found at br0k3nlab.com/phonemenal and the source at GitHub .
Some background This project has roots going back to 2016. During a local SOC training exercise, a parody domain was registered - a catch-all email address was set up to forward to a personal inbox.</description><content type="html"><![CDATA[<p>
    <img src="/projects/images/phonemenal-logo.svg"  alt="phonemenal-logo"  class="center"  style="border-radius: 8px; max-width: 50%;"  />

 <br></p>
<p><strong>phonemenal</strong> is a phonetic similarity and homophone detection library for Python. It is designed for identifying
sound-alike collisions across namespaces - package registries, domains, social handles, and anywhere else that
a name spoken aloud could be confused for another. The docs can be found at
<a href="https://br0k3nlab.com/phonemenal/" target="_blank" rel="noopener noreferrer">br0k3nlab.com/phonemenal</a>
 and the source at
<a href="https://github.com/brokensound77/phonemenal" target="_blank" rel="noopener noreferrer">GitHub</a>
.</p>
<h3 id="some-background">Some background</h3>
<p>This project has roots going back to 2016. During a local SOC training exercise, a parody domain was registered - a
catch-all email address was set up to forward to a personal inbox. Almost immediately, sensitive information started
flowing in. Real credentials. Real documents. People were typing what they <em>heard</em> or <em>remembered</em>, and the spelling
didn&rsquo;t match up.</p>
<p>That accidental discovery became a mutual research interest between myself and
<a href="https://x.com/ReaganShort" target="_blank" rel="noopener noreferrer">Reagan Short</a>
. We spent time digging into the problem space,
exploring linguistic patterns, and ultimately presented our findings at
<a href="https://troopers.de/troopers23/talks/mmtwsy/" target="_blank" rel="noopener noreferrer">TROOPERS 2023</a>
 under the title
<em>Homophonic Collisions: Hold me Closer Tony Danza</em>. You can watch the full talk
<a href="https://www.youtube.com/watch?v=nj4fZAM_IDg" target="_blank" rel="noopener noreferrer">here</a>
 or grab the
<a href="https://troopers.de/downloads/troopers23/TR23_HomophonicCollisions.pdf" target="_blank" rel="noopener noreferrer">slides</a>
.</p>
<p>The core of the research was this: <strong>existing defenses focus on what words <em>look</em> like (typosquatting, homoglyphs),
but largely ignore what words <em>sound</em> like</strong>. Tools like DNSTwist are great for visual permutations, but they don&rsquo;t
account for the fact that &ldquo;phlask&rdquo; and &ldquo;flask&rdquo; are pronounced identically, or that someone dictating &ldquo;numpy&rdquo; over the
phone could easily end up with &ldquo;numpie&rdquo; on the other end.</p>
<p>phonemenal is the tooling that grew out of that research.</p>
<h3 id="the-problem-homophonic-collisions">The problem: homophonic collisions</h3>
<p>A <strong>homophonic collision</strong> occurs when two distinct strings share the same (or nearly the same) pronunciation. There
are a few levels to this:</p>
<ol>
<li><strong>Exact homophones</strong> - words that are phonetically identical: <em>blue</em> / <em>blew</em>, <em>right</em> / <em>write</em>, <em>new</em> / <em>knew</em></li>
<li><strong>Near-homophones</strong> - words that are phonetically very close: <em>crowd</em> / <em>crown</em>, <em>page</em> / <em>rage</em>, <em>elastic</em> / <em>fantastic</em></li>
<li><strong>Soundsquatting</strong> - the weaponized exploitation of homophonic collisions for malicious purposes</li>
</ol>
<p>The diagram below illustrates just how many points of failure exist in the communication chain - from the speaker&rsquo;s
intended message to the listener&rsquo;s interpretation. Every one of these is a potential collision point.</p>
<p>
    <img src="/post-images/intro-phonemenal/hc-c3.png"  alt="Coils of Communication Chaos"  class="center"  style="border-radius: 8px; max-width: 80%;"  />

 <br></p>
<p><em>Coils of Communication Chaos - adapted from the <a href="https://guides.lib.uw.edu/research/linguistics" target="_blank" rel="noopener noreferrer">UW Linguistics Research Guide</a>
</em></p>
<p>That last one is where it gets interesting from a security perspective. If an attacker registers <code>numpie</code> on PyPI,
or <code>walmaret.com</code> as a domain, or <code>phlask</code> as a package name - these are not typos. They are <em>phonetically equivalent</em>
names designed to exploit the gap between what we hear and what we type.</p>
<p>During our research, we found real-world examples across multiple namespaces:</p>
<ul>
<li><strong>Domains</strong> (from the Alexa top 1000): <code>walmaret.com</code>, <code>yootube.com</code>, <code>webbex.com</code>, <code>wellesfarago.com</code></li>
<li><strong>PyPI packages</strong>: soundsquat variants of popular packages, some validated as malicious</li>
<li><strong>Voice assistants and LLMs</strong>: speech-to-text and text-to-speech wrappers that silently introduce collision errors</li>
</ul>
<p>The attack surface grows every time a human speaks a name and someone else types it. And with voice assistants and
LLM integrations becoming increasingly prevalent, the risk surface is expanding.</p>
<h3 id="see-it-in-action">See it in action</h3>
<p>The explorer below shows pre-computed phonetic breakdowns for real examples across different categories. Click any
pair to expand the phoneme details. Pay attention to the <em>Package Squats</em> and <em>Domain Squats</em> tabs - these are the
kinds of collisions phonemenal is built to detect.</p>
<link rel="stylesheet" href="/css/phonemenal-demo.css">

<div class="phn-explorer" id="phn-explorer">
  <div class="phn-tabs">
    <button class="phn-tab active" data-target="homophones">Homophones</button>
    <button class="phn-tab" data-target="near">Near-Homophones</button>
    <button class="phn-tab" data-target="packages">Package Squats</button>
    <button class="phn-tab" data-target="domains">Domain Squats</button>
  </div>
  <div class="phn-panel active" data-key="homophones"></div>
  <div class="phn-panel" data-key="near"></div>
  <div class="phn-panel" data-key="packages"></div>
  <div class="phn-panel" data-key="domains"></div>
</div>

<script src="/js/phonemenal-demo.js"></script>

<h3 id="how-phonemenal-works">How phonemenal works</h3>
<p>The library provides three complementary scoring algorithms, all normalized to a <code>0.0–1.0</code> range:</p>
<p><strong>PPC-A (Positional Phoneme Correlation - Absolute)</strong> builds positional phoneme combinations by traversing forward and
reverse directions with padding, then measures set intersection. It captures how much of the positional phoneme
structure two words share.</p>
<p><strong>PLD (Phoneme Levenshtein Distance)</strong> operates at the syllable level, using CMU dict stress markers to split phonemes
into syllable groups. Each syllable is an atomic unit, so the distance reflects how many whole syllables differ - this
is closer to how we actually perceive speech.</p>
<p><strong>LCS (Longest Common Subsequence)</strong> computes the ratio of the longest common subsequence to the total sequence length.
Simple, effective, and robust to insertions.</p>
<p>A <strong>composite score</strong> combines all three with configurable weights.</p>
<p>For words not in the CMU Pronouncing Dictionary (think: brand names, neologisms, package names like <code>numpy</code> or
<code>pytorch</code>), phonemenal uses a <strong>fallback encoder</strong> - a simplified Metaphone-inspired encoding that applies digraph
replacement, vowel normalization, and character collapsing to produce phonetic keys. Sound-alike names produce the same
or similar keys.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">from</span> phonemenal <span style="color:#ff79c6">import</span> fallback
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>fallback<span style="color:#ff79c6">.</span>phonetic_key(<span style="color:#f1fa8c">&#34;numpy&#34;</span>)    <span style="color:#6272a4"># → &#34;nAmpY&#34;</span>
</span></span><span style="display:flex;"><span>fallback<span style="color:#ff79c6">.</span>phonetic_key(<span style="color:#f1fa8c">&#34;numpie&#34;</span>)   <span style="color:#6272a4"># → &#34;nAmpY&#34;  - exact match</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>fallback<span style="color:#ff79c6">.</span>phonetic_key(<span style="color:#f1fa8c">&#34;flask&#34;</span>)    <span style="color:#6272a4"># → &#34;flAsk&#34;</span>
</span></span><span style="display:flex;"><span>fallback<span style="color:#ff79c6">.</span>phonetic_key(<span style="color:#f1fa8c">&#34;phlask&#34;</span>)   <span style="color:#6272a4"># → &#34;flAsk&#34;  - exact match</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>fallback<span style="color:#ff79c6">.</span>phonetic_key(<span style="color:#f1fa8c">&#34;phone&#34;</span>)    <span style="color:#6272a4"># → &#34;fAn&#34;</span>
</span></span><span style="display:flex;"><span>fallback<span style="color:#ff79c6">.</span>phonetic_key(<span style="color:#f1fa8c">&#34;fone&#34;</span>)     <span style="color:#6272a4"># → &#34;fAn&#34;    - exact match</span>
</span></span></code></pre></div><h3 id="getting-started">Getting started</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install phonemenal
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># With LLM support for deep analysis</span>
</span></span><span style="display:flex;"><span>pip install phonemenal<span style="color:#ff79c6">[</span>llm<span style="color:#ff79c6">]</span>
</span></span></code></pre></div><p>The API is intentionally straightforward. Here are the core operations:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">from</span> phonemenal <span style="color:#ff79c6">import</span> similarity, homophones, variants, splitting, scanning
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># -- Similarity scoring (all 0.0–1.0) --</span>
</span></span><span style="display:flex;"><span>similarity<span style="color:#ff79c6">.</span>ppc(<span style="color:#f1fa8c">&#34;crowd&#34;</span>, <span style="color:#f1fa8c">&#34;crown&#34;</span>)         <span style="color:#6272a4"># PPC-A score</span>
</span></span><span style="display:flex;"><span>similarity<span style="color:#ff79c6">.</span>pld(<span style="color:#f1fa8c">&#34;elastic&#34;</span>, <span style="color:#f1fa8c">&#34;fantastic&#34;</span>)   <span style="color:#6272a4"># syllable-level edit distance</span>
</span></span><span style="display:flex;"><span>similarity<span style="color:#ff79c6">.</span>lcs(<span style="color:#f1fa8c">&#34;packaging&#34;</span>, <span style="color:#f1fa8c">&#34;packages&#34;</span>)  <span style="color:#6272a4"># longest common subsequence</span>
</span></span><span style="display:flex;"><span>similarity<span style="color:#ff79c6">.</span>composite(<span style="color:#f1fa8c">&#34;crowd&#34;</span>, <span style="color:#f1fa8c">&#34;crown&#34;</span>)   <span style="color:#6272a4"># weighted average of all three</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># -- Exact homophones --</span>
</span></span><span style="display:flex;"><span>homophones<span style="color:#ff79c6">.</span>find(<span style="color:#f1fa8c">&#34;blue&#34;</span>)        <span style="color:#6272a4"># → [&#34;blew&#34;]</span>
</span></span><span style="display:flex;"><span>homophones<span style="color:#ff79c6">.</span>find(<span style="color:#f1fa8c">&#34;right&#34;</span>)       <span style="color:#6272a4"># → [&#34;rite&#34;, &#34;wright&#34;, &#34;write&#34;]</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># -- Near-homophones --</span>
</span></span><span style="display:flex;"><span>homophones<span style="color:#ff79c6">.</span>find_similar(
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;crowd&#34;</span>,
</span></span><span style="display:flex;"><span>    candidates<span style="color:#ff79c6">=</span>[<span style="color:#f1fa8c">&#34;crown&#34;</span>, <span style="color:#f1fa8c">&#34;crowed&#34;</span>, <span style="color:#f1fa8c">&#34;crude&#34;</span>],
</span></span><span style="display:flex;"><span>    min_score<span style="color:#ff79c6">=</span><span style="color:#bd93f9">0.7</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># -- Variant generation --</span>
</span></span><span style="display:flex;"><span>variants<span style="color:#ff79c6">.</span>generate(<span style="color:#f1fa8c">&#34;flask&#34;</span>)                <span style="color:#6272a4"># → {&#34;phlask&#34;, &#34;flazk&#34;, ...}</span>
</span></span><span style="display:flex;"><span>variants<span style="color:#ff79c6">.</span>generate_morphological(<span style="color:#f1fa8c">&#34;click&#34;</span>)  <span style="color:#6272a4"># → {&#34;clicked&#34;, &#34;clicker&#34;, &#34;clicks&#34;, ...}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># -- Compound word splitting --</span>
</span></span><span style="display:flex;"><span>splitting<span style="color:#ff79c6">.</span>split(<span style="color:#f1fa8c">&#34;bluevoyage&#34;</span>)                 <span style="color:#6272a4"># → [&#34;blue&#34;, &#34;voyage&#34;]</span>
</span></span><span style="display:flex;"><span>splitting<span style="color:#ff79c6">.</span>homophone_permutations(<span style="color:#f1fa8c">&#34;bluevoyage&#34;</span>) <span style="color:#6272a4"># → [&#34;bluevoyage&#34;, &#34;blewvoyage&#34;, ...]</span>
</span></span></code></pre></div><h3 id="the-cli">The CLI</h3>
<p>phonemenal ships with a CLI that uses Rich for formatted output:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># Compare two words across all algorithms</span>
</span></span><span style="display:flex;"><span>phonemenal similarity crowd crown
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Specific algorithm</span>
</span></span><span style="display:flex;"><span>phonemenal similarity crowd crown -a ppc
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Find exact homophones</span>
</span></span><span style="display:flex;"><span>phonemenal homophones blue
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Generate sound-alike variants</span>
</span></span><span style="display:flex;"><span>phonemenal variants flask -m  <span style="color:#6272a4"># include morphological variants</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Split compound words and show permutations</span>
</span></span><span style="display:flex;"><span>phonemenal split bluevoyage -p
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Full comparison report</span>
</span></span><span style="display:flex;"><span>phonemenal compare crowd crown
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># JSON output for scripting</span>
</span></span><span style="display:flex;"><span>phonemenal compare crowd crown -j
</span></span></code></pre></div><h3 id="scanning-at-scale">Scanning at scale</h3>
<p>The real power shows up when you need to check names against a known set. The scanning pipeline combines forward
matching, variant generation, and optional reverse verification:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">from</span> phonemenal.scanning <span style="color:#ff79c6">import</span> scan, scan_with_reverse, format_matches
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Forward scan: check candidates against known names</span>
</span></span><span style="display:flex;"><span>matches <span style="color:#ff79c6">=</span> scan(
</span></span><span style="display:flex;"><span>    candidates<span style="color:#ff79c6">=</span>[<span style="color:#f1fa8c">&#34;numpie&#34;</span>, <span style="color:#f1fa8c">&#34;phlask&#34;</span>, <span style="color:#f1fa8c">&#34;klik&#34;</span>],
</span></span><span style="display:flex;"><span>    known_names<span style="color:#ff79c6">=</span>[<span style="color:#f1fa8c">&#34;numpy&#34;</span>, <span style="color:#f1fa8c">&#34;flask&#34;</span>, <span style="color:#f1fa8c">&#34;click&#34;</span>],
</span></span><span style="display:flex;"><span>    threshold<span style="color:#ff79c6">=</span><span style="color:#bd93f9">0.75</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">print</span>(format_matches(matches))
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># [!!!] &#39;numpie&#39; ~ &#39;numpy&#39;   (score: 1.00, type: exact_phonetic, keys: nAmpY / nAmpY)</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># [!!!] &#39;phlask&#39; ~ &#39;flask&#39;   (score: 1.00, type: exact_phonetic, keys: flAsk / flAsk)</span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># [!!!] &#39;klik&#39;   ~ &#39;click&#39;   (score: 1.00, type: exact_phonetic, keys: klAk / klAk)</span>
</span></span></code></pre></div><p>For reverse scanning, you can provide a lookup function to check if generated variants actually exist in the wild:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">import</span> httpx
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">check_pypi</span>(name: <span style="color:#8be9fd;font-style:italic">str</span>) <span style="color:#ff79c6">-&gt;</span> <span style="color:#8be9fd;font-style:italic">bool</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;Check if a package exists on PyPI.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    resp <span style="color:#ff79c6">=</span> httpx<span style="color:#ff79c6">.</span>head(<span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;https://pypi.org/project/</span><span style="color:#f1fa8c">{</span>name<span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">/&#34;</span>, follow_redirects<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> resp<span style="color:#ff79c6">.</span>status_code <span style="color:#ff79c6">==</span> <span style="color:#bd93f9">200</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>matches <span style="color:#ff79c6">=</span> scan_with_reverse(
</span></span><span style="display:flex;"><span>    candidates<span style="color:#ff79c6">=</span>[<span style="color:#f1fa8c">&#34;numpy&#34;</span>],
</span></span><span style="display:flex;"><span>    known_names<span style="color:#ff79c6">=</span>[<span style="color:#f1fa8c">&#34;numpy&#34;</span>],
</span></span><span style="display:flex;"><span>    exists_fn<span style="color:#ff79c6">=</span>check_pypi,
</span></span><span style="display:flex;"><span>    threshold<span style="color:#ff79c6">=</span><span style="color:#bd93f9">0.75</span>,
</span></span><span style="display:flex;"><span>    include_morphological<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>This is where it starts to get practical for real defensive operations - monitoring package registries, domain
registrations, or any namespace where soundsquatting could be a vector.</p>
<h3 id="llm-powered-deep-analysis">LLM-powered deep analysis</h3>
<p>For cases where algorithmic scoring is ambiguous, phonemenal supports optional LLM-powered analysis. It can break
down syllables, generate IPA transcriptions, and score using a weighted phonetic model:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># Anthropic API</span>
</span></span><span style="display:flex;"><span>phonemenal analyze numpy --provider anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># OpenAI API</span>
</span></span><span style="display:flex;"><span>phonemenal analyze numpy --provider openai
</span></span></code></pre></div><p>The LLM integration over API / remote models is entirely optional - the core library has zero external API dependencies.
Instead, you can pass the analysis off to a local agent from within phonemenal, or just handoff the prompt. The benefit
of this approach is it is built into the core library, with no extras or auth requirements.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6272a4"># Pipe to a local agent</span>
</span></span><span style="display:flex;"><span>phonemenal analyze numpy --agent claude
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Just get the prompt (for piping to your own workflow)</span>
</span></span><span style="display:flex;"><span>phonemenal prompt numpy | pbcopy
</span></span></code></pre></div><p>
    <img src="/post-images/intro-phonemenal/phonemenal-agent.png"  alt="phonemenal with a local agent"  class="center"  style="border-radius: 8px; max-width: 100%;"  />

 <br></p>
<h3 id="real-world-validation-scanning-known-malicious-packages">Real-world validation: scanning known malicious packages</h3>
<p>While building out a supply chain monitoring and analysis framework, I integrated homophonic collision detection into the
process. It was then that I decided to dust off the POCs from our research and release the phonemenal library.</p>
<p>To further put this to the test, I ran phonemenal against the
<a href="https://github.com/DataDog/malicious-software-packages-dataset" target="_blank" rel="noopener noreferrer">DataDog Malicious Software Packages Dataset</a>
 -
a curated collection of known malicious packages from PyPI and npm (credit to DataDog for maintaining this dataset).
The goal was simple: scan the <em>names</em> of known malicious packages against popular legitimate packages to see how many
are phonetically similar - potential soundsquats hiding in plain sight.</p>
<p><strong>NOTE: only the package names from the manifest were used in this analysis. The actual malicious samples were not
downloaded or executed.</strong></p>
<h4 id="pypi-results">PyPI results</h4>
<p><strong>1,786</strong> known malicious package names scanned against the top ~125 most popular PyPI packages:</p>
<table>
<thead>
<tr>
<th>Threshold</th>
<th>Matches</th>
<th>Unique Candidates</th>
</tr>
</thead>
<tbody>
<tr>
<td>&gt;= 1.00 (exact)</td>
<td>11</td>
<td>11</td>
</tr>
<tr>
<td>&gt;= 0.90</td>
<td>43</td>
<td>42</td>
</tr>
<tr>
<td>&gt;= 0.80</td>
<td>125</td>
<td>118</td>
</tr>
<tr>
<td>&gt;= 0.70</td>
<td>381</td>
<td>300</td>
</tr>
</tbody>
</table>
<p><strong>56.9%</strong> of the malicious packages had at least one phonetic match (&gt;= 0.60) against a popular package. At the
strictest threshold (exact phonetic key match), 11 packages were dead ringers:</p>
<table>
<thead>
<tr>
<th>Malicious</th>
<th>Target</th>
<th>Score</th>
<th>Keys</th>
</tr>
</thead>
<tbody>
<tr>
<td>aiiohttp</td>
<td>aiohttp</td>
<td>1.00</td>
<td>Ahtp / Ahtp</td>
</tr>
<tr>
<td>aiohtttp</td>
<td>aiohttp</td>
<td>1.00</td>
<td>Ahtp / Ahtp</td>
</tr>
<tr>
<td>beautifulsup4</td>
<td>beautifulsoup4</td>
<td>1.00</td>
<td>bAtAfAlsAp4 / bAtAfAlsAp4</td>
</tr>
<tr>
<td>botoceor</td>
<td>botocore</td>
<td>1.00</td>
<td>bAtAcAr / bAtAcAr</td>
</tr>
<tr>
<td>coloroma</td>
<td>colorama</td>
<td>1.00</td>
<td>cAlArAmA / cAlArAmA</td>
</tr>
<tr>
<td>colurama</td>
<td>colorama</td>
<td>1.00</td>
<td>cAlArAmA / cAlArAmA</td>
</tr>
<tr>
<td>djangoo</td>
<td>django</td>
<td>1.00</td>
<td>djAngA / djAngA</td>
</tr>
<tr>
<td>flaask</td>
<td>flask</td>
<td>1.00</td>
<td>flAsk / flAsk</td>
</tr>
<tr>
<td>reuquests</td>
<td>requests</td>
<td>1.00</td>
<td>rAkwAsts / rAkwAsts</td>
</tr>
<tr>
<td>ritch</td>
<td>rich</td>
<td>1.00</td>
<td>rAc / rAc</td>
</tr>
<tr>
<td>selenim</td>
<td>selenium</td>
<td>1.00</td>
<td>sAlAnAm / sAlAnAm</td>
</tr>
</tbody>
</table>
<details>
<summary><strong>Near matches (score 0.87 – 0.97) (expand for details)</strong></summary>
<table>
<thead>
<tr>
<th>Malicious</th>
<th>Target</th>
<th>Score</th>
<th>Keys</th>
</tr>
</thead>
<tbody>
<tr>
<td>importlib-metadate</td>
<td>importlib-metadata</td>
<td>0.97</td>
<td>AmpArtlAbmAtAdAt / AmpArtlAbmAtAdAtA</td>
</tr>
<tr>
<td>typing-extension</td>
<td>typing-extensions</td>
<td>0.96</td>
<td>tYpAngAxtAnsn / tYpAngAxtAnsns</td>
</tr>
<tr>
<td>python-dateuti</td>
<td>python-dateutil</td>
<td>0.96</td>
<td>pYtAndAtAtA / pYtAndAtAtAl</td>
</tr>
<tr>
<td>matplotlibp</td>
<td>matplotlib</td>
<td>0.95</td>
<td>mAtplAtlAbp / mAtplAtlAb</td>
</tr>
<tr>
<td>setuptolos</td>
<td>setuptools</td>
<td>0.95</td>
<td>sAtAptAlAs / sAtAptAls</td>
</tr>
<tr>
<td>tensrflow</td>
<td>tensorflow</td>
<td>0.95</td>
<td>tAnsrflAw / tAnsArflAw</td>
</tr>
<tr>
<td>aiopbotocore</td>
<td>aiobotocore</td>
<td>0.94</td>
<td>ApbAtAcAr / AbAtAcAr</td>
</tr>
<tr>
<td>colouramas</td>
<td>colorama</td>
<td>0.94</td>
<td>cAlArAmAs / cAlArAmA</td>
</tr>
<tr>
<td>pydantics</td>
<td>pydantic</td>
<td>0.94</td>
<td>pYdAntAcs / pYdAntAc</td>
</tr>
<tr>
<td>requesrts</td>
<td>requests</td>
<td>0.94</td>
<td>rAkwAsrts / rAkwAsts</td>
</tr>
<tr>
<td>requestsx</td>
<td>requests</td>
<td>0.94</td>
<td>rAkwAstsx / rAkwAsts</td>
</tr>
<tr>
<td>requesuts</td>
<td>requests</td>
<td>0.94</td>
<td>rAkwAsAts / rAkwAsts</td>
</tr>
<tr>
<td>requesxts</td>
<td>requests</td>
<td>0.94</td>
<td>rAkwAsxts / rAkwAsts</td>
</tr>
<tr>
<td>pckaging</td>
<td>packaging</td>
<td>0.93</td>
<td>pkAgAng / pAkAgAng</td>
</tr>
<tr>
<td>requeste</td>
<td>requests</td>
<td>0.93</td>
<td>rAkwAst / rAkwAsts</td>
</tr>
<tr>
<td>typing-extnesions</td>
<td>typing-extensions</td>
<td>0.93</td>
<td>tYpAngAxtnAsns / tYpAngAxtAnsns</td>
</tr>
<tr>
<td>btoocore</td>
<td>botocore</td>
<td>0.92</td>
<td>btAcAr / bAtAcAr</td>
</tr>
<tr>
<td>ulrlib3</td>
<td>urllib3</td>
<td>0.92</td>
<td>AlrlAb3 / ArlAb3</td>
</tr>
<tr>
<td>cryptograohy</td>
<td>cryptography</td>
<td>0.91</td>
<td>crYptAgrAhY / crYptAgrAfY</td>
</tr>
<tr>
<td>cryptographz</td>
<td>cryptography</td>
<td>0.91</td>
<td>crYptAgrAfz / crYptAgrAfY</td>
</tr>
<tr>
<td>pandaai</td>
<td>pandas</td>
<td>0.91</td>
<td>pAndA / pAndAs</td>
</tr>
<tr>
<td>s4transfer</td>
<td>s3transfer</td>
<td>0.90</td>
<td>s4trAnsfAr / s3trAnsfAr</td>
</tr>
<tr>
<td>aiohtttps</td>
<td>aiohttp</td>
<td>0.89</td>
<td>Ahtps / Ahtp</td>
</tr>
<tr>
<td>pyyal</td>
<td>pyyaml</td>
<td>0.89</td>
<td>pYAl / pYAml</td>
</tr>
<tr>
<td>colorara</td>
<td>colorama</td>
<td>0.88</td>
<td>cAlArArA / cAlArAmA</td>
</tr>
<tr>
<td>colormore</td>
<td>colorama</td>
<td>0.88</td>
<td>cAlArmAr / cAlArAmA</td>
</tr>
<tr>
<td>colotama</td>
<td>colorama</td>
<td>0.88</td>
<td>cAlAtAmA / cAlArAmA</td>
</tr>
<tr>
<td>dequests</td>
<td>requests</td>
<td>0.88</td>
<td>dAkwAsts / rAkwAsts</td>
</tr>
<tr>
<td>fequests</td>
<td>requests</td>
<td>0.88</td>
<td>fAkwAsts / rAkwAsts</td>
</tr>
<tr>
<td>gequests</td>
<td>requests</td>
<td>0.88</td>
<td>gAkwAsts / rAkwAsts</td>
</tr>
<tr>
<td>r3quests</td>
<td>requests</td>
<td>0.88</td>
<td>r3kwAsts / rAkwAsts</td>
</tr>
<tr>
<td>r4quests</td>
<td>requests</td>
<td>0.88</td>
<td>r4kwAsts / rAkwAsts</td>
</tr>
<tr>
<td>requesfs</td>
<td>requests</td>
<td>0.88</td>
<td>rAkwAsfs / rAkwAsts</td>
</tr>
<tr>
<td>requesks</td>
<td>requests</td>
<td>0.88</td>
<td>rAkwAsks / rAkwAsts</td>
</tr>
<tr>
<td>requestn</td>
<td>requests</td>
<td>0.88</td>
<td>rAkwAstn / rAkwAsts</td>
</tr>
<tr>
<td>requestr</td>
<td>requests</td>
<td>0.88</td>
<td>rAkwAstr / rAkwAsts</td>
</tr>
<tr>
<td>tequests</td>
<td>requests</td>
<td>0.88</td>
<td>tAkwAsts / rAkwAsts</td>
</tr>
<tr>
<td>matplotlib-req</td>
<td>matplotlib</td>
<td>0.87</td>
<td>mAtplAtlAbrAq / mAtplAtlAb</td>
</tr>
</tbody>
</table>
</details>
<p>The <code>requests</code> library alone had <strong>20+</strong> phonetically similar malicious variants in the dataset - everything from
simple transpositions (<code>reuquests</code>, <code>requesrts</code>) to onset substitutions (<code>fequests</code>, <code>dequests</code>, <code>tequests</code>).</p>
<h4 id="npm-results">npm results</h4>
<p><strong>9,505</strong> known malicious package names scanned against ~120 popular npm packages:</p>
<table>
<thead>
<tr>
<th>Threshold</th>
<th>Matches</th>
<th>Unique Candidates</th>
</tr>
</thead>
<tbody>
<tr>
<td>&gt;= 1.00 (exact)</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>&gt;= 0.90</td>
<td>31</td>
<td>31</td>
</tr>
<tr>
<td>&gt;= 0.80</td>
<td>236</td>
<td>223</td>
</tr>
<tr>
<td>&gt;= 0.70</td>
<td>1,207</td>
<td>967</td>
</tr>
</tbody>
</table>
<p><strong>38.4%</strong> of the malicious set had phonetic matches. The 3 exact hits:</p>
<table>
<thead>
<tr>
<th>Malicious</th>
<th>Target</th>
<th>Score</th>
<th>Keys</th>
</tr>
</thead>
<tbody>
<tr>
<td>naniod</td>
<td>nanoid</td>
<td>1.00</td>
<td>nAnAd / nAnAd</td>
</tr>
<tr>
<td>pupeter</td>
<td>puppeteer</td>
<td>1.00</td>
<td>pApAtAr / pApAtAr</td>
</tr>
<tr>
<td>pupetier</td>
<td>puppeteer</td>
<td>1.00</td>
<td>pApAtAr / pApAtAr</td>
</tr>
</tbody>
</table>
<p>The <code>typescript</code> package was a particularly heavy target with <strong>15+</strong> versioned squats
(<code>typescript-5.5</code>, <code>typescript-5.6</code>, <code>typescript-go</code>, etc.).</p>
<details>
<summary><strong>Near matches (score 0.82 – 0.96) (expand for details)</strong></summary>
<table>
<thead>
<tr>
<th>Malicious</th>
<th>Target</th>
<th>Score</th>
<th>Keys</th>
</tr>
</thead>
<tbody>
<tr>
<td>suport-color</td>
<td>supports-color</td>
<td>0.96</td>
<td>sApArtcAlAr / sApArtscAlAr</td>
</tr>
<tr>
<td>typescript-5.5</td>
<td>typescript</td>
<td>0.95</td>
<td>tYpAscrApt5 / tYpAscrApt</td>
</tr>
<tr>
<td>node-multer</td>
<td>nodemailer</td>
<td>0.95</td>
<td>nAdAmAltAr / nAdAmAlAr</td>
</tr>
<tr>
<td>eslint-8</td>
<td>eslint</td>
<td>0.92</td>
<td>AslAnt8 / AslAnt</td>
</tr>
<tr>
<td>eslint-9</td>
<td>eslint</td>
<td>0.92</td>
<td>AslAnt9 / AslAnt</td>
</tr>
<tr>
<td>jsonwebjstoken</td>
<td>jsonwebtoken</td>
<td>0.92</td>
<td>jsAnwAbjstAkAn / jsAnwAbtAkAn</td>
</tr>
<tr>
<td>peritter</td>
<td>prettier</td>
<td>0.92</td>
<td>pArAtAr / prAtAr</td>
</tr>
<tr>
<td>boby_parser</td>
<td>body-parser</td>
<td>0.90</td>
<td>bAbYpArsAr / bAdYpArsAr</td>
</tr>
<tr>
<td>axio.js</td>
<td>axios</td>
<td>0.89</td>
<td>AxAjs / AxAs</td>
</tr>
<tr>
<td>deezcord.js</td>
<td>discord.js</td>
<td>0.89</td>
<td>dAzcArdjs / dAscArdjs</td>
</tr>
<tr>
<td>dezcord.js</td>
<td>discord.js</td>
<td>0.89</td>
<td>dAzcArdjs / dAscArdjs</td>
</tr>
<tr>
<td>dizcordjs</td>
<td>discord.js</td>
<td>0.89</td>
<td>dAzcArdjs / dAscArdjs</td>
</tr>
<tr>
<td>inquirer-js</td>
<td>inquirer</td>
<td>0.89</td>
<td>AnkwArArjs / AnkwArAr</td>
</tr>
<tr>
<td>nodemail-lite</td>
<td>nodemailer</td>
<td>0.89</td>
<td>nAdAmAlAt / nAdAmAlAr</td>
</tr>
<tr>
<td>date-fns.js</td>
<td>date-fns</td>
<td>0.88</td>
<td>dAtAfnsjs / dAtAfns</td>
</tr>
<tr>
<td>mongodb-cd</td>
<td>mongodb</td>
<td>0.88</td>
<td>mAngAdbcd / mAngAdb</td>
</tr>
<tr>
<td>mongodb-ci</td>
<td>mongodb</td>
<td>0.88</td>
<td>mAngAdbcA / mAngAdb</td>
</tr>
<tr>
<td>node-fetch-v3</td>
<td>node-fetch</td>
<td>0.88</td>
<td>nAdAfAcv3 / nAdAfAc</td>
</tr>
<tr>
<td>nodemonjs</td>
<td>nodemon</td>
<td>0.88</td>
<td>nAdAmAnjs / nAdAmAn</td>
</tr>
<tr>
<td>zustand.js</td>
<td>zustand</td>
<td>0.88</td>
<td>zAstAndjs / zAstAnd</td>
</tr>
<tr>
<td>jsonapptoken</td>
<td>jsonwebtoken</td>
<td>0.87</td>
<td>jsAnAptAkAn / jsAnwAbtAkAn</td>
</tr>
<tr>
<td>cross-session</td>
<td>cross-env</td>
<td>0.86</td>
<td>crAsAsn / crAsAnv</td>
</tr>
<tr>
<td>express-v4</td>
<td>express</td>
<td>0.86</td>
<td>AxprAsv4 / AxprAs</td>
</tr>
<tr>
<td>js-prettier</td>
<td>prettier</td>
<td>0.86</td>
<td>jsprAtAr / prAtAr</td>
</tr>
<tr>
<td>prettierjs</td>
<td>prettier</td>
<td>0.86</td>
<td>prAtArjs / prAtAr</td>
</tr>
<tr>
<td>socket.io.js</td>
<td>socket.io</td>
<td>0.86</td>
<td>sAkAtAjs / sAkAtA</td>
</tr>
<tr>
<td>webpikes</td>
<td>webpack</td>
<td>0.86</td>
<td>wAbpAkAs / wAbpAk</td>
</tr>
<tr>
<td>playwright-1.45</td>
<td>playwright</td>
<td>0.84</td>
<td>plAYrAgt145 / plAYrAgt</td>
</tr>
<tr>
<td>playwright-1.46</td>
<td>playwright</td>
<td>0.84</td>
<td>plAYrAgt146 / plAYrAgt</td>
</tr>
<tr>
<td>playwright-1.47</td>
<td>playwright</td>
<td>0.84</td>
<td>plAYrAgt147 / plAYrAgt</td>
</tr>
<tr>
<td>dotevn</td>
<td>dotenv</td>
<td>0.83</td>
<td>dAtAvn / dAtAnv</td>
</tr>
<tr>
<td>eth-errors</td>
<td>ethers</td>
<td>0.83</td>
<td>AtArArs / AtArs</td>
</tr>
<tr>
<td>etherdjs</td>
<td>ethers</td>
<td>0.83</td>
<td>AtArdjs / AtArs</td>
</tr>
<tr>
<td>nanoid-js</td>
<td>nanoid</td>
<td>0.83</td>
<td>nAnAdjs / nAnAd</td>
</tr>
<tr>
<td>opresc</td>
<td>express</td>
<td>0.83</td>
<td>AprAsc / AxprAs</td>
</tr>
<tr>
<td>debug-mj</td>
<td>debug</td>
<td>0.83</td>
<td>dAbAgmj / dAbAg</td>
</tr>
<tr>
<td>debugr1</td>
<td>debug</td>
<td>0.83</td>
<td>dAbAgr1 / dAbAg</td>
</tr>
<tr>
<td>prsima</td>
<td>prisma</td>
<td>0.82</td>
<td>prsAmA / prAsmA</td>
</tr>
<tr>
<td>stgripe</td>
<td>stripe</td>
<td>0.82</td>
<td>stgrAp / strAp</td>
</tr>
<tr>
<td>multerjs</td>
<td>multer</td>
<td>0.82</td>
<td>mAltArjs / mAltAr</td>
</tr>
<tr>
<td>sequelize-v7</td>
<td>sequelize</td>
<td>0.82</td>
<td>sAkwAlAzv7 / sAkwAlAz</td>
</tr>
</tbody>
</table>
</details>
<p>Here is what a real scan against the DataDog dataset looks like in practice:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">import</span> json
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">import</span> httpx
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">from</span> phonemenal.scanning <span style="color:#ff79c6">import</span> scan, format_matches
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Load malicious package names from DataDog manifest (names only!)</span>
</span></span><span style="display:flex;"><span>resp <span style="color:#ff79c6">=</span> httpx<span style="color:#ff79c6">.</span>get(
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;https://raw.githubusercontent.com/DataDog/&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;malicious-software-packages-dataset/main/samples/pypi/manifest.json&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>malicious_names <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">list</span>(resp<span style="color:#ff79c6">.</span>json()<span style="color:#ff79c6">.</span>keys())
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6272a4"># Scan against popular packages</span>
</span></span><span style="display:flex;"><span>popular <span style="color:#ff79c6">=</span> [<span style="color:#f1fa8c">&#34;requests&#34;</span>, <span style="color:#f1fa8c">&#34;flask&#34;</span>, <span style="color:#f1fa8c">&#34;numpy&#34;</span>, <span style="color:#f1fa8c">&#34;django&#34;</span>, <span style="color:#f1fa8c">&#34;selenium&#34;</span>, <span style="color:#f1fa8c">&#34;colorama&#34;</span>,
</span></span><span style="display:flex;"><span>           <span style="color:#f1fa8c">&#34;beautifulsoup4&#34;</span>, <span style="color:#f1fa8c">&#34;aiohttp&#34;</span>, <span style="color:#f1fa8c">&#34;botocore&#34;</span>, <span style="color:#f1fa8c">&#34;cryptography&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>matches <span style="color:#ff79c6">=</span> scan(candidates<span style="color:#ff79c6">=</span>malicious_names, known_names<span style="color:#ff79c6">=</span>popular, threshold<span style="color:#ff79c6">=</span><span style="color:#bd93f9">0.80</span>)
</span></span><span style="display:flex;"><span><span style="color:#8be9fd;font-style:italic">print</span>(format_matches(matches))
</span></span></code></pre></div><p>These are not hypothetical results. These are real malicious packages, confirmed by DataDog&rsquo;s GuardDog analysis, that
phonemenal correctly flags as phonetically suspicious.</p>
<h3 id="whats-next">What&rsquo;s next?</h3>
<p>As I mentioned above, I am integrating it into a supply chain monitoring framework at the moment, but there are lots of other
use cases for detecting and preventing sound as an attack vector, especially with voice assistants coming back into popularity
as part of the genAI craze.</p>
<p>phonemenal is still early and there is more to explore. Acoustic analysis approaches, dialect-aware detection,
expanded namespace coverage, and deeper integration with package registry monitoring are all on the horizon.</p>
<p>If this is a problem space that interests you, check out the
<a href="https://br0k3nlab.com/phonemenal/" target="_blank" rel="noopener noreferrer">docs</a>
, the
<a href="https://github.com/brokensound77/phonemenal" target="_blank" rel="noopener noreferrer">repo</a>
,
or the <a href="https://www.youtube.com/watch?v=nj4fZAM_IDg" target="_blank" rel="noopener noreferrer">TROOPERS talk</a>
 for the full research
background. Contributions and ideas are welcome.</p>
]]></content></item><item><title>Introducing the REx: Rule Explorer Project</title><link>https://br0k3nlab.com/posts/2024/07/introducing-the-rex-rule-explorer-project/</link><pubDate>Mon, 15 Jul 2024 00:16:21 -0700</pubDate><guid>https://br0k3nlab.com/posts/2024/07/introducing-the-rex-rule-explorer-project/</guid><description>The REx project is a collection and breakdown of several of the most popular open security detection rules for analysis and exploration, enabled by the powerful search and visualization capabilities of the Elastic stack! The docs can be found at rulexplorer.io .
The Detection Engineering Threat Report (DETR) is the visual component of the REx project, where the data speaks for itself, with minimal interpretive narration.
What is the purpose of the REx project?</description><content type="html"><![CDATA[<p>
    <img src="/post-images/intro-rex/rex-logo.jpeg"  alt="REx-logo"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>The <strong>REx</strong> project is a collection and breakdown of several of the most popular open security detection rules for
analysis and exploration, enabled by the powerful search and visualization capabilities of the Elastic stack! The docs
can be found at <a href="https://rulexplorer.io" target="_blank" rel="noopener noreferrer">rulexplorer.io</a>
.</p>
<p>The <strong>Detection Engineering Threat Report (DETR)</strong> is the visual component of the REx project, where the data speaks for
itself, with minimal interpretive narration.</p>
<p>
    <img src="/post-images/intro-rex/detr-overview.gif"  alt="overview"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<h3 id="what-is-the-purpose-of-the-rex-project">What is the purpose of the REx project?</h3>
<p>This project provides a mechanism for interacting with various popular <a href="https://rulexplorer.io/#included-rule-sets" target="_blank" rel="noopener noreferrer">rule sets</a>
,
in order to have a better understanding of the detection landscape, and quickly survey and compare multiple approaches.</p>
<p>Insights can be derived from data by looking at it from different perspectives, especially when done in a visual manner.
The idea of this project is to view rule development, the detection engineering ecosystem, and the threat landscape from alternative lenses.</p>
<h4 id="what-is-the-detection-engineering-threat-report-detr">What is the Detection Engineering Threat Report (DETR)?</h4>
<p>And why call it a report? It was organized and structured to be consumed as a report, albeit, an interactive and dynamic
report.</p>
<p>Like many in the industry, we constantly review and consume the various threat reports published by different
vendors and projects. The normal flow for how the data is produced for these reports can be seen below. It is usually
(at least in part) an aggregation and analysis of observed alerts and raw events within each respective environment or purview.</p>
<p>
    <img src="/post-images/intro-rex/threat-reports-flow.png"  alt="threat reports flow"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>It is no secret that these reports and data are prone to confirmation bias, or authors over-focusing on their own assumptions
or sources of information. Additionally, we are also aware of the risk of a feedback loop, where we see primarily what we
look for, only, it is often overly-reinforced because we then consume information from other reports based on their own
observations. This can be described as a <em>threat detection observation paradox</em> and can be seen below.</p>
<p>
    <img src="/post-images/intro-rex/influence-flow.png"  alt="paradox"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>I think the industry does a pretty good job of controlling these biases, assumptions, and tendencies, so this is not meant
to be critical of any approach. The idea is to attempt to ascertain additional perspective by peering into the process at a
different point in the cycle. Focusing on threats <em>through</em> detection engineering efforts (<strong>rules</strong>), rather than from the
triage and analysis (<strong>alerts</strong>).</p>
<p>It is not a uniquely discrete perspective, as these cannot exist without each other, just a shift up the spectrum.</p>
<p>As of this release, the DETR consists of the following sections:</p>
<h4 id="state-of-current-detections">State of current detections</h4>
<p>
    <img src="/post-images/intro-rex/detr-1-state.gif"  alt="state"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>This section analyzes the latest snapshot of all covered rule sets. The rule snapshots are refreshed every 24 hours,
which is why they do not have a timestamp associated with them.</p>
<h4 id="developments-and-changes-over-time">Developments and changes over time</h4>
<p>
    <img src="/post-images/intro-rex/detr-2-changes.gif"  alt="changes"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>This section analyzes the changes made to all of covered rule sets. Insights into where the most development takes place
per individual rule attribute, including maintenance perspectives.</p>
<p>The four types of unique changes (new terms) are:</p>
<ul>
<li>new detection logic fields detected over last 30d</li>
<li>new detection logic fields by author detected over last 30d</li>
<li>new techniques detected over last 30d</li>
<li>new techniques by author detected over last 30d</li>
</ul>
<h4 id="uniqueness-over-time">Uniqueness over time</h4>
<p>
    <img src="/post-images/intro-rex/detr-3-uniqueness.gif"  alt="uniqueness"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>This section analyzes the uniqueness of detection logic fields and ATT&amp;CK techniques within rules over time.
It can be reflective of novelty, new datasources, or even just schemas that are too large.</p>
<h4 id="emerging-threats-analysis">Emerging threats analysis</h4>
<p>
    <img src="/post-images/intro-rex/detr-4-threats.gif"  alt="threats"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>This dashboard analyzes the reactiveness and responsiveness to known major threats, CVEs, or any other prominently discussed risks.</p>
<p>What is interesting and insightful to be observed here is the fact that most rule detection logic approaches tend to
focus on behavioral aspects, as opposed to being too atomic or overly specific and signature-like. This means that some
insights to coverage may not be immediately obvious, or in other words, successful pre-existing detection capabilities
for major emerging threats can easily be overlooked when inspecting from a purely rules perspective (as opposed to alerts).</p>
<p>The CVE’s chosen were the most represented in other threat reports. While they are insightful in themselves, they are
also meant to showcase the process of temporal analysis - simply look up the timing of other CVE’s or events and compare
accordingly.</p>
<h3 id="whats-the-goal">What&rsquo;s the goal?</h3>
<p>Put simply, the goal is to provide a platform to easily analyze rules and the detection engineering ecosystem in new ways.</p>
<p>It may be helpful to think about the following personas when using this project:</p>
<ul>
<li>Security Analysts</li>
<li>Threat Hunters</li>
<li>Security Engineers</li>
<li>Security Researchers</li>
<li>Security Managers</li>
</ul>
<p>Additionally, consider the following use cases:</p>
<ul>
<li>Rule Development Lifecycle</li>
<li>Threat Landscape Analysis</li>
<li>Maintenance Costs</li>
<li>Threat Coverage</li>
<li>Data Sources and Field usage</li>
</ul>
<p>There are multiple ways to search and visualize the data, depending on specific need or perspective. To maximize
insights and perspective, it is all about filtering and pivoting. Whether starting with a search in Discover or any of
the dashboards as part of the DETR, you can filter down around observations or known events, such as the release of a
CVE or exploit.</p>
<p>This is <strong>not</strong> meant to be a vendor or coverage comparison tool! Leave that to Gartner and Mitre. Coverage is a complex
thing and each source has their own approaches and philosophies, which are better debated elsewhere. <em>More</em> rules does
not always translate to <em>more</em> or <em>better</em> coverage.</p>
<p>For insights into creating high-quality, high-efficacy rules, check out the <a href="https://br0k3nlab.com/resources/zen-of-security-rules/">Zen of Security Rules</a>.</p>
<h3 id="details">Details</h3>
<p>
    <img src="/post-images/intro-rex/rex-flow.png"  alt="flow"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>The data consists of:</p>
<ul>
<li>a snapshot of each respective repo’s primary branch</li>
<li>all new and changed rule files over time</li>
<li>unique techniques and fields from the detection logic</li>
</ul>
<p>Every 24 hours, the latest snapshot of the rules in their primary repos is saved. Additionally, all modifications to rules
over that time period is also saved within a different index. Finally, search results of unique techniques and fields over
a 30 day period are also saved. The details of the schema, indexes, and data can be found in the schema
<a href="https://rulexplorer.io/schema/" target="_blank" rel="noopener noreferrer">docs</a>
. The rule logic is also parsed for additional in depth field analysis.</p>
<p>The Kibana features provided include:</p>
<h4 id="search">Search</h4>
<p>
    <img src="/post-images/intro-rex/detr-discover.gif"  alt="discover"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<h4 id="visualize">Visualize</h4>
<p>
    <img src="/post-images/intro-rex/detr-overview.gif"  alt="visualize"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<h4 id="graph">Graph</h4>
<p>
    <img src="/post-images/intro-rex/detr-graph.gif"  alt="graph"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<h3 id="refreshed-every-24-hours">Refreshed every 24 hours</h3>
<p>Similar to the LoFP project, this is meant to be a maintenance-free project, and so the data remains fresh and auto-updates every 24 hours</p>
<h3 id="limitations">Limitations?</h3>
<p>As of this date, support for correlation Sigma rules still needs to be added.</p>
]]></content></item><item><title>Introducing LoFP</title><link>https://br0k3nlab.com/posts/2024/02/introducing-lofp/</link><pubDate>Sun, 11 Feb 2024 00:16:21 -0700</pubDate><guid>https://br0k3nlab.com/posts/2024/02/introducing-lofp/</guid><description>The farm is growing! A new way to live off the land, in this case, by blending in with it.
What is LoFP? Living off the False Positive is an autogenerated collection of false positives sourced from some of the most popular rule sets. The information is categorized along with ATT&amp;amp;CK techniques, rule source, and data source. Entries include details from related rules along with their description and detection logic.</description><content type="html"><![CDATA[<p>
    <img src="/post-images/intro-lofp/lofp-small.png"  alt="LoFP-logo"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>The <a href="https://lolol.farm" target="_blank" rel="noopener noreferrer">farm</a>
 is growing! A new way to live off the land, in this case, by blending
in with it.</p>
<h3 id="what-is-lofp">What is LoFP?</h3>
<p>Living off the False Positive is an autogenerated collection of false positives sourced from some of the most popular
rule sets. The information is categorized along with ATT&amp;CK techniques, rule source, and data source. Entries
include details from related rules along with their description and detection logic.</p>
<p>
    <img src="/post-images/intro-lofp/lofp.gif"  alt="LoFP-logo"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<h3 id="whats-the-goal">What&rsquo;s the goal?</h3>
<p>The goal is to enable <em>both</em> <strong>red</strong> and <strong>blue</strong> teams with this information.</p>
<p>Red teams can use this information to blend in by mimicking or looking similar to the FP activity. Alert fatigue often
causes analysts to readily ignore things even remotely false positive. At there very least, it will instill doubt.</p>
<p>Blue teams on the other hand, can use this information to assess weak spots in their detection logic. They can also
compare across rule sets to see if it is a broad tendency, or maybe something more specific to a particular vendor.
It can also assist during alert triage and investigation, by looking at common FPs around certain techniques and data
sources.</p>
<h3 id="details">Details</h3>
<p>For now, it encompasses rules from the following sources:</p>
<ul>
<li>elastic detection rules</li>
<li>sigma rules</li>
<li>splunk rules</li>
</ul>
<p>And it isn&rsquo;t <em>all</em> the rule directories at this point, but this could expand. The trouble is, false positive annotations
tend to be more narrative than keyword based, making it difficult to aggregate similarities.</p>
<p>This is why you shouldn&rsquo;t use this by just scrolling along &ndash; that would be a little painful. Instead, focus on searching
for keywords in the false positives themselves (such as &ldquo;python&rdquo;, &ldquo;powershell&rdquo;, etc.), the techniques, rule source, or
data source as a starting point.</p>
<p>If you know you will be leveraging certain techniques, find similar ones and see what the false positives trends tend
to look like and use this information to blend in.</p>
<p>
    <img src="/post-images/intro-lofp/lofp-entry.png"  alt="LoFP-logo"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>As you can see, the idea is to include certain key details of the source from the rule(s) that the FPs come from to
maximize the value of the information.</p>
<p>Checkout the <a href="https://github.com/brokensound77/LoFP" target="_blank" rel="noopener noreferrer">repo</a>
 for more details on auto
generation.</p>
<h3 id="nightly-builds">Nightly builds</h3>
<p>This is meant to be a maintenance-free project. As a result, this data refreshes nightly, based on the latest available
updates in the respective repos.</p>
<h3 id="future-expansion">Future expansion?</h3>
<p>Possibly, but let&rsquo;s see how this goes first.</p>
]]></content></item><item><title>Detecting RMM</title><link>https://br0k3nlab.com/posts/2023/04/detecting-rmm/</link><pubDate>Thu, 13 Apr 2023 19:45:10 -0700</pubDate><guid>https://br0k3nlab.com/posts/2023/04/detecting-rmm/</guid><description>(originally posted on github)
The most difficult challenge with RMM detection is contextual awareness around usage to determine if it is valid or malicious.
if the software is not used in the environment could it be legitimate by a random employee? is it an attacker BYOL even so, all occurrences could probably be considered suspicious if it is used in the environment is every use of it legitimate? Probably not this also creates significant living off the land (LOL) opportunity some occurrences should be considered suspicious without any contextual awareness, this is an even harder problem Under resources, there is a table of known RMM executables, as well as a raw json RMM.</description><content type="html"><![CDATA[<p>(<em>originally posted on <a href="https://gist.github.com/brokensound77/6d8a1e480e65ff20e151099c98267b14">github</a></em>)</p>
<p>The most difficult challenge with RMM detection is contextual awareness around usage to determine if it is valid or malicious.</p>
<ul>
<li>if the software is <em>not</em> used in the environment
<ul>
<li>could it be legitimate by a random employee?</li>
<li>is it an attacker BYOL</li>
<li>even so, all occurrences could probably be considered suspicious</li>
</ul>
</li>
<li>if it <em>is</em> used in the environment
<ul>
<li>is <em>every</em> use of it legitimate? Probably not</li>
<li>this also creates significant living off the land (LOL) opportunity</li>
<li><em>some</em> occurrences should be considered suspicious</li>
</ul>
</li>
<li>without <em>any</em> contextual awareness, this is an even harder problem</li>
</ul>
<p>Under <em>resources</em>, there is a <a href="https://br0k3nlab.com/resources/rmm/" title="RMM Executable">table</a> of known RMM executables, as well as a raw json <a href="/resources/rmm/data/RMM.json">RMM.json</a> for processing.</p>
<h2 id="approaches-to-detecting">Approaches to detecting</h2>
<h3 id="a-explicitly-defined-rmm-software--behavioral-less-resilient">A. Explicitly defined RMM software + behavioral (less resilient)</h3>
<p>These rely on explicity referencing known RMM artifacts (in some way) within the logic</p>
<ol>
<li>Known RMMs</li>
<li>Known RMM + low prevalence</li>
<li>New executable in environment + known RMM</li>
<li>New + RMM + suspicious activity</li>
<li>New + RMM + alert</li>
</ol>
<p>
    <img src="/post-images/detecting-rmm/rmm1.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<h3 id="b-dynamically-and-generically-defining-rmm--behavioral">B. Dynamically and generically defining RMM + behavioral</h3>
<p>This relies completely on common behaviors of RMM (can misidentify)</p>
<ol>
<li>Logic for generic RMM behaviors (vs pre-defined known RMMs)</li>
</ol>
<h2 id="details">Details</h2>
<h3 id="a1-known-rmms">A1. Known RMMs</h3>
<p>
    <img src="/post-images/detecting-rmm/rmm2.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>Two options to defining known RMM&rsquo;s</p>
<h4 id="option-1-comprehensive-list-of-identified-rmm-executables">Option 1: comprehensive list of identified RMM executables</h4>
<p>Simply build a list of all known executables (see the table below). This is brittle, but more precise</p>
<details open>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span>process <span style="color:#ff79c6">where</span> event.<span style="color:#ff79c6">type</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;start&#34;</span> <span style="color:#ff79c6">and</span>
</span></span><span style="display:flex;"><span>(
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">//</span> Windows
</span></span><span style="display:flex;"><span>  (
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">host</span>.os.<span style="color:#ff79c6">type</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;windows&#34;</span> <span style="color:#ff79c6">and</span>
</span></span><span style="display:flex;"><span>      process.executable : (
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\*\\NinjaRMMAgentPatcher.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\NinjaRMMAgent\\NinjaRMMAgentPatcher.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ProgramData\\NinjaRMMAgent\\ninjarmm-cli.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\*\\NinjaRMMAgent.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\NinjaRMMAgent\\NinjaRMMAgent.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ATERA Networks\\AteraAgent\\AteraAgent.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ATERA Networks\\AteraAgent\\Packages\\AgentPackageNetworkDiscoveryWG\\AgentPackageNetworkDiscoveryWG.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ATERA Networks\\AteraAgent\\Packages\\AgentPackageAgentInformation\\AgentPackageAgentInformation.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ATERA Networks\\AteraAgent\\Packages\\AgentPackageSTRemote\\AgentPackageSTRemote.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ATERA Networks\\AteraAgent\\Packages\\AgentPackageFileExplorer\\AgentPackageFileExplorer.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ATERA Networks\\AteraAgent\\Packages\\AgentPackageMonitoring\\AgentPackageMonitoring.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ATERA Networks\\AteraAgent\\Packages\\AgentPackageRuntimeInstaller\\AgentPackageRuntimeInstaller.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Windows\\SysWOW64\\config\\systemprofile\\AppData\\Local\\GoToAssist Remote Support Applet\\*.tmp\\GoToAssistService.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Users\\*\\AppData\\Local\\GoToAssist Remote Support Applet\\*.tmp\\GoToAssistProcessChecker.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\LogMeIn\\GoToAssist Corporate\\*\\G2AC_HostLauncher.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\GoToMeeting\\*\\G2MInstaller.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Users\\*\\AppData\\Local\\GoToMeeting\\*\\g2mcomm.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Users\\*\\AppData\\Local\\GoToMeeting\\*\\g2mlauncher.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\GoToAssist Remote Support Customer\\*\\g2ax_host_service.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\GoToAssist Remote Support Customer\\*\\g2ax_comm_customer.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Users\\*\\AppData\\Local\\GoTo Resolve Applet\\*.tmp\\GoToResolveService.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\GoToAssist Remote Support Unattended\\*\\GoToAssistTools64.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\GoToAssist Remote Support Unattended\\*\\GoToAssistUnattended.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Users\\*\\AppData\\Local\\goto-updater\\pending\\GoToSetup-*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\GoToMeeting\\*\\g2mlauncher.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Users\\*\\AppData\\Local\\GoToAssist Remote Support Applet\\*.tmp\\GoToAssistCrashHandler.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Users\\*\\AppData\\Local\\GoToMeeting\\*\\g2mupdate.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\DesktopCentralMSP_Server\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\ADManager Plus\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\PMP\\tools\\archiver\\windows\\x86-64\\7za.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\elasticsearch\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\PMP\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\bin\\7za.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\bin\\wrapper.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\OpManager\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\EventLog Analyzer\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\ADAudit Plus\\pgsql\\bin\\postgres.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\OpManager\\Probe\\OpManagerProbe\\pgsql\\bin\\postgres.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\Microsoft Intune Management Extension\\ClientHealthEval.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\WindowsApps\\Microsoft.*\\IntuneManagementExtensionBridge\\IntuneManagementExtensionBridge.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\WindowsApps\\Microsoft.*\\BridgeLauncher\\BridgeLauncher.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\Microsoft Intune Management Extension\\Microsoft.Management.Services.IntuneWindowsAgent.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\Microsoft Intune Management Extension\\Microsoft.Management.Clients.CopyAgentCatalog.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\Microsoft Intune Management Extension\\SensorLogonTask.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\Microsoft Intune Management Extension\\AgentExecutor.exe&#34;</span>,
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Users\\*\\AppData\\Local\\MSP Anywhere for N-central\\Viewer\\Tmp\\SWI_MSP_RC_ViewerUpdate-*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Agent\\bin\\dcagentservice.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Agent\\bin\\DCFAService64.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Agent\\bin\\dcagentregister.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Server\\pgsql\\bin\\postgres.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Server\\bin\\wrapper.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\DesktopCentral_Server\\bin\\wrapper.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Server\\bin\\UEMS.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Server\\nginx\\dcnginx.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Agent\\bin\\EMSAddonInstaller.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\DesktopCentral_Server\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Server\\apache\\bin\\dcserverhttpd.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Server\\bin\\7za.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Server\\jre\\bin\\java.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Server\\bin\\dcnotificationserver.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Agent\\dcconfig.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\DesktopCentral_Agent\\patches\\*-gimp-*-setup.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\ManageEngine\\AssetExplorer\\DesktopCentral_Server\\bin\\wrapper.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\lib\\native\\64bit\\wrapper.dll&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\jre\\bin\\awt.dll&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\jre\\bin\\sunec.dll&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\jre\\bin\\freetype.dll&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\jre\\bin\\fontmanager.dll&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\lib\\native\\64bit\\SyMNative.dll&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Program Files*\\ManageEngine\\ServiceDesk\\DesktopCentral_Server\\lib\\native\\64bit\\OSDSyMNative.dll&#34;</span>,
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Windows\\Action1\\action1_remote.exe&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;C:\\Windows\\Action1\\action1_agent.exe&#34;</span>)
</span></span><span style="display:flex;"><span>  ) <span style="color:#ff79c6">or</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">//</span> MacOS
</span></span><span style="display:flex;"><span>  (
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">host</span>.os.<span style="color:#ff79c6">type</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;macos&#34;</span> <span style="color:#ff79c6">and</span>
</span></span><span style="display:flex;"><span>      process.executable : (
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;/Applications/NinjaRMMAgent/programfiles/ninjarmm-macagent&#34;</span>,
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;/Applications/GoToMeeting.app/Contents/MacOS/GoToMeeting&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;/Applications/GoToMeeting.app/Contents/Helpers/G2MUpdate&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;/Users/*/Library/Application Support/LogMeInInc/GoToMeeting/G2MUpdate&#34;</span>,
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;/Library/Intune/Microsoft Intune Agent.app/Contents/MacOS/IntuneMdmDaemon&#34;</span>,
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;/Applications/MSP Anywhere Agent N-central.app/Contents/Resources/MSP Anywhere Service Configurator.app/Contents/MacOS/MSP Anywhere Service Configurator&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#f1fa8c">&#34;/Applications/MSP Anywhere Agent N-central.app/Contents/Resources/MSP Anywhere Helper&#34;</span>)
</span></span><span style="display:flex;"><span>  )
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div></details>
<h4 id="option-2-resilient-patterns-of-known-rmm-software">Option 2: resilient patterns of known RMM software</h4>
<p>This is a more resilient approach, which looks for</p>
<ul>
<li>unique patterns of the executable path</li>
<li>code signature unique to RMM software</li>
</ul>
<details open>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#ff79c6">any</span> <span style="color:#ff79c6">where</span> event.category : (<span style="color:#f1fa8c">&#34;process&#34;</span>, <span style="color:#f1fa8c">&#34;library&#34;</span>) <span style="color:#ff79c6">and</span> event.<span style="color:#ff79c6">type</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;start&#34;</span> <span style="color:#ff79c6">and</span>
</span></span><span style="display:flex;"><span>(
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">//</span> Windows
</span></span><span style="display:flex;"><span>  (
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">host</span>.os.<span style="color:#ff79c6">type</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;windows&#34;</span> <span style="color:#ff79c6">and</span> (
</span></span><span style="display:flex;"><span>      process.executable : (<span style="color:#f1fa8c">&#34;?:\\*NinjaRMMAgent*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;?:\\*\\AteraAgent\\*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;?:\\*\\GoToAssist*\\*.exe&#34;</span>, <span style="color:#f1fa8c">&#34;?:\\*\\GoToMeeting\\*.exe&#34;</span>, <span style="color:#f1fa8c">&#34;?:\\*\\GoTo*.exe&#34;</span>, <span style="color:#f1fa8c">&#34;?:\\*\\GoToSetup*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;?:\\*ManageEngine\\*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;?:\\Microsoft Intune*\\*.exe&#34;</span>, <span style="color:#f1fa8c">&#34;?:\\IntuneManagement*\\*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;?:\\*\\*N-central*\\*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;?:\\*\\DesktopCentral*\\*.exe&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;?:\\*\\Action1\\*.exe&#34;</span>) <span style="color:#ff79c6">or</span>
</span></span><span style="display:flex;"><span>      dll.path : (<span style="color:#f1fa8c">&#34;?:\\*NinjaRMMAgent*.dll&#34;</span>,
</span></span><span style="display:flex;"><span>                  <span style="color:#f1fa8c">&#34;?:\\*\\AteraAgent\\*.dll&#34;</span>,
</span></span><span style="display:flex;"><span>                  <span style="color:#f1fa8c">&#34;?:\\*\\GoToAssist*\\*.dll&#34;</span>, <span style="color:#f1fa8c">&#34;?:\\*\\GoToMeeting\\*.dll&#34;</span>, <span style="color:#f1fa8c">&#34;?:\\*\\GoTo*.dll&#34;</span>, <span style="color:#f1fa8c">&#34;?:\\*\\GoToSetup*.dll&#34;</span>,
</span></span><span style="display:flex;"><span>                  <span style="color:#f1fa8c">&#34;?:\\*ManageEngine\\*.dll&#34;</span>,
</span></span><span style="display:flex;"><span>                  <span style="color:#f1fa8c">&#34;?:\\Microsoft Intune*\\*.dll&#34;</span>, <span style="color:#f1fa8c">&#34;?:\\IntuneManagement*\\*.dll&#34;</span>,
</span></span><span style="display:flex;"><span>                  <span style="color:#f1fa8c">&#34;?:\\*\\*N-central*\\*.dll&#34;</span>,
</span></span><span style="display:flex;"><span>                  <span style="color:#f1fa8c">&#34;?:\\*\\DesktopCentral*\\*.dll&#34;</span>,
</span></span><span style="display:flex;"><span>                  <span style="color:#f1fa8c">&#34;?:\\*\\Action1\\*.dll&#34;</span>) <span style="color:#ff79c6">or</span>
</span></span><span style="display:flex;"><span>      process.code_signature.subject_name : (<span style="color:#f1fa8c">&#34;NinjaRMM, LLC&#34;</span>,
</span></span><span style="display:flex;"><span>                                             <span style="color:#f1fa8c">&#34;Atera Networks Ltd&#34;</span>,
</span></span><span style="display:flex;"><span>                                             <span style="color:#f1fa8c">&#34;LogMeIn, Inc.&#34;</span>,
</span></span><span style="display:flex;"><span>                                             <span style="color:#f1fa8c">&#34;ZOHO Corporation Private Limited&#34;</span>,  <span style="color:#ff79c6">//</span> could FP due <span style="color:#ff79c6">to</span> non<span style="color:#ff79c6">-</span>RMM software
</span></span><span style="display:flex;"><span>                                             <span style="color:#f1fa8c">&#34;Action1 Corporation&#34;</span>) <span style="color:#ff79c6">or</span>
</span></span><span style="display:flex;"><span>      dll.code_signature.subject_name : (<span style="color:#f1fa8c">&#34;NinjaRMM, LLC&#34;</span>,
</span></span><span style="display:flex;"><span>                                         <span style="color:#f1fa8c">&#34;Atera Networks Ltd&#34;</span>,
</span></span><span style="display:flex;"><span>                                         <span style="color:#f1fa8c">&#34;LogMeIn, Inc.&#34;</span>,
</span></span><span style="display:flex;"><span>                                         <span style="color:#f1fa8c">&#34;ZOHO Corporation Private Limited&#34;</span>,  <span style="color:#ff79c6">//</span> could FP due <span style="color:#ff79c6">to</span> non<span style="color:#ff79c6">-</span>RMM software
</span></span><span style="display:flex;"><span>                                         <span style="color:#f1fa8c">&#34;Action1 Corporation&#34;</span>)
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>  ) <span style="color:#ff79c6">or</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">//</span> MacOS
</span></span><span style="display:flex;"><span>  (
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">host</span>.os.<span style="color:#ff79c6">type</span> <span style="color:#ff79c6">==</span> <span style="color:#f1fa8c">&#34;macos&#34;</span> <span style="color:#ff79c6">and</span> (
</span></span><span style="display:flex;"><span>      process.executable : (<span style="color:#f1fa8c">&#34;/Applications/*NinjaRMMAgent/*&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;/Applications/*GoToMeeting*/*&#34;</span>, <span style="color:#f1fa8c">&#34;/Users/*/Library/*/GoToMeeting*/*&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;/Library/*Microsoft InTune*/*&#34;</span>, <span style="color:#f1fa8c">&#34;/Users/*/Library/*Microsoft InTune*/*&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#f1fa8c">&#34;/Applications/*N-central*/*&#34;</span>) <span style="color:#ff79c6">or</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">//</span> <span style="color:#ff79c6">or</span> dll.path : () <span style="color:#ff79c6">or</span>
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">//</span> process.code_signature.subject_name : () <span style="color:#ff79c6">or</span>
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">//</span> dll.code_signature.subject_name : ()
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>  )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#ff79c6">//</span> Linux
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div></details>
<h3 id="a2-known-rmm--low-prevalence">A2. Known RMM + low prevalence</h3>
<p>
    <img src="/post-images/detecting-rmm/rmm3.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>Perform one of the searches from step 1 and aggregate on:</p>
<ul>
<li>hosts</li>
<li>users</li>
<li>unique executions</li>
</ul>
<p>Look for low counts</p>
<h3 id="a3-new-executable-in-environment--known-rmm">A3. New executable in environment + known RMM</h3>
<p>
    <img src="/post-images/detecting-rmm/rmm4.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>Create a <a href="https://www.elastic.co/guide/en/security/current/rules-ui-create.html#create-new-terms-rule">new terms</a> stlye rule based on step 1</p>
<ul>
<li>window history of now-30d</li>
<li>base the new terms on: <code>process.name</code>, <code>host.id</code>  (remove host.id for full environment prevalence)</li>
</ul>
<p>If you do not have a new terms capability, you can perform the search in step 1 to build a list of observed RMM executables,
then pivot (or <code>join</code>) on a search for recent exections.</p>
<h3 id="a4-new-executable--known-rmm--suspicious-activity">A4. New executable + known RMM + suspicious activity</h3>
<p>
    <img src="/post-images/detecting-rmm/rmm5.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>Combine step 3 with subsequent suspicious activity (such as lateral movement information gathering).</p>
<p>With Elastic, you could do this by:</p>
<ol>
<li>create the rule from step 3 (optionally as a <code>building_block_rule</code> to keep noise down)</li>
<li>create a separate sequence based rule that looks for the new term <em>then</em> the suspicious activity
<ul>
<li>to simplify this, you can create <em>another</em> <code>building_block_rule</code> for suspicious activity</li>
</ul>
</li>
</ol>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span>sequence <span style="color:#ff79c6">by</span> <span style="color:#ff79c6">host</span>.id, <span style="color:#ff79c6">user</span>.id, process.name <span style="color:#ff79c6">with</span> maxspan<span style="color:#ff79c6">=</span><span style="color:#bd93f9">25</span>m
</span></span><span style="display:flex;"><span>  [alert <span style="color:#ff79c6">where</span> <span style="color:#ff79c6">rule</span>.id <span style="color:#ff79c6">==</span> <span style="color:#ff79c6">&lt;</span>new_term_rule_step3<span style="color:#ff79c6">&gt;</span>]
</span></span><span style="display:flex;"><span>  [alert <span style="color:#ff79c6">where</span> <span style="color:#ff79c6">rule</span>.id <span style="color:#ff79c6">==</span> <span style="color:#ff79c6">&lt;</span>suspicious_rule_step4<span style="color:#ff79c6">&gt;</span>]
</span></span></code></pre></div><h3 id="a5-new-executable--known-rmm--alert">A5. New executable + known RMM + alert</h3>
<p>
    <img src="/post-images/detecting-rmm/rmm6.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>Similar to step 4 except referencing <em>actual</em> alerts for the second part of the sequence</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span>sequence <span style="color:#ff79c6">by</span> <span style="color:#ff79c6">host</span>.id, <span style="color:#ff79c6">user</span>.id, process.name <span style="color:#ff79c6">with</span> maxspan<span style="color:#ff79c6">=</span><span style="color:#bd93f9">25</span>m
</span></span><span style="display:flex;"><span>  [alert <span style="color:#ff79c6">where</span> <span style="color:#ff79c6">rule</span>.id <span style="color:#ff79c6">==</span> <span style="color:#ff79c6">&lt;</span>new_term_rule_step3<span style="color:#ff79c6">&gt;</span>]
</span></span><span style="display:flex;"><span>  [alert <span style="color:#ff79c6">where</span> <span style="color:#ff79c6">true</span>]
</span></span></code></pre></div><p>Leaving subquery 2 generic is a great option, since a newly occurring RMM would be suspicious in this case. It can be
tightened down with a few options:</p>
<ul>
<li>limiting query 2 to certain techniques or subtechniques</li>
<li>add additional logic to query 2 from the raw alert results, or even a subset of alerts</li>
<li>adding additional queries to the sequence to express a more progressed attack</li>
</ul>
<h3 id="b1-logic-for-generic-rmm-behaviors">B1. Logic for generic RMM behaviors</h3>
<p>Rather than using statically defined RMM artifacts based on observations, this entails building out generic logic to identify them. This is a much greater challenge, especially due to their legitimate nature. Additional features such as ML, entity analytics, and other aggregation based searching make a significant difference here.</p>
<p>Once a dynamic method is defined, then steps 2-5 apply, creating a sustainable detection approach.</p>
<p>I think it <em>is</em> doable from a purely rule-based approach, but I will return to this a bit later &hellip;</p>
<p>Also, with the Elastic ES|QL piped language, these become much more feasible within a single rule.</p>
]]></content></item><item><title>Event Category and Field Distribution Over Attack Techniques</title><link>https://br0k3nlab.com/posts/2023/03/event-category-and-field-distribution-over-attack-techniques/</link><pubDate>Fri, 10 Mar 2023 21:05:17 -0700</pubDate><guid>https://br0k3nlab.com/posts/2023/03/event-category-and-field-distribution-over-attack-techniques/</guid><description>Event category and field distribution over ATT&amp;amp;CK techniques Analysis of Elastic detection-rules, showing event types and field distribution per technique. The full results are represented in the file below (fields_by_technique.json)
The structure is:
&amp;#34;library&amp;#34;: { # event.category (generic if event.category not defined) &amp;#34;fields&amp;#34;: { # field distribution for that event.category within that technique &amp;#34;dll.code_signature.status&amp;#34;: &amp;#34;100.00%&amp;#34;, # field with percentage &amp;#34;dll.code_signature.trusted&amp;#34;: &amp;#34;100.00%&amp;#34;, # field with percentage &amp;#34;host.os.type&amp;#34;: &amp;#34;100.00%&amp;#34;, # field with percentage &amp;#34;process.</description><content type="html"><![CDATA[<h2 id="event-category-and-field-distribution-over-attck-techniques">Event category and field distribution over ATT&amp;CK techniques</h2>
<p>Analysis of Elastic <a href="https://github.com/elastic/detection-rules">detection-rules</a>, showing event types and field
distribution per technique. The full results are represented in the file below (<code>fields_by_technique.json</code>)</p>
<p>The structure is:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-fallback" data-lang="fallback"><span style="display:flex;"><span>&#34;library&#34;: {                                       # event.category (generic if event.category not defined)
</span></span><span style="display:flex;"><span>      &#34;fields&#34;: {                                  # field distribution for that event.category within that technique
</span></span><span style="display:flex;"><span>        &#34;dll.code_signature.status&#34;: &#34;100.00%&#34;,    # field with percentage
</span></span><span style="display:flex;"><span>        &#34;dll.code_signature.trusted&#34;: &#34;100.00%&#34;,   # field with percentage
</span></span><span style="display:flex;"><span>        &#34;host.os.type&#34;: &#34;100.00%&#34;,                 # field with percentage
</span></span><span style="display:flex;"><span>        &#34;process.pid&#34;: &#34;100.00%&#34;                   # field with percentage
</span></span><span style="display:flex;"><span>      },
</span></span><span style="display:flex;"><span>      &#34;rule_count&#34;: 1                              # number of rules within this technique + event.category
</span></span></code></pre></div><p>Ex:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#f1fa8c">&#34;T1553&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">&#34;generic&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">&#34;fields&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;event.provider&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;host.os.type&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;message&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>
</span></span><span style="display:flex;"><span>      },
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">&#34;rule_count&#34;</span>: <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">&#34;library&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">&#34;fields&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;dll.code_signature.status&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;dll.code_signature.trusted&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;host.os.type&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;process.pid&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>
</span></span><span style="display:flex;"><span>      },
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">&#34;rule_count&#34;</span>: <span style="color:#bd93f9">1</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">&#34;process&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">&#34;fields&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;event.category&#34;</span>: <span style="color:#f1fa8c">&#34;66.67%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;event.type&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;host.os.type&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;process.args&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;process.executable&#34;</span>: <span style="color:#f1fa8c">&#34;33.33%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;process.name&#34;</span>: <span style="color:#f1fa8c">&#34;66.67%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;process.parent.executable&#34;</span>: <span style="color:#f1fa8c">&#34;33.33%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;process.pe.original_file_name&#34;</span>: <span style="color:#f1fa8c">&#34;33.33%&#34;</span>
</span></span><span style="display:flex;"><span>      },
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">&#34;rule_count&#34;</span>: <span style="color:#bd93f9">3</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">&#34;registry&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">&#34;fields&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;event.type&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;host.os.type&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;process.executable&#34;</span>: <span style="color:#f1fa8c">&#34;33.33%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;registry.data.strings&#34;</span>: <span style="color:#f1fa8c">&#34;66.67%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;registry.path&#34;</span>: <span style="color:#f1fa8c">&#34;100.00%&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">&#34;registry.value&#34;</span>: <span style="color:#f1fa8c">&#34;33.33%&#34;</span>
</span></span><span style="display:flex;"><span>      },
</span></span><span style="display:flex;"><span>      <span style="color:#ff79c6">&#34;rule_count&#34;</span>: <span style="color:#bd93f9">3</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span></code></pre></div><p>For technique <code>T1553</code>, the following event types were present on the specified number of rules:</p>
<ul>
<li>1 generic</li>
<li>1 library</li>
<li>3 process</li>
<li>3 registry</li>
</ul>
<p>And the respective fields per <code>event.category</code> were present relative to those counts as defined</p>
<h2 id="data">Data</h2>
<p>A full json dump of the data can be found <a href="https://gist.github.com/brokensound77/420bc801592715c00af2dd0775f59901#file-fields_by_technique-json" target="_blank" rel="noopener noreferrer">here</a>
, where this blog was originally posted.</p>
<p>Happy analyzing!</p>
]]></content></item><item><title>Sifting through the SPLurge! Writing Effective Queries for Splunk with SPL</title><link>https://br0k3nlab.com/posts/2018/06/sifting-through-the-splurge-writing-effective-queries-for-splunk-with-spl/</link><pubDate>Mon, 18 Jun 2018 15:47:22 -0700</pubDate><guid>https://br0k3nlab.com/posts/2018/06/sifting-through-the-splurge-writing-effective-queries-for-splunk-with-spl/</guid><description>Splunk is arguably one of the most popular and powerful tools across the security space at the moment, and for good reason. It is an incredibly powerful way to sift through and analyze big sets of data in an intuitive manner. SPL is the Splunk Processing Language which is used to generate queries for searching through data within Splunk.
The organization I have in mind when writing this is a SOC or CSIRT, in which large scale hunting via Splunk is likely to be conducted, though it can apply just about any where.</description><content type="html"><![CDATA[<p>Splunk is arguably one of the most popular and powerful tools across the security space at the moment, and for good reason.
It is an incredibly powerful way to sift through and analyze big sets of data in an intuitive manner. SPL is the Splunk
Processing Language which is used to generate queries for searching through data within Splunk.</p>
<p>The organization I have in mind when writing this is a SOC or CSIRT, in which large scale hunting via Splunk is likely
to be conducted, though it can apply just about any where. It is key to be able to have relevant data sets for which to
properly vet queries against. Fortunately, there are many example data sets available for testing on GitHub, <a href="https://docs.splunk.com/Documentation/Splunk/7.1.1/SearchTutorial/GetthetutorialdataintoSplunk">from Splunk</a>,
and some mentioned below. There are also &ldquo;data generators&rdquo; which can generate noise for testing. Best of all would be to
create your own though :).</p>
<p>I was fortunate to have had the enjoyable experience of participating in a Boss of the SOC CTF a few years back, which
had some pretty good exemplar security related data. Earlier this year, they released the data set publicly <a href="https://www.splunk.com/blog/2018/05/10/boss-of-the-soc-scoring-server-questions-and-answers-and-dataset-open-sourced-and-ready-for-download.html">here</a>.</p>
<p>This guide is not meant to be a deep dive into the structuring of a query using the SPL. The best place for that is the
Splunk documentation itself, starting with <a href="http://docs.splunk.com/Documentation/Splunk/7.1.1/Search/Aboutsearchlanguagesyntax">this</a>. This is geared more towards operations in which multiple queries are
written, maintained, and used in an operational capacity. Many of these concepts can be generalized and applied to other
signatures, rules, code or programmatic functions, such as Snort, YARA, or ELK, in which a large quantity of
multi-version discrete units must be maintained.</p>
<h3 id="1-balance-efficiency-with-enough-specificity-to-minimize-false-positives">1. Balance efficiency with enough specificity to minimize false positives</h3>
<p>The ultimate goal of any Splunk query is to search and present data in order to answer some question(s). There are many
right ways to search in Splunk, but there are often far fewer best ways (yes, multiple bests, see next sentence). Before
formulating a search query, a couple considerations should be weighed and prioritized, such as accuracy, efficiency,
clarity, integrity, and duration. It is easy to get spoiled by simply doing wildcard searches, but also just as easy to
unnecessarily bog down a search with superfluous key value mappings. An over reliance of either can lead to problems.</p>
<p><strong>Accuracy</strong> - are there multiple sources which can answer the question? If so, which is more reliable and authoritative?
More importantly, how important is it to reduce or eliminate false positives from your results? There is a heavy inverse
correlation between accuracy and efficiency.</p>
<p><strong>Clarity</strong> - filtering down to the most relevant information needed to answer the question is only half of the battle –you
still need to interpret it. It may be fine to view the results as raw data if there are only one or two results of
non-complex data, but when there are rows of deeply structured data, taking the time to present it in the most
appropriate manner will go a long way.</p>
<p><strong>Duration</strong> - the length required for the query to complete. Is this a search that will be run often, and so delays are
additive and add to total inefficiency; is there an urgent need to answer something ASAP; is a longer duration eating up
resources on other running functions on the search head? Sometimes it is necessary to break a search into smaller
sub-searches or to target smaller sets of data and then pivot from there.</p>
<p><strong>Efficiency</strong> - closely tied to duration, an inefficient query will lead to unnecessary delays, excessive resource
consumption, and could even affect the integrity of the data (pay close attention to implicit limitations of results on
certain commands!). Paying attention to efficiency is especially important if there are per-user limitations on number
of searches, memory usage, or other constraints.Too many explicitly defined wildcard placeholders could become very
expensive, and the <a href="https://en.wikipedia.org/wiki/Linearizability">atomicity</a> of a formulated query should always be considered.</p>
<p><strong>Integrity</strong> - will you be manipulating any data as part of your search? If so, understand the risks to compromising the
integrity of your results in doing so. The more pivots made on returned data, the more susceptible to loss of integrity
the search becomes.</p>
<h3 id="2-make-it-readable">2. Make it readable</h3>
<p>Write queries in a consistent and clear manner. Sometimes it is better to have a query take up many additional lines for
3. the sake of better readability. Breaking into newlines on pipes is the defacto standard for readability purposes, as
4. can be seen below.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span>event_simpleName <span style="color:#ff79c6">IN</span> (SyntheticProcessRollup2, ProcessRollup2) ImageFileName<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;*Windows\\\System32\\\\regsvr32.exe&#34;</span> CommandLine<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;*/i:http*&#34;</span> <span style="color:#ff79c6">AND</span> ParentCommandLine<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;*scrobj.dll*&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> rex field<span style="color:#ff79c6">=</span>CommandLine <span style="color:#f1fa8c">&#34;/i:(?&lt;sct_file_tmp&gt;\S+)&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> eval sct_file<span style="color:#ff79c6">=</span><span style="color:#ff79c6">replace</span>(sct_file_tmp, <span style="color:#f1fa8c">&#34;:&#34;</span>, <span style="color:#f1fa8c">&#34;[:]&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> eval ParentProcess<span style="color:#ff79c6">=</span>ImageFileName
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> eval ParentCLI<span style="color:#ff79c6">=</span>CommandLine
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> eval ParentUser<span style="color:#ff79c6">=</span>UserName
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> <span style="color:#ff79c6">rename</span> TargetProcessId_decimal <span style="color:#ff79c6">AS</span> ParentProcessId_decimal
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> <span style="color:#ff79c6">join</span> ParentProcessId_decimal 
</span></span><span style="display:flex;"><span>	[<span style="color:#ff79c6">search</span> event_simpleName <span style="color:#ff79c6">IN</span> (SyntheticProcessRollup, ProcessRollup2)
</span></span><span style="display:flex;"><span>	<span style="color:#ff79c6">|</span> eval ChildProcess<span style="color:#ff79c6">=</span>ImageFileName
</span></span><span style="display:flex;"><span>	<span style="color:#ff79c6">|</span> eval ChildCLI<span style="color:#ff79c6">=</span>CommandLine
</span></span><span style="display:flex;"><span>	<span style="color:#ff79c6">|</span> eval ChildUser<span style="color:#ff79c6">=</span>UserName]
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> <span style="color:#ff79c6">table</span> _time ParentUser ParentCLI ChildProcess ChildCLI sct_file
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">view</span> raw2.Make<span style="color:#ff79c6">-</span>it<span style="color:#ff79c6">-</span>readable.py hosted <span style="color:#ff79c6">with</span> ❤ <span style="color:#ff79c6">by</span> GitHub
</span></span></code></pre></div><h3 id="3-make-it-extensible">3. Make it extensible</h3>
<p>Queries should be written in such a way that other people can modify it for their own adaptations or to update or expand
a current one. Some ways to accomplish this would be using obvious variable names, readability, or even leaving in
inexpensive functionality or variables which can be used for other purposes.</p>
<h3 id="4-make-it-modular">4. Make it modular</h3>
<p>Modularity will lead to extensibility, maintainability, and resiliency. This will also increase efficiency as code reuse
will be much simpler.</p>
<h3 id="5-make-it-feasible">5. Make it feasible</h3>
<p>If the query is written for the purpose of manual sifting and analysis, then 50k results is not very reasonable. However,
if it is for stateful preservation, <a href="http://docs.splunk.com/Documentation/Splunk/7.1.1/Alert/Aboutalerts">alerts</a>, or
<a href="http://docs.splunk.com/Documentation/Splunk/7.1.1/Knowledge/Aboutlookupsandfieldactions">lookups</a>, then that is more
acceptable. Incorporating pivots on the
information with subsearches and filtering or even, if necessary, breaking it up in to multiple different queries will
make managing the results a surmountable task.</p>
<h3 id="6-make-it-resilient">6. Make it resilient</h3>
<p>The data can change and so can the SPL itself (or even custom commands if used), so writing queries that are less
effected by potential changes is important, especially if the effects of the changes are not obvious, which could lead
to a loss of integrity in the results. (This is where testing is also important)</p>
<h3 id="7-make-it-consistent">7. Make it consistent</h3>
<p>Having a style guide may seem like overkill, but if your operation is highly dependent on maintaining a repository of
queries, it can go a long way. Naming conventions, spacing, line breaks, use of quotations, ordering, and style are
some of the things to standardize to help with consistency.</p>
<h3 id="8-make-it-identifiable">8. Make it identifiable</h3>
<p>Something as simple as:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span> <span style="color:#ff79c6">|</span> eval queryID<span style="color:#ff79c6">=</span>wxp<span style="color:#ff79c6">-</span><span style="color:#bd93f9">110</span> 
</span></span></code></pre></div><p>This ID can then be printed out with the results if needed or purely used as a means to categorize and quickly identify.
Naming conventions should be obvious or recognizable (wxp = Windows XP, query 110), or even mappable to the repository itself.</p>
<h3 id="9-make-it-noob-friendly">9. Make it noob friendly</h3>
<p>This is obviously highly dependent on your usage and organizational structure, however, it never hurts to keep queries
as simple as can be, since there is always the chance that someone else will need to maintain or interpret them. Bonus*
less time needing to train people on their purpose!</p>
<h3 id="10-rtfm">10. RTFM!</h3>
<p>I am a huge proponent of RTFM (F!=field, btw) for both myself and others. Splunk has put a lot of effort into meticulous
documentation, which is clearly reflected in the detailed and thorough documentation. With regards to writing SPL queries,
the <a href="http://docs.splunk.com/Documentation/Splunk/7.1.1/SearchReference/WhatsInThisManual">search reference</a> is your absolute best friend!</p>
<h3 id="11-know-your-data">11. Know your data</h3>
<p>The first two things that I tell anyone to do that is new to Splunk is to familiarize yourself with the syntax of SPL
(#10) and just as importantly, to get to know how the data is structured. The simplest way to do this is to do a
wildcard search (*) and start reviewing the raw results under the events tab. The data will usually be structure in XML
or JSON. Initially, it will be less important to know which data was structured from <a href="http://docs.splunk.com/Documentation/Splunk/7.1.1/Indexer/Howindexingworks">indexing</a>,
<a href="http://docs.splunk.com/Documentation/Splunk/7.1.1/Knowledge/ExtractfieldsinteractivelywithIFX">field extractions</a>, or
other <a href="http://docs.splunk.com/Documentation/SplunkCloud/latest/Knowledge/Configureadvancedextractionswithfieldtransforms">transforms</a>,
but may become important with more advanced searches.</p>
<h3 id="12test-it">12.Test it</h3>
<p>Do not ever merge a query into production ops, bless off on it, trust it, or whatever it is you do to give it legitimacy
without first testing and confirmation of positive results. Regardless of how simple the query is, you can never
guarantee that some other confounding issue isn&rsquo;t occurring. If it is a matter of missing the applicable data, well then,
Try Harder! There are many great products out there to help with this at scale, such as Red Canary&rsquo;s
<a href="https://github.com/redcanaryco/atomic-red-team">atomic red team</a> or
Mitre&rsquo;s <a href="https://github.com/mitre/caldera">caldera</a>.</p>
<h3 id="13-build-it-out-piecemeal">13. Build it out piecemeal</h3>
<p>It can get stressful spending a lot of time on a query, only for it to not return the correct or any results, regardless
of tweaking. The best way to build complex queries is to build them in pieces, testing as you go along. This is
especially convenient because you can point to available data for the sake of testing to ensure positive results, and
then change it as it is built out.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#ff79c6">#</span> ensure you have <span style="color:#ff79c6">data</span> <span style="color:#ff79c6">for</span> the computer
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">host</span><span style="color:#ff79c6">=</span>ComputerA  
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> ensure you have <span style="color:#ff79c6">data</span> being parsed <span style="color:#ff79c6">from</span> that computer <span style="color:#ff79c6">to</span> the CommandLine field
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">host</span><span style="color:#ff79c6">=</span>ComputerA CommandLine<span style="color:#ff79c6">=*</span>  
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span> <span style="color:#ff79c6">search</span> <span style="color:#ff79c6">for</span> <span style="color:#ff79c6">all</span> occurences <span style="color:#ff79c6">of</span> python <span style="color:#ff79c6">in</span> command line activity <span style="color:#ff79c6">for</span> the computer
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">host</span><span style="color:#ff79c6">=</span>ComputerA CommandLine<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;*python*&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>...
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">#</span><span style="color:#ff79c6">search</span> <span style="color:#ff79c6">for</span> <span style="color:#ff79c6">all</span> systems <span style="color:#ff79c6">where</span> powershell spawned a python program <span style="color:#ff79c6">in</span> which <span style="color:#bd93f9">3</span> <span style="color:#ff79c6">or</span> <span style="color:#ff79c6">more</span> <span style="color:#ff79c6">parameters</span> <span style="color:#ff79c6">are</span> passed
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">host</span><span style="color:#ff79c6">=*</span> ParentProcess<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;powershell.exe&#34;</span> process<span style="color:#ff79c6">=</span><span style="color:#f1fa8c">&#34;python.exe&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> rex field<span style="color:#ff79c6">=</span>CommandLine <span style="color:#f1fa8c">&#34;(\s-{1,2})(?&lt;flags&gt;\S+)&#34;</span> max_match<span style="color:#ff79c6">=</span><span style="color:#bd93f9">0</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> stats <span style="color:#ff79c6">count</span> <span style="color:#ff79c6">values</span>(flags) <span style="color:#ff79c6">by</span> <span style="color:#ff79c6">host</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> <span style="color:#ff79c6">where</span> <span style="color:#ff79c6">count</span><span style="color:#ff79c6">&gt;</span><span style="color:#bd93f9">3</span>
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">|</span> sort <span style="color:#bd93f9">0</span> <span style="color:#ff79c6">host</span>
</span></span></code></pre></div><h3 id="14-implement-version-control">14. Implement version control</h3>
<p>The necessity of this is really dependent on the amount of queries and modifications, though it makes sense even for
small quantities. This can be accomplished as simply as baking a version into the query itself, such as from #8 with
revisions tacked on with periods (wxp-110.3) or even in its own field:
| eval version=3
Even better than that would be to maintain them in a database or repository such as GitHub, which gives the added
benefit of stateful change representations. It is also possible to save searches directly in Splunk, the version control
is less intuitive in this way.</p>
<h3 id="15-maintain-multiple-versions-of-the-same-thing">15. Maintain multiple versions of the same thing</h3>
<p>This doesn&rsquo;t just apply to older versions of the same query, but queries which may search the same thing but present it
in a different manner, search a different data set, or search a different time window.</p>
<h3 id="16-dont-reinvent-the-wheel">16. Don&rsquo;t reinvent the wheel</h3>
<p>It is all too easy to blow a full 12 hour shift perfecting a query, which may not even end up working at all. While it
is important to have these search queries catered to your specific need, it is not always necessary to MacGyver it alone.
There are lots of great resources available to borrow ideas or techniques from, such as the Splunk blogs and forums, or
you can even work with a co-worker.</p>
<h3 id="17-dont-depend-on-the-wheel">17. Don&rsquo;t depend on the wheel</h3>
<p>Counter to #16, you do not want to become over reliant on searching for help, as this could lead to running queries
which may not be working as you think they are. This could also potentially compromise the integrity of the results.
Worse yet, it could be an inefficient way of doing something which has caught on and persisted through the forums.</p>
<h3 id="18-share-it">18. Share it</h3>
<p>If you have written a gem or come up with a novel approach to something, share it back with the community. Even if the
data set is different, there may still be much which can be gleaned from it. It also helps to drive conversations which
benefit the community as a whole.</p>
<h3 id="19-save-it">19. Save it</h3>
<p>This is such an obvious one, but in spite of that, I still constantly find myself rewriting queries that I had previously
written over and over again&hellip;</p>
<h3 id="20-regex">20. REGEX!</h3>
<p>I don&rsquo;t know why I have this all the way down at #20, because this is easily one of the most powerful and important
concepts for which to be able to pivot on results with. There are several commands where regex is able to be leveraged,
but the two most significant are <a href="https://docs.splunk.com/Documentation/Splunk/7.1.1/SearchReference/Regex">regex</a> and
<a href="https://docs.splunk.com/Documentation/Splunk/7.1.1/SearchReference/Rex">rex</a>.</p>
<p>Regex does exactly what it says &ndash;allows you to filter on respective fields (or _raw) using regex, which in Splunk is a
<a href="https://docs.splunk.com/Documentation/Splunk/7.1.1/Knowledge/AboutSplunkregularexpressions">slimmed down version</a>
of PCRE. The rex command is much more powerful, in that it allows you to create fields based on the
parsed data, which can then be used to pivot your searches on. You can even build it as a
<a href="https://docs.splunk.com/Documentation/Splunk/7.1.1/Search/Parsemultivaluefields">multivalued</a> field if more than
one match occurs. An example of the rex command (and potentially more than one value) can be seen in the example from #13.</p>
<h3 id="21-know-when-its-better-to-go-beyond-just-using-a-search-with-spl">21. Know when its better to go beyond just using a search with SPL</h3>
<p>Finally, we made it all the way to #21! Sometimes, depending on circumstance, function, and operational usage, manual
searching with SPL queries is just not the best answer. Splunk has a lot of other functionality which can accomplish
many of the same things, with less manual requirements. Alerts, scheduled reports, dashboards, and any of a number of
apps built within or against the API allow for almost limitless capability. If you are struggling to maintain or achieve
some of the topics annotated here, it may mean it is time to explore some of these alternative options.</p>
<h3 id="final-thoughts">Final Thoughts</h3>
<p>This is certainly not an all inclusive list, as there are many more practices which can apply here. Ultimately, it
depends on the specific deployment, implementation, and usage of Splunk which should dictate exactly how you create and
maintain search queries. This was also not meant to go too deep in the weeds on generating advanced queries (though that
may come in the future), but rather a high level approach to maintaining quality and standards. There are many other
people who are far more experienced and with much greater Splunk-fu out there, so if you have any input or insight,
please feel free to reach out.</p>
<p><em>originally posted on a previous blog of mine</em></p>
]]></content></item><item><title>Cyber Threat Hunting - Leveraging the Kill Chain</title><link>https://br0k3nlab.com/posts/2017/04/cyber-threat-hunting-leveraging-the-kill-chain/</link><pubDate>Thu, 27 Apr 2017 17:46:38 -0700</pubDate><guid>https://br0k3nlab.com/posts/2017/04/cyber-threat-hunting-leveraging-the-kill-chain/</guid><description>Cyber Threat Hunting is a critical component necessary to ensuring comprehensive defense and response measures are in place by taking a proactive approach to detecting threats. While threat hunting itself is not a new concept, the actual execution of it is constantly evolving. The current inception of threat hunting is enabled by the fact that big data handling has become more feasible along with the advent of advanced statistical analysis and machine learning.</description><content type="html"><![CDATA[<p>Cyber Threat Hunting is a critical component necessary to ensuring comprehensive defense and response measures are in
place by taking a proactive approach to detecting threats. While threat hunting itself is not a new concept, the actual
execution of it is constantly evolving. The current inception of threat hunting is enabled by the fact that big data
handling has become more feasible along with the advent of advanced statistical analysis and machine learning.</p>
<p>There are many frameworks and methodologies that have been created around modern cyber threat hunting. Some of these
particular implementations are specialized for specific environments, circumstances, or data sources, while others are
more generic, applicable across any situation. The one thing which the majority of these methodologies have in common
however, is the fact that they all leverage or reference an attacker lifecycle in some way.</p>
<p>There are many considerations and components which should be accounted for while preparing to execute a hunting mission,
but a few of those include the following:</p>
<p><strong>The Attacker Lifecycle</strong>
<a href="http://www.lockheedmartin.com/us/what-we-do/aerospace-defense/cyber/cyber-kill-chain.html">The Cyber Kill Chain</a>
is an industry-wide de facto standard for modeling threats within the cyber ecosystem. The Kill
Chain was originally created by several researchers at the Lockheed Martin Corporation as part of a
<a href="http://www.lockheedmartin.com/content/dam/lockheed/data/corporate/documents/LM-White-Paper-Intel-Driven-Defense.pdf">methodology</a> to
more appropriately model and defend against increasingly advanced adversaries. The primary benefit of approaching
defensive cyber operations from the perspective of the Kill Chain is a disciplined framework to focus and scope
intelligence-driven defensive operations such as cyber threat hunting.</p>
<p><strong>Defining Normality in Organizations</strong>
It is just as important to assess and understand your own organization as it is profiling and understanding the enemy.
This is one of the most common things that companies tend to struggle with. Inventory of assets, criticality of data,
normal business to business (B2B) communications, etc. Fully understanding and awareness of these things makes defending
(and proactively hunting) much more successful.</p>
<p><strong>Intelligence-Driven</strong>
This may be the most important factor within the entire threat hunting process. While the full details of how to execute
this will be documented in a subsequent post, it is critical to understand that intelligence will not only scope the
hunting process, but also provide relevance to any findings within the context of the attacker lifecycle.</p>
<h3 id="scoping-hunt-missions">Scoping Hunt Missions</h3>
<p>The Cyber Kill Chain is broken down into the following categories:</p>
<ul>
<li>Reconnaissance</li>
<li>Weaponization</li>
<li>Delivery</li>
<li>Exploitation</li>
<li>Installation</li>
<li>Command &amp; Control (C2)</li>
<li>Actions on Objectives</li>
</ul>
<p>Many organizations with access to large amounts of data which they can hunt through will conduct both targeted and
generic hunts. The kill chain is leveraged through both approaches.</p>
<p>With targeted hunting, the pre-defined scope of potential adversaries and their historically attributable tactics,
techniques, and procedures (TTPs) are taken into account and broken out to the respective category which they fall under
in the Cyber Kill Chain. This then allows analysts to focus on appropriate data sources necessary for locating Indicators
of Compromise (IOC), Indicators of Activity (IOA), or anything else of relevance.</p>
<p>With regard to generic hunting, analysts can appropriately scope the focus of individual hunt missions based on the
categories of the Cyber Kill Chain. Under this method, analyst can start hunting very broadly within the chosen category,
making a series of pivots on the data returned, until it reaches a manageable size for the analyst to peruse through,
line by line. Should any indicators or suspicious items be found, analysts can then shift the focus laterally across the
Cyber Kill Chain to establish more evidence of an attack or escalate to a senior analyst to carry out the investigation.</p>
<p>The best way to understand the advantage of leveraging the kill chain for hunting is to see an example of it in action.</p>
<h3 id="the-hunt">The Hunt</h3>
<p>First, we must determine our data sources relevant to the hunt. In this case, system logs, security event data, and host
level information are available. The advantage of having multiple sources of data, such as an MSSP with multiple customers,
is an expanded, diverse data set for which to make comparisons. While this adds some complications in baselining, it
significantly increases the value of the data set for comparative analysis to identify suspicious activity based on anomalies.</p>
<p>Under generic hunting, we can choose to look for specific identifiers within the respective category chosen, look for
anomalies within the larger data set, or a combination of the two. Additionally, we must determine the window of time to
focus on. For generic hunting, these missions are usually executed on cyclical rotations, ensuring activity from any given
time gets coverage. Anomaly based generic hunting requires multiple successive pivots to systematically filter out more
and more data, so for simplicity, we will focus on hunting for specific identifiers. Although this method also relies on
successive pivots, it is much easier to demonstrate.</p>
<p>Two short example hunt missions are explored below.</p>
<p>Example # 1 – <strong>Actions on Objectives</strong> (Privilege Escalation) via <code>net</code> Command Usage</p>
<p>For this example we will explore a generic cyber threat hunt mission, focused on identifying activity categorized within
the actions on objectives phase.</p>
<p>For this search we are limiting to a <em>single customer</em>, looking for all <em>process creation</em> events executing the <code>net</code> command
within the <em>last 24 hours</em>:</p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc1.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>The search has returned 121 results. While this is surely a manageable number of events for an analyst to investigate,
we can clear them out even further by filtering out normal activity based on established baselines:</p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc2.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>We can see that, across the customer’s entire environment, the 121 events are comprised of only 3 unique commands.
Breaking this our further by individual system reveals even more insight:</p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc3.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>Based on previously documented baselines and defining normality for this customer, we can determine that the wuauserv
activity (Windows update automatic update service) is expected activity and rule it out. We can dig further into the
<code>net use</code> commands:</p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc4.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>The results (truncated for brevity) reveal the activity within the context of time. We can then compare this activity
with the normal, expected activity within the respective systems (either by comparing to baselines or verifying with the customer).</p>
<p>If there were any suspicious results which could not be accounted for as legitimate, the next step would have been to
paint additional context around it, such as by seeing what else the user is executing shortly before and after; what
other processes are being run; and parent-child process relationships. Identified suspicious activity can then be
compared to other data sources to enrich these findings and then to shift focus across the Cyber Kill Chain categories
for extending context.</p>
<p>Example # 2 Exploitation Leveraging a Software Vulnerability (<a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-5638">CVE-2017-5638</a>)</p>
<p>For this next example, we will explore how finding anomalies in the data set can reveal malicious activity (or attempts).
In this search, we are searching for event data to identify all unique request and response HTTP headers. We will not
restrict to a single customer in this case in order to encompass a larger, diversified data set:</p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc5.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc6.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>The (partial) list of results shows that it has captured over 50 unique HTTP headers. We can then look for certain
outliers in the respective headers such as significantly higher or lower occurrences of a value or even significant
differences in the length of the values themselves.</p>
<p>As we browse through the top values of each result, we discover an interesting finding within the request packet’s
content-type header. Not only are there two entries far longer than the rest, but one also has a very high rate of occurrence.</p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc7.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>We can further verify how much of an outlier this truly might be by examining all unique values. This reveals that there
are some repeating patterns within the value itself. A review of the RFC will reveal that this is not expected activity.
Of course we now know that this is the attempted exploitation of CVE-2017-5638, targeting Apache Struts, but this process
shows how this activity might have been detected prior to its discovery and disclosure.</p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc8.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<p>We can now make a pivot on this data to extract all of the attempted malicious command line activity by targeted customer
within the Apache Struts vulnerability, so we can then search our host level data for verification that the commands were
not successfully executed.</p>
<p>
    <img src="/post-images/cyber-kill-chain/ckc9.png"  alt="search"  class="center"  style="border-radius: 8px;"  />

 <br></p>
<h3 id="conclusion">Conclusion</h3>
<p>Cyber threat hunting is critical to effectively identifying potential threats or compromises by taking a proactive
approach. There are many different methodologies and techniques to guide cyber hunt missions, but the right one should
be dictated by specific circumstances. Regardless of how it is executed, examining the environment through the attacker
lifecycle will help guide scope and provide additional insight which might not have been considered.</p>
<p>Taking a proactive approach to securing and detecting malicious activity within your environment is paramount in today’s
technologically dependent landscape. Hunting through an attacker lifecycle or the Cyber Kill Chain will allow you to
identify and stop threats which traditional signature-based methods might miss.</p>
<p><em>originally posted on a previous blog of mine</em></p>
<p><em>update: this was also released, in a shorter, modified version at this blog <a href="https://web.archive.org/web/20170904045522/https://blog.rackspace.com/cyber-hunting-through-the-cyber-kill-chain">here</a>
as well as this <a href="https://www.rackspace.com/sites/default/files/white-papers/age-of-the-cyber-hunter-white-paper_1.pdf">white paper</a></em></p>
]]></content></item></channel></rss>