There are multiple steps involved:
Step 1: Prepare intermediate results. Each of them MUST fit this criteria:
- It appears exactly as-is in ALL inclusion files.
- It can't appear in ANY of the exclusion files.
- It must be at least 16 bytes long.
- It must not be too long (usually < 256 bytes, but could be 1024 at times).
Application of these criteria often gives more than 10.000 total snippets.
Step 2: Keep top snippets amongst the intermediate results
We apply a statistical model to assign a score to each of the previous snippets, and use that score to pick the top 30 of each type -- eg: top 30 binary, top 30 ascii, etc.
The scores are inversely proportional to the expected number of goodware matched by the snippet. That is, a low score means that it would match lots of goodware, while a high score means it would match no goodware at all.
The scores are approximate, therefore it pays off to look past the first or second snippet on the results page.
Updated 3 months ago