Motivation
Both traditional fuzzers (afl++, libFuzzer) and smart contract fuzzers (Echidna, Medusa) typically use a corpus as a starting sequence of interesting inputs for the program under test.
There is an opportunity for smart contract fuzzers to use a common corpus format to enable composability across a range of tools:
- tools that generate a starting corpus or extend an existing corpus (Optik, Halmos)
- tools that process an existing corpus (coverage reports, minimizers)
- fuzzers that both start with a given corpus and can add to it over the course of a run
End to end you could imagine a “fuzzer heaven” workflow where:
- you start with a base corpus generator
- you run $FUZZER1 for a while, adding corpus entries
- you run $FUZZER2 for a while, potentially concurrently
- you run $FUZZER3 with no input corpus in the cloud, but grab its output corpus
- process the results with another tool to merge the corpora, categorize unique findings, minimize sequences, generate a timeline view of findings, a coverage report, etc…
Proposed Format
A corpus is a directory with the following structure:
corpus
├── setUp.json
├── inputs
│ ├── toolname-seq001.json
│ │ ...
│ └── toolname-seqXXX.json
└── outputs
├── toolname-seqYYY.json
│ ...
└── toolname-seqZZZ.json