Common Corpus Format

Motivation

Both traditional fuzzers (afl++, libFuzzer) and smart contract fuzzers (Echidna, Medusa) typically use a corpus as a starting sequence of interesting inputs for the program under test.

There is an opportunity for smart contract fuzzers to use a common corpus format to enable composability across a range of tools:

tools that generate a starting corpus or extend an existing corpus (Optik, Halmos)
tools that process an existing corpus (coverage reports, minimizers)
fuzzers that both start with a given corpus and can add to it over the course of a run

End to end you could imagine a “fuzzer heaven” workflow where:

you start with a base corpus generator
you run $FUZZER1 for a while, adding corpus entries
you run $FUZZER2 for a while, potentially concurrently
you run $FUZZER3 with no input corpus in the cloud, but grab its output corpus
process the results with another tool to merge the corpora, categorize unique findings, minimize sequences, generate a timeline view of findings, a coverage report, etc…

Proposed Format

A corpus is a directory with the following structure:

corpus
├── setUp.json
├── inputs
│   ├── toolname-seq001.json
│	  │	  ...
│   └── toolname-seqXXX.json
└── outputs
    ├── toolname-seqYYY.json
		│ 	...
    └── toolname-seqZZZ.json