The Art, Science, and Engineering of Fuzzing: A Survey
- Year
- 2021
- Authors
- Valentin J.M. Manes, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, Maverick Woo
- DOI
- 10.1109/TSE.2019.2946563
Related
Persistent Notes
In-text annotations
“1 INTRODUCTION” Page 1
“At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed." Page 1
“Furthermore, there has been an observable fragmentation in the terminology used by various fuzzers." Page 1
“2 SYSTEMIZATION, TAXONOMY, AND TEST PROGRAMS” Page 2
“generates a stream of random characters to be consumed by a target program” Page 2
OG definition Page 2
“2.1 Fuzzing & Fuzz Testing” Page 2
“Fuzzing is the execution of the PUT using input(s) sampled from an input space (the “fuzz input space”) that protrudes the expected input space of the PUT." Page 2
“Third, the sampling process is not necessarily randomized, as we will see in §5." Page 2
“Fuzz testing is the use of fuzzing to test if a PUT violates a correctness policy” Page 2
“A fuzzer is a program that performs fuzz testing on a PUT." Page 2
“A fuzz campaign is a specific execution of a fuzzer on a PUT with a specific correctness policy." Page 2
“For example, a correctness policy employed by early fuzzers tested only whether a generated input—the test casecrashed the PUT. However, fuzz testing can actually be used to test any policy observable from an execution, i.e., EM-enforceable [190]." Page 2
This is how the OG paper defined correctness Page 2
“A bug oracle is a program, perhaps as part of a fuzzer, that determines whether a given execution of the PUT violates a specific correctness policy." Page 2
“For instance, PerfFuzz [140] looks for inputs that reveal performance problems." Page 2
“A fuzz configuration of a fuzz algorithm comprises the parameter value(s) that control(s) the fuzz algorithm." Page 2
May be simple or complex depending on the algorithm. Page 2
“A seed is a (commonly well-structured) input to the PUT, used to generate test cases by modifying it." Page 2
“2.2 Paper Selection Criteria” Page 2
“2.3 Fuzz Testing Algorithm” Page 3
“The first part is the PR E P R O C E S S function, which is executed at the beginning of a fuzz campaign. The second part is a series of five functions inside a loop: SC H E D U L E, IN P U TGE N, IN P U TEV A L, CO N FUP D A T E, and CO N T I N U E. Each execution of this loop is called a fuzz iteration and each time IN P U TEV A L executes the PUT on a test case is called a fuzz run." Page 3
“2.4 Taxonomy of Fuzzers” Page 3
“2.4.1 Black-box Fuzzer” Page 3
“The term “black-box” is commonly used in software testing [164], [35] and fuzzing to denote techniques that do not see the internals of the PUT—these techniques can observe only the input/output behavior of the PUT, treating it as a black-box. In software testing, black-box testing is also called IO-driven or data-driven testing” Page 3
“2.4.2 White-box Fuzzer” Page 4
“At the other extreme of the spectrum, white-box fuzzing [93] generates test cases by analyzing the internals of the PUT and the information gathered when executing the PUT." Page 4
“While DSE is an active research area [93], [91], [41], [179], [116], works in DSE generally do not claim to be about white-box fuzzing and so we did not include many such works." Page 4
“2.4.3 Grey-box Fuzzer” Page 4
“Some fuzzers [81], [71], [214] take a middle ground approach dubbed grey-box fuzzing. In general, grey-box fuzzers can obtain some information internal to the PUT and/or its executions." Page 4
“Although there usually is some consensus among security experts, the distinction among black-, greyand white-box fuzzing is not always clear." Page 4
“2.5 Fuzzer Genealogy and Overview” Page 4
“3 PREPROCESS” Page 4
“3.1 Instrumentation” Page 4
“3.1.1 Execution Feedback” Page 7
“3.1.2 Thread Scheduling” Page 7
“However, instrumentation can also be used to trigger different non-deterministic program behaviors by explicitly controlling how threads are scheduled” Page 7
“Existing work has shown that even randomly scheduling threads can be effective at finding race condition bugs” Page 7
“3.1.3 In-Memory Fuzzing” Page 7
“When testing a large program, it is sometimes desirable to fuzz only a portion of the PUT without re-spawning a process for each fuzz iteration in order to minimize execution” Page 7
“Some fuzzers [9], [241] perform in-memory fuzzing on a function without restoring the state of the PUT after each iteration. We call such a technique in-memory API fuzzing." Page 7
“Although efficient, in-memory API fuzzing suffers from unsound fuzzing results: bugs (or crashes) found with inmemory fuzzing may not be reproducible, because (1) it is not always feasible to construct a valid calling context for the target function, and (2) there can be side-effects that are not captured across multiple function calls." Page 7
“3.2 Seed Selection” Page 7
“A common approach, which is known as minset, finds a minimal set of seeds that maximizes a coverage metric such as node coverage." Page 7
“For example, AFL’s minset is based on branch coverage with a logarithmic counter on each branch." Page 7
“3.3 Seed Trimming” Page 8
“Smaller seeds are likely to consume less memory and entail higher throughput. Therefore, some fuzzers attempt to reduce the size of seeds prior to fuzzing them, which is called seed trimming." Page 8
“3.4 Preparing a Driver Application” Page 8
“4 SCHEDULING” Page 8
“4.1 The Fuzz Configuration Scheduling (FCS) Problem” Page 8
“Fundamentally, every scheduling algorithm confronts with the same exploration vs. exploitation conflict” Page 8
“Woo et al. [235] dubbed this inherent conflict the Fuzz Configuration Scheduling (FCS) Problem." Page 8
“4.2 Black-box FCS Algorithms” Page 8
“Weighted Coupon Collector’s Problem with Unknown Weights” Page 8
“multi-armed bandit (" Page 8
“4.3 Grey-box FCS Algorithms” Page 8
“the coverage attained when fuzzing a configuration. AFL [241] is the forerunner in this category and it is based on an evolutionary algorithm (EA)." Page 8
“5 INPUT GENERATION” Page 9
“Traditionally, fuzzers are categorized into either generation- or mutation-based fuzzers” Page 9
“model-based” Page 9
Generation Based Page 9
“model-less” Page 9
Mutation based Page 9
“5.1 Model-based (Generation-based) Fuzzers” Page 9
“5.1.1 Predefined Model” Page 9
“Some fuzzers use a model that can be configured by the user." Page 9
“Other model-based fuzzers target a specific language or grammar, and the model of this language is built into the fuzzer itself." Page 9
“5.1.2 Inferred Model” Page 9
“Inferring the model rather than relying on a predefined or user-provided model has recently been gaining traction." Page 9
“5.1.3 Encoder Model” Page 10
“Fuzzing is often used to test decoder programs which parse a certain file format. Many file formats have corresponding encoder programs, which can be thought of as an implicit model of the file format." Page 10
“5.2 Model-less (Mutation-based) Fuzzers” Page 10
“By mutating only a fraction of a valid file, it is often possible to generate a new test case that is mostly valid, but also contains abnormal values to trigger crashes of the PUT." Page 10
“5.2.1 Bit-Flipping” Page 10
“Bit-flipping is a common technique used by many model-less fuzzers” Page 10
“mutation ratio, which determines the number of bit positions to flip for a single execution of IN P U TGE N” Page 10
“5.2.2 Arithmetic Mutation” Page 10
“AFL [241] and honggfuzz [213] contain another mutation operation where they consider a selected byte sequence as an integer and perform simple arithmetic on that value." Page 10
“5.2.3 Block-based Mutation” Page 10
“There are several block-based mutation methodologies, where a block is a sequence of bytes of a seed:" Page 10
insert, delete, random, permute order, resize, add block from another seed Page 10
“5.2.4 Dictionary-based Mutation” Page 11
“Some fuzzers use a set of predefined values with potentially significant semantic meaning for mutation." Page 11
“5.3 White-box Fuzzers” Page 11
“White-box fuzzers can also be categorized into either modelbased or model-less fuzzers." Page 11
“5.3.1 Dynamic Symbolic Execution” Page 11
“At a high level, classic symbolic execution [130], [42], [112] runs a program with symbolic values as inputs, which represents all possible values." Page 11
“by letting the user to specify uninteresting parts of the code” Page 11
“5.3.2 Guided Fuzzing” Page 11
“Some fuzzers leverage static or dynamic program analysis techniques to enhance the effectiveness of fuzzing." Page 11
“5.3.3 PUT Mutation” Page 11
“One of the practical challenges in fuzzing is bypassing a checksum validation." Page 11
“6 INPUT EVALUATION” Page 12
“6.1 Bug Oracles” Page 12
“For example, if a stack buffer overflow overwrites a pointer on the stack with a valid memory address, the program might run to completion with an invalid result rather than crashing but the fuzzer would not detect this." Page 12
“sanitizers” Page 12
“6.1.1 Memory and Type Safety” Page 12
“6.1.2 Undefined Behaviors” Page 12
“6.1.3 Input Validation” Page 12
“6.1.4 Semantic Difference” Page 13
“Semantic bugs are often discovered using a technique called differential testing [153], which compares the behavior of similar (but not identical) programs." Page 13
“6.2 Execution Optimizations” Page 13
“6.3 Triage” Page 13
“Triage is the process of analyzing and reporting test cases that cause policy violations." Page 13
“6.3.1 Deduplication” Page 13
“Deduplication is the process of pruning any test case from the output set that triggers the same bug as another test case." Page 13
“stack backtrace hashing, coveragebased deduplication, and semantics-aware deduplication." Page 13
“The underlying hypothesis of stack backtrace hashing is that similar crashes are caused by similar bugs, and vice versa. However, to the best of our knowledge, this hypothesis has never been directly tested. There is some reason to doubt its veracity: some crashes do not occur near the code that caused the crash. For example, a vulnerability that causes heap corruption might only cause a crash when an unrelated part of the code attempts to allocate memory and not when the heap overflow occurred." Page 13
“6.3.2 Prioritization and Exploitability” Page 14
“Prioritization, a.k.a. the fuzzer taming problem [61], is the process of ranking or grouping violating test cases according to their severity and uniqueness." Page 14
“6.3.3 Test case minimization” Page 14
“Another important part of triage is test case minimization. Test case minimization is the process of identifying the portion of a violating test case that is necessary to trigger the violation, and optionally producing a test case that is smaller and simpler than the original but still causes a violation." Page 14
“7 CONFIGURATION UPDATING” Page 14
“7.1 Evolutionary Seed Pool Update” Page 14
“7.2 Maintaining a Minset” Page 15
“8 RELATED WORK” Page 15
“9 CONCLUDING REMARKS” Page 15
%% Import Date: 2026-02-02T11:03:56.815-05:00 %%