50.003 - Auto Test Generation¶

Learning Outcomes¶

By the end of this unit you should be able to

explain the roles and functionalities of the test generator.
apply mutation-based fuzzing to generate test cases.
apply generation-based function to synthesize test cases.

Test Generation¶

Recall that process of software testing.

graph
  Input-->Program
  Program-->TO
  TO["Test Oracle"]-->Pass
  TO["Test Oracle"]-->Fail

So far we have been defining, curating and generating test cases manually.

In this unit we consider some approaches that generates test cases automatically.

graph
  TG["Test Generator"]-->Input
  Input-->Program
  Program-->TO
  TO["Test Oracle"]-->Pass
  TO["Test Oracle"]-->Fail

Fuzz Testing¶

Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program.

Fuzzing aims to identify test inputs which reveal exploitable vulnerabilities.

One downside for fuzzing is that it can't generate the expected outputs. This drawback is minor since for most of the fuzzing test cases, error should be expected outputs.

Why fuzzing¶

Besides test case generation automation, there are several extra reasons why fuzzing should be considered.

A study found that one-quarter to one-third of all utilities on every UNIX system that the evaluators could get their hands on would crash in the presence of random input.
A study that looked at 30 different Windows applications found that 21% of programs crashed and 24% hung when sent random mouse and keyboard input, and every single application crashed or hung when sent random Windows event messages.
A study found that OS X applications, including Acrobat Reader, Apple Mail, Firefox, iChat, iTunes, MS Office, Opera, and Xcode, were even worse than the Windows ones.

The Base Line - Random Testing¶

The base line of fuzzing is to randomly generate inputs as sequences of characters or bytes. Such a approach can be easily implemented.

For instance, consider the javascript program simple-fuzzer.js

function random_fuzz() {
    let res = "";
    // strings of any length between 0 and 1024
    let length = Math.floor(Math.random() * 1024);

    //generate a random character at each location of the string
    for (let i =0 ; i < length ; i ++) {
        //generate a character between ASCII 32 and 128
        let c = String.fromCharCode(Math.floor(Math.random() * 96) + 32);
        res = res + c;
    }
    return res;
}

console.log(random_fuzz())

Running node simple-fuzzer.js produces

.u/m:SKx[-:\3t3b?z87f)6'35!t1Y``fGR[_JS:&9^RSIA6svw8G~!%!%_lHA9b\-a"cw$<|$2Gc^8&A*c$eT#wd<QUd}Y1ua>yDxh:=NC^3jI~KMivT'.j{Cp%cDco\2aG/cw3d$<Ih|vIm,_4d,oE8;nV!VXc:[1T,X{F)pIh8=+_[Xtm=_2X:9EWU_Oo*$|^0}nw~Dr1cQ(<dr}Y7/lH&mS$7?fx(F(]j)^9#j8.m2`<v_]="PAI{qvD+yxPusDx!/5ZXMFf'b"Q,.pt7K\\?i7Yj 8Tx)/E'`M[}=v{>GtDbijU%mwEZB?PG]Fo{{6]jXRD[0(<zgbsah1J!D&m\`aGQ8ehDbQN9>^C<I[AZ2;s'%_oj1}#RuCj~i=O"vG]Q!FJ`UP:?{Y|]o\P#7zi'8\Ck@!vC./j:C.\)iGx>Q]@zffZ24]lZw4L7BKk0dA{/ Z6`3v:NN8:h:@/qOC.oq{O^kgYD(#;|@_\i,l!x2P14G(|T+KL!bml:<[P+Sh!*]JY|\y\dL[

In the older days when softwares were developed using less advanced tools and techniques, this approach was effective. In modern days, in which many softwares were developed with proper design and having standard input validation in-place, e.g. type checking, regex and parser etc. Most of the randomly generated input will be immediately rejected, without reaching very "deep" in the test subject.

Mutation based fuzzing¶

One of the possible ways to overcome the limitation with the random testing is that we could consider generating fuzzy test cases based on known valid test inputs.

For instance, assuming that we note that one of the valid inputs to the test subject is the following string

1	`ISTDisApillarInSUTDbutItsNameiSGoingToChange`

One possible way to mutate it is to choose a character at a random position and replace it by ssomething else.

For instance, we randoml choose position (say 13) and replace the character I by character Y

1	`ISTDisApillarYnSUTDbutItsNameiSGoingToChange`

The above is consdered an mutant of the original input.

Here are some basic operations for mutation at some randomly chosen location.

Flipping a bit/boolean/integer/character
Trimming
Swapping characters/bits
Insert characters

Corhort Exercise (Graded)¶

Given an input string, implement a mutation operator that chooses a random position in the string and swaps the adjacent characters. Meaning if SUTD is an input string and 2 is chosen as the random position, the output should be SUDT. Careful about the string length bound check.

Generation based fuzzing¶

An alternative to Mutation-based fuzzing is touse generation based fuzzing. In generation based fuzzing, we assume that we have access to a set of grammar rules that defines the possible syntatically valid inputs to the test subject.

Formal Grammar¶

In computer science, we often use formal grammar as the specification of a language. A language denotes a set of sequences of terminals. A terminal is often a symbol, a byte, or a number.

For example, consider a language of arithmetic expressions which consists of +,-,*,/ as the operators and numbers as operands. In addition, we allow to use () to disambiguate ambiguous terms.

For instance 5 + 2 * 3 is one of the valid expressions according to the following EBNF grammar rules

S ::= Expr
Expr ::= Expr + Term | Expr - Term | Term
Term ::= Term * Factor | Term / Factor | Factor
Factor ::= -Integer | (Expr) | Integer | Integer.Integer
Integer ::= Digit | IntegerDigit
Digit ::=  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

In the above, each line denotes a rule. A rule consists of a left-hand-side, a define operator ::= and a right-hand-side. The LHS is a non-terminal. A non-terminal is a meta symbol in the grammar which defines a potential expension. As a dual, we have terminals, which are the atomic/elementary symbols in the languages, such as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, -, *, /, ,., (, ) and white space. The RHS of a rule is a set of alternatives sepearated by |. Each alternative is a sequence of terminals and non-terminals.

We could interpret each rule as the LHS non-terminal can be potentially expanded into the one of the alternatives in the RHS. For example, we consider the rule with Expr as the LHS, which says the expression non-terminal can be either a + expression (which consists another Expr, a + and a non-terminal Term), or a - expression, or a Term by itself.

Parse Tree¶

Given a grammar, a parser normally validates the input sequence by attempting to construct a parse tree. A parse tree records from which grammar rules the input sequence can be validated according to the grammar. We can also think of the parse tree as the result of the parsing.

For example, the exxpression 5 + 2 * 3 can be parsed into the following parse tree

graph
   N0("S") --> N1
   N1("Expr") --> N2
   N1("Expr") --> N3("+")
   N1("Expr") --> N4
   N2("Expr") --> N5
   N5("Term") --> N6
   N6("Factor") --> N7
   N7("Integer") --> N8
   N8("Digit") --> N9("5")
   N4("Term") --> N10
   N4("Term") --> N11("*")
   N4("Term") --> N12
   N10("Term") --> N13
   N13("Factor") --> N14
   N14("Integer") --> N15
   N15("Digit") --> 2
   N12("Factor") --> N16
   N16("Integer") --> N17
   N17("Digit") --> N18("3")

In the above parse tree, starting from the root, we find that the first rule we applied in parsing 5 + 2 * 3 is S ::= Expr. The subtree indicates that the second rule/alternative we apply is Expr ::= Expr + Term. The left subtree contains no branches, which leads us to the digit 5. The right subtree indicates that the rule we applied to parse the sub expression 2 * 3 is Term ::= Factor.

Generate Fuzz test using the grammar¶

Given we understand the input requirement, we can now define a better fuzzer w.r.t to the grammar.
The idea is to starting from the starting rule's nonterminal, randomly choose one grammar rule given the current non-terminal to generate a sub term (sub parse tree). We need to take note of

Do not over expand with the recursive alternative.
Do not always expand the same sub-tree / node.

With this we can generate a good set of fuzz test input passing the syntax checking of the target software and yet having enough randomness.

Cohort Exercise (Graded)¶

Use JavaScript to implement a fuzzer that will randomly generate inputs to the calculator conforming to the grammar. For now, you can hardcode the expression grammar.

Hint: Start with the initial rule S ::= Expr and at each point, apply a rule at random. For example, randomly choose any of the rules Expr ::= Term, Expr ::= Expr + Term or Expr – Term in the next step. Continue until a valid expression for the calculator is obtained. Make sure you do not expand the rules forever to avoid infinite loop.

Limitations of Fuzzing¶

Fuzzing offers many perks to maintain goodd quality of softwares.

Can provide results with little effort
Can reveal bugs that were missed in a manual audit
Provides an overall picture of the robustness of the target software

It also shares certain limitations

Will not find all bugs
The crashing test cases that are produced may be difficult to analyse, as the act of fuzzing does not give you much knowledge of how the software operates internally
Programs with complex inputs can require much more work to produce a smart enough fuzzer to get sufficient code coverage

Feedback-based Fuzzing¶

Besides getting the fuzzer to leverage the input specification, another orthothongal approach is to gather feedback from the test report.

The idea is to turn the test case generation problem into an optimization problem.

graph
    N1-->N2
    N2("Execute Test Cases")-->N3
    N3("Collect Feedback")-->N1("Generate Test Cases")

As illustrated by the above diagram, in a feedback-baed fuzzing testing framework, we first use a fuzzer to generate the test cases, Then execute the test cases against the test subject. Finally we collect the test outcomes and report as a feedback to the fuzzer which should improve the test effectivness.

One common way to measure test effectiveness is to measure the code coverage.

Exercise (Not Graded)¶

Recall that in Jest, we can generate the coverage report in text or html. Note that it can also generate json report. With it, we can feed the json report back to the fuzzer to generate a better set of tests to increase the code coverage.

Discuss among your team members to think of a possible implementation to incorporate feedback-based fuzzing

API Fuzzing¶

Fuzz testing can be applied to API testing in several aspect

Fuzzing the low level request - generates random bytes as HTTP requests to test the robustness of the API service.
Fuzzing the routes - through fuzzers, we could generate sequences of random valid / invalid requests to the test the API routers.
Fuzzing the high level request - generate GET/POST/DEL/PUT HTTP requests by fuzzing HTTP parameters or form parameters.

Some references can be found in the following

https://medium.com/@Magii/fuzzing-with-postman-599dce6317c7
https://github.com/KissPeter/APIFuzzer
https://docs.gitlab.com/ee/user/application_security/api_fuzzing/

Fuzzing test on UI¶

Fuzz testing can be applied to UI too.

Fuzzing the UI element event handler by generating random input actions.
Fuzzing the inputs to the HTML forms.

Some reference can be found in the following

https://www.fuzzingbook.org/html/GUIFuzzer.html