Web Security V - SQL injections and lookup tables
Imagine if you could convince a program to somehow execute code that it's not supposed to. An SQL injection is an attempt at just that.
SQL Injections

To start, here's a famous comic from xkcd that illustrates the disastrous effects of SQL injections on databases.

XKCD SQL injection comic
SQL injection vulnerabilities are common and exist on many small websites, and it generally does not take too long to spot them - even for a rookie. I've mentioned before that input fields on websites offer hackers a valuable window of opportunity to gain access to some server information, particularly its databases. To understand this, let's see how database queries are executed on the server in the first place.
$query = "SELECT * FROM users WHERE username='" . $username . "'";
mysql_query($query);
A query like this is very typical in handling database operations, and here it's used to pull up the record of some user registered on the website (if it exists), possibly to cross-reference it with login information that the user has entered. How can we exploit something like this? Let's say I was logging on a website that requires a username and password; perhaps it looks something like this:
Now if I were to enter something like "amer" for the username and "password" for password (I would strongly discourage using "password" as a password in the real world, I'm just trying to make a point here), then the query would be executed as SELECT * FROM users where username='amer'. So how do I exploit this? Well, maybe if I provide a username that is not really a username but just SQL code, then maybe I can change the structure of this statement and get it to do something other than its intended purpose. Maybe I could get it to remove a user from the database. If this was a university database, for example, then maybe I could delete a student record - someone I don't like. Maybe I could create a fake student with a fake ID. Or maybe I could just get rid of the database altogether. You can see then how problematic it can be to have a vulnerability like this on a website and how important it is to make sure that queries are executed safely.

Before I get into protection mechanisms, let's look at examples of how I could exploit this query. If I entered something like ' OR '1'='1, then the query would be SELECT * FROM users where username='' OR '1'='1'. Because '1'='1' will always be true, this query will retrieve the record of every user on the database. What happens after that is left to the hacker's discretion. Different websites are structured differently on the backend, so there isn't a unique way to read this information after retrieving it - finding a way to manipulate the information you have just retrieved into something useful (or disastrous, depending on your perspective) is generally determined experimentally. Some other tricks used in injections includes blocking out the rest of the query after the point of injection by using the comment characters /*. For example, I might choose to enter something like this as a username ' OR '1'='1' /* instead. This is an improvement over the original because now the rest of the statement is blocked and I have more control over what gets executed. It's important to note that a hacker cannot really see what the query statement looks like because this is all server-side code, but they can make repeated educated guesses to narrow the options down until they get a good idea of what it could be. For example, the query statement presented earlier could have been written differently.

$query = "SELECT * FROM users WHERE username='" . $username . "' AND password='" . $password . "'";
mysql_query($query);
Now if I try the old trick ' OR '1'='1' in the username only, it will not work because there is an extra clause in the query that isn't being taken into account. Better would be ' OR '1'='1' /*. Better yet would be to add a DROP statement at the add and get rid of the table altogether 1'; DROP TABLE users. Of course, the hacker doesn't really know the table is called 'users' for sure, but that's not really the point. One thing to note is that multiple statements cannot always be executed in the same query. In this particular example, it will not execute because mysql_query() does not allow it. That's why the injection process is experimental; you have to try various combinations before figuring out how the queries are structured and executed.

There are various defence mechanisms for SQL injections. Most programming languages will have built-in functions to take care of strings entered with malicious intents. I'm going to go ahead and borrow the PHP example provided by Wikipedia. The function of interest in PHP is mysql_real_escape_string(). There are some other ones out there, but they are deprecated and this function is standard in escaping input in PHP (as of the time of writing of this post).

$query = sprintf("SELECT * FROM users WHERE username='%s' AND password='%s'",
                  mysql_real_escape_string($username),
                  mysql_real_escape_string($password));
mysql_query($query);

Lookup Tables

Scenario: You just hacked into a user database of some website, and you have retrieved all the user and login information including passwords. But there's a twist: the passwords are hashed.

OK, so it could be worse, you could have an extra column of hashed information in the table indicative of password salting as well. Let's go ahead and make the assumption that the passwords were just hashed but not salted. What now? Well, I guess that depends on what you're trying to accomplish. If your intention was to hack an individual account, then it's very easy even if the password is hashed. Start by figuring out the hashing mechanism. A little inspection and perhaps some trial and error, and the algorithm can be figured out quite easily. After that, it's only a matter of replacing the password hash with your own by running any string through that hashing algorithm, and then replacing it for the account of interest in the database with the hash produced by the algorithm. Then, all you have to do is login with the password as the string you chose to hash and voila, you're in. That said, it's unusual for hackers to break into databases just for a single account; there's usually a larger agenda involved.

A simple technique to get access to quite a few accounts is to cross-reference the hashes with hashes of popular weak passwords that are used. A small program can be written to do the check on each account and find them all. Such an attack is called a dictionary attack. You probably already know that many websites will warn you against using such passwords and some go as far as preventing you from registering if you use a dictionary password or one that does not involve a combination of letters, numbers, and symbols. This is done for your own protection in the event of a database breach. This sort of protection renders dictionary attacks useless.

This brings us to the importance of lookup tables . It is often convenient to have a table that stores the pre-computed hashes of various common passwords that can be used to cross-reference the hashes in the database. This is called a look-up table, and it is particularly fast because the hashes are pre-computed, and this saves valuable computing time. The same process can be done effectively the other way around, and it is called a reverse lookup table. In this case, the compromised password hashes are stored in a lookup table, and then they're compared with hashes of common passwords one by one that need to be computed. Because the hash of a unique string is always the same, this method can be used to find users with the same password quickly. Finally, there's the rainbow table which is sort of a compromise between the two. In the case of a lookup table, a lot of memory will be required to store the hashes of common passwords used and so there's a limit on how large the lookup table can be. On the other hand, reverse lookup tables do not really use up memory aside from the compromised passwords which are of finite size because the hash of the password to test is computed on the spot and is not stored. However, it takes a lot of time to compare each potential password one-by-one, and so the issue with the reverse lookup table is the amount of time it takes. Rainbow tables attempt to solve this time-memory tradeoff by using a hash chain to compare with the compromised password hashes. Before I describe the algorithm to generate a rainbow table, we need to take a closer look at what a hash function is mathematically. A hash function maps some string into a hash. This hash is a fixed-length string and is represented by a constricted set of characters. Multiple strings can be mapped to the same hash function and this is called a hash collision, although it happens so infrequently for most hash functions in the context of cryptography that it is usually ignored. By definition then, a hash function does not have an inverse because it is not one-to-one. That's what makes them very useful in cryptography - they are hard to invert. Now I'm going to introduce the reverse of the hash function (note the reverse is not the same as the inverse) which maps a hash back to plaintext, and this is called a reduction function. The reduction function is really an arbitrary function that maps a hash to some plaintext. There's no specific rule that you need to adhere to when building the reduction function. So we can take a hash, put it through the reduction function to produce a plaintext, then put the plaintext back through the hash function to produce a hash, and repeat this process until we end up with a hash chain. We terminate when we produce a hash that matches the one we had originally.

More on salting

The best defence mechanism against these attacks is to salt the passwords. Perhaps the use of the word "salting" in the context of cryptography owes its origins to the use of salt as in impurity in various chemicals to control some sort of physical property - that's just my hypothesis; I'm not really sure.

Motivation: Consider a database table that looks like this:

Name Username Password
Harvey Specter hspecter 0d107d09f5bbe40cade3de5c71e9e9b7
Mike Ross mross 0d107d09f5bbe40cade3de5c71e9e9b7
Louis Litt littup 7a4ff36b94c56abfe3474c5994c4a916

You may notice immediately that the password hashes for both Harvey and Mike are identical which indicates that there is a high probability that they also have identical passwords. Of course, this is not necessarily true because it may be a hash collision, but this is unlikely. Anyone with access to the database now can figure out both passwords just by analyzing one of them. This can be a problem in larger tables where the likelihood of users choosing the same silly passwords ('password', 'letmein', 'yoloswag', etc) is much higher. It is the developer's responsibility to make sure user information is well protected, even when the user is being negligent with regards to their choice of password. Another thing that makes this password-storage mechanism rather insecure is that the table passwords are easy to crack with lookup tables because the password strings are directly mapped to their respective hashes.

We can add an extra layer of security by salting these passwords. Salting is the process of appending (or prepending) an additional string to the password before hashing it.

$salt = "secret";
$hash = md5($password . $salt);
This is a weak form of salting; it adds little to the security of the information in the database. If the salt is shared among all users, then it won't take too long for a hacker to figure out what it is. Furthermore, this doesn't solve the problem of identical hashes for identical passwords. A better approach would be to use a unique salt for each registered user. This adds a much deeper layer of security and eliminates the threat of lookup tables.
$salt = bin2hex(openssl_random_pseudo_bytes(16));
$hash = md5($password . $salt);

The next post will likely be the final on web security for now, and it will be about what is perhaps the most underrated hacking tool - social engineering and exploiting the human factor in computer security.

Web Security IV - Database security

A lot of websites require user login and have user accounts setup on their web server. All user information including usernames, passwords (well, not quite; i'll get to that later... ), emails, personal information, and any sort of dynamic information that you can see on the website is stored on a database. But first, an introduction to static and dynamic webpages.

Static and dynamic webpages

An informal definition: A static webpage appears the same to anyone who requests it, but a dynamic page varies based on the person accessing it. For example, your Gmail will always be different (hopefully) from someone else's page when they access it. The Toronto Star, on the other hand, appears the same regardless of who accesses it, so it can loosely be described as a static page. I say loosely here because it does in fact have a login system, but I'm going to ignore that for the purpose of this example. For dynamic websites, the server first checks for a logged-in user using local sessions and compares them with client-side cookies. If a handshake is successfully established, the next step would be to load the appropriate information from the database. Now, I'd like to make my definition for dynamic webpages a bit more formal.

Dynamic page: A page that loads dynamic (varying) content from a database or other form of organized data storage depending on the user accessing it - and the user is usually identified using sessions.

A dynamic page therefore requires programming on the server end to establish the logic flow of user verification. Please see the previous post in this series to get an idea of how this is done with sessions. As it currently stands, server-side programming is done predominantly with PHP, which is the reason why my example code is written in PHP. Keep in mind though that other languages are also frequently used including, but not limited to: Java in the form of JSP, Python, and Perl. Server-side programming introduces a whole new aspect that is not commonly dealt with in programming: information handling and security. When I say information handling, I don't just mean making sure the correct type of input is entered; I mean making sure that the input entered is not malicious in nature and that it can be processed as a query without obliterating the database. Unfortunately, some websites still pass database queries in the URL - a very risky feat that often gets exploited. Also under the category of "information handling", it is just as important to make sure that the correct user is seeing the information that you are retrieving from the database. It can be very problematic if a user were able to see information that they really shouldn't (imagine the sort of trouble that would ensue if such a problem existed on eBay).

Logging in

I briefly talked about the dangers of logging in as root on the database and the importance of setting up minimal privileges. I'm just going to reiterate: never login as root on a databse, and deny all unnecessary privileges. I don't know of any circumstance where you would need to DROP a table directly from a PHP script; in fact, if you are using DROP from a script, you're probably doing something terribly wrong. Deny these commands from MySQL and avoid the headache that you will inevitably get if you don't take these preventative measures.

Storing information

Now, this is the most important part of this article. First, let me provide you with a couple of scenarios involving horrible security blunders. See if you can spot the mistakes in each one:

  1. Your calculus course requires you to use MathXL. You try to login, but you realize you forgot your password, so you click the "Forgot your password" link, and it sends you an email with your old password in it, so you are able to login again.
  2. This is a sample of the PHP code in http://www.example.com/login.php used to validate a login request:
    
    /*
      Function get_mysql_array converts MySQL resource object to array manipulatable by PHP
    
      Parameters: 
      - $result is the result of the query executed, or the resource object
      - $array_type is a constant indicating the type of array to return (associative, numeric or both)
    
      Returns: an array of the information requested by the query
    */
    function get_mysql_array($result, $array_type = MYSQL_BOTH) {
      $records = array();
      while ($row = mysql_fetch_array($result, $array_type)) {
        $records[] = $row;
      }
      if (count($records) === 1) {
        return $records[0];
      }
      return $records;
    }
    
    //Get username and password strings from the HTML form
    $username = $_GET["username"];
    $password = $_GET["password"];
    
    $query_result = mysql_query("SELECT * FROM users WHERE username='$username' AND password='$password'")
      or die(mysql_error());
    
    $user_record = get_mysql_array($query_result, MYSQL_ASSOC);
    if (!empty($user_record)) {
      echo 'Welcome ' . $username . ', you\'re logged in now.';
    } else {
      echo "Invalid username or password.";
    }
    
    //End PHP script, some HTML mark up goes here
    
  3. This is the PHP code from www.example.com/register.php, used to register new users:
    
    //Contains important database functions, including "get_mysql_array()" 
    require_once("db_functions.php");
    
    /*
      Function sanitize() cleans an input string of any inappropriate or malicious characters or code.
      Parameters:
      - $input is the input string to be cleaned
      Returns: a clean version of the input string
    */
    function sanitize($input) {
      $input = trim($input);
      $input = htmlentities($input);
      $input = mysql_real_escape_string($input);
      return $input;
    }
    
    /*
      Function hash() salts a password string with a secret prepared string and hashes it using the md5 algorithm
      Parameters:
      - $plain_password is the password string as plaintext
      Returns: the salted and hashed passwords ready to be inserted into DB
    */
    function hash($plain_password) {
      $salt = "secret_salt123";
      $password = $plain_password . $salt;
      $password = md5($password);
      return $password;
    }
    
    $username = sanitize($_POST["username"]);
    $password = hash($_POST["password"]);
    
    $query = "INSERT INTO USERS (username, password) 
              VALUES ($username, $password)";
    mysql_query($query) or die(mysql_error());
      
    
Take a look at these scenarios carefully, and think of what the programmer is doing wrong in each case. In fact, there are few things being done right in any scenario. Here are the answers:
  1. There's a couple of big mistakes here. A website should never be emailing their users their passwords in plaintext (or any other form for that matter). If they are emailing passwords in plaintext, then they are also storing the password in the database in plaintext. This is a bad idea, and I'll expand on this in a bit. Second of all, if it isn't bad enough storing the password in plaintext, why are they emailing it over an unencrypted and insecure http protocol (Note that it's http://www.mathxl.com and not https://www.mathxl.com) where anyone can sniff it? Good websites usually have better mechanisms to assist a user when they forget their password. A common one is to send the user an email with a link, that includes some random token in the URL which when opened prompts the user to enter a new password. A copy of the token is of course stored on the database as well and is checked against the one provided by the URL to verify that it is the same user. Another method, which is not recommended, is to reset the password to a random string and then email the user this random string via SSL (https). The problem here isn't obscurity (or lack thereof) because SSL encrypts the string, so a "man in the middle attack" won't work here. The problem is that it opens up a DoS exploit (not quite in the traditional sense, but I would still classify it under DoS) because anyone can reset anyone else's password if they know their email without verification, which can be very annoying if you are on the receiving end of this attack.

    Are you taking notes, Pearson?

  2. This sort of code is just plain unacceptable. One thing that really bothers me, right off the bat, is that database queries and code are being mixed with other PHP code.

    A note on OOP: PHP implemented Object Oriented Programming in PHP5, and so it should be used when necessary. While OOP isn't too useful for most server-side programming purposes, I would go as far as saying it is inexcusable to not use this feature when working with a database. The database should be treated an object with methods specific to the sort of information that you are working with. One huge advantage to doing this is that if you ever decide to change the type of database you are using (many big companies do), you can do it much easier if you only had to change the one Database class, instead of changing every single PHP script which is next to impossible in most large projects. For example, Facebook started off using a MySQL database. MySQL databases are really good for most projects, especially start-ups because it's free and powerful. However, its performance declines when the database becomes very large, and usually, something like Oracle is preferred in these situations. Facebook is much too big at this point to even think about switching. I'm not sure exactly how Facebook deals with databases in their code, but I can make the fair assumption that they did not use OOP, since OOP was introduced in PHP about the same time that Facebook launched in 2004.

    However, this isn't really a security issue, just bad practice, and so I shouldn't really complain. The first real security mistake is that the input fields are using the GET global array.
    $username = $_GET["username"];
    $password = $_GET["password"];
    
    This exposes these strings in the URL. Better would be to use POST.
    $username = $_POST["username"];
    $password = $_POST["password"];
    
    Additionally, the original HTML code needs to reflect these changes as well.
    <form name="login" action="login.php" method="POST">
      <label for="username">Username:</label>
      <input type="text" id="username" name="username">
      <label for="password">Password:</label>
      <input type="password" id="password" name="password">
      <input type="submit" name="login_submit" value="Login">
    </form>
    
      
      
      
      
      
      
      
      
    POST still exposes these strings in the HTTP headers, but the HTTP header is not cached by the browser, and it's an important security measure to avoid caching sensitive information. Finally, any website using login systems should be implementing SSL to ensure that HTTP headers are encrypted.

    There's another security mistake in the execution of the query.

    $query_result = mysql_query(
      "SELECT * FROM users WHERE username='$username' AND password='$password'"
    );
    

    Here, the user input is never sanitized, and so this database is susceptible to SQL injections. These will be covered in more detail in the next article. Better would be to run it through some sanitizer function to ensure that the user doesn't run SQL queries on the database. Even better yet would be to use MySQLi instead, which allows you to create pre-written statements and plug-in variables accordingly without the worry of injections. Altogether, this script is poorly written.

  3. This code is a huge improvement over the former, but still constitutes legitimate grounds for termination as a web programmer. The sanitize() function serves a good purpose. While sanitization is great, this script is missing an important part, which is validation - is the input entered by the user reasonably valid? For example, if the user is expected to enter an email, and the string received on the server does not contain an "@" symbol, then you can assume that the string inputted is invalid, and registation should halt. Again, this isn't really a security issue, so I'll move to the next part of the script - the hash() function. This function is riddled with security holes. The first major issue is that the salt used is a fixed string which would be identical for every registering user. This effectively makes the salt useless.
    $salt = "secret_salt123";
    
    Better would be to use a random salt for every user.
    $salt = rand(0, 999999);
    
    This is better, but still doesn't even come close to meeting quality standards. First of all, rand() has a very low entropy, so it isn't really all that "random". Second, rand() produces only decimal numbers and within a limited interval.

    Better yet would be to use a stronger function for random generation.
    $salt = bin2hex(openssl_random_pseudo_bytes(16));
    
    Here, the openssl_random_pseudo_bytes() function takes in the length of the random string in bytes. I want a 32 digit hexadecimal string, so I will provide 16 bytes as the parameter (1 hexadecimal digit is 4 bits). The bin2hex() function converts the string from binary to hexadecimal.

    Finally, the best option available on a Linux server is none other than the famous character special file /dev/urandom.
    function get_urandom($length = 16) {
      if (is_readable("/dev/urandom")) {
        $file = fopen("/dev/urandom", "r");
        $rnd = fread($file, $length);
        fclose($file);
        return $rnd;
      }
      return NULL;
    }
    
    If you are running this on your own dedicated Linux server, then you can even get rid of the is_readable() if statement check - just make sure you have permissions to read the file. This gives you the raw random data. It is up to you what exactly you want to do with this value, but it's advisable to convert it to hex or run it through base64_encode() before using it as a salt.

    Note: If entropy is absolutely critical for the random number generation application, consider using /dev/random instead. This file will stall until the entropy pool has been regenerated to give you a more random string.

    The next big mistake is the hash function used, md5() is perhaps the most widely-known cryptographic hashing function out there but it is certainly not the most secure one - at least not anymore.

    Better would be to use crypt().

    $hash = crypt($password, $salt);
    
    This function has a little more to it than what was shown above, which I'll discuss soon. The salt should be tweaked a bit before passing it as a parameter.

    Lastly, this script suffers similar mistakes compared to the login script, particularly the SQL injection vulnerability.

Hashing

In accordance with the principle of Defense in Depth, it is important to put forth some precautionary measures in the event of a database breach. The first important step is to obfuscate sensitive information in the database. This usually means password and credit card numbers.

Note: It is usually better to depend on some third-party company for the handling and safe-guarding of credit card numbers. Storing such information locally is very dangerous - by doing so, you are essentially inviting hackers to steal them.

This can be done through a process called hashing. There is usually some confusion as to the differences between hashing and encrypting information. Hashing is fundamentally different from encryption in that it is a one-way process. You cannot "unhash" a string; the algorithm is not reversible (that's not entirely true, hash functions are only very difficult to reverse , but not impossible, as many computer scientists have demonstrated with md5). Please see Chapter II of Ralph Merkle's 1979 PhD Thesis for Stanford University to get a better idea of the development, history, and need for hash functions - in fact, I strongly recommend reading the entire thing. On the other hand, encrypted information is eventually run through a reverse algorithm to get the original information once it reaches its appropriate destination.

Number theory and computer security: It may be difficult to understand why exactly hashing algorithms aren't reversible, so I'll give an example to demonstrate a one-way function. Consider a function which returns the product of two (or more) very large prime numbers. Say it gives the product of these Fibonacci Primes: 1066340417491710595814572169, and 19134702400093278081449423917, which is just 20404106545895102906154128522206995414761716518871165973 according to Wolfram Alpha. Is this function reversible? Well, in theory, a lot of things are reversible. A better question to ask would be: Is this function reversible in practice? The answer to that is no. No one can spend that much computation time just to find the prime factors of a number on the same order of magnitude as 1055. Not even Wolfram Alpha. You will find that number theory has a profound application in computer security and cryptography. Much of modern-day hashing and encryption algorithms trace their origins back to mathematicians and number theorists. RSA, a widely known computer security firm, was founded by three mathematicians, and the RSA algorithm used in their security applications is based on prime factorization. It can be seen then how important the properties of numbers are in computer security.

Hashing in a broad sense refers to mapping certain strings to shorter keys of some constant length (usually integers) using some algorithm. A good example of where this is used is an associative array. The keys in an associative array are mapped to their values using a hash function which converts the keys into a short integer. It is obvious here that it would be advantageous for hashing functions to be fast for a lot of applications, including the aforementioned example. In this article though, I'm going to focus on a specific subset of hashing functions called cryptographic hashing functions. Cryptographic hash functions are different from other hash functions in that they work better slow. In fact, the slower the hash function the more secure it is. Of course, you don't want to make a new registering user wait 5 minutes while you hash their chosen password, so there's a limit to how slow you should be. The reason slower is more secure is because it makes it harder to crack using dictionary attacks or lookup tables.

I will pick up this post next time where I will discuss lookup and rainbow tables, salting, and SQL injections. Please stay tuned!

Web Security III - More on sessions
In the previous article of this series, I discussed sessions, the HTTP protocol, and some security issues associated with sessions. I didn't really get into the PHP code related to session security so I'm going to do that in this post. First, since HTTP is stateless, we need to start the session on every new TCP request on our web server.
session_start();
This validates the session server-side with the session ID on the client-side, and if it is successful, the appropriate session information is loaded into the global array $_SESSION. So, if we're working with a user login system, we can store some unique identifier for the user when they login, which can be accessed on subsequent requests from $_SESSION.
session_start();
session_regenerate_id(true);

$login_err = "Invalid login information. Please try again.";

#Cleans strings of any html entities and escapes MySQL code
function clean($string) {
  $string = htmlentities($string);
  $string = mysql_real_escape_string($string);
  return $string;
}

$clean_username = "";
$clean_password = "";
$username = trim($_POST["username"]);
$password = $_POST["password"];
$clean_username = clean($username);
$clean_password = clean($password);

if ($username !== $clean_username || $password !== $clean_password) {
  echo $login_err;
  die;
} else {
  #Code to validate username and password from database goes here.

  /*
    If user is validated, proceed to store their username in session for future identification
  */

  $_SESSION["username"] = $clean_username;
}
This how the login page, usually named login.php, might look like. In other pages, the session can be validated using the $_SESSION global variable.
if (isset($_SESSION["username"])) {
  #User is already logged in.
  $username = $_SESSION["username"];
} else {
  #Send user back to login page.
  header("Location: http://www.example.com/login.php");
  die;
}
Note that session_regenerate_id(true) should be run on login/logout and in any important step (such as going to a payment transaction page). It serves the purpose of regenerating the session ID making it harder to hijack. Moreover, the function clean is important to validate user input to make sure they aren't inserting malicious code in these input fields. I'll discuss SQL injections in more detail in the future.
Session Fixation

I'm going to conclude this post by discussing a security vulnerability that doesn't happen too often because PHP protects against it by default but can be devastating otherwise. Session fixation occurs when a third party tricks some client into opening a URL which sets the client's session ID to something pre-determined by the orchestrator of the attack. For example, the URL http://www.example.com/index.php?sid=123a9isodj12-031 is susceptible to session fixation because it passes the session ID through the URL. This opens up all sorts of security issues and should be avoided. In fact, parameters passed through the URL should be limited to a minimum. Operations such as SID handling, or execution of database queries should be running through invisible server-side code and not the URL which is transparent (and manipulatable) to the client. A better example, if Facebook passed SID's through the URL, I could send you a link in an email, or on my website, or wherever really, pointing to something like http://www.facebook.com/index.php?sid=abc123 under the assumption that you will innocently open this URL. In the meantime, I would be sitting by my computer, refreshing that very URL in my browser, waiting for you to fall for the trap. When that happens, I would be logged into your account automatically!

Web Security II - Sessions

The HTTP protocol is stateless; nothing is remembered between one TCP request and the next. That means that as you navigate from one webpage to another on the same website, no information is transferred. This can be a problem, particularly for dynamic websites that need to remember user information between requests. For example, a website which provides e-mail service needs to remember who the user is as they navigate through the different parts of the website. When you're using a website like Gmail, you're only asked to login once, and after that the server remembers you for a reasonable amount of time even if you change pages. But how can that be possible if HTTP requests are stateless?

Sessions

The server doesn't really depend on HTTP headers to remember information between requests; rather, it stores user information on the server in the form of sessions. When you access a dynamic website like Gmail, Hotmail, or Facebook, you are automatically assigned a unique Session ID (SID) - this is a random string of characters which is used to identify you. A copy of the SID is stored as a cookie on the client's computer, and another copy is stored on the server. Whenever you request access to a webpage on the server, the server grabs your SID cookie, and tries to find a matching one on the server. If it does find a matching one, then of course that means you are already logged in, and it will display your information accordingly. If it doesn't find a matching one or it doesn't find the appropriate cookie at all on your computer, then it simply asks you to login, creates an SID for you upon successful login, and then uses that to remember you for the remainder of the session. Sessions do eventually timeout after an extended period of time. Each website assigns its own value for how long it should take for the session to expire.

Session Hijacking - A Demonstration on Facebook

Essentially, your SID is just a temporary ticket to access your account on the server and whatever information you have stored on the database. For that reason, SIDs are popular targets for hackers looking to gain access to your account. Recall that SIDs are stored as cookies on the client's computer. Keep in mind that a cookie is really just a file stored on your computer, so it can be accessed, read, copied, manipulated. An important thing to note about cookies is that each website has its own set of cookies which do not interfere with the cookies of other websites. Furthermore, cookies are browser-dependent. If you're logged in on some website now on one browser, you can try opening that same website on another browser, and you'll find that you won't be logged in on the second browser. Because each browser stores cookies independently of other browsers, you can login to Facebook on Chrome and Firefox simultaneously on the same account using different SIDs. Now, I'm getting to the most exciting part of this post - I'm going to demonstrate session hijacking on the planet's largest social networking site. Here is an outline of the steps I'm going to take to complete this:

  • Open up Facebook on Chrome - it will not be logged in on Chrome.
  • Dissect Facebook's cookies and find the one storing the SID - I'll do this on Chrome because it has a better UI.
  • Open Facebook on Firefox - not logged in yet.
  • Modify Facebook's cookies on Firefox so that the SID on Firefox matches that of Chrome
  • By simply refreshing the page on Firefox, I should be logged in automatically without having to enter my username or password. If that's the case, then the session has been hijacked successfully. Otherwise, it's back to the drawing board

Naturally, things won't go according to plan, and I'll encounter some difficulties, so I'll change the plan as I move along. It may not seem so impressive that I'm essentially hacking my own Facebook account on a different browser, but it's the idea here that matters. In theory, this could be anyone's SID, and I could be hijacking their account from any computer in the world using any browser. In reality, the only hard part is obtaining the SID in the first place, which I'll be discussing later on. At this point, this post may seem more like a tutorial on hacking than computer security. I'm not going to deny that, some of the most important things I have learned about computer security stem from malicious intents. If you want to learn how to protect something, you need to learn how it is attacked first. Anyways, I'll assume that the first step is trivial for most readers, so I'll go straight to the second. Chrome has some nice built-in developer tools that will come in handy for this task. You can open it up by right-clicking anywhere on the webpage, and clicking the "Inspect Element" menu option. This should show a frame attached to the bottom of the browser, we're interested in the "Resources" tab, so go ahead and click on that. There's some interesting things in there like "Local Storage" and "Application Cache", but right now I just want to see "Cookies", so I'll click that, and Chrome will list the cookies that Facebook is storing on my computer.

Screenshot of Developer tools on Chrome
Fig. 1 - Facebook cookies on Google Chrome.

The cookies stored on your computer will likely be slightly different; you may be missing some or have a few extra - that's fine. Now I need to narrow down the cookies to find the one that stores the SID. By default, PHP gives it a name of PHPSESSID, but I would expect Facebook developers to have given it a different name at least for the sake of obscurity. There's a few ways to go about finding the right cookie. First of all, I know for a fact that the SID expires with the session, so it can't possibly be one of the first four that have explicit expiry dates. That leaves me with five cookies that expire on session end. At this point, I could try deleting the cookies one by one, and then try to refresh the page. If I am logged out after refreshing the page, then I know that this cookie is used to keep my session alive and is very likely the SID. Another thing I could do which would probably the better choice would be to look at the list of cookies before logging in, and once again after logging in. By looking at the changes made to the cookies list, I can gain valuable clues as to which one is the SID. The new cookies that are added only after I've logged in are likely used for authentication. Note that the SID cookie will not necessarily be added after login, it may already be there, but it's value changes upon login. Through the latter method I was able to determine that the cookies with the names c_user and xs are both used in combination with each other to keep the session alive because they appeared only after I logged in. But which one is the SID and what is the other cookie's purpose? It's not at all surprising that Facebook is using a combination of cookies for authentication, being a large website with an incredible amount of traffic.

A Note about Sessions, Randomness, and /dev/urandom: The SID string is not always unique because computers don't have infinite entropy in random number generation. This problem can be diminished by using /dev/urandom to supply the randomness instead of PHP's default mechanism (these are set in php.ini with session.hash_function generating the SID hash, and session.entropy_file dictating the 'randomness') and jacking up the entropy. The file urandom collects random noise from the server's devices and can be used to generate the SID. It is a rare event, but it could happen that two users are assigned the same SID, and this might cause problems for a site like Facebook, when you have so many users logged in at once. For this reason, it is good practice to add another cookie to authenticate. This other cookie is usually the user ID, just a unique number that is assigned to you when you register and it is used to identify your record on the database. The c_user cookie sounds like it fits this description, especially by looking at its value, I can see that it's an integer and is very likely my user ID (it even looks very familiar). To confirm, I'll visit facebook.com/profile.php?id=X where X is the value of c_user, and as expected my very own profile page shows up. So that means the xs cookie must be the one that stores the SID. Now, make sure to copy the values of these cookies into some text file for later use.

The next part will be inserting these cookies in Firefox. Before that, a few things need to be understood here. First of all, cookies belong to a particular domain; that is, Google has cookies on your computer that are independent to Facebook's. Moreover, Google or Facebook's subdomains also have cookies that are independent of each other. For instance, the search engine Google can be accessed from www.google.com, and GMail can be accessed from mail.google.com. These two subdomains will store cookies that are not related or grouped with each other because they are different subdomains. There's more - domains can be accessed with or without the 'www' and will generally show the same page. You may have thought that it didn't matter which one you typed and that they are both the same, but in reality they are very different. In fact, they are treated as different subdomains by the browsers and so will store cookies independently of each other. You can see how this can be problematic because if each one stores its own cookies, then you could be logged in on facebook.com but logged out on www.facebook.com. For this reason, most websites redirect their non-www domain to their www domain. If you type in facebook.com in your browser right now, you'll notice that it will automatically redirect to www.facebook.com. This can be accomplished quite easily on an Apache server using mod_rewrite.

RewriteCond %{HTTP_HOST} ^amerhesson\.com$
RewriteRule (.*) http://www.amerhesson.com/$1 [R=301,L]

This website also uses mod_rewrite to redirect to the www url. Now, the reason I'm mentioning this is that we will need to indicate the host domain of the fake cookies we will be inserting, and we need to put them to the correct subdomain or else it will not work. In this case, the correct host is www.facebook.com. Putting anything else, including facebook.com will not work.

By default, Firefox only allows you to view cookies, and not add/modify/delete them. We can change that by installing a plug-in. I installed one called Edit Cookies. We'll go to the Facebook main page, it is not logged in, but we're about to change that very quickly. Let's run the cookie editor in Firefox (should be under the tools menu) and add our two cookies that will allow us to bypass the login system. It will ask for four fields: Name, Content, Host, and Path. In the 'Name' field, we will put c_user, 'Content' will be the value that we grabbed from Chrome, Host should be 'www.facebook.com', and 'Path' will just be the root folder '/', allowing the cookie to propagate throughout the entire subdomain. Make sure that the 'Expire at end of session' option is selected and go ahead and add that cookie. Repeat this process for the xs cookie. Once you are done, you should see the cookies added to the list. Now, simply refresh the page, and you should be logged into your own account!

If you followed the steps correctly, that should work for you. If it doesn't work, keep experimenting until you get it right! OK now the tougher part is knowing how to get the SID in the first place. This can be accomplished a few ways. First of all, contrary to popular belief, you do not need to hack the Facebook servers to hack someone else's Facebook account. Because a copy of the SID is stored on the client's computer as a cookie, breaking into the client's computer to steal the SID would be sufficient to hijack their account. This is still no walk in the park, but it is much less daunting than attempting to hack the servers. Other things to look for are XSS exploits that can expose the site's cookies, especially on insecure browsers. I'll likely write a post on XSS scripting at some point in this series, so please stay tuned.

Web Security I - Introduction
Think evil

Whenever you're writing server-side code, you should always be thinking of ways in which your code could be exploited, particularly when handling input from the client. Input fields provide a window of opportunity for users with malicious intents to interact with the server because input fields are usually processed on the server. I feel like I should mention this point because it happens all too often and opens up all kinds of exploits on the server - under no circumstances should you be executing client-side input with eval(). I really can't think of any legitimate reason to execute user input, but in the unlikely case that there is, make sure it is as strict as can possibly be. Input field values are also usually stored in some database where queries are executed to insert or modify user information. If some user knows for instance that an SQL database is being used on the server (generally, the underlying technology is not very difficult to figure out, especially when many companies openly disclose that information), they can enter malicious SQL code in the input fields in hopes of executing it on the server to retrieve information they shouldn't be able to see or to simply destroy as much information as they can on the database. This is called an SQL injection, and I will be writing about injections in more detail in the future including some common defense mechanisms against them.

Once you understand the various security issues associated with building a website, you will need to run it through a series of rigorous tests to ensure that your website meets quality security standards. However, just because you can't find exploits on your own system, doesn't mean a group of experienced hackers would break a sweat over it; you may need to get other people to test it who have experience in the area of computer security, or rather breaking security, and let them have a crack at it - literally.

Security through Obscurity

This is quite a common phrase used in computer security. It refers to adding security to the web server by making it difficult for a perpetrator to find their way around, or hiding the underlying technology used in a web server. For example, different websites use different languages, databases, and servers. The first step any sensible hacker would make is to gather as much information as possible about the target. If the web server doesn't make any attempts at hiding this information, then the hacker can simply proceed to the next step. That said, if a skilled hacker wants to compromise your system, hiding that sort of information will not stop them, as there are usually plenty of clues that they can use to figure it out anyway. A good web server will not depend on obscurity for security - not even as a fail-safe method. Obscurity may be used as an added measure for Defense in Depth, if at all, but never as a primary method of defense.

Never use this as an excuse to avoid documenting your server-side code, or writing obscure code in an effort to dissuade an unauthorized perpetrator from dissecting it. If a perpetrator does manage to get access to your server, then you've got bigger problems to worry about than exposed source code. That said, when writing JavaScript code, it can be particularly useful to run it through a minifier - I generally prefer to use Google's minifier. This does two important things. First, it compresses your JavaScript code considerably making your webpages load faster. Also it obfuscates code that would otherwise easily be dissected by the client. This is only useful if you are using JavaScript extensively in your website.

While I'm on the subject of code-documenting, I would like to share my stance on the issue. In my opinion, code-documenting should be kept to an absolute minimum. It is much better to write clean, easy to understand, maintainable code than to slap together a 1000-character-one-liner and write an essay's worth of comments describing what it does. I used to think one-line solutions were elegant. In reality, elegance depends on the efficiency of the approach taken to the solution, and not the number of lines needed. You should try to split up your code into separate lines when it becomes hard to understand or change. As a rule of thumb, if one line of code has to wrap to the next line on a wide-screen monitor, then you're probably doing it wrong. The reason why code-documenting is stressed so much by teachers or employers, especially early on in a programmer's career, is because usually programmers are inherently terrible at writing clear code - at least that's the impression that I've gotten so far. What should be stressed though is that it's OK to sacrifice some code efficiency for the sake of maintainability - always write your code with the future in mind.

Defense in Depth

Never rely on just one mechanism to protect your website. This rule applies to your server and database as well. After all, if one is compromised, they all are. Always have a fail-safe mechanism prepared, and it can never hurt to have an extra back-up, just in case. Defense in depth is also important to buy extra time when your server faces the inevitable hacking - even the best do. Usually, a big website would have network administrators working round-the-clock looking for suspicious activity and suppressing it. If they detect a breach, obviously they have procedures to take counter-measures, but if their system is secured on multiple levels, then the perpetrator shouldn't be able to just waltz in and wreak havoc on the entire system in a matter of minutes. With a secure server, they would be able to recover before extensive damage occurs.

Least Privilege

Never give anyone more privileges than what they need. This applies to the developers who are working on the website, and the clients who use the website. For example, if you're running a MySQL database, then by default you will have a root login with full access to the database. Some web servers may connect to their database with a root account. More often than not, this is the wrong approach. Usually, you will only use a handful of the statements that are provided by MySQL such as SELECT or INSERT. Generally, you will not need to modify the structure of a table within a PHP script, so you can deny privileges to statements like DROP or CREATE, and even most - if not all - administrative statements like GRANT and SUPER. The account you use to connect to the database from a PHP script should have limited privileges. Connecting as root is generally discouraged, even if you are physically accessing the database yourself, and not from an automated script, you should avoid root because a small mistake in your query could be devastating. I feel a bit hypocritical by saying this because the database I'm using currently connects as root, but at the same time, my website (as of the time of writing of this post) does not accept any external user input, so it isn't really a threat against injections - I also do intend on fixing it soon. You may ask why exactly these privileges should be stripped if a user on the website will never be able to run SQL queries. If a user does succeed in making an SQL injection, you will want to limit their damage and prevent them from using potentially disastrous statements like DROP which could eradicate an entire table at once. Even if you think you are validating your inputs properly, you should strip unnecessary privileges because this is in accordance with the principle of Defense in Depth - albeit it won't deter a determined intruder, but it's still a good security measure to implement because it may buy you some time before the damage becomes extensive. This also applies to the files on your web server. Deny access whenever possible to files on your server - especially the ones containing sensitive information. For example, if you're running an Apache server, you will have an .htaccess file that you will likely want to guard vigilantly. Make sure that you set the minimum permission and ownership bits on this file - just enough for the server to run properly.

/var/www# chmod 644 .htaccess
/var/www# chown root .htaccess

That sets the permission bits to -rw-r--r-- so only root can write to .htaccess - everyone else can only read it. You may be tempted to set the permissions even lower by doing something like chmod 600, but that would be a mistake as the server needs to be able to read .htaccess to display pages properly. Otherwise, that would return a 403 error on every page on the server. It is also good practice to set minimum privileges within .htaccess itself when using authentication and authorization directives if you're running Apache, but that requires a separate post on its own. I'll conclude this post with a real-life example of how the Principle of least privilege is used. A valet is often given a special key, that can only be used to drive the car, park it, and nothing else. This key cannot be used to open the car itself or its trunk. It may seem silly to deny these privileges to the valet when you're trusting them with your car, but it is in fact very important. You've got to put yourself in the shoes of an "evil" valet person to understand why exactly that is. A valet with intents of stealing the car would probably want to be discreet about it so running off with the car instead of parking it would not be a good idea. However, if hypothetically the valet key did have access to opening the car, then the valet could just store the form of the key in some memory foam, make a copy later and use that copy to steal the car from the owner's house later at a more convenient time. But, since valet keys are specifically designed such that they can't be used to open the car doors, the owner would be relieved to know that their car is safe. Whenever you're faced with a situation where you can reduce privileges without losing much functionality or taking much away from the user experience, do it - you don't even really have to think about the security implications. It may not improve your security considerably but you can certainly trust that it wouldn't hurt it.