To start, here's a famous comic from xkcd that illustrates the disastrous effects of SQL injections on databases.
$query = "SELECT * FROM users WHERE username='" . $username . "'"; mysql_query($query);A query like this is very typical in handling database operations, and here it's used to pull up the record of some user registered on the website (if it exists), possibly to cross-reference it with login information that the user has entered. How can we exploit something like this? Let's say I was logging on a website that requires a username and password; perhaps it looks something like this:
Now if I were to enter something like "amer" for the username and "password" for password (I would strongly discourage using "password" as a password in the real world, I'm just trying to make a point here), then the query would be executed as
SELECT * FROM users where username='amer'. So how do I exploit this? Well, maybe if I provide a username that is not really a username but just SQL code, then maybe I can change the structure of this statement and get it to do something other than its intended purpose. Maybe I could get it to remove a user from the database. If this was a university database, for example, then maybe I could delete a student record - someone I don't like. Maybe I could create a fake student with a fake ID. Or maybe I could just get rid of the database altogether. You can see then how problematic it can be to have a vulnerability like this on a website and how important it is to make sure that queries are executed safely.
Before I get into protection mechanisms, let's look at examples of how I could exploit this query. If I entered something like ' OR '1'='1, then the query would be SELECT * FROM users where username='' OR '1'='1'. Because '1'='1' will always be true, this query will retrieve the record of every user on the database. What happens after that is left to the hacker's discretion. Different websites are structured differently on the backend, so there isn't a unique way to read this information after retrieving it - finding a way to manipulate the information you have just retrieved into something useful (or disastrous, depending on your perspective) is generally determined experimentally. Some other tricks used in injections includes blocking out the rest of the query after the point of injection by using the comment characters /*. For example, I might choose to enter something like this as a username ' OR '1'='1' /* instead. This is an improvement over the original because now the rest of the statement is blocked and I have more control over what gets executed. It's important to note that a hacker cannot really see what the query statement looks like because this is all server-side code, but they can make repeated educated guesses to narrow the options down until they get a good idea of what it could be. For example, the query statement presented earlier could have been written differently.
$query = "SELECT * FROM users WHERE username='" . $username . "' AND password='" . $password . "'"; mysql_query($query);Now if I try the old trick
' OR '1'='1' in the username only, it will not work because there is an extra clause in the query that isn't being taken into account. Better would be ' OR '1'='1' /*. Better yet would be to add a DROP statement at the add and get rid of the table altogether 1'; DROP TABLE users. Of course, the hacker doesn't really know the table is called 'users' for sure, but that's not really the point. One thing to note is that multiple statements cannot always be executed in the same query. In this particular example, it will not execute because mysql_query() does not allow it. That's why the injection process is experimental; you have to try various combinations before figuring out how the queries are structured and executed.
There are various defence mechanisms for SQL injections. Most programming languages will have built-in functions to take care of strings entered with malicious intents. I'm going to go ahead and borrow the PHP example provided by Wikipedia. The function of interest in PHP is mysql_real_escape_string(). There are some other ones out there, but they are deprecated and this function is standard in escaping input in PHP (as of the time of writing of this post).
$query = sprintf("SELECT * FROM users WHERE username='%s' AND password='%s'",
mysql_real_escape_string($username),
mysql_real_escape_string($password));
mysql_query($query);
Scenario: You just hacked into a user database of some website, and you have retrieved all the user and login information including passwords. But there's a twist: the passwords are hashed.
OK, so it could be worse, you could have an extra column of hashed information in the table indicative of password salting as well. Let's go ahead and make the assumption that the passwords were just hashed but not salted. What now? Well, I guess that depends on what you're trying to accomplish. If your intention was to hack an individual account, then it's very easy even if the password is hashed. Start by figuring out the hashing mechanism. A little inspection and perhaps some trial and error, and the algorithm can be figured out quite easily. After that, it's only a matter of replacing the password hash with your own by running any string through that hashing algorithm, and then replacing it for the account of interest in the database with the hash produced by the algorithm. Then, all you have to do is login with the password as the string you chose to hash and voila, you're in. That said, it's unusual for hackers to break into databases just for a single account; there's usually a larger agenda involved.
A simple technique to get access to quite a few accounts is to cross-reference the hashes with hashes of popular weak passwords that are used. A small program can be written to do the check on each account and find them all. Such an attack is called a dictionary attack. You probably already know that many websites will warn you against using such passwords and some go as far as preventing you from registering if you use a dictionary password or one that does not involve a combination of letters, numbers, and symbols. This is done for your own protection in the event of a database breach. This sort of protection renders dictionary attacks useless.
This brings us to the importance of lookup tables . It is often convenient to have a table that stores the pre-computed hashes of various common passwords that can be used to cross-reference the hashes in the database. This is called a look-up table, and it is particularly fast because the hashes are pre-computed, and this saves valuable computing time. The same process can be done effectively the other way around, and it is called a reverse lookup table. In this case, the compromised password hashes are stored in a lookup table, and then they're compared with hashes of common passwords one by one that need to be computed. Because the hash of a unique string is always the same, this method can be used to find users with the same password quickly. Finally, there's the rainbow table which is sort of a compromise between the two. In the case of a lookup table, a lot of memory will be required to store the hashes of common passwords used and so there's a limit on how large the lookup table can be. On the other hand, reverse lookup tables do not really use up memory aside from the compromised passwords which are of finite size because the hash of the password to test is computed on the spot and is not stored. However, it takes a lot of time to compare each potential password one-by-one, and so the issue with the reverse lookup table is the amount of time it takes. Rainbow tables attempt to solve this time-memory tradeoff by using a hash chain to compare with the compromised password hashes. Before I describe the algorithm to generate a rainbow table, we need to take a closer look at what a hash function is mathematically. A hash function maps some string into a hash. This hash is a fixed-length string and is represented by a constricted set of characters. Multiple strings can be mapped to the same hash function and this is called a hash collision, although it happens so infrequently for most hash functions in the context of cryptography that it is usually ignored. By definition then, a hash function does not have an inverse because it is not one-to-one. That's what makes them very useful in cryptography - they are hard to invert. Now I'm going to introduce the reverse of the hash function (note the reverse is not the same as the inverse) which maps a hash back to plaintext, and this is called a reduction function. The reduction function is really an arbitrary function that maps a hash to some plaintext. There's no specific rule that you need to adhere to when building the reduction function. So we can take a hash, put it through the reduction function to produce a plaintext, then put the plaintext back through the hash function to produce a hash, and repeat this process until we end up with a hash chain. We terminate when we produce a hash that matches the one we had originally.
The best defence mechanism against these attacks is to salt the passwords. Perhaps the use of the word "salting" in the context of cryptography owes its origins to the use of salt as in impurity in various chemicals to control some sort of physical property - that's just my hypothesis; I'm not really sure.
Motivation: Consider a database table that looks like this:
| Name | Username | Password |
|---|---|---|
| Harvey Specter | hspecter | 0d107d09f5bbe40cade3de5c71e9e9b7 |
| Mike Ross | mross | 0d107d09f5bbe40cade3de5c71e9e9b7 |
| Louis Litt | littup | 7a4ff36b94c56abfe3474c5994c4a916 |
You may notice immediately that the password hashes for both Harvey and Mike are identical which indicates that there is a high probability that they also have identical passwords. Of course, this is not necessarily true because it may be a hash collision, but this is unlikely. Anyone with access to the database now can figure out both passwords just by analyzing one of them. This can be a problem in larger tables where the likelihood of users choosing the same silly passwords ('password', 'letmein', 'yoloswag', etc) is much higher. It is the developer's responsibility to make sure user information is well protected, even when the user is being negligent with regards to their choice of password. Another thing that makes this password-storage mechanism rather insecure is that the table passwords are easy to crack with lookup tables because the password strings are directly mapped to their respective hashes.
We can add an extra layer of security by salting these passwords. Salting is the process of appending (or prepending) an additional string to the password before hashing it.
$salt = "secret"; $hash = md5($password . $salt);This is a weak form of salting; it adds little to the security of the information in the database. If the salt is shared among all users, then it won't take too long for a hacker to figure out what it is. Furthermore, this doesn't solve the problem of identical hashes for identical passwords. A better approach would be to use a unique salt for each registered user. This adds a much deeper layer of security and eliminates the threat of lookup tables.
$salt = bin2hex(openssl_random_pseudo_bytes(16)); $hash = md5($password . $salt);
The next post will likely be the final on web security for now, and it will be about what is perhaps the most underrated hacking tool - social engineering and exploiting the human factor in computer security.