One reason PHP is everywhere is because the language makes it so darn easy to get a simple app up and running. Dozens of online tutorials demonstrate the basics of how to write a database app that you can quickly adapt to your specific needs.

But be careful! This strength can also be PHP’s great weakness. Because PHP makes it so easy to create working programs, it’s equally easy for inexperienced programmers to write code that is slow, hard to maintain, and riddled with security holes. Too many PHP programmers are blissfully cranking out code that seems to work just fine, but is actually filled with hidden dangers and maintenance issues.

This post is for them.

To help, let’s run through four PHP scripts to perform a simple task: Saving a user’s name and age to a MySQL database.

Each example will be an improvement on the one before. If you’re just learning PHP, the Level 1 example may look like what you know how to do, but it’s dangerous and should be improved. The next three examples are intended to give you better ideas of how to write safer, more easily maintainable code. Just remember, this progression of coding techniques is meant to be a high-level overview, not a reference on database usage in PHP. You’ll want to read the PHP manuals for that.

Level 1: Basic MySQL, open to bugs and SQL injection

Let’s look at a simple form that lets a user enter their name and email address into a form, so that your PHP program can write it to a database table:

<html>
   <head>
      <title>Entry form #1</title>
   </head>
   <body>
      <form method="POST" action="post1.php">
         Name: <input name="name" type="text">
         <br />
         Age: <input name="age" type="text">
         <br />
         <input type="submit" value="Save">
      </form>
   </body>
</html>

Now, here’s a PHP script that takes that input from the user and writes it into a MySQL database:


<?php

$servername = 'localhost';
$username   = 'secret';
$password   = 'secret';
$dbname     = 'users';

$name = $_POST['name'];
$age  = $_POST['age'];

// ...
// Validation of $age as a valid number omitted for space.
// ...

$conn = mysql_connect( $servername, $username, $password );
$db   = mysql_select_db( $dbname, $conn );
if ( !$db ) {
    die( "Can't use DB $dbname" . mysql_error() );
}

$result = mysql_query( "INSERT INTO users ( name, age ) VALUES ( '$name', $age )" );

if ( !$result ) {
    die( 'ERROR: ' . mysql_error() );
}

?>

This is the basic way to do database interactions in PHP. You might find code like this in an old book on PHP from the early 2000s, or a quick-and-dirty tutorial on a website.

(Note that one of our fields is a number. I have omitted any validation of the field, such as making sure that the field is actually a numeric value and not, for example, the string “twelve,” or that the age is not a negative number. This is important, but outside the scope of this article. The is_numeric function is the starting point. Similarly, numeric values are also not quoted in SQL. Your database may be forgiving and translate the string ’12’ into the numeric value 12, but the right way to do it is to not quote numeric values.)

This initial version of our page works, and it seems to make sense. You can see how the parts of the PHP code get put together, and it’s clear how you assemble the data from the user into a SQL statement that puts data into your database. You’ve gotten it working, and you’re satisfied that you’ve got some good working code.

Nevertheless, this form is the coding equivalent of climbing a rickety ladder. Let’s look at some of its most dangerous elements.

Problem characters, security, and escaping your data

First, the first version uses the old mysql_xxx functions. These have been deprecated in favor of the newer mysqli_xxx functions. That means that in a future release, the mysql_xxx functions may go away, and your code could stop working.

Second, your SQL could blow up in certain cases; for example, if a data field has an apostrophe in it. If someone enters “Catherine O’Leary” in the name field, the resulting SQL code will look like:

INSERT INTO users ( name, age ) VALUES ( 'Catherine O'Leary', 37 );

That will produce the following error:

ERROR: You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right 
syntax to use near 'Leary', '' )' at line 1

The string value 'Catherine O'Leary' is not a valid string because of the apostrophe in the middle of a string delimited by apostrophes. The SQL interpreter thinks that the string stops after the O. In order for the program to accept an apostrophe in the middle of a string, it must be “escaped” by preceding it with a backslash, so it comes out like this:

INSERT INTO users ( name, age ) VALUES ( 'Catherine O\'Leary', 37 );

To escape the string, you can use the PHP function str_replace to replace the single quote with the backslash-quote sequence, but the better way is to use mysql_real_escape_string to do it for you. Just make sure you use mysql_real_escape_string and not the old, deprecated mysql_escape_string, which did not escape the strings based on character set.

There’s another even more important reason to escape your strings. SQL code constructed out of untrusted data from the internet is a giant security problem, vulnerable to a “SQL injection.” The SQL code that you are writing could be using data from anyone on the web. You’re executing code which was at least partially constructed from something potentially malicious. A hacker could embed other SQL code within your SQL command and steal your data or take over your server.

You may see code that uses various other PHP functions for handling escaping; that is usually a mistake as well. For example, you may see some programmers use str_replace or addslashes, but database-specific functions are the way to go. The database functions cover edge cases you may not be aware of specific to your database, and they also take into account character encoding. You may see people use functions for dealing with HTML data, such as htmlspecialchars or htmlentities, but HTML encoding and database encoding are entirely unrelated. The encoding for HTML is different than the encoding for a SQL statement, so you will get wrong results if you use the HTML functions for database encoding.

Level 2: Manually escaping data with MySQLi

Now, let’s look at a second version of the code, using MySQLi and proper escaping:


<?php

$servername = 'localhost';
$username   = 'secret';
$password   = 'secret';
$dbname     = 'users';

$name = $_POST['name'];
$age  = $_POST['age'];    // Validation of $age omitted.

$mysqli = new mysqli( $servername, $username, $password, $dbname );
if ( mysqli_connect_errno() ) {
    die( 'Connect failed: ' . mysqli_connect_error() );
}

$escaped_name  = $mysqli->real_escape_string( $name );

if ( !$mysqli->query( "INSERT INTO users ( name, age ) VALUES ( '$escaped_name', $age )" ) ) {
    die( 'ERROR: ' . $mysqli->error );
}

?>

The power of prepared statements

This version uses the newer mysqli extension instead of the mysql extension, and it properly escapes the input to prevent bugs and SQL injection attacks.

If this was the only code you were going to write, you could leave it like this and everything would be fine. However, dealing with data escaping and formatting in your SQL statements is prone to programmer error. What if you forget to escape some of the data? And no one wants to deal with quoting string data and not quoting numeric data. Fortunately, there’s an easier way: using prepared statements and placeholders.

Prepared statements require more lines of PHP code than a call to plain old mysqli_query, but they are safer and repeated calls can be faster to execute.

When you prepare a SQL statement, the database processes the SQL and prepares it for execution, but it doesn’t actually execute any code in the database. This can speed things when executing the same SQL query over and over because the query has to be prepared only once and then executed many times, instead of being both prepared and executed many times.

The most important advantage of prepared statements is that they let you use value placeholders, or bound variables. This allows you separate the SQL statement from the data on which it’s going to operate. This can make your SQL easier to understand, and prevents SQL injection without having to escape the data yourself.

Level 3: Prepared statements with MySQLi

Let’s see what these prepared statements look like in action:


<?php

$servername = 'localhost';
$username   = 'secret';
$password   = 'secret';
$dbname     = 'users';

$name = $_POST['name'];
$age  = $_POST['age'];    // Validation of $age omitted.

$mysqli = new mysqli( $servername, $username, $password, $dbname );
if ( mysqli_connect_errno() ) {
    die( 'Connect failed: ' . mysqli_connect_error() );
}

$stmt = $mysqli->prepare( 'INSERT INTO users ( name, age ) VALUES ( ?, ? )' );
if ( !$stmt ) {
    die( 'ERROR: ' . $mysqli->error );
}

$stmt->bind_param( 'si', $name, $age );
if ( !$stmt->execute() ) {
    die( 'ERROR: ' . $mysqli->error );
}

?>

Note how in this version, the query is not built from variables at all. It is a single string, created by the programmer, with placeholders (the question marks) that tell PHP where values will go when the query is executed. It’s impossible for outside data to mess up this SQL query because the text of the query is not built with any outside data.

The bind_param method then lets you specify what values will go with each placeholder:

$stmt->bind_param( 'si', $name, $age );

The 'si' means that there are two parameters, a string and an integer, in that order. You have to make sure that your placeholders and values match up.

Placeholders are never quoted in the SQL statement. They show where a value is going to go, and the type of each value is determined by the call to bind_param.

Only when the statement is actually executed with execute() does the outside data get involved. The data from the variables specified in bind_param is passed directly to the database, without having to be transformed into a SQL statement.

Abstracting with PDO

Let’s take our program to the third level by introducing the PDO package. PDO stands for PHP Data Objects, and makes the calls to the database even easier, although more abstract. PDO also eases the handling of bind parameters.

Another advantage of PDO is that it’s not tied to a specific database system. With our old version, if we wanted to change the database from MySQL to PostgreSQL, we would have to change all the code from mysqli_xxx calls to pg_xxx functions. With PDO, those differences between databases are abstracted away. To find out PDO drivers your system has installed, use this code:

print_r( PDO::getAvailableDrivers() );

Which on my system displays:

Array
(
    [0] => mysql
    [1] => pgsql
    [2] => sqlite
)

If the database system you want to use is not shown when you run that on your system, you’ll need to install the driver for it.

Level 4: Prepared statements with PDO

The PDO version does things a bit differently from what we’ve seen before:


<?php

$servername = 'localhost';
$username   = 'scratch';
$password   = 'scratch';
$dbname     = 'scratch';

$name = $_POST['name'];
$age  = $_POST['age']; // Validation of $age omitted.

try {
    $dbh = new PDO( "mysql:host=$servername;dbname=$dbname", $username, $password );

    $sth = $dbh->prepare( 'INSERT INTO users ( name, age ) VALUES ( :name, :age )' );
    $sth->bindParam( ':name', $name, PDO::PARAM_STR );
    $sth->bindParam( ':age', $age, PDO::PARAM_INT );
    $sth->execute();
}
catch( PDOException $e ) {
    die( 'Failed: ' . $e->getMessage() );
}

?>

First, note that all the work to be done is inside that try block, followed by a catch block. This saves us from the tedium of checking every operation for success or failure. If any of the statements fail inside that block, PDO will throw an exception, and the catch block will catch it and report on the error.

Second, note how the bind parameters are not positional as they were in the MySQLi example. Instead of just being question marks, the placeholders have names. This may involve a little extra typing, but it helps avoid common errors where the number or order of fields change. With named parameters, you can’t get your parameters out of order because the order doesn’t matter. If you prefer positional parameters instead of named ones, PDO still supports those as well.

The point

As you look over the progression of these examples, it may seem like going to extremes to create SQL code with named placeholders when you could slap together SQL in a string and be done with it.

It’s not.

Each of these examples is an improvement on the one before it. These techniques were created to solve problems, not just to make programmers’ lives more difficult. Following them can reduce program crashes and failures, and help eliminate security vulnerabilities. That’s more than worth a bit of extra effort on the frontend, right?

 

Background image courtesy of Shutterstock.com.

Andy Lester has been involved with open source for two decades, contributing many modules to Perl’s CPAN as well as speaking at open source conferences and user groups around the country. He’s the
author of the search tool ack and his book Land The Tech Job You Love is published by Pragmatic Bookshelf.

View posts by .

Interested in writing for New Relic Blog? Send us a pitch!