Compressing HTTP content has been around forever. Still, it can be common for the traffic between our apps and our web api to not be gzipped. For some apps, this isn't a big deal as the payload between the app and the server is quite small, pehaps just a couple KBs of JSON or XML. For other apps, however, gzipping can make a big difference in the amount of time it takes to communicate with the server.

First, we'll quickly discuss HTTP compression generally. The most common options when it comes to compressing HTTP content are deflate, gzip, and sdch. The first option, deflate, is a is a close cousin of gzip, but there are some browsers (cough IE cough) that didn't implement deflate correctly, so it fell out of favor. The last one, sdch, is a proprietary Google algorithm that is supported natively in Android and Chrome. Since we're dealing in iOS apps, this isn't really going to work for us, either.

This leaves the stalwart ole gzip. Gzip is supported out-of-the box in NSURLSession and automatically communicates to network servers that it can handle gzipped data. When the server sends down gzipped data, NSURLSession automatically de-compresses it; you don't have to do anything. This is awesome, except that all of this automatic-ness can require a bit of hacking in some situations, which we will get to in a minute.

Not all network traffic is ideally suited for gzipping. PDFs and most image formats are already compressed and gzipping them isn't going to make the files any smaller. Small files also aren't ideally suited for gzipping as they can actually get slightly larger with the overhead of the compression dictionary. Gzipping works really really well on text-based data. If your web API deals in moderate-to-large sized chunks of uncompressed or text-based data, gzip can really make your network payloads smaller. You can algorithmically determine if your server should gzip a specific payload, or you could just gzip everything. Google tends to advocate gzipping all traffic since the gains made for most payloads outweigh the negligible CPU costs of decompressing and even some images may benefit from gzipping as they can sometimes have a fair amount of metadata that can be compressed. Google has an interesting article on HTTP compression.

Is Gzipping Worth It?

I ran a couple of tests using some sample data. One example was a SQLite database file while the other was straight JSON.

File Type Data Size Gzipped Size
SQLite database 15 MB 2.7 MB
JSON Data 234KB 11KB

For the SQLite file, we have about an 80% smaller file and for the JSON data, we're looking at around a 95% savings. That's huge. Or small. Whatever. Obviously, the results will vary by the data you throw in to gzip, but for the data that it is well suited, gzipping is well worth it.

How Do I Make This Happen?

On the server

Apache, the most common web server, allows you to turn on http compression via its .htaccess files. This is great, but if turning on gzipping globally isn't an option, we can enable it on a per-request basis.

I'm going to use PHP and JSON for these examples.

Clients and servers let each other know their gzip capability via HTTP headers. The specific header is Accept-Encoding: gzip. NSURLSession sends this header to servers by default, so we don't have to do anything there. On the server side, we'll have to get our hands a bit dirty.

In the PHP file that our app is communicating with, we likely have some version of this:

echo json_ecode($data);
exit;

This outputs our JSON-encoded data which is sent down to the app. Great. We've decided that we want to gzip this output. To do so, first we need to get a little PHP-specific.

PHP has what are called output buffers. Unless output buffering is turned off, everything echoed out is kept in a variable (a buffer) until the end of the script, then the data is output all at once. To keep things simple, we want to gzip our entire output, so we need to clear out anything that's already in the output buffer.

We'll jump up a couple of lines and add some code.

while(ob_get_level())
{
    ob_end_clean();  
}

Output buffers can be nested, so we need to loop through and make sure that we "clean" all of the output buffers that may exist at the moment. Unless we've been echoing out data further up in the code, this is probably unnecessary, but better safe than sorry.

Now we'll alter our json_encode code from above.

// Create a json encoded string
$json = json_encode($data);         
// Get the uncompressed size.
$size = strlen($json);
// Gzip the string.
$compressed = gzencode($json);      
// Tell the app that the response is gzipped
header('Content-Encoding: gzip');   
// Send the uncompressed content length (we'll get to this in a moment)
header('X-Uncompressed-Content-Length: ' . $size);  
// Send the compressed content length
header('Content-Length: ' . strlen($compressed));   
// Output the gzipped data.
echo $compressed;                   
exit;                               
// We're done. 😌

That's it as far as the server goes. See that header('X-Uncompressed-Content-Length: ' . $size) bit? That's our hack for progress reporting.

App Code

Since we've converted all of our network code over to NSURLSession from NSURLConnection (right?!?), we'll use NSURLSession for this example.

As we discussed before, NSURLSession handles all of the decompression for us. Awesome. Except it needs to back off and give us a little emotional space because it's trying too hard and it is getting in the way of progress.

Download progress specifically.

🙄

Here is the delegate method that we all know and love that gets called when we get a response from the server.

func URLSession(_ session: NSURLSession, dataTask dataTask: NSURLSessionDataTask, didReceiveData data: NSData)

There's a lot of interesting objects in there. We know that we can grab that dataTask and check its countOfBytesReceived property and we divide it by its countOfBytesExpectedToReceive and we have the progress percent.

Except.

With gzipped content, NSURLSession will always report countOfBytesExpectedToReceive as -1. Also, the countOfBytesReceived is the uncompressed count, not the actual number of bytes transmitted to the app. 😒 It really tries to hide the compression from you.

This is where our header hack comes into play. Server headers that start with X- are user-defined, non-standard headers. We sneaked the uncompressed download size into the header so we can make this the body of our delegate method.

 if let response = downloadTask.response as? NSHTTPURLResponse,
    let contentLengthString = response.allHeaderFields["X-Uncompressed-Content-Length"] as? String,
    let contentLength = Int64(contentLengthString) {
                print("Download: \(totalBytesWritten)/\(contentLength)")
        }

First we cast our response as a NSHTTPURLResponse (geez, that name) and we grab our uncompressed content length from the header. We then convert it to an Int64 and divide the downloaded count from the delegate for our uncompressed total.

And 👏🏾, data transfer progress of gzipped data in our Swift app. Greatly reduced traffic. Faster downloads. Win/win/win.