Z, a must use command line tool

Recently, I was updating my bash prompt and setting up some Zsh profiles when I found a great script called Z (https://github.com/rupa/z). Z keeps a history of your most used directories and allows you to quickly jump to them based on the most popular/frequently used directory that matches a partial string. I manage a lot of servers with a lot of website directories, and I have a lot of git repos locally, so this tool has saved me a ton of typing already. Here are some examples:

z proj -> Takes me to ~/projects
z blog -> Takes me to ~/projects/personal/blog
z Doc -> Takes me to ~/Documents
z www -> Takes me to /var/www

As you can see, for a super simple and easy to use script, it can save you a lot of time. Just download it, and source it via your .bashrc/.zshrc file.

Why you should be using strace

Strace is a debugging utility for *nix systems, used to monitor the system calls used by a program, as well as any signals received by the program. Strace is useful for seeing what an application is doing under the hood, which can be vital to find subtle bugs in your code obscured by built-in functionality. I frequently use it with PHP to diagnose some types of errors as well as quickly find what causes scripts to hang (if you just want to see what’s slow in your code, I recommend profiling your code instead).

What are system calls? They are calls to low level functions provided by the OS. This includes functions like mmap, select, recvfrom, and more. This means you’ll frequently get a lot of output from strace, so it can be difficult to find the problem, but once you understand some of the system calls it’s much easier.

How To Use Strace

For example, I have the following file:

<?php
require 'cool.php';

Running this file with strace php test.php, I see some of the following output:

getcwd("/home/brandon", 4096) = 14
lstat("/home/brandon/./cool.php", 0x7fff18039d20) = -1 ENOENT (No such file or directory)
lstat("/usr/share/php/cool.php", 0x7fff18039d20) = -1 ENOENT (No such file or directory)
lstat("/usr/share/pear/cool.php", 0x7fff18039d20) = -1 ENOENT (No such file or directory)
lstat("/home/brandon/cool.php", 0x7fff18039d20) = -1 ENOENT (No such file or directory)
getcwd("/home/brandon", 4096) = 14
lstat("/home/brandon/cool.php", 0x7fff1803be80) = -1 ENOENT (No such file or directory)
open("/home/brandon/cool.php", O_RDONLY) = -1 ENOENT (No such file or directory)

Here I can see all of the places PHP is looking for the file, and in which order. This helps me see that PHP is failing to find my file cool.php (this is a contrived example as PHP will throw a fatal error telling you this anyways). A better example of when this may be useful is with the gettext extension. I had an issue a while back where PHP wasn’t loading my language files. The code that actually loads the language file is buried in the extension, and invisible to me. However, using strace, I was able to see where PHP was looking for the language files, and correct the issue with my code.

Attaching Strace To A Running Process

Far more useful than starting a program with strace is attaching strace to a running process such as Apache or PHP-FPM. This can be very useful to see why a process is hanging or very slow.

strace -p PROCESS_ID

This will attach strace to the given process id. You’ll probably have to be root for that to work. This will continually show you strace output as long as the process is running.

A very useful alias I have is straceall:

straceall () {
  ps -ef | grep $1 | awk '{ print "-p " $2}' | xargs strace -v -s 100
}

This takes a process name as an argument, and attaches strace to every process with that name. This lets me attach strace to all php-fpm processes for example, by running:

straceall php-fpm

I have the alias in my .bashrc file.

Long running PHP script gotcha

So I was recently answering a question on StackOverflow, and learned something quite interesting about PHP.  This particular aspect of PHP mostly affects long running scripts, such as PHP based servers or daemons.

PHP does not re-use resource ids internally, so eventually your script could run into an error with the resource ID overflows and wraps around into the negatives, which pretty much breaks everything. There was a bug report opened about it in 2009, but the issue is still present. This is a huge issue, because resources are used all over, even in not-so-apparent places. For example, most file operations use resources, as do database connections and network sockets. Some of these are obvious, as functions return a resource object (such as socket_accept). Others are not obvious, like file_get_contents, which actually uses two resources under the hood, incrementing the internal resource ID counter by two

The maximum resource id depends on your architecture (32 or 64 bit), and possibly the options used to compile PHP. On 32 bit systems, it’s almost certainly going to be 2,147,483,647. On 64 bit systems, its probably going to be 9,223,372,036,854,775,807.

While this limit is far easier to hit on 32 bit systems, it’s still possible on 64 bit systems if your script is very long running. This limit is good to be aware of when designing your application, as it’s possible to mitigate by using techniques such as forking and multiple processes.

Demonstration:

<?php
echo gmp_init("0x41682179fbf5")."\n";// Resource id #4
echo gmp_init("0x41682179fbf5")."\n";// Resource id #5
echo gmp_init("0x41682179fbf5")."\n";// Resource id #6
file_get_contents('/etc/hosts');
echo gmp_init("0x41682179fbf5")."\n";// Resource id #9
echo gmp_init("0x41682179fbf5")."\n";// Resource id #10
echo gmp_init("0x41682179fbf5")."\n";// Resource id #11

The above code will demonstrate resource IDs incrementing if you run it locally (requires php5-gmp). You can also see file_get_contents incrementing the ID by two.

Optimizing Your PHP with Xdebug

I work with a lot of PHP applications, and part of my job is optimizing those applications to reduce server costs and maximize how many requests each server can handle. There are many ways to do this, but I’m going to talk about using a profiler, which is one of the better ways to start optimizing your code base.

You’ve probably heard people saying not to get caught up in premature optimization, maybe you’ve heard them follow up with profile your code to find the bottlenecks. A lot of people don’t know what this means, or are under the impression that profiling is quite hard to setup. However, this is certainly not true. Profiling PHP is very simple, thanks to the wonderful Xdebug extension. Before we get started setting up Xdebug, you’re going to need a program to read the profiling information files, known as cachegrind files. I personally recommend using KCacheGrind (also available on Windows), although WinCacheGrind is another good alternative. Finally, there is a very lightweight profiler called WebGrind that is web based. You should download and install one of these programs.

Before continuing, a small note about callgrind/cachegrind files. Callgrind is a tool included with Valgrind, a C/C++ analysis tool. A lot of different profilers support this format, so the information about interpreting the results below are language agnostic.

Installing Xdebug

These instructions are for Ubuntu, however it’s pretty easy to find instructions for whatever platform you are on. Xdebug is quite popular.

Simply install the Xdebug PHP extension using apt, and you’re set:

sudo apt-get install php5-xdebug

You may need to restart Apache/PHP-FPM. You can verify it’s installed using phpinfo, or by running php -i | grep xdebug.

Configuring Xdebug

Next, we need to enable the Xdebug profiler (it’s disabled by default). We have two options here: Always on, or we can turn it on my setting a GET parameter or cookie. I recommend the second option, it’ll make your life easier.

Edit your php.ini file, and add the following option:

xdebug.profiler_output_dir = /path/to/store/cachegrind/files

I use the following:

xdebug.profiler_output_dir = /var/www/_profiler

Make sure the directory exists.

Next, enable the profiler.

Always On:

xdebug.profiler_enable = 1

Enabled using the GET/POST/COOKIE param XDEBUG_PROFILE=1:

xdebug.profiler_enable_trigger = 1

The above settings are mutually exclusive, so only use one of them.

Dumping A Cachegrind File

Cachegrind files contain all of the profiling information for your application. Simply navigate to your site with the GET parameter set, like so:

http://localhost:11000/?XDEBUG_PROFILE=1

In the configured profiler output directory, you should see a file named like this: cachegrind.out.8803 (the number will be different for you)

Now, open this file using your cachegrind viewer.

cachegrind

This is an example from a local development machine running WordPress, that was loading quite slowly for some reason. The first thing you should do, is sort by “Self” time, like I have. Please note that times here are relative, and don’t mean a whole lot by themselves.

If you’re wondering what the different between “Incl” (Inclusive) and Self time is, it’s quite simple. Inclusive time is the time required to run a function including the time it took to run every other function called by this function. That’s why your main file will have the highest inclusive time, it runs everything. Self time is far more useful, and is the time it took to run just that function, not any functions called by it. That’s the column important for optimization. Here, we can see that curl_exec was called 7 times, and took about 10 times longer than the next slowest call.

You can use the callers tab to see which functions called each function, and step through the execution of your program. You can also browse the source code for the function being called (You may have to map directories to your source directory if you’re running kcachegrind on another machine, do this from the settings). I did this for the above curl_exec, and was able to narrow down the trouble to the WordPress version check, which I then disabled as I do updates to this site manually.

On another site, I noticed the custom routing function was taking a substantial amount of time to map routes. When I looked at the function, it was doing a bunch of work to clean up and normalize the routes. This was a developer convenience feature, but had a high performance cost. I removed all of the normalization code, and fixed all of the routes to ensure they were already normalized, and it sped the site up by about 12ms.

The Call Graph

The call graph, a tab located in the bottom right pane, is a useful visualization of the call graph for the currently selected function call.

call_graphA call graph is a directed graph that representing the calling relationships between functions in your code. The functions pointing TO get_option (in the middle) are functions that call get_option. The functions that get_option points to are functions called from in the get_option function itself. This tool is useful for visualizing the cost of a function and where the cost comes from.

Conclusion

Profiling your code allows you to find the slowest calls and spend effort fixing those, instead of blindly fixing things that may have very little impact on performance. It’s an essential tool for any developer. I urge any programmers reading this to get familiar with a profiling tool, whether it’s Xdebug for PHP or the equivalent for your language of choice. It will make you far more productive, and you’ll wonder how you lived without it!

Basics Of Scaling: Load Balancers

Lately, I’ve been doing a lot of work on systems that require a high degree of scalability to handle large traffic spikes. This has led to a lot of questions from friends and colleagues on scaling, so I thought I’d do a blog series on the basics of scaling .

Why Use A Load Balancer?

One of the most important pieces of a high scale architecture is the load balancer. Load balancers allow you to distribute load (e.g. HTTP requests) between several servers. This is vital as it allows for horizontal expansion – you can increase your capacity (ability to serve x number of users) by simply adding more servers.  It really depends on your application, but I’ve found that web boxes are frequently the bottleneck initially, caused by your programming language of choice (e.g. Python/PHP/Ruby).

For example, a small PHP application running on a box with Nginx and PHP-FPM isn’t going to have a bottleneck at the database level, but it will be bottlenecked by PHP. Each request is handled by a separate PHP-FPM process, and each PHP-FPM process needs a certain amount of memory. Lets say your server can handle 256 concurrent users. The easiest way to scale at this point is to add more web boxes. Each web box will increase your capacity by 256 users. However, you need a way of distributing visitors between each web box, and that’s where the load balancer comes in.

Traditionally in a web server architecture, your load balancer(s) will sit in front of the rest of your stack, directly accepting requests from your users. All traffic will go through your load balancer(s), and get routed to a web box using a defined scheduling algorithm (e.g. sticky sessions, round robin, least load, etc).

BasicLB_2

 

Other benefits of a load balancer include health checks that can remove an unhealthy server from the pool automatically (e.g. if the server isn’t responding, the load balancer will temporarily stop directing traffic to it). You also eliminate your application/web servers as a single point of failure. Servers can fail, and just be removed from the pool without your visitors noticing.

Which Load Balancer?

There are many load balancers available, divided into two categories – hardware and software load balancers. I recommend starting off with a software load balancer, as hardware load balancers are very expensive and designed for larger infrastructure. Software load balancers include multipurpose applications such as Varnish, Nginx and Apache, as well as dedicated applications like HAProxy.

I personally recommend HAProxy as it’s a superb piece of software, very fast and reliable. If you’re building your infrastructure in AWS, you could also use an ELB, although I don’t recommend it as those are closed systems and provide limited customization.

Your load balancer should go on a dedicated server, to maximize reliability and resources.

Which Scheduling Algorithm?

There are three common scheduling algorithms employed by load balancers, and you can traditionally configure which one you’d like to use.

Sticky sessions use cookies to ensure a user always hits the same backend server. If your application currently stores state information about a request/user on the server (not a database or other source), this can be useful to add load balancing to your application without making code changes, but I don’t recommend it as it can put too much load on one server.

Round robin sends each successive request to the next server in the pool, e.g. A -> B -> C -> D -> A. This is the most common load balancing technique, and is normally fine for most websites.

Load scheduling directs requests to the server with the lowest load (e.g. CPU load). This is the best load balancing technique and is still quite easy to setup. Not all load balancers support this however.

Application Compatability

Unless you use sticky sessions, you’ll probably have to make changes to your application to be compatible with a load balancing system.

The biggest thing that will normally cause issues is stateful information being stored on the server. A great example of this is user sessions. PHP uses file based sessions by default, so a user may login on server A, and there next request goes to server B which doesn’t have their session info on it. Luckily, PHP and most other languages and/or frameworks support alternative session storage mechanisms like Memcached or storing sessions in your database. I most frequently use Memcached based sessions.

Any other state information that gets saved to a single server will need to be adapted to be stored in a shared location or replicated amongst servers.

Automatic Scaling

A logical next step is adding auto scaling to your web/application server cluster. In this system, you would have a management server of some sort that monitors your web server cluster, and automatically adds or removes servers based on load. This works best with a virtualization infrastructure or cloud based hosting such as Amazon AWS. Your management server would then add or remove the instance from your load balancer.

Again, if you’re using Amazon AWS, they have this built in. It’s called Auto Scaling groups, and works well with Amazon ELBs (elastic load balancers).

Closing Notes

Most large and/or complex architectures will include load balancers at several points throughout the infrastructure, and will normally use a cluster of load balancers at each point to avoid having a single point of failure. Certain DNS providers will allow you to specify multiple IP addresses for a domain name, and it will either return the closest IP to the user, or use a round robin approach (this is what Google does). If you have 10 web servers, but 1 load balancer, you have a single point of failure that can take down everything. Load balancers are traditionally quite stable, but you should always avoid single points of failure if you want a robust architecture.

GiBQsmf

Imgur’s server architecture, pictured above, uses round robin DNS, multiple server clusters, and a cluster of load balancers (shown here as the “proxy cluster”). I’ll be covering all of the elements pictured above, and more, in upcoming articles.