Rant: Constants can’t be mutable

I’ve been working with Ruby lately, and a few people have told me that Ruby has constants, but they are mutable. Recently, I’ve been reading a new book on Ruby, and read the following passage:Dcx1I5U

A lot of other places mirror this (e.g. rubylearning.com). You cannot have constants if they are mutable. Mutable constants are just variables. Take a look at the definition of a constant from Wikipedia:

In computer programming, a constant is an identifier with an associated value which cannot be altered by the program during normal execution – the value is constant. This is contrasted with a variable, which is an identifier with a value that can be changed during normal execution – the value is variable. 

This is pretty basic stuff to most programmers (and people with common sense). Ruby constants are actually just variables that follow the naming convention for constants.

It’s a stretch, but you could say Ruby has weak constants (I’d prefer to just say that no, ruby does not have constants).

If you use bash colors, check for a tty first!

Most of the command line programs I write use ANSI escape codes to output colored text (along with bold/underlined text). I find this makes the programs a lot easier to understand at a glance. For example, my test runners will output success in green, skipped in yellow, and failure in red. I can very quickly scroll through the output and spot failures.

However, if you’re going to do this, or if you already do this, you may run into problems with redirecting your program to a log file or piping it to another program. Your colors show up as ugly escape codes and make it harder to parse the log files. Pretty frequently, you only want colors if you’re outputting to an interactive terminal.

When you use colors in your command line program, you should support two features. Your program should have an option to disable colors, such as –no-color, and you should automatically detect if STDOUT is a TTY, and if not, disable colors.

Here’s a PHP example:

// Disable colors
if (in_array('--no-color', $argv) || !posix_isatty(STDOUT)) {
  $runner->color = false;
}

And here’s a Python example:

if not sys.stdout.isatty():
  color_mode = False

Since stdout won’t be a TTY if you are using redirection or piping, it’s a good thing to check before using color codes.

How Linux pipes work under the hood

Piping is one of the core concepts of Linux & Unix based operating systems. Pipes allow you to chain together commands in a very elegant way, passing output from one program to the input of another to get a desired end result.

Here’s a simple example of piping:

ls -la | sort | less

The above command gets a listing of the current directory using ls, sorts it alphabetically using the sort utility, and then paginates it for easier reading using less.

I’m not going to go into depth about pipes as I’ll assume you already know what they are (at least in the context of bash pipes). Instead, I’m going to show how pipes are implemented under the hood.

How Pipes Are Implemented

Before I explain how bash does pipes, I’ll explain how the kernel implements pipes (at a high level).

  • Linux has a VFS (virtual file system) module called pipefs, that gets mounted in kernel space during boot
  • pipefs is mounted alongside the root file system (/), not in it (pipe’s root is pipe:)
  • pipefs cannot be directly examined by the user unlike most file systems
  • The entry point to pipefs is the pipe(2) syscall
  • The pipe(2) syscall is used by shells and other programs to implement piping, and just creates a new file in pipefs, returning two file descriptors (one for the read end, opening using O_RDONLY, and one for the write end, opened using O_WRONLY)
  • pipefs is stored an in-memory file system

Pipe I/O, buffering, and capacity

A pipe has a limited capacity in Linux. When the pipe is full, a write(2) will block (or fail if the O_NONBLOCK flag is set). Different implementations of pipes have different limits, so applications shouldn’t rely on a pipe having a particular size. Applications should be designed to consume data as soon as it is available so the writing process doesn’t block. That said, knowing the pipe size is useful. Since Linux 2.6.35, the default pipe capacity is 65,536 bytes (it used to be the size of the page file, e.g. 4096 bytes in i386 architectures).

When a process attempts to read from an empty pipe, read(2) will block until data is available in the pipe. If all file descriptors pointing to the write end of the pipe have been closed, reading from the pipe will return EOF (read(2) will return 0).

If a process attempts to write to a full pipe, write(2) will block until enough data has been read from the pipe to allow the write call to succeed. If all file descriptors pointing to the read end of the pipe have been closed, writing to the pipe will raise the SIGPIPE signal. If this signal is ignored, write(2) fails with the error EPIPE.

All of this is important when understanding pipe performance. If a process A is writing data at roughly the same speed as process B is reading it, pipes work very well and are highly performance. An imbalance here can cause performance problems. See the next section for more information/examples.

How Shells Do Piping

Before continuing, you should be aware of how Linux creates new processes.

Shells implement piping in a manner very similar to how they implement redirection. Basically, the parent process calls pipe(2) once for each two processes that get piped together. In the example above, bash would need to call pipe(2) twice to create two pipes, one for piping ls to sort, and one to pipe sort to less. Then, bash forks itself once for each process (3 times for our example). Each child will run one command. However, before the children run their commands, they will overwrite one of stdin or stdout (or both). In our above example, it will work like this:

  • bash will create two pipes, one to pipe ls to sort, and one to pipe sort to less
  • bash will fork itself 3 times (1 parent process and 3 children, for each command)
  • child 1 (ls) will set it’s stdout file descriptor to the write end of pipe A
  • child 2 (sort) will set it’s stdin file descriptor to the read end of pipe A (to read input from ls)
  • child 2 (sort) will set it’s stdout file descriptor to the write end of pipe B
  • child 3 (less) will set it’s stdin file descriptor to the read end of pipe B (to read input from sort)
  • each child will run their commands

The kernel will automatically schedule processes so they roughly run in parallel. If child 1 writes too much to pipe A before child 2 has read it, child 2 will block for a while until child 2 has had time to read from the pipe. This normally allows for very high levels of efficiency as one process doesn’t have to wait for the other to complete to start processing data. Another reason for this is that pipes have a limited size (normally the size of a single page of memory).

Pipe Example Code

Here is a C example of how a program like bash might implement piping. My example is pretty simple, and accepts two arguments: a directory and a string to search for. It will run ls -la to get the contents of the directory, and pipe them to grep to search for the string.

#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>

#define READ_END 0
#define WRITE_END 1

int main(int argc, char *argv[])
{
    int pid, pid_ls, pid_grep;
    int pipefd[2];

    // Syntax: test . filename
    if (argc < 3) {
        fprintf(stderr, "Please specify the directory to search and the filename to search for\n");
        return -1;
    }

    fprintf(stdotestut, "parent: Grepping %s for %s\n", argv[1], argv[2]);

    // Create an unnamed pipe
    if (pipe(pipefd) == -1) {
        fprintf(stderr, "parent: Failed to create pipe\n");
        return -1;
    }

    // Fork a process to run grep
    pid_grep = fork();

    if (pid_grep == -1) {
        fprintf(stderr, "parent: Could not fork process to run grep\n");
        return -1;
    } else if (pid_grep == 0) {
        fprintf(stdout, "child: grep child will now run\n");

        // Set fd[0] (stdin) to the read end of the pipe
        if (dup2(pipefd[READ_END], STDIN_FILENO) == -1) {
            fprintf(stderr, "child: grep dup2 failed\n");
            return -1;
        }

        // Close the pipe now that we've duplicated it
        close(pipefd[READ_END]);
        close(pipefd[WRITE_END]);

        // Setup the arguments/environment to call
        char *new_argv[] = { "/bin/grep", argv[2], 0 };
        char *envp[] = { "HOME=/", "PATH=/bin:/usr/bin", "USER=brandon", 0 };

        // Call execve(2) which will replace the executable image of this
        // process
        execve(new_argv[0], &new_argv[0], envp);

        // Execution will never continue in this process unless execve returns
        // because of an error
        fprintf(stderr, "child: Oops, grep failed!\n");
        return -1;
    }

    // Fork a process to run ls
    pid_ls = fork();

    if (pid_ls == -1) {
        fprintf(stderr, "parent: Could not fork process to run ls\n");
        return -1;
    } else if (pid_ls == 0) {
        fprintf(stdout, "child: ls child will now run\n");
        fprintf(stdout, "---------------------\n");

        // Set fd[1] (stdout) to the write end of the pipe
        if (dup2(pipefd[WRITE_END], STDOUT_FILENO) == -1) {
            fprintf(stderr, "ls dup2 failed\n");
            return -1;
        }

        // Close the pipe now that we've duplicated it
        close(pipefd[READ_END]);
        close(pipefd[WRITE_END]);

        // Setup the arguments/environment to call
        char *new_argv[] = { "/bin/ls", "-la", argv[1], 0 };
        char *envp[] = { "HOME=/", "PATH=/bin:/usr/bin", "USER=brandon", 0 };

        // Call execve(2) which will replace the executable image of this
        // process
        execve(new_argv[0], &new_argv[0], envp);

        // Execution will never continue in this process unless execve returns
        // because of an error
        fprintf(stderr, "child: Oops, ls failed!\n");
        return -1;
    }

    // Parent doesn't need the pipes
    close(pipefd[READ_END]);
    close(pipefd[WRITE_END]);

    fprintf(stdout, "parent: Parent will now wait for children to finish execution\n");

    // Wait for all children to finish
    while (wait(NULL) > 0);

    fprintf(stdout, "---------------------\n");
    fprintf(stdout, "parent: Children has finished execution, parent is done\n");

    return 0;
}

I’ve commented it thoroughly, so hopefully it makes sense.

Named vs Unnamed Pipes

In the above examples, we’ve been using unnamed/anonymous pipes. These pipes are temporary, and are discarded once your program finishes or all of their file descriptors are closed. They are the most common type of pipe.

Named pipes, also known as FIFOs (for first in, first out), get created as a named file on your hard disk. They allow multiple unrelated programs to open and use them. You can have multiple writers quite easily, with one reader, for a very simplistic client-server type design. For example, nagios does this, with the master process reading a named pipe, and every child process writing commands to the named pipe.

Named pipes are creating using the mkfifo command or syscall. Example:

mkfifo ~/test_pipe

Other than their creation, they work pretty much the same as unnamed pipes. Once you create them, you can open them using open(2). You must open the read end using O_RDONLY or the write end using O_WRONLY. Most operating systems implement unidirectional pipes, so you can’t open them in both read/write mode.

FIFOs are often used as a unidirectional IPC technique, for a system with multiple processes. A multithreaded application may also use named or unnamed pipes, as well as other IPC techniques such as shared memory segments.

FIFOs are created as a single inode, with the property i_pipe set as a reference to the actual pipe. While the name may exist on your filesystem, pipes don’t cause I/O to the underlying device, as once the inode is read, FIFOs behave like unnamed pipes and operate in-memory.

Analysis of a WordPress plugin exploit

This morning, I was reading ArsTechnica like I do every morning, and saw an article about how yet another popular WordPress plugin was found to have a remote execution vulnerability. The comments on the article were predictably bad and misinformed, so I decided to look into the security fix and see what caused the original issue (and how the exploit worked).

The plugin is Custom Contacts Form, which has over 670,000 downloads.

This bug is awful, catastrophic to sites that enable registration by untrusted users.

First, this bug has been in the plugin for at least 3 years, I didn’t feel like figuring out exactly when it cropped up though. 3 years is a long time.

This bug allows any visitor on your blog (they don’t even need to be logged in) to download an export file of your contact form. That alone could be very catastrophic depending on your site.

More importantly, this bug allows any authenticated user on your blog (of any privilege level) to execute arbitrary SQL commands. Let that sink in for a moment.

So, how did this bug come to be?

Looks like gross incompetence combined with a possible misunderstanding of the is_admin function. See, is_admin doesn’t check to see if the current user is an admin, it checks if the current user is in the admin area (/wp-admin), which as most WordPress users know, any user can access (it’s where the profile settings are). Even subscriber level users can access the admin area.

So let’s take a look at the code that caused the issue:

if (!is_admin()) { /* is front */
    // ...
} else { /* is admin */
    // ...
    add_action('init', array(&$custom_contact_admin, 'adminInit'), 1);
    // ...
}

So seen above is a shortened code snipped from the main plugin file, that adds a hook to execute adminInit if the user is in the WordPress admin. Now lets look at that hook:

function adminInit() {
    $this->downloadExportFile();
    $this->downloadCSVExportFile();
    $this->runImport();
}

The above function executes a few other functions. This is already worrying based on function names. I’d expect adminInit to check if the current user had some specific capability or role first, but it doesn’t. Maybe it still does in those functions?

function runImport() {
    if (isset($_POST['ccf_clear_import']) || isset($_POST['ccf_merge_import'])) {
        //chmod('modules/export/', 0777);
        ccf_utils::load_module('export/custom-contact-forms-export.php');
        $transit = new CustomContactFormsExport(parent::getAdminOptionsName());
        $settings['import_general_settings'] = ($_POST['ccf_import_overwrite_settings'] == 1) ? true : false;
        $settings['import_forms'] = ($_POST['ccf_import_forms'] == 1) ? true : false;
        $settings['import_fields'] = ($_POST['ccf_import_fields'] == 1) ? true : false;
        $settings['import_field_options'] = ($_POST['ccf_import_field_options'] == 1) ? true : false;
        $settings['import_styles'] = ($_POST['ccf_import_styles'] == 1) ? true : false;
        $settings['import_saved_submissions'] = ($_POST['ccf_import_saved_submissions'] == 1) ? true : false;
        $settings['mode'] = ($_POST['ccf_clear_import']) ? 'clear_import' : 'merge_import';
        $transit->importFromFile($_FILES['import_file'], $settings);
        ccf_utils::redirect('options-general.php?page=custom-contact-forms');
    }
}

Oh….. Guess not. Just as a note, the two download functions also don’t check permissions, which is how an attacker can dump your contact entries.

Now in this runImport function, the important call is to $transit->importFromFile. It takes an uploaded file, and does something with it. Let’s take a look:

function importFromFile($file, $settings = array('mode' => 'clear_import', 'import_general_settings' => false, 'import_forms' => true,'import_fields' => true, 'import_field_options' => true, 'import_styles' => true, 'import_saved_submissions' => false)) {
    $path = CCF_BASE_PATH. 'import/';
    $file_name = basename(time() . $file['name']);
    $file_extension = pathinfo($file['name'], PATHINFO_EXTENSION);
    if ( stripos( $file_extension, 'sql' ) ) {
        unlink( $file['tmp_name'] );
        wp_die( 'You can only import .sql files.' );
    }
    // ...
}

I’ve left out the bulk of the function, as you can probably see what it’s going to do. It takes a SQL file, and runs it. Since this function isn’t behind an authentication/capability/role check, that means anyone can upload any SQL file and run it….

So how would this have been avoided? A simple capability check is normally sufficient:

if ( current_user_can( 'manage_options' ) ) {
    // Is a real admin
}

You could also check to see if the user has the “administrator” role. So is this what the plugin author did to resolve the issue? Nope, he just removed the code….

So as you can see, this isn’t a security issue with WordPress, it’s just bad programming.

How Linux creates processes

A friend recently read my post on how bash redirection works, but didn’t quite understand my explanation of how bash launches another process and sets stdin/stdout/stderr, so this is a follow up post.

Linux creates every process using the fork(2) or clone(2) syscalls. The only way to create a process is to fork your current process, and replace the executable image using the exec(2) syscall with the executable image of the process you want to run. Here’s how it works:

  1. A process (e.g. bash) wants to run another process, such as “ls” (to list files and directories)
  2. The process (bash) forks itself using the fork(2) or clone(2) syscall
  3. Forked processes are basically exact copies of their parent process, and resume execution at the exact same spot. Therefore you normally check if you’re the forked process by checking the return code of the fork(2) or clone(2) syscall. It’ll return different values for the parent and the child process. The parent will then go on to do something like call wait(2) to wait for the child process to complete execution
  4. The child process inherits pipes, file descriptors, state, etc from the parent process, but a lot of that isn’t needed now so the child process may clean up by closing open file descriptors and sockets, etc
  5. The child process calls execve(2) or another exec(2) syscall to replace itself with the target process. A few things about the exec(2) syscalls. They replace the current process with another process. The text, data and stack of the calling process are overwritten by the new process. However, most process attributes are preserved, such as the stdin/stdout/stderr file descriptors (so a child process could set these before running exec(2). File descriptors are kept open in general, unless they are set to close-on-exec.

Here’s a code example that does the same thing:

#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>

int main(void)
{
  int pid = fork();

  if (pid == -1) {
    fprintf(stderr, "Could not fork process\n");
    return -1;
  } else if (pid == 0) {
    fprintf(stdout, "Child will now replace itself with ls\n");

    // Setup the arguments/environment to call
    char *argv[] = { "/bin/ls", "-la", 0 };
    char *envp[] = { "HOME=/", "PATH=/bin:/usr/bin", "USER=brandon", 0 };

    // Call execve(2) which will replace the executable image of this
    // process
    execve(argv[0], &argv[0], envp);

    // Execution will never continue in this process unless execve returns
    // because of an error
    fprintf(stderr, "Oops!\n");
    return -1;
  } else if (pid > 0) {
    int status;

    fprintf(stdout, "Parent will now wait for child to finish execution\n");
    wait(&status);
    fprintf(stdout, "Child has finished execution (returned %i), parent is done\n", status);
  }

  return 0;
}

The above code will fork the process, and run ls, with some helpful output to see whats going on. Output should look like this:

Parent will now wait for child to finish execution
Child will now replace itself with ls
total 24
drwxrwxr-x 2 brandon brandon 4096 Aug 7 11:09 .
drwxr-xr-x 53 brandon brandon 4096 Aug 7 11:09 ..
-rwxrwxr-x 1 brandon brandon 8805 Aug 7 11:08 test
-rw-rw-rw- 1 brandon brandon 1038 Aug 7 11:08 test.c
Child has finished execution (returned 0), parent is done