I recently discovered a very useful PHP function called token_get_all(), which allows you to tap into the Zend Engine that parses PHP (Written in C, so very fast). The function accepts a string, containing PHP code and will return the tokenized output as an array. The array will contain many elements, each of which may be a single character (Such as =, ;, or even “), or an array containing 3 values: The token type, represented as an integer, the token text itself (a T_COMMENT token would contain the actual comment), and the line number that the token started on. Hint: You can get the “nice” token name by calling the token_name() function on the token type.
This function allows you to do a lot of different things, such as built powerful debugging capabilities, source code coverage tools, or build an awesome syntax highlighting library like I did. Most libraries just use some basic regex, and you end up with OK syntax highlighting, but nothing special. Using the library I built, you can easily get syntax highlighting that rivals editors like Sublime Text 2 or Eclipse. I’ve put it on GitHub, so go check it out!
My next mini-project will be writing this as a WordPress plugin