Going Beyond grep for Searching Source Code

For developers who live on the command line, a shell prompt (bash or zsh) and an editor (vi or emacs) are the only IDE (integrated development environment) we need. For searching trees of source code, grep and maybe find will suit our needs.

grep will search whatever files you may have, but it wasn’t designed for source code, and so it doesn’t always deliver optimal results. Although grep is ubiquitous, there are other useful ways to search your source code. I created one of them, ack, back in 2004, and another, ag was adapted from ack in 2012. Let’s take a look at what they offer:

What is ack?

First, ack doesn’t require specifying filenames to search. It assumes that you want to start in the current directory and descend through the tree (grep’s -r switch), searching through all text files, while ignoring version control directories like .svn and .git. These are usually the most common behaviors a programmer wants, so ack makes them the default.

The first big difference you’ll see with ack is that its output is more oriented to human-readability. Here’s grepping for print in a source tree:

grep

And here’s searching for print with ack. The output is colorized and results from the same file are grouped together visually. It’s designed to help programmers quickly and easily see the matches and tell where they came from.

ack

ack knows that you probably have a lot of different kinds of source code in your tree, and that you often don’t want to see results from a certain file. ack lets you specify files to include or exclude. If you want to search only Java files, then use the --java switch. If you want to ignore HTML and PHP files, use --nohtml --nophp.

A handy option for dealing with case-insensitive matches was taken from the vim editor: smartcase. With --smart-case, ack will make the search case-insensitive, as if you’d specified -i, if the term you’re searching for is in all lowercase. This is an option you typically have in your .ackrc file, rather than specifying it on the command line.

Don’t worry about learning an entirely new set of options.  Although ack adds many new features, it maintains the most common options that you’re used to in grep, such as -i for case-insensitive, -w for word searching, -C for context and so on.

Along comes ag

Geoff Greer was a happy ack user, but he wanted something even faster. So, he created ag, The Silver Searcher, with most of the features of ack, but rewritten in C. He used pthreads for parallelization, mmap for optimizing file I/O, and other speedy features, but at the expense of portability. (A Windows port was started but has not been kept up.)

Geoff made ag look exactly like ack, and at first ack and ag had almost identical feature sets, but over time ag added new features and ack moved to 2.0, and their feature sets diverged. Let’s look at some of the more interesting features unique to each tool.

ack lets you define your own filetypes

If you have source code files of a type that ack doesn’t know about, you can specify it with the --type-set switch. Say you have a lot of COBOL files, which is one of the few languages ack doesn’t support by default. To tell ack that .cob extensions are COBOL, just add this to your .ackrc file:

--type-set=cobol:ext:cob

You can also tell ack to base filetypes on an exact filename match, a regular expression match against the filename, or even a regex match against the first line of the file. Run ack --dump to see the default filetype specifications and examples of how to use --type-set and --type-add.

ag doesn’t allow specifying your own filetypes. If you want to add a new filetype, you have to modify the source and rebuild.

ag lets you search within compressed files

ag’s --search-zip flag treats compressed files as if they were normal text files. This has long been a feature request for ack, but there’s no way to implement it without sacrificing portability.

ack offers custom output

ack allows you to output your matches in any format you want using the --output switch. Say you were searching for #include files in your C code. You would simply run:

$ ack '#include'
src/util.c
1:#include <ctype.h>
2:#include <string.h>
3:#include <stdio.h>
...

But what if you just wanted to see the filenames that were included? You’d form a capture group with parentheses and output just that, like so:

$ ack '#include <(.+)>' --output='$1'

src/util.c
1:ctype.h
2:string.h
3:stdio.h

Eliminate the filenames with -h and pipe the output to sort -u and you can get a deduped list of all include files in the project

$ ack '#include <(.+)>' --output='$1' -h | sort -u

ctype.h
dirent.h
errno.h
fcntl.h
...

This feature is powerful because you can put any Perl expression inside the value for
--output.

ag offers editor integration

ag has excellent support for integrating with various editors. ag’s --ackmate outputs ack’s results in a format that the TextMate editor can understand, and --vimgrep does the same for vim.  ack doesn’t support this, but can emulate it using the --output option.

ack allows project-level .ackrc files

ack allows you to have project-specific .ackrc files. Say you have a project that uses COBOL files, but you don’t want to have the custom --type-set settings in your global /etc/ackrc or your local ~/.ackrc. You can put a .ackrc file at the root directory of your project and put the settings in there. If you put this .ackrc under version control, then everyone working on your project automatically gets the COBOL --type-sets as well.

ag reuses your VCS’s ignore files

Version control systems typically have some way to ignore files in the VCS. ag will check those files for you and ignore them in searches. If your .gitignore says to ignore all files with .html extension, then ag will ignore them as well. ack lacks this feature.

grep, ack, or ag, which one should you use?

There’s no wrong answer. Each tool has its own strengths, and you should use whichever best fits your needs. This quick cheat sheet can help inform your decision:

grep

  • Available on all Unix-like systems by default, but not on Windows.
  • Everyone knows it, and should be used for scripting purposes.

ack

  • Very portable, runs on any system that runs Perl, including Windows
  • Ignores backup files, binary files, your VCS’s work files, and other unwanteds
  • True Perl regular expressions, not PCRE, because it’s written in Perl
  • Flexible output with the --output option
  • User-definable file types
  • Project-level configuration

ag

  • Very fast
  • Uses your VCS’s ignore files to know what to ignore
  • Searches compressed files
  • Better editor integration
  • Not as portable; Windows version is out of date

That’s just a quick comparison. Checking the --help output of each of the tools will show more similarities and differences.

Remember that you’re not restricted to using only one tool. Feel free to use whatever is most appropriate at any given time, ack or ag for searching source code, and grep to search other text files. Of course, there are plenty more search tools to explore as well, including the new pt platinum searcher tool, which is Go based. For info on more alternatives, check out http://beyondgrep.com/more-tools/.

Note: ack 1.x had different, sometimes-confusing default searching behaviors.  Also, versions of ack 2 between 2.00 and 2.11 had a serious security hole that was fixed in 2.12. Please make sure to use 2.12 or higher.

If you’d like to discuss this topic further, you can join the discussion on New Relic’s Community Forums here!

Andy Lester has been involved with open source for two decades, contributing many modules to Perl's CPAN as well as speaking at open source conferences and user groups around the country. He's the author of the search tool ack and his book Land The Tech Job You Love is published by Pragmatic Bookshelf. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!