Tuesday, July 8, 2008

Bash: C Call Trees and Graphs

One day I had a need. (Or was it a desire? I can't remember off the top of my head.) My need was to find all the pathways through a large set of code between one function call and another, i.e. how many ways can this code get from a() to b()?

I googled around for a while and found some call-graphing tools out there, but none did exactly what I wanted, so I wrote my own.

As with many of the other similar call-graphing tools, a lot of the grunt work can be performed with existing tools, such as cscope for determining the callers and callees of functions, and graphviz for generating graph layouts and preparing them for display. (I recently heard of Egypt which uses gcc to generate an easier-to-parse intermediate representation of code, and the ubiquitous graphviz for the back-end visualization... While AFAIK it doesn't cull the graph down to a()->b(), its use of gcc is probably much quicker than my use of cscope. I will probably convert my implementation from cscope->bash->graphviz to gcc->X->graphviz at some point in the future, while I have an "X" in mind, that's a story for another entry...)

My solution, whipped out over a weekend or so, simply teases information out of a cscope database and presents the results to graphviz for display using a dozen or so fairly simple bash functions. I thought it would take a lot more work and cleverness to generate the results I wanted, but the solution's brute force nature leaves some performance on the table, which I intend to remedy when I have time.

My solution hammers on the filesystem with a lot of pipes and temporary files to shuffle intermediate results around. The most interesting part about it is probably the interplay between all_callees(), descend(), and filter_cscope_lines(). These combine to find out all the functions that are "downstream" of an initial set, i.e. all the functions the initial set calls, and the functions these call, and so on all the way down to the tips of this tree. (There is an equivalent mechanism for finding "upstream" functions.) It does this by:
  1. Stashing the initial function(s) (specified as an argument or piped in) in a temporary "working set" file,
  2. Iterating through each function name in the working set, calling cscope on each to find out what functions it calls,
  3. Appending newly discovered downstream functions (that don't already exist in the working set) to the working set.
As descend iterates through the working set, as long as new downstream functions are being found, they get appended to the working set where descend will eventually reach them. Descend will know when it's found all the downstream tips when it simply reaches the end of the working set.

On to usage:

First, you need to build a cscope database. A simple bash wrapper ("build_graphdb") for cscope takes care of most situations; simply running it in a directory will trigger cscope to parse source files it's natively aware of in that directory and it's subdirectories and build a database there named "cscope.out". If you want to instrument code in a read-only directory or need to change the name for any other reason, simply pick a new name via "set_graphdb ", and the calltree code will use that instead.

Once a cscope database has been built, you can show:
  1. all functions "downstream" of some function X, i.e. a tree of all functions called by X, ("downstream")
  2. all functions "upstream" of some function X, i.e. all the ways X could be called, ("upstream")
  3. all code paths that lead from function X to function Y. ("subgraph")
  4. all code paths between an arbitrary set of functions A, B, C, [...] Z ("relate")
The functions themselves:
These produce raw dot-format files:
  1. _downstream
  2. _upstream
  3. _subgraph
  4. _relate
These will generate, process and (maybe) display the dot-format files using graphviz tools to interpret the dot
files, and generate various vector- or image-file formats that can be displayed by yet other programs (I prefer
zgrviewer, but had issues last time I tried it, so have settled for xfig lately):
  1. downstream
  2. upstream
  3. subgraph
  4. relate
As I mentioned, these will generate and display graphs in xfig format. You can easily substitute your own "viewer" by creating a variation of the bash function "figviewer" below, and having the generic "ctviewer" (i.e. "calltree viewer") call your variation instead. One variant, "zgrviewer", is already set up to invoke zgrviewer, and "dot2jpg", "dot2png" and "dot2html" turn dot-format files into .jpg, .png and .html files respectively as well.

Here's I generated the graph above:

bash-3.2$ . calltree.sh
bash-3.2$ cd jli
bash-3.2$ build_graphdb
bash-3.2$ _subgraph jli_parse malloc > graph.dot
bash-3.2$ cat graph.dot | dot2png graph.png

If I want to view the images directly rather than generate an image file of it, I would simply replace the last two lines with "subgraph jli_parse malloc" to invoke the viewer referenced by the "ctviewer" shell function.

There are some extra vestigial features within the shell code that I was experimenting with:
  • A quick and dirty attempt to generate graphs that would highlight possible memory leaks, and
  • A more complex set of transformations that would generate an image file and an html link mask, the idea being to gain some more interactivity by letting me click on a graph in a browser and update it with something useful via a simple netcat-and-bash-based http server... I made a some good progress before abandoning the effort due to not really having a clear use/goal for the capability.
So without further ado, here is the bash script which implements call tree/graphs: simply paste it into a bash shell to use it (you'll also need to install at least cscope and graphviz to generate dot files, and maybe xfig or zgrviewer to view the graphs.):


echo calltree.sh

#use cscope to build reference files (./cscope.out by default, use set_graphdb to override name or location)
set_graphdb() { export GRAPHDB=$1; }
unset_graphdb() { unset GRAPHDB; }
build_graphdb() { cscope -bkRu ${GRAPHDB:+-f $GRAPHDB} && echo Created ${GRAPHDB:-cscope.out}...; }

# cscope queries
fdefine() { cscope ${GRAPHDB:+-f $GRAPHDB} -L1 $1; }
callees() { cscope ${GRAPHDB:+-f $GRAPHDB} -L2 $1; }
callers() { cscope ${GRAPHDB:+-f $GRAPHDB} -L3 $1; }

# given a set of function names, find out how they're related
filter_edges() { local sym cscope_line
while read -a sym; do
fdefine $sym | while read -a cscope_line; do
grep -wq ${cscope_line[1]} ${1:-<(echo)} &&
printf "${cscope_line[1]}\t[href=\"${cscope_line[0]}:${cscope_line[2]}\"]\t/*fdefine*/\n"
callees $sym | while read -a cscope_line; do
grep -wq ${cscope_line[1]} ${1:-<(echo)} &&
printf "$sym->${cscope_line[1]}\t[label=\"${cscope_line[0]}:${cscope_line[2]}\"]\t/*callee*/\n"
callers $sym | while read -a cscope_line; do
grep -wq ${cscope_line[1]} ${1:-<(echo)} &&
printf "${cscope_line[1]}->$sym\t[label=\"${cscope_line[0]}:${cscope_line[2]}\"]\t/*caller*/\n"

# present list of function names to filter_edges properly
edges() { local tfile=/tmp/edges.$RANDOM
cat > $tfile
filter_edges $tfile <$tfile
rm $tfile

# append unknown symbol names out of lines of cscope output
filter_cscope_lines() { local cscope_line
while read -a cscope_line; do
grep -wq ${cscope_line[1]} ${1:-/dev/null} || echo ${cscope_line[1]}

# given a set of function names piped in, help spit out all their callers or callees that aren't already in the set
descend() { local symbol
while read -a symbol; do
$1 $symbol | filter_cscope_lines $2

# discover functions upstream of initial set
all_callers() { local tfile=/tmp/all_callers.$RANDOM
cat ${1:+<(echo $1)} > $tfile
descend callers $tfile <$tfile >>$tfile
cat $tfile; rm $tfile

# discover functions downstream of initial set
all_callees() { local tfile=/tmp/all_callees.$RANDOM
cat ${1:+<(echo $1)} > $tfile
descend callees $tfile <$tfile >>$tfile
cat $tfile; rm $tfile

# intersection of all_callees(a) and all_callers(b)
call_tree() { local tfile=/tmp/graph_filter.$RANDOM
all_callees $1 | sort -u > $tfile
comm -12 $tfile <(all_callers $2 | sort -u);
rm $tfile

# all functions downstream of callers of argument
all_callerees() { callers $1 | filter_cscope_lines | all_callees; }

# odd experimental set of calls that might help spot potential memory leaks
call_leaks() { local tfile=/tmp/graph_filter.$RANDOM
all_callerees $1 | sort -u > $tfile
comm -2 $tfile <(all_callers $2 | sort -u)
rm $tfile

# all the ways to get from (a,b,...z) to (a,b,...z)
call_graph() { for a; do for b; do if [ $a != $b ]; then call_tree $a $b; fi; done; done; }

# wrap dot-format node and edge info with dot-format whole-graph description
graph() { printf "digraph iftree {\ngraph [rankdir=LR, concentrate=true];\nnode [shape=record];\nedge [];\n"; cat | sort -u; printf "}\n"; }

# filter out unwanted (as specified in “~/calltree.deny”) and/or unnecessary edges
graph_filter() { local tfile=/tmp/graph_filter.$RANDOM
cat > $tfile
grep fdefine $tfile
grep $1 $tfile | grep -vf ~/calltree.deny | cut -f1,3
rm $tfile

# how to invoke zgrviewer as a viewer
zgrviewer() { ~/bin/zgrviewer -Pdot $*; }
# how to invoke xfig as a viewer
figviewer() { xfig <(dot -Tfig $*); }

# specify a viewer
ctviewer() { figviewer $*; }

# add color to specified nodes
colornodes() { (cat; for x in $@; do echo "$x [color=red]"; done;) }

# generate dot files
_upstream() { all_callers $1 | edges | graph_filter ${2:-caller} | colornodes $1 | graph; }
_downstream() { all_callees $1 | edges | graph_filter ${2:-callee} | colornodes $1 | graph; }
_subgraph() { call_tree $1 $2 | edges | graph_filter ${3:-callee} | colornodes $1 $2 | graph; }
_relate() { call_graph $@ | edges | graph_filter callee | colornodes $@ | graph; }
_leaks() { call_leaks $1 $2 | edges | graph_filter ${3:-callee} | colornodes $1 $2 | graph; }

# generate dot files and invoke ctviewer
upstream() { _upstream $@ > /tmp/tfile; ctviewer /tmp/tfile; rm -f /tmp/tfile; }
downstream() { _downstream $@ > /tmp/tfile; ctviewer /tmp/tfile; rm -f /tmp/tfile; }
subgraph() { _subgraph $@ > /tmp/tfile; ctviewer /tmp/tfile; rm -f /tmp/tfile; }
relate() { _relate $@ > /tmp/tfile; ctviewer /tmp/tfile; rm -f /tmp/tfile; }
leaks() { _leaks $@ > /tmp/tfile; ctviewer /tmp/tfile; rm -f /tmp/tfile; }

# dot file conversions
dot2png() { dot -Tpng -o $1; }
dot2jpg() { dot -Tjpg -o $1; }
dot2html() { dot -Tpng -o $1.png -Tcmapx -o $1.map; (echo "<IMG SRC="$1.png" USEMAP="#iftree" />"; cat $1.map) > $1.html; }


Anonymous said...

Nice! Ahh, the power of Unix... put little utilities, powerful on their own, together to form mean tools :D I was looking for some pointers on how to do just this stuff. Cool!

Jason Nyberg said...

Thanks, even I'm surprised at just how simple it was to derive this relatively complex tool in so little bash. Well, the result looks simple, it took a lot of cogitating to figure out the graph filter for the "subgraph" tool in particular, and to boil the implementation down to what I presented here. I hope you found some helpful techniques in there!

Anonymous said...

Thank you for developing this handy utility.

Anonymous said...

will this work on windows?

Anonymous said...

Just an FYI, there seems to be a bug in ubuntu 12.04 grep (2.10) which makes these script not work and giving off the following error:

grep: input file `/tmp/all_callees.32664' is also the output

Looks really sweet however ;).

Anonymous said...

Did you get a chance to explore same thing using gcc?

marconeuro said...

doesn´t work on Ubuntu 12.10, neither on suse 11, what's the usage? calling it directly doesn´t do anything. With sh -x calltree.sh return

calltree.sh: 73: calltree.sh: Syntax error: "(" unexpected

line 73:
comm -12 $tfile <(all_callers $2 | sort -u);

Jason Nyberg said...

Sorry for not reviewing comments lately. Answers:

Anonymous from November 22, 2011: This is implemented in Bash, so if you can get a decent Bash running in Windows, I don't see why not, offhand. Try Cygwin.

Anonymous from June 20, 2012: Interesting, and frustrating... Seems like Bash is doing something tricky, because all of the greps in the calltree code output to stdout or output to bash pipes. And thanks for the compliment!

Anonymous from August 29, 2012: I have some awesome ideas but nothing implemented yet. Basically, I'm working on another little project which will generate these call graphs very nearly for free and very, very efficiently (relative to this bash implementation.)

marconeuro: See the bit in the blog post goes:

Here's [how (ugh, sloppy writing)] I generated the graph above:

bash-3.2$ . calltree.sh
bash-3.2$ cd jli
bash-3.2$ build_graphdb
bash-3.2$ _subgraph jli_parse malloc > graph.dot
bash-3.2$ cat graph.dot | dot2png graph.png

The first line "installs" the bash functions into a running bash shell.
The second, just puts you where the code you want to graph is located.
The third builds a cscope database which the bash calltree code will use.
The fourth creates a "dot" (i.e. graphviz) file which describes a graph.
The fifth renders the graph.

(Obviously, you need graphviz and whatever tool you're going to use to render the graph.)

To all: please take a look at my "C Calltrees in Bash, revisited" post for a slightly redesigned and slightly more featureful version of the Bash Call Grapher!

Thanks for checking it out,


Anonymous said...

Can you please share your "awesome ideas"? :)
Also, the one on which you are currently working, are you using gcc or is there still some another option?

Anonymous said...

For Anonymous Jun 20, 2012, 3:01.

It is a shell bug, see link below please:

And the solution to prevent it is simple:
on filter_cscope_lines: unuse "-q" for grep, and redirect output to > /dev/null, i.e.,
grep -w ${cscope_line[1]} ${1:-/dev/null} > /dev/null|| echo ${cscope_line[1]}

Anonymous said...

FYI: last line is missing the ;}
It may not have shown up in the text.