kafsemo.org

Turning data into graphs

2012-02-19

I liked the results from comparing perl and module releases being tested over time. Most of the hoops to jump through were with release dates and getting R to render how I wanted. Here are those hoops, along with how I jumped.

Gathering data

CPAN Testers has a results page for each module. On there, a nice friendly JSON button links to all the test results:

[
   {
      "status" : "PASS",
      "osvers" : "2.11",
      "state" : "pass",
      "osname" : "solaris",
      "platform" : "i86pc-solaris-thread-multi",
      "version" : "0.615",
      "distribution" : "XML-Writer",
      "perl" : "5.10.0",
      "fulldate" : "201201182135",
      ...
   },
...
]

This is really easy to parse with JSON.pm.

Release dates

Once we have clones of the Perl git repository and XML::Writer we can get the times at which those releases were tagged. Perl’s tags have used a number of conventions so, in essence:

while read p; do
  ...
  elif git rev-parse --quiet --verify perl-"$p" >/dev/null; then
    t="perl-$p"
  else
    t="v$p"
  ...

echo "$p,$t","`git show "$t" --pretty=format:Stamp,%ct | grep ^Stamp, | cut -f2 -d,`"

to get output like:

IDTagStamp
XML-Writer-0.3xml-writer-0.3944761768
XML-Writer-0.4xml-writer-0.4954899991
...
5.5.3perl-5.005_03922659709
5.6.0perl-5.6.0953789951
...

To CSV

We then process all this using a perl script to get CSV output with the corresponding release dates of the software used in each test.

Generating graphs

Get the data in with d <- read.csv('data.csv') and then start to plot. A simple plot(x$key, x$value) is a great start when analysing data. If we want control over how the results look, R is happy to give us that control. Too much? Maybe!

Dates on the axes

First, plot without labels (plot(d$xmlwriter, d$perl, ..., xaxt='n', yaxt='n')), then add in axes for dates:

dates <- ISOdate(1999:2012, 1, 1)
axis(1, at=dates, labels=format(dates, "%Y"))
dates <- ISOdate(seq(2000, 2012, 2), 1, 1)
axis(2, at=dates, labels=format(dates, "%Y"))

Highlight certain releases

Use abline to show guidelines for specific releases:

# Perl releases:
 
abline(h=1305397133, col='darkgray')
abline(h=1271077269, col='darkgray')
abline(h=1198315389, col='darkgray')
abline(h=1027029998, col='darkgray')
abline(h=953789951, col='darkgray')

axis(4, at=c(1305397133, 1271077269, 1198315389, 1027029998, 953789951), c('v5.14.0', 'v5.12.0', 'v5.10.0', 'v5.8.0', 'v5.6.0'), tick=FALSE, las=2)

Trends

For each XML::Writer release, I wanted to show the mean release date of the perls it had been tested with. We can use aggregate across the tests to group the perl release dates (d$perl) by the corresponding XML::Writer version, then summarise by taking the mean and plot it as a line:

points(aggregate(d$perl, list(d$xmlwriter), mean), type='l', col='blue')

This shows that the test perls are getting more recent; it’s nice to to show and to quantify this kind of thing.

(Music: Urge Overkill, “Effigy”)
(More from this year, or the front page? [K])