I liked the results from comparing perl and module releases being tested over time. Most of the hoops to jump through were with release dates and getting R to render how I wanted. Here are those hoops, along with how I jumped.
CPAN Testers has a results page for each module. On there, a nice friendly JSON button links to all the test results:
[
{
"status" : "PASS",
"osvers" : "2.11",
"state" : "pass",
"osname" : "solaris",
"platform" : "i86pc-solaris-thread-multi",
"version" : "0.615",
"distribution" : "XML-Writer",
"perl" : "5.10.0",
"fulldate" : "201201182135",
...
},
...
]
This is really easy to parse with JSON.pm.
Once we have clones of the Perl git repository and XML::Writer we can get the times at which those releases were tagged. Perl’s tags have used a number of conventions so, in essence:
while read p; do
...
elif git rev-parse --quiet --verify perl-"$p" >/dev/null; then
t="perl-$p"
else
t="v$p"
...
echo "$p,$t","`git show "$t" --pretty=format:Stamp,%ct | grep ^Stamp, | cut -f2 -d,`"
to get output like:
| ID | Tag | Stamp |
|---|---|---|
| XML-Writer-0.3 | xml-writer-0.3 | 944761768 |
| XML-Writer-0.4 | xml-writer-0.4 | 954899991 |
| ... | ||
| 5.5.3 | perl-5.005_03 | 922659709 |
| 5.6.0 | perl-5.6.0 | 953789951 |
| ... |
We then process all this using a perl script to get CSV output with the corresponding release dates of the software used in each test.
Get the data in with d <- read.csv('data.csv') and then start to plot.
A simple plot(x$key, x$value) is a great start when analysing data.
If we want control over how the results look, R is happy to give us
that control. Too much? Maybe!
First, plot without labels (plot(d$xmlwriter, d$perl, ..., xaxt='n', yaxt='n')), then add in
axes for dates:
dates <- ISOdate(1999:2012, 1, 1) axis(1, at=dates, labels=format(dates, "%Y")) dates <- ISOdate(seq(2000, 2012, 2), 1, 1) axis(2, at=dates, labels=format(dates, "%Y"))
Use abline to show guidelines for specific releases:
# Perl releases:
abline(h=1305397133, col='darkgray')
abline(h=1271077269, col='darkgray')
abline(h=1198315389, col='darkgray')
abline(h=1027029998, col='darkgray')
abline(h=953789951, col='darkgray')
axis(4, at=c(1305397133, 1271077269, 1198315389, 1027029998, 953789951), c('v5.14.0', 'v5.12.0', 'v5.10.0', 'v5.8.0', 'v5.6.0'), tick=FALSE, las=2)
For each XML::Writer release, I wanted to show the mean release date of
the perls it had been tested with. We can use aggregate across the tests
to group the perl release dates (d$perl) by the corresponding XML::Writer
version, then summarise by taking the mean and plot it as a line:
points(aggregate(d$perl, list(d$xmlwriter), mean), type='l', col='blue')
This shows that the test perls are getting more recent; it’s nice to to show and to quantify this kind of thing.