CPAN is the central repository for Perl modules. As I write this, it’s claiming 6,689 separate modules – that’s a lot, and it’s sometimes hard to know where to start. CPAN uses an extensive network of mirrors, and one consequence is that download statistics aren’t available. Of course, the numbers would be flawed – they ignore caching for one thing, but more importantly ignore “resellers” of Perl, such as repackaging in Linux distributions. So flawed, yes, but interesting, also yes.
Anyway, code written, code run, on one week’s traffic from cpan.etla.org (thanks to Michael).
It’s reasonable to say that a lot of those look like dependencies, rather than modules that coders would use directly. Specifically, the first few are fetched by the CPAN downloader itself during the bootstrapping process. Net::Telnet seems to be used by a lot of monitoring and configuration tools – handy. HTML::Parser does what it claims to well. It’s particularly useful for scraping, and using URI::URL helps to make sure absolute and relative URLs are handled properly. If we presume that people who download DBI (Perl’s SQL database abstraction) did so for use with a driver, it looks like MySQL is the clear leader – no other DBD driver gets even close. Postgres and Oracle are equal second, both with 27. For comparison, the Debian Popularity Contest (an opt-in survey of installations) shows HTML::Parser as the most popular non-core module (MySQL’s driver only has three times as many installations as Postgres’ here).
It’s clear that many people and systems are depending on these modules. Should more of them ship with a default Perl? Should they all be polished until they have perfect test coverage? (This is part of the mission of the Phalanx Project, which has a similar focus.) Should CPAN endorse more statistics? (As a simple solution, the code I used lends itself to aggregation – it generates a YAML file with per-file counts. If mirrors published these at a well-known URL, it would be simple to combine them to get a much larger sample.)