kafsemo.org

What’s the most popular Perl module?

2004-07-20

CPAN is the central repository for Perl modules. As I write this, it’s claiming 6,689 separate modules – that’s a lot, and it’s sometimes hard to know where to start. CPAN uses an extensive network of mirrors, and one consequence is that download statistics aren’t available. Of course, the numbers would be flawed – they ignore caching for one thing, but more importantly ignore “resellers” of Perl, such as repackaging in Linux distributions. So flawed, yes, but interesting, also yes.

Anyway, code written, code run, on one week’s traffic from cpan.etla.org (thanks to Michael).

Downloads File Module
338Compress-Zlib-1.33.tar.gzCompress::Zlib
334Archive-Tar-1.10.tar.gzArchive::Tar
307TermReadKey-2.21.tar.gzTerm::ReadKey
298IO-Zlib-1.01.tar.gzIO::Zlib
284Net-Telnet-3.03.tar.gzNet::Telnet
279CPAN-1.76.tar.gzCPAN
278Bundle-libnet-1.00.tar.gzBundle::libnet
273HTML-Parser-3.36.tar.gzHTML::Parser
270Digest-1.08.tar.gzDigest
269Digest-MD5-2.33.tar.gzDigest::MD5
268Term-ReadLine-Perl-1.0203.tar.gzTerm::ReadLine::Perl
248URI-1.31.tar.gzURI
242libnet-1.18.tar.gz(Many in Net::)
242File-Spec-0.87.tar.gzFile::Spec
213Data-Dumper-2.121.tar.gzData::Dumper
199DBI-1.42.tar.gzDBI
183HTML-Tagset-3.03.tar.gzHTML::Tagset
171libwww-perl-5.79.tar.gzLWP
167DBD-mysql-2.9003.tar.gzDBD::mysql
160Test-Harness-2.42.tar.gzTest::Harness

It’s reasonable to say that a lot of those look like dependencies, rather than modules that coders would use directly. Specifically, the first few are fetched by the CPAN downloader itself during the bootstrapping process. Net::Telnet seems to be used by a lot of monitoring and configuration tools – handy. HTML::Parser does what it claims to well. It’s particularly useful for scraping, and using URI::URL helps to make sure absolute and relative URLs are handled properly. If we presume that people who download DBI (Perl’s SQL database abstraction) did so for use with a driver, it looks like MySQL is the clear leader – no other DBD driver gets even close. Postgres and Oracle are equal second, both with 27. For comparison, the Debian Popularity Contest (an opt-in survey of installations) shows HTML::Parser as the most popular non-core module (MySQL’s driver only has three times as many installations as Postgres’ here).

It’s clear that many people and systems are depending on these modules. Should more of them ship with a default Perl? Should they all be polished until they have perfect test coverage? (This is part of the mission of the Phalanx Project, which has a similar focus.) Should CPAN endorse more statistics? (As a simple solution, the code I used lends itself to aggregation – it generates a YAML file with per-file counts. If mirrors published these at a well-known URL, it would be simple to combine them to get a much larger sample.)

(Music: Richard Thompson, “The End of the Rainbow”)
(More from this year, or the front page? [K])