kafsemo.org: 2004-07-20: What’s the most popular Perl module?

CPAN is the central repository for Perl modules. As I write this, it’s claiming 6,689 separate modules – that’s a lot, and it’s sometimes hard to know where to start. CPAN uses an extensive network of mirrors, and one consequence is that download statistics aren’t available. Of course, the numbers would be flawed – they ignore caching for one thing, but more importantly ignore “resellers” of Perl, such as repackaging in Linux distributions. So flawed, yes, but interesting, also yes.

Anyway, code written, code run, on one week’s traffic from cpan.etla.org (thanks to Michael).

Downloads	File	Module
338	Compress-Zlib-1.33.tar.gz	Compress::Zlib
334	Archive-Tar-1.10.tar.gz	Archive::Tar
307	TermReadKey-2.21.tar.gz	Term::ReadKey
298	IO-Zlib-1.01.tar.gz	IO::Zlib
284	Net-Telnet-3.03.tar.gz	Net::Telnet
279	CPAN-1.76.tar.gz	CPAN
278	Bundle-libnet-1.00.tar.gz	Bundle::libnet
273	HTML-Parser-3.36.tar.gz	HTML::Parser
270	Digest-1.08.tar.gz	Digest
269	Digest-MD5-2.33.tar.gz	Digest::MD5
268	Term-ReadLine-Perl-1.0203.tar.gz	Term::ReadLine::Perl
248	URI-1.31.tar.gz	URI
242	libnet-1.18.tar.gz	(Many in Net::)
242	File-Spec-0.87.tar.gz	File::Spec
213	Data-Dumper-2.121.tar.gz	Data::Dumper
199	DBI-1.42.tar.gz	DBI
183	HTML-Tagset-3.03.tar.gz	HTML::Tagset
171	libwww-perl-5.79.tar.gz	LWP
167	DBD-mysql-2.9003.tar.gz	DBD::mysql
160	Test-Harness-2.42.tar.gz	Test::Harness

It’s reasonable to say that a lot of those look like dependencies, rather than modules that coders would use directly. Specifically, the first few are fetched by the CPAN downloader itself during the bootstrapping process. Net::Telnet seems to be used by a lot of monitoring and configuration tools – handy. HTML::Parser does what it claims to well. It’s particularly useful for scraping, and using URI::URL helps to make sure absolute and relative URLs are handled properly. If we presume that people who download DBI (Perl’s SQL database abstraction) did so for use with a driver, it looks like MySQL is the clear leader – no other DBD driver gets even close. Postgres and Oracle are equal second, both with 27. For comparison, the Debian Popularity Contest (an opt-in survey of installations) shows HTML::Parser as the most popular non-core module (MySQL’s driver only has three times as many installations as Postgres’ here).

It’s clear that many people and systems are depending on these modules. Should more of them ship with a default Perl? Should they all be polished until they have perfect test coverage? (This is part of the mission of the Phalanx Project, which has a similar focus.) Should CPAN endorse more statistics? (As a simple solution, the code I used lends itself to aggregation – it generates a YAML file with per-file counts. If mirrors published these at a well-known URL, it would be simple to combine them to get a much larger sample.)

(Music: Richard Thompson, “The End of the Rainbow”)