Mon, 21 May 2007

Playing with the package file

For Meikes thesis, we needed a csv-File from the Debian-Package file, so I wrote a small script to do that. You can get it from my repository (or via websvn). (Not yet perfect, but worked well enough for us; patches welcome.)

If you just need the csv to copy the data in a database (like she did), don't try to guess, how long a textfield (e.g. for depends or the long description) should be; we tried and failed several times (Sure... 2000 characters should be enough for a long description.). We finally used the following database (We used PostgreSQL, should work similar on other database systems): CREATE TABLE packages ( package character varying (75) NOT NULL, source character varying (75), priority character varying(10) NOT NULL, section character varying(20) NOT NULL, installedsize integer NOT NULL, maintainer character varying(150) NOT NULL, architecture character varying(4) NOT NULL, version character varying(40) NOT NULL, depends character varying(5000), conflicts character varying(5000), recommends character varying(5000), suggests character varying(5000), enhances character varying(500), predepends character varying(500), provides character varying(500), replaces character varying(500), buildessential character varying(3), essential character varying(3), filename character varying(200) NOT NULL, md5sum character(33) NOT NULL, origin character varying(100), sha1 character(41) NOT NULL, sha256 character(65) NOT NULL, size integer NOT NULL, tag character varying(1000), task character varying(400), description text NOT NULL, longdescription character varying(30000), CONSTRAINT pk PRIMARY KEY (package) );.

Now that you've created a table, you need to fill it. Easiest method is to use the COPY statement, as follows: copy packages (package, source, priority, section, installedsize, maintainer, version, depends, conflicts, recommends, suggests, enhances, predepends, replaces, buildessential, essential, filename, md5sum, origin, sha1, sha256, size, tag, task, description, longdescription) FROM '/path/to/your/csv-file' DELIMETER ';' CSV;.

You should now have a nice database to play with :)

Some examples... Which are the packages with the longest longdescription?

debian_packages=> select package, length(longdescription) as length from packages order by length desc limit 5;
        package         | length
------------------------+--------
 texlive-latex-extra    |  25337
 texlive-fonts-extra    |   5719
 emacs-goodies-el       |   4502
 xbase-clients          |   4403
 postgresql-contrib-7.4 |   4223
(5 rows)

Or... Which maintainer have the most packages?

debian_packages=> select maintainer, count(*) as anzahl  from packages group by maintainer order by anzahl desc limit 5;
                            maintainer                            | anzahl
------------------------------------------------------------------+--------
 Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>       |    465
 Debian QA Group <packages@qa.debian.org>                         |    458
 Debian X Strike Force <debian-x@lists.debian.org>                |    274
 Debian Perl Group <pkg-perl-maintainers@lists.alioth.debian.org> |    261
 Daniel Baumann <daniel@debian.org>                               |    255
(5 rows)

Or... Is there a package, whose sha256-sum contains Meikes birthday?

debian_packages=> select package, sha256 from packages where sha256 like '%261081%';
           package            |                              sha256
------------------------------+-------------------------------------------------------------------
 libobject-realize-later-perl | 5188126108146fc0b1328473125687fb388bd8d87f460e335952dbb1b66aa3d1
(1 row)

And another goodie (We needed to alter the database several times, to make this package fit into it): Which is the package with the longest name?

debian_packages=> select package, length(package) from packages order by length(package) desc limit 1;
                         package                         | length
---------------------------------------------------------+--------
 libmaypole-plugin-authentication-usersessioncookie-perl |     55
(1 row)

A lot of fun... and a lot of interesting facts to be discovered...

postet at 13:09 into [Debian] permanent link


About

Alexander Tolimar Reichle-Schmehl lives in Tuttlingen / Germany. He works as IT manager (specialized on Unix and SAN/Storage) for an international automotive supplier.

Links