Over my time working with Debian packages, I’ve always been concerned that I have been missing catchable mistakes by not running all the static checking tools I could run. As a result, I’ve been interested in writing some code that automates this process, a place where I can push a package and come back a few hours later to check on the results. This is great, since it provides a slightly less scary interface to new packagers, and helps avoid thinking they’ve just been “told off” by a Developer.
I’ve spent the time to actually write this code, and I’ve called it debuild.me. The code itself is in its fourth iteration, and is built up from a few core components. The client / server code (lucy and ethel) are quite interconnected, but firehose works great on its own, and is a single, unified (and sane!) spec that is easy to hack with (or even on!). Hopefully, this means that our wrappers will be usable outside of debuild.me, which is a win for everyone.
Backend Design
The backend (lucy) was the first part I wanted to design. I made the decision (very early on) that everything was going to be 100% Python 3.3+. This lets me use some of the (frankly, sweet) tools in the stdlib. Since I’ve written this type of thing before (I’ve tried to write this tool many, many, many, many times before), so I had a rough sense of how I wanted to design the backend. Past iterations had suffered from an overly complex server half, so I decided to go ultra minimal with the design of debuild.me.
The backend watches a directory (using a simple inotify
script) and processes
.changes
files as they come in. If the package is a source package, a set of
jobs are triggered (such as lintian
, build
and desktop-file-validate
),
as well as a different set for binary packages (such as lintian
, piuparts
and adequite
). Only people may upload source packages (without any debs) and
only builders can upload binary packages (without source).
The client and server talk using XML-RPC with BASIC HTTP auth. I’m going to (eventually) SSL secure the transport layer, but for now, this will work as a proof of concept.
Since I tend to like to keep my codebase simple and straightforward, I’ve used MongoDB as Lucy’s DB. This lets me move between documents in Mongo to Python objects without any trouble. In addition, I evaluated some of the queue code out there (ZMQ, etc), and they all seemed like overkill for my problem, and had a hard time keeping track of jobs that (must never!) get lost. As a result, I wrote my own (very simple) job queue in Mongo, which has no sense of scheduling (at all), but can do its job (and do it well).
Jobs describe what’s to be built with a link to the package
document
that the job relates to, and its arch
and suite
(don’t worry about the
rest just yet). Jobs get assigned via natural sort on its UUID
based _id
,
and assigned to the first builder that can process its arch
/ suite
.
Source packages are considered arch:all
/ suite:unstable
(so they always
get the most up-to-date linters on any arch that comes along).
Lucy also allows for uploads to be given an X-Lucy-Group
tag to manage which
set of packages they’re a part of. This comes in handy for doing partial
archive rebuilds, or eventually using it to manage what jobs should be run
on which uploads. This will allow me to run much more time-consuming tools
for packages I want to review versus rebuilding to ensure packages don’t
FTBFS or aren’t adequite.
Client Design
The buildd client (ethel) talks with lucy
via XML-RPC
to get assigned new jobs, release old jobs, close finished jobs,
and upload package report data. When the etheld
requests a new job, it also
passes along what suites
it knows of, which arches
it can build, as well
as what types
it can run (stuff like lintian
, build
or cppcheck
.) Lucy
then assigns the builder to that job (so that we don’t allocate the same job
twice), and what time it was assigned at.
Ethel then takes the result of the job (in the form of a firehose.model
tree)
and transmits it over the line back to the Lucy server as a report
(which also
contains information on if the build failed or not), at which point
lucy hands back a location (on the lucy host) that the daemon can write the log
to.
If the job was a binary build, the etheld
process will dput
the package to
the server, with a special X-Lucy-Job
tag to signal which job that build
relates to, so that future lint runs can fetch the deb
files that the build
produced.
Tooling
Ethel runs a set of static checkers on the source code, which are basically fancy wrappers around the tools we all know and love (like lintian, desktop-file-validate, or piuparts) which output Firehose in place of home-grown stdout. This allows us to programmatically deal with the output of these tools in a normal and consistent way.
Some of the more complex runners are made of 3 parts - a runner
, wrapper
and command
. The server invokes the command
routine, which invokes the
runner
(the command just provides a unified interface to all the runners),
who’s output gets parsed by the wrapper
to turn it into a Firehose model
tree.
The goal here is that tons of very quick-running tools get run over a distributed network, and machine-readable reports get filed in a central location to aid in reviewing packages.
Ricky
In addition to the actual code to run builds, I’ve worked on a few tools to
aid with using debuild.me for my DD related life. I have some uncommon
use-cases that are nice to support. One such use-case is the ability to rebuild
packages from the archive (unmodified) to check that they rebuild OK against
the target. This is handy for things like arch:all
packages that get
uploaded (since they never get rebuilt on the buildd machines, and FTBFSs are
sadly common) or packages that have had a Build-Dependency
change on them.
Ricky is able to create a .dsc
url to your friendly local mirror, and fetch
that exact version of the package. Ricky can then also use the .dsc
(in a
monumental hack) to forge a package_version_source.changes
file, and sign
it with an autobuild key and upload it to the debuild.me instance. Since it
can also modify the .changes
’s target distribution, you can also use this to
test if a package will build on stable
or testing
, unmodified.
Fred
Fred is a wrapper around Ricky, to help with fetching packages that may not
exist yet. Fred also contains an email scraper to read off such lists as
debian-devel-changes, and
add an entry to fetch that upload when it becomes available on the local
mirror, pass it to ricky
, and allow debuild.me to rebuild new packages
that match a set of criteria.
I’m currently playing around with the idea of rebuilding all incoming Python packages to ensure they don’t FTBFS in a clean chroot.
Loofah
Loofah is also another wrapper around Ricky, but for use manually. Loofah
is able to sync down the apt Sources
list, and place it in Mongo for fast
queries. This than allows me to manually run rebuilds on any Source package
that fits a set of criteria (written in the form of a Mongo query), which get
pulled and uploaded by Ricky
.
An example script to rebuild any packages that Build-Depend
on
python3-all-dev
in Debian unstable
/ main
would look like:
[ { "version": "unstable", "suite": "main" }, { "Build-Depends": "python3-all-dev" } ]
Or, a script to rebuild any package that depends on CDBS:
[ {}, {"$or": [{"Build-Depends": "cdbs"}, {"Build-Depends-Indep": "cdbs"}]} ]
You can use anything that exists in the Sources.gz
file to query off of (
including Maintainer
!)
Future Work
The future work on debuild.me will be centered around making it easier for buildd nodes to be added to the network, with more and more automation in that process (likely in the form of debs). I also want to add better control over the jobs, so that packages I upload only go to my personal servers.
I’d also very much like to get better EC2 / Virtualization support integrated into the network, so that the buildd count grows with the queue size. This is a slightly hard problem that I’m keen to fix.
I’m also considering moving the log parsing code out of the workers, so that
the parsing code can be fixed without upgrading all the workers. This would also
drop the Firehose
dep on the client code, which would be nice.
Migration from a debuild.me build into a local reprepro
repo is something
that would be fairly easy to do as well, likely to be done remotely via
the XML-RPC
interface, which calls a couple of reprepro
commands (such as
includedsc
and includedeb
) and publishes it to the user’s repo. This is
a nice use of the debs that get built, and could also allow debuild.me to be
used like a PPA system, but this allows the user to not migrate packages
that may contain piuparts
issues.