Tuesday, February 22, 2011

Updating HP Printer Firmware from Linux

HP LaserJet printers provide a number of ways to update the system firmware, some of which are easier or more effective than others.  My favorite method so far: use curl to upload the new firmware via FTP.

First, download the firmware file from HP's support website. Then, just push it to the printer...

curl -T ./hp_firmware_file.rfu ftp://myprinter.mydomain

Can be scripted to run in batches too. :)

Monday, February 14, 2011

PDF Printing and Cups

Cups servers can be used to handle print jobs from all the major operating systems.  When a new print job arrives, it is either handled as 'raw' or as one of a number of mime file types.

A raw job (often sent from Windows) is pre-processed into the final printer-hardware-ready binary stream.  Cups doesn't touch raw jobs; it just forwards them on to the end device.

When a non-raw job comes in, Cups knows it will have to do some processing before the file is ready to print.  Printer hardware is only capable of handling certain file types (often PostScript or PCL), so if the file to be printed is not in these types, filters must be applied.  Cups includes a number of filters to turn files into printer-ready PostScript.  Supported types include image files, text, postscript, pdf, and others.  A good description of the process is available here: http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/CUPS-printing.html

Most of the time this works fairly well.  Unfortunately, Cups makes some naive assumptions about its filters which will backfire on occasion.  As of this posting, Cups 1.4 may use the pdftops application from Poppler to handle PDF to PS conversion.  (This is how Debian and Ubuntu handle things at the moment.)  Cups will hand the incoming PDF file off to the pdftops filter, expecting to get PostScript back in return.  The catch: some files may take up to several hours to finish processing!

Cups runs its print queues in a FIFO style.  If you run a multi-user print server, and one user submits a PDF that spends 2-3 hours in pdftops, everyone else's jobs will back up in the printer queue.  This makes users sad :(  and printer admins angry >:-|

Ideally, Poppler should fix and improve pdftops to the point where it no longer takes more than a few seconds to process any file.  Unfortunately, this is hard work, complicated by the fact that a lot of peoples print jobs are considered confidential.  Being an angry admin, I need a more immediate solution.

It turns out, pdftops uses two backends: Splash and Cairo.  Cairo is a very well known graphics library used in a number of high profile applications.  Splash is much less well known, and as far as I can tell, there is very little if any documentation for it.  It doesn't even show up in a search of the Poppler Wiki.

After some time tracking down an offending PDF and profiling the bejesus out of it, almost all of the processing delay was incurred by various components of Splash, and regrettably, it appears to have been in multiple, unrelated portions of the code, rather than one easily addressed bug.  As soon as Poppler was compiled without support for Splash (--disable-splash-output), a PDF that previously took 3 hours to process was down to a couple seconds.  Talk about an improvement!

One of the Poppler developers mentioned that Splash was used to rasterize PDF elements which are not natively supported by PostScript.  Turning this feature off may result in mangling the appearance of some printed files, so disable it at your own risk.  But if you find your print queues being frequently tied up by unruly pdftops processes, it's definitely worth a try.