Complete.Org: Mailing Lists: Archives: gopher: December 2007:
[gopher] Improved binary file detection in Bucktooth 0.2.2
Home

[gopher] Improved binary file detection in Bucktooth 0.2.2

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: gopher@xxxxxxxxxxxx
Subject: [gopher] Improved binary file detection in Bucktooth 0.2.2
From: brian@xxxxxxxxxxxxx
Date: Fri, 28 Dec 2007 01:23:39 -0600
Reply-to: gopher@xxxxxxxxxxxx

I'm using buckd to serve up binary files, and noticed that several
binary files (mostly older PDFs with a lot of text in the file header)
were being identified as item type "0" rather than "9". It turns out
that buckd uses the Perl -B operator to determine binary files.  To do
this, it examines some number of bytes in the file header for certain
characteristics (nul bytes, high-order bits set, etc.) and if that
number of bytes exceeds 30%, Perl identifies it as a binary file.

This wasn't accurate enough for my purposes, so I modified buckd.in so
that it calls the UNIX "file" command and greps for the string "text"
(guaranteed to be returned if a file is identified as a text file).

I just want to emphasize that this is *not* a problem with Bucktooth,
but rather an issue with Perl.

Here's the patchfile with the change.  I opted to modify buckd.in and
simply regenerate buckd.

--- buckd.in    2007-12-28 01:21:30.000000000 -0600
+++ buckd.in.new        2007-12-28 01:20:58.000000000 -0600
@@ -289,7 +289,7 @@
                ($xentr =~ /\.jpe?g$/i) ? "I" :
                ($xentr =~ /\.html?$/i) ? "h" :
                ($xentr =~ /\.hqx$/i) ? "4" :
-               (-B $xentr) ? "9" :
+               (grep(!/text/, `file $xentr`)) ? "9" :
        "0";
        $xentr =~ s/^$DIR//;
        return ($itype, ($pentr eq $xentr) ? '' : $xentr);

  --Brian



[Prev in Thread] Current Thread [Next in Thread]