Re: detecting file type by magic number


Subject: Re: detecting file type by magic number
From: Kevin Vajk (kvajk@ricochet.net)
Date: Thu Jan 20 2000 - 02:55:40 CST


On Thu, 20 Jan 2000, Paul Rohr wrote:

> How much of the file do each of these functions need? The thought was that
> the importer factory could open the file once and pass a copy of the first
> 1K or so bytes to a function in each importer (the body of which is just the
> guts of your functions).

Detecting abiword, RTF, or HTML doesn't need very much.

Detecting an MS Word document can need up to about 2200 bytes.
There seem to be several different tests, for different versions
of Word or something. Weird, isn't it?

Detecting a gzip'd file needs only two bytes. Perhaps this isn't
very reliable but what can you do? :(
But that doesn't necessarily mean it's a .zabw file; it could
be any other compressed data, which is why I use zlib to read
a few lines, and do the abiword test on them. I don't know
how to pass a character buffer to zlib, though. Maybe I'm
thinking about this all wrong, though.

> 2. (major) This way, the sniffers are isolated from any platform-specific
> file-handling logic. (Think Mac.)

Yikes.

> Agreed. Anyone interested in patching the abiword exporter to see how well
> this works?

I'll give it a try. You might look for somebody else too, though,
since:
  1. I am a C programmer who has never written in C++ before, *ever*.
  2. I don't know my way around the abiword source, at all.

Any potential problems I should be looking out for?

- Kevin Vajk
  <kvajk@ricochet.net>



This archive was generated by hypermail 2b25 : Thu Jan 20 2000 - 02:59:13 CST