Re: detecting file type by magic number


Subject: Re: detecting file type by magic number
From: Paul Rohr (paul@abisource.com)
Date: Thu Jan 20 2000 - 02:40:46 CST


At 11:34 PM 1/19/00 -0800, Kevin Vajk wrote:
>Based mostly on the Linux "file" command (/usr/share/magic), I've
>written a few C functions which, given a filename, return 1 if
>they think the file is of the type they recognize, else 0.
>
>The functions I've written are:
> int file_is_abiword_1(char *filename);
> int file_is_gzipped_abiword_1(char *filename); /* needs zlib */
> int file_is_rtf(char *filename);
> int file_is_html(char *filename);
> int file_is_microsoft_word(char *filename);
>
>Any interest? Is this primitive approach of value?

How much of the file do each of these functions need? The thought was that
the importer factory could open the file once and pass a copy of the first
1K or so bytes to a function in each importer (the body of which is just the
guts of your functions).

The advantages of this are:

1. (minor) We open the file once, rather than once per importer.

2. (major) This way, the sniffers are isolated from any platform-specific
file-handling logic. (Think Mac.)

>By the way, moving the "<abiword..." line to the start of
>the .abw files would make it a lot easier to add an entry to
>/usr/share/magic so that "file foo.abw" could be made to
>work as expected on Linux/UNIX. I'd highly recommend doing
>this, as I think "file" is used by various file managers
>to determine what application to launch to open the file
>with.

Agreed. Anyone interested in patching the abiword exporter to see how well
this works?

Paul



This archive was generated by hypermail 2b25 : Thu Jan 20 2000 - 02:35:27 CST