From: Dom Lachowicz (doml@appligent.com)
Date: Mon Feb 25 2002 - 20:01:27 GMT
Well, almost - it's about 95% done. I basically just need to rewrite 1
more method which works right now just as if my heuristic code hadn't
been committed.
I'll clean up my code shortly to use a Confidence datatype instead of a
UT_uint8.
Basically, everything that imports returns a normalized number between
[0,255] with 0 being "I'm not at all confident", 127 being "I'm so-so"
and 255 being "I can totally handle this file type". Applies to both
recognizeContents and recognizeSuffix methods.
What I'm going to do is heavily weight the recognizeContents method
(maybe 85-15) and apply the following heuristic:
my_match = heuristic(contentsConfidence, suffixConfidence);
if ( my_match > best_match )
best_filetype = my_match_filetype;
This will fix a few bugs in bugzilla.
Dom
This archive was generated by hypermail 2.1.4 : Mon Feb 25 2002 - 15:08:12 GMT