X-Git-Url: http://gitweb.fperrin.net/?p=Dictionary.git;a=blobdiff_plain;f=dictionary-format-v6.txt;h=06941cccff6db756412e8850763ca865506e0323;hp=e84daec3c7d52dacf13201f0bcbe933f36e897fa;hb=HEAD;hpb=ba3bb0e41a0cceb851065814f8b761709bcc1412 diff --git a/dictionary-format-v6.txt b/dictionary-format-v6.txt index e84daec..06941cc 100644 --- a/dictionary-format-v6.txt +++ b/dictionary-format-v6.txt @@ -1,7 +1,10 @@ This is a quick write-up of the old dictionary file format, v6. +It is the format that is (unfortunately) still used by the Tolino +ebook readers. The ConvertToV6 tool can be used to convert +the new format to this old one. v6 is troublesome as it relies on Java serialization and thus there will be references to Java types. -This hasn't been checked for correctness and likely has some bugs. +This hasn't been checked much for correctness and likely has some bugs. Also, I really should have used some standard format for writing this... =========================================== @@ -185,3 +188,17 @@ First part consists always the same 40 bytes: 1 byte 0x74: String type [String]: stop word 1 byte 0x78: blockdata end + +Note: Some even older dictionaries wrote out a LinkedHashSet instead of a +HashSet. +That adds the following bytes describing LinkedHashSet before the 0x72 above: + 0x72, // class + // Java String "java.util.LinkedHashSet" + 0x00, 0x17, 0x6a, 0x61, 0x76, 0x61, 0x2e, 0x75, 0x74, 0x69, + 0x6c, 0x2e, 0x4c, 0x69, 0x6e, 0x6b, 0x65, 0x64, 0x48, 0x61, + 0x73, 0x68, 0x53, 0x65, 0x74, + // serialization ID + 0xd8, 0x6c, 0xd7, 0x5a, 0x95, 0xdd, 0x2a, 0x1e, + 0x02, // flags + 0x00, 0x00, // fields count + 0x78 // blockdata end