X-Git-Url: http://gitweb.fperrin.net/?p=Dictionary.git;a=blobdiff_plain;f=dictionary-format-v6.txt;h=06941cccff6db756412e8850763ca865506e0323;hp=17cbe23f36e7e7edf13e08d571a46c0c019a92ce;hb=HEAD;hpb=acfb5ff7f1ab0cafad4bc6a00d854ef829738ae3 diff --git a/dictionary-format-v6.txt b/dictionary-format-v6.txt index 17cbe23..06941cc 100644 --- a/dictionary-format-v6.txt +++ b/dictionary-format-v6.txt @@ -1,7 +1,10 @@ This is a quick write-up of the old dictionary file format, v6. +It is the format that is (unfortunately) still used by the Tolino +ebook readers. The ConvertToV6 tool can be used to convert +the new format to this old one. v6 is troublesome as it relies on Java serialization and thus there will be references to Java types. -This hasn't been checked for correctness and likely has some bugs. +This hasn't been checked much for correctness and likely has some bugs. Also, I really should have used some standard format for writing this... =========================================== @@ -156,3 +159,46 @@ list_of([Int]) list of indices into list_of(html_entry) (since v6) 3: index into list_of([text_entry]) 4: index into list_of([index_entry]) (mark as "extra info/translation" entry) 5: index into list_of([html_entry]) + +======================================= + +Set + +Java serialization of java.util.HashSet. +First part consists always the same 40 bytes: + 0xac, 0xed, // magic + 0x00, 0x05, // version + 0x73, // object + 0x72, // class + // Java String "java.util.HashSet" + 0x00, 0x11, 0x6a, 0x61, 0x76, 0x61, 0x2e, 0x75, 0x74, 0x69, + 0x6c, 0x2e, 0x48, 0x61, 0x73, 0x68, 0x53, 0x65, 0x74, + // serialization ID + 0xba, 0x44, 0x85, 0x95, 0x96, 0xb8, 0xb7, 0x34, + 0x03, // flags: serialized, custom serialization function + 0x00, 0x00, // fields count + 0x78, // blockdata end + 0x70, // null (superclass) + 0x77, 0x0c // blockdata short, 0xc bytes + +[Int]: capacity. Not used for anything, but set to >= +[Float]: capacity factor. May affect performance of old QuickDic versions, set to 0.75f +[Int]: + times: + 1 byte 0x74: String type + [String]: stop word +1 byte 0x78: blockdata end + +Note: Some even older dictionaries wrote out a LinkedHashSet instead of a +HashSet. +That adds the following bytes describing LinkedHashSet before the 0x72 above: + 0x72, // class + // Java String "java.util.LinkedHashSet" + 0x00, 0x17, 0x6a, 0x61, 0x76, 0x61, 0x2e, 0x75, 0x74, 0x69, + 0x6c, 0x2e, 0x4c, 0x69, 0x6e, 0x6b, 0x65, 0x64, 0x48, 0x61, + 0x73, 0x68, 0x53, 0x65, 0x74, + // serialization ID + 0xd8, 0x6c, 0xd7, 0x5a, 0x95, 0xdd, 0x2a, 0x1e, + 0x02, // flags + 0x00, 0x00, // fields count + 0x78 // blockdata end