]> gitweb.fperrin.net Git - DictionaryPC.git/log
DictionaryPC.git
4 years agoMinor automated code simplifications.
Reimar Döffinger [Wed, 15 Apr 2020 16:04:12 +0000 (18:04 +0200)]
Minor automated code simplifications.

4 years agoAvoid creating the same Matchers over and over.
Reimar Döffinger [Wed, 15 Apr 2020 15:50:19 +0000 (17:50 +0200)]
Avoid creating the same Matchers over and over.

4 years agoMove v6 writing code here from Android code repo.
Reimar Döffinger [Mon, 13 Apr 2020 23:41:22 +0000 (01:41 +0200)]
Move v6 writing code here from Android code repo.

4 years agoAdd code to convert a Dictionary to the old v6 format.
Reimar Döffinger [Mon, 13 Apr 2020 14:35:44 +0000 (16:35 +0200)]
Add code to convert a Dictionary to the old v6 format.

4 years agoRemove more left-overs from xerces dependency.
Reimar Döffinger [Mon, 13 Apr 2020 13:47:28 +0000 (15:47 +0200)]
Remove more left-overs from xerces dependency.

4 years agoGet rid of xerces dependency.
Reimar Döffinger [Mon, 13 Apr 2020 13:42:17 +0000 (15:42 +0200)]
Get rid of xerces dependency.

Relying on the standard XML implementation is a lot slower,
but the WiktionarySplitter run still only takes a few minutes.

4 years agoUpdate runner scripts for compilation into bin/
Reimar Döffinger [Mon, 13 Apr 2020 13:27:41 +0000 (15:27 +0200)]
Update runner scripts for compilation into bin/

4 years agoConsistent EOL format.
Reimar Döffinger [Mon, 13 Apr 2020 13:03:32 +0000 (15:03 +0200)]
Consistent EOL format.

4 years agoEnable all compiler warnings (-Xlint:all).
Reimar Döffinger [Mon, 13 Apr 2020 12:49:33 +0000 (14:49 +0200)]
Enable all compiler warnings (-Xlint:all).

4 years agoRemove long obsolete googlecode_upload.py
Reimar Döffinger [Mon, 13 Apr 2020 12:40:14 +0000 (14:40 +0200)]
Remove long obsolete googlecode_upload.py

4 years agoPut compiled .class files into bin/ directory.
Reimar Döffinger [Mon, 13 Apr 2020 12:35:54 +0000 (14:35 +0200)]
Put compiled .class files into bin/ directory.

Cleaner than having them all over.
Also update gitignore file.

4 years agoDelete included jars, they are not the right versions anyway.
Reimar Döffinger [Mon, 13 Apr 2020 12:31:10 +0000 (14:31 +0200)]
Delete included jars, they are not the right versions anyway.

4 years agoUpdate run command to commons-text instead of commons-lang3
Reimar Döffinger [Mon, 13 Apr 2020 12:30:42 +0000 (14:30 +0200)]
Update run command to commons-text instead of commons-lang3

4 years agoImport cleanup/changes for Eclipse compatibility.
Reimar Döffinger [Mon, 13 Apr 2020 12:10:39 +0000 (14:10 +0200)]
Import cleanup/changes for Eclipse compatibility.

Unfortunately Eclipse insists on fiddling with imports
without knowing what it does, thus breaking compilation
and in one case causing a bug that will result in
subtly broken dictionaries.

4 years agoMinor code cleanup.
Reimar Döffinger [Mon, 13 Apr 2020 12:03:31 +0000 (14:03 +0200)]
Minor code cleanup.

4 years agoFix compile warnings.
Reimar Döffinger [Sat, 11 Apr 2020 23:00:10 +0000 (01:00 +0200)]
Fix compile warnings.

4 years agoRemove unused functions that cause warnings.
Reimar Döffinger [Sat, 11 Apr 2020 22:50:09 +0000 (00:50 +0200)]
Remove unused functions that cause warnings.

4 years agoExplicitly specify encoding for compile command.
Reimar Döffinger [Sat, 11 Apr 2020 22:44:55 +0000 (00:44 +0200)]
Explicitly specify encoding for compile command.

For better usability.

4 years agoSwitch to new dictionary path.
Reimar Döffinger [Sat, 11 Apr 2020 22:09:40 +0000 (00:09 +0200)]
Switch to new dictionary path.

4 years agoAlso handle "paragraph end" newline character.
Reimar Döffinger [Sat, 11 Apr 2020 19:28:32 +0000 (21:28 +0200)]
Also handle "paragraph end" newline character.

4 years agoReplace <sup></sup> sections with only digits by UTF-8.
Reimar Döffinger [Sat, 11 Apr 2020 15:33:02 +0000 (17:33 +0200)]
Replace <sup></sup> sections with only digits by UTF-8.

4 years agoExclude some more special titles not relevant for us.
Reimar Döffinger [Sat, 11 Apr 2020 14:23:11 +0000 (16:23 +0200)]
Exclude some more special titles not relevant for us.

4 years agoSwitch to https download URL.
Reimar Döffinger [Sat, 4 Apr 2020 20:37:07 +0000 (22:37 +0200)]
Switch to https download URL.

4 years agoExplicitly request 4GB RAM to run WiktionarySplitter.
Reimar Döffinger [Sun, 16 Jun 2019 08:34:40 +0000 (10:34 +0200)]
Explicitly request 4GB RAM to run WiktionarySplitter.

5 years agoRefine fix for Spanish wiktionary.
Reimar Döffinger [Wed, 9 Jan 2019 22:47:02 +0000 (23:47 +0100)]
Refine fix for Spanish wiktionary.

5 years agoDo not hard-code path to java binary.
Reimar Döffinger [Wed, 9 Jan 2019 20:44:56 +0000 (21:44 +0100)]
Do not hard-code path to java binary.

5 years agoImprove wiktionary splitter for Spanish and Portuguese
Reimar Döffinger [Wed, 9 Jan 2019 20:43:52 +0000 (21:43 +0100)]
Improve wiktionary splitter for Spanish and Portuguese

5 years agoAdd french-greek dictionary support.
Reimar Döffinger [Tue, 4 Dec 2018 20:16:07 +0000 (21:16 +0100)]
Add french-greek dictionary support.

5 years agoAdd support for generating Romani dictionary.
Reimar Döffinger [Wed, 8 Aug 2018 22:29:16 +0000 (00:29 +0200)]
Add support for generating Romani dictionary.

5 years agoMove several files out of Util.
Reimar Döffinger [Sun, 20 May 2018 12:41:48 +0000 (14:41 +0200)]
Move several files out of Util.

5 years agoMissing part of AR-ES support.
Reimar Döffinger [Sun, 20 May 2018 12:41:18 +0000 (14:41 +0200)]
Missing part of AR-ES support.

6 years agoAdd AR-ES dictionary generation.
Reimar Döffinger [Mon, 26 Feb 2018 20:25:23 +0000 (21:25 +0100)]
Add AR-ES dictionary generation.

6 years agoAdd German-Thai dictionary to generation list.
Reimar Döffinger [Mon, 26 Feb 2018 20:18:14 +0000 (21:18 +0100)]
Add German-Thai dictionary to generation list.

6 years agoRevert accidental changes to generate_dictionaries.sh.
Reimar Döffinger [Sun, 15 Oct 2017 14:47:12 +0000 (16:47 +0200)]
Revert accidental changes to generate_dictionaries.sh.

6 years agoReduce progress prints and optimize title check.
Reimar Döffinger [Sun, 15 Oct 2017 14:25:32 +0000 (16:25 +0200)]
Reduce progress prints and optimize title check.

6 years agoMinor optimizations for endPage function.
Reimar Döffinger [Sun, 15 Oct 2017 14:03:59 +0000 (16:03 +0200)]
Minor optimizations for endPage function.

6 years agoMove code out of loop that had no reason to be in it.
Reimar Döffinger [Sun, 15 Oct 2017 13:36:19 +0000 (15:36 +0200)]
Move code out of loop that had no reason to be in it.

6 years agoCompress WiktionarySplitter output files.
Reimar Döffinger [Sun, 15 Oct 2017 13:21:12 +0000 (15:21 +0200)]
Compress WiktionarySplitter output files.

Saves around 60% of disk space with no significant
difference in speed on a multi-core system.

6 years agoSupport compressed input for parsers.
Reimar Döffinger [Sun, 15 Oct 2017 10:08:25 +0000 (12:08 +0200)]
Support compressed input for parsers.

6 years agoAdd a write buffer to wiktionary splitter outputs.
Reimar Döffinger [Sun, 15 Oct 2017 08:38:13 +0000 (10:38 +0200)]
Add a write buffer to wiktionary splitter outputs.

Around 20% faster processing, and will be useful when
adding compression support as well.

6 years agoCache compiled patterns.
Reimar Döffinger [Sun, 15 Oct 2017 08:25:05 +0000 (10:25 +0200)]
Cache compiled patterns.

6 years agoAdd read-ahead buffer to decompress in parallel.
Reimar Döffinger [Sat, 14 Oct 2017 17:55:06 +0000 (19:55 +0200)]
Add read-ahead buffer to decompress in parallel.

Allows using more than one CPU core for a good speedup.
Benchmarks:
Uncompressed files:    196.29 CPU, 5:18.34 wall clock time
xz-compressed, before: 299.19 CPU, 5:21.85 wall clock time
xz-compressed, after:  308.96 CPU, 3:29.60 wall clock time

(first was I/O limited, second was CPU-limited, now it is
almost only limited by CPU-time for XML parsing)

6 years agoWiktionarySplitter: Support compressed inputs.
Reimar Döffinger [Sat, 7 Oct 2017 19:48:29 +0000 (21:48 +0200)]
WiktionarySplitter: Support compressed inputs.

Unfortunately bzip2 decompression is very slow (slower
than the XML parsing in fact), so it might make sense to
re-compress the downloaded files from bzip2 to xz.
If the decompression could be done in a separate thread,
xz compression would even provide a speedup if the files
are on a slower (non-SSD) disk.

6 years agoAdd logic for generating DE-RO dictionary.
Reimar Döffinger [Sat, 2 Sep 2017 17:57:45 +0000 (19:57 +0200)]
Add logic for generating DE-RO dictionary.

6 years agoUpdate to use Dictionary Util subproject.
Reimar Döffinger [Sat, 2 Sep 2017 17:54:19 +0000 (19:54 +0200)]
Update to use Dictionary Util subproject.

6 years agoSwitch to FileChannel and using Util from Dictionary subproject.
Reimar Döffinger [Sun, 20 Aug 2017 12:37:49 +0000 (14:37 +0200)]
Switch to FileChannel and using Util from Dictionary subproject.

6 years agoAdd support for generating Low German dictionary.
Reimar Döffinger [Tue, 15 Aug 2017 20:42:23 +0000 (22:42 +0200)]
Add support for generating Low German dictionary.

6 years agoPrevent inserting duplicate Pairs.
Reimar Döffinger [Sun, 13 Aug 2017 11:38:46 +0000 (13:38 +0200)]
Prevent inserting duplicate Pairs.

A rather brute-force approach and not
generic, but it's at least an improvement.

6 years agoAdd CollatorWrapper class to prepare for using ICU.
Reimar Döffinger [Sat, 5 Aug 2017 18:28:44 +0000 (20:28 +0200)]
Add CollatorWrapper class to prepare for using ICU.

Android should still use java.text, since that is based
on ICU anyway.

6 years agoAdd AR-TR dictionary generation.
Reimar Döffinger [Sat, 5 Aug 2017 08:08:12 +0000 (10:08 +0200)]
Add AR-TR dictionary generation.

7 years agoClearer error message if newline could not be found.
Reimar Döffinger [Thu, 13 Apr 2017 20:51:45 +0000 (22:51 +0200)]
Clearer error message if newline could not be found.

7 years agoFix wikisplit of Pennsylvania German
Reimar Döffinger [Thu, 13 Apr 2017 20:18:42 +0000 (22:18 +0200)]
Fix wikisplit of Pennsylvania German

7 years agoAvoids false parse errors due to ]] vs ] ].
Reimar Döffinger [Thu, 13 Apr 2017 19:14:21 +0000 (21:14 +0200)]
Avoids false parse errors due to ]] vs ] ].

7 years agoAnother fix to really skip comments.
Reimar Döffinger [Thu, 13 Apr 2017 18:37:37 +0000 (20:37 +0200)]
Another fix to really skip comments.

7 years agoMake logging configurable, default to severe only.
Reimar Döffinger [Thu, 13 Apr 2017 18:25:41 +0000 (20:25 +0200)]
Make logging configurable, default to severe only.

7 years agoFix skipping of comments.
Reimar Döffinger [Thu, 13 Apr 2017 18:25:08 +0000 (20:25 +0200)]
Fix skipping of comments.

7 years agoAdd Old Church Slavonic and Pennsylvania German.
Reimar Döffinger [Thu, 13 Apr 2017 16:45:37 +0000 (18:45 +0200)]
Add Old Church Slavonic and Pennsylvania German.

7 years agoAdd Sicilian
Reimar Döffinger [Thu, 23 Mar 2017 22:28:11 +0000 (23:28 +0100)]
Add Sicilian

7 years agoUpdate to work with latest Dictionary repo version.
Reimar Döffinger [Sun, 19 Mar 2017 20:52:37 +0000 (21:52 +0100)]
Update to work with latest Dictionary repo version.

7 years agoFix PT spelling for category links.
Reimar Döffinger [Sat, 11 Feb 2017 18:11:53 +0000 (19:11 +0100)]
Fix PT spelling for category links.

7 years agoAdd pt stoplist.
Reimar Döffinger [Sat, 11 Feb 2017 18:09:40 +0000 (19:09 +0100)]
Add pt stoplist.

7 years agoGenerate es and pt dictionaries, too.
Reimar Döffinger [Sat, 11 Feb 2017 18:05:03 +0000 (19:05 +0100)]
Generate es and pt dictionaries, too.

7 years agoFix crash in dictionary generation for PT input.
Reimar Döffinger [Sat, 11 Feb 2017 16:48:18 +0000 (17:48 +0100)]
Fix crash in dictionary generation for PT input.

7 years agoSupport pt and es wiktionary in splitter.
Reimar Döffinger [Sat, 11 Feb 2017 16:32:56 +0000 (17:32 +0100)]
Support pt and es wiktionary in splitter.

The ES format seems to have changed so we can
now actually use it.

7 years agoAdd pt download, make curl follow "moved permanently" redirects.
Reimar Döffinger [Sat, 11 Feb 2017 16:16:53 +0000 (17:16 +0100)]
Add pt download, make curl follow "moved permanently" redirects.

7 years agoAdd FR-PT dictionary.
Reimar Döffinger [Tue, 13 Dec 2016 21:33:04 +0000 (22:33 +0100)]
Add FR-PT dictionary.

7 years agoApply astyle code formatting.
Reimar Döffinger [Tue, 8 Nov 2016 22:28:19 +0000 (23:28 +0100)]
Apply astyle code formatting.

7 years agoAdd support for generating IT-RU dictionary.
Reimar Döffinger [Thu, 13 Oct 2016 21:42:29 +0000 (23:42 +0200)]
Add support for generating IT-RU dictionary.

7 years agoAdd ES-CA dictionary to generation list.
Reimar Döffinger [Wed, 5 Oct 2016 22:38:59 +0000 (00:38 +0200)]
Add ES-CA dictionary to generation list.

7 years agoFix it-noun parser.
Reimar Döffinger [Wed, 5 Oct 2016 22:27:21 +0000 (00:27 +0200)]
Fix it-noun parser.

Removes mnull entries in EN-IT dictionary.

7 years agoFix typo, not sure what it fixes though.
Reimar Döffinger [Wed, 5 Oct 2016 20:47:17 +0000 (22:47 +0200)]
Fix typo, not sure what it fixes though.

7 years agoAdd commented out possible future improvements
Reimar Döffinger [Wed, 5 Oct 2016 19:59:42 +0000 (21:59 +0200)]
Add commented out possible future improvements

7 years agoPartial progress to fix frwiktionary parsing.
Reimar Döffinger [Wed, 5 Oct 2016 19:56:18 +0000 (21:56 +0200)]
Partial progress to fix frwiktionary parsing.

7 years agoAdd DE-cmn dictionary to generation list.
Reimar Döffinger [Sun, 5 Jun 2016 12:26:53 +0000 (14:26 +0200)]
Add DE-cmn dictionary to generation list.

7 years agoFix special cases for sub-languages like Mandarin.
Reimar Döffinger [Sun, 5 Jun 2016 12:25:27 +0000 (14:25 +0200)]
Fix special cases for sub-languages like Mandarin.

8 years agoSupport generation FR-AR dictionary.
Reimar Döffinger [Fri, 15 Apr 2016 19:19:56 +0000 (21:19 +0200)]
Support generation FR-AR dictionary.

8 years agoAdd missing | in pattern.
Reimar Döffinger [Sat, 19 Mar 2016 21:20:07 +0000 (22:20 +0100)]
Add missing | in pattern.

8 years agoSome bugfixes for generation script.
Reimar Döffinger [Sun, 6 Mar 2016 14:02:56 +0000 (15:02 +0100)]
Some bugfixes for generation script.

Reset variables and fix second stop list.

8 years agoAdd two more languages to generation list.
Reimar Döffinger [Sun, 6 Mar 2016 13:34:01 +0000 (14:34 +0100)]
Add two more languages to generation list.

8 years agoAdd support to generate pure translation-to-translation dictionaries.
Reimar Döffinger [Mon, 11 Jan 2016 19:42:39 +0000 (20:42 +0100)]
Add support to generate pure translation-to-translation dictionaries.

8 years agoIgnore any local .class files.
Reimar Döffinger [Mon, 11 Jan 2016 19:26:23 +0000 (20:26 +0100)]
Ignore any local .class files.

8 years agoMore generic run scripts.
Reimar Döffinger [Wed, 6 Jan 2016 16:15:30 +0000 (17:15 +0100)]
More generic run scripts.

8 years agoMore robust compilation script.
Reimar Döffinger [Wed, 6 Jan 2016 16:10:12 +0000 (17:10 +0100)]
More robust compilation script.

8 years agoFix compilation.
Reimar Döffinger [Wed, 6 Jan 2016 16:09:59 +0000 (17:09 +0100)]
Fix compilation.

8 years agoFix filtering out translation from French HTML.
Reimar Döffinger [Thu, 17 Dec 2015 22:07:02 +0000 (23:07 +0100)]
Fix filtering out translation from French HTML.

8 years agoRe-add some English dictionaries after fixing them.
Reimar Döffinger [Wed, 16 Dec 2015 22:37:40 +0000 (23:37 +0100)]
Re-add some English dictionaries after fixing them.

8 years agoFix for splitting Mandarin/Cantonese/...
Reimar Döffinger [Wed, 16 Dec 2015 22:35:44 +0000 (23:35 +0100)]
Fix for splitting Mandarin/Cantonese/...

8 years agoFix splitting of Greek/Ancient Greek.
Reimar Döffinger [Wed, 16 Dec 2015 21:51:03 +0000 (22:51 +0100)]
Fix splitting of Greek/Ancient Greek.

8 years agoImprove tokenizer speed.
Reimar Döffinger [Mon, 14 Dec 2015 22:49:56 +0000 (23:49 +0100)]
Improve tokenizer speed.

8 years agoUse default Java Collator.
Reimar Döffinger [Sun, 13 Dec 2015 14:10:31 +0000 (15:10 +0100)]
Use default Java Collator.

8 years agoSwitch to newer icu4j to fix hang bugs with EN-ZH.
Reimar Döffinger [Sun, 13 Dec 2015 13:45:13 +0000 (14:45 +0100)]
Switch to newer icu4j to fix hang bugs with EN-ZH.

8 years agoFix parsing of examples with multiline foreign part.
Reimar Döffinger [Sun, 13 Dec 2015 12:22:42 +0000 (13:22 +0100)]
Fix parsing of examples with multiline foreign part.

8 years agoMinor code cleanup.
Reimar Döffinger [Sun, 13 Dec 2015 01:42:54 +0000 (02:42 +0100)]
Minor code cleanup.

8 years agoAvoid replaceAll.
Reimar Döffinger [Sun, 13 Dec 2015 00:07:40 +0000 (01:07 +0100)]
Avoid replaceAll.

It uses regexp and is horribly slow, so use replace
where it works just as well.

8 years agoFree some memory as early as possible.
Reimar Döffinger [Sun, 13 Dec 2015 00:06:44 +0000 (01:06 +0100)]
Free some memory as early as possible.

8 years agoFix German name for latin.
Reimar Döffinger [Sat, 12 Dec 2015 20:02:05 +0000 (21:02 +0100)]
Fix German name for latin.

Should fix almost all words missing in the DE-LA dictionary.

8 years agoFix compilation against latest newformat branch.
Reimar Döffinger [Sat, 12 Dec 2015 15:11:36 +0000 (16:11 +0100)]
Fix compilation against latest newformat branch.

8 years agoEncode URLs as ASCII, avoid UTF-8.
Reimar Döffinger [Wed, 9 Dec 2015 17:20:49 +0000 (18:20 +0100)]
Encode URLs as ASCII, avoid UTF-8.

This is necessary for links to work on
Android 2.x.

8 years agoImprovements to wikisplit code.
Reimar Döffinger [Tue, 8 Dec 2015 18:56:51 +0000 (19:56 +0100)]
Improvements to wikisplit code.

8 years agoSwitch script to generate version 7 zips.
Reimar Döffinger [Tue, 8 Dec 2015 05:17:48 +0000 (06:17 +0100)]
Switch script to generate version 7 zips.