DictionaryPC.git
8 weeks agoAttempt at using maven for compilation master
Frédéric Perrin [Sat, 13 Feb 2021 09:15:08 +0000 (09:15 +0000)]
Attempt at using maven for compilation

2 months agoEscape only unicode chars
Frédéric Perrin [Sun, 7 Feb 2021 00:35:19 +0000 (00:35 +0000)]
Escape only unicode chars

2 months agoMove library search to a common file
Frédéric Perrin [Sun, 7 Feb 2021 00:33:50 +0000 (00:33 +0000)]
Move library search to a common file

2 months agoEnable single-language dictionary
Frédéric Perrin [Sat, 6 Feb 2021 10:57:12 +0000 (10:57 +0000)]
Enable single-language dictionary

2 months agoFix pathnames
Frédéric Perrin [Sat, 30 Jan 2021 18:38:13 +0000 (18:38 +0000)]
Fix pathnames

2 months agoRevert "Fix compile warnings."
Frédéric Perrin [Sat, 30 Jan 2021 14:12:10 +0000 (14:12 +0000)]
Revert "Fix compile warnings."

This reverts commit 2182783b7ac6a22c23b37db4ba458ff12a6978dc.

3 months agoAlign with optimized Dictionary repo code.
Reimar Döffinger [Sun, 27 Dec 2020 20:16:51 +0000 (21:16 +0100)]
Align with optimized Dictionary repo code.

3 months agoUse optimized StringUtil.split() function.
Reimar Döffinger [Sun, 27 Dec 2020 18:31:13 +0000 (19:31 +0100)]
Use optimized StringUtil.split() function.

Also optimize the function extracting entry parts
in brackets or parenthesis.

3 months agoSome minor optimizations.
Reimar Döffinger [Sun, 27 Dec 2020 17:15:51 +0000 (18:15 +0100)]
Some minor optimizations.

3 months agoMinor simplification of DictFileParser.
Reimar Döffinger [Sun, 27 Dec 2020 14:21:46 +0000 (15:21 +0100)]
Minor simplification of DictFileParser.

3 months agoMerge pull request #5 from christophlingg/patch-1
Reimar Döffinger [Sun, 20 Dec 2020 09:57:56 +0000 (10:57 +0100)]
Merge pull request #5 from christophlingg/patch-1

include dict.cc entries with subject labels

3 months agoMerge pull request #3 from zorun/swedish
Reimar Döffinger [Sun, 20 Dec 2020 09:55:20 +0000 (10:55 +0100)]
Merge pull request #3 from zorun/swedish

Add french-swedish dictionary support.

3 months agoinclude dict.cc entries with subject labels
Christoph Lingg [Sun, 20 Dec 2020 01:11:30 +0000 (02:11 +0100)]
include dict.cc entries with subject labels

I used this script to turn a DE-ES dict.cc file into a quickdic compatible with my Tolino. From the original 45k entries more than 20k were dropped because they had a subject label:

> WARNING: Malformed line: Atomphysik {f} física {f} atómica noun [phys.]

This change allows lines to have 4 fields/columns: `language1`, `language2`, `word class`,  `subject labels`.

see also https://github.com/natowi/quickdic-dictionary.dictionarypc/issues/1

10 months agoAdd native-image.cmd to build a Windows native binary
Reimar Döffinger [Sun, 24 May 2020 19:45:03 +0000 (21:45 +0200)]
Add native-image.cmd to build a Windows native binary

10 months agoExplicitly specify source and target for javac command.
Reimar Döffinger [Sun, 24 May 2020 17:43:02 +0000 (19:43 +0200)]
Explicitly specify source and target for javac command.

Necessary to limit to Java 11 for compatibility with native-image.

11 months agoWiktionarySplitter: implement parallel processing
Reimar Döffinger [Sat, 25 Apr 2020 13:52:16 +0000 (15:52 +0200)]
WiktionarySplitter: implement parallel processing

Also reduce memory footprint to enable that.

11 months agoAdd scripts to compile a native image.
Reimar Döffinger [Sat, 25 Apr 2020 09:50:19 +0000 (11:50 +0200)]
Add scripts to compile a native image.

11 months agoChange scripts to be able to use a native-image binary instead.
Reimar Döffinger [Sat, 25 Apr 2020 09:47:23 +0000 (11:47 +0200)]
Change scripts to be able to use a native-image binary instead.

11 months agoAlso support CheckDictionariesMain in Runner.
Reimar Döffinger [Sat, 25 Apr 2020 09:34:08 +0000 (11:34 +0200)]
Also support CheckDictionariesMain in Runner.

11 months agoAdd Runner class to allow to run all of the programs.
Reimar Döffinger [Tue, 21 Apr 2020 20:39:52 +0000 (22:39 +0200)]
Add Runner class to allow to run all of the programs.

Thus allows to provide a single runnable jar or even
native-image to provide all functionality.

11 months agoWrite v6 stoplist without using serialization features.
Reimar Döffinger [Tue, 21 Apr 2020 18:56:03 +0000 (20:56 +0200)]
Write v6 stoplist without using serialization features.

11 months agoAdd parallel HashMap for faster lookups.
Reimar Döffinger [Thu, 16 Apr 2020 23:19:51 +0000 (01:19 +0200)]
Add parallel HashMap for faster lookups.

11 months agoFix compilation.
Reimar Döffinger [Thu, 16 Apr 2020 22:46:32 +0000 (00:46 +0200)]
Fix compilation.

Caused by lack of testing of the testing script :)

11 months agoAvoid unnecessary use of String.format.
Reimar Döffinger [Thu, 16 Apr 2020 21:55:45 +0000 (23:55 +0200)]
Avoid unnecessary use of String.format.

11 months agoOptimize escapedFindEnd.
Reimar Döffinger [Thu, 16 Apr 2020 21:11:58 +0000 (23:11 +0200)]
Optimize escapedFindEnd.

11 months agoOptimize finding start of next token.
Reimar Döffinger [Thu, 16 Apr 2020 20:10:03 +0000 (22:10 +0200)]
Optimize finding start of next token.

11 months agoSimplify newline handling and regexes.
Reimar Döffinger [Thu, 16 Apr 2020 19:37:50 +0000 (21:37 +0200)]
Simplify newline handling and regexes.

11 months agoOptimize plaintext dispatch path.
Reimar Döffinger [Thu, 16 Apr 2020 19:21:29 +0000 (21:21 +0200)]
Optimize plaintext dispatch path.

About 11% faster.

11 months agoFix run command, commons-lang3 is still needed (by commons-text).
Reimar Döffinger [Wed, 15 Apr 2020 20:29:31 +0000 (22:29 +0200)]
Fix run command, commons-lang3 is still needed (by commons-text).

11 months agoOptimize comparisons for TreeMap.
Reimar Döffinger [Wed, 15 Apr 2020 20:17:55 +0000 (22:17 +0200)]
Optimize comparisons for TreeMap.

11 months agoMinor automated code simplifications.
Reimar Döffinger [Wed, 15 Apr 2020 16:04:12 +0000 (18:04 +0200)]
Minor automated code simplifications.

11 months agoAvoid creating the same Matchers over and over.
Reimar Döffinger [Wed, 15 Apr 2020 15:50:19 +0000 (17:50 +0200)]
Avoid creating the same Matchers over and over.

11 months agoMove v6 writing code here from Android code repo.
Reimar Döffinger [Mon, 13 Apr 2020 23:41:22 +0000 (01:41 +0200)]
Move v6 writing code here from Android code repo.

11 months agoAdd code to convert a Dictionary to the old v6 format.
Reimar Döffinger [Mon, 13 Apr 2020 14:35:44 +0000 (16:35 +0200)]
Add code to convert a Dictionary to the old v6 format.

11 months agoRemove more left-overs from xerces dependency.
Reimar Döffinger [Mon, 13 Apr 2020 13:47:28 +0000 (15:47 +0200)]
Remove more left-overs from xerces dependency.

11 months agoGet rid of xerces dependency.
Reimar Döffinger [Mon, 13 Apr 2020 13:42:17 +0000 (15:42 +0200)]
Get rid of xerces dependency.

Relying on the standard XML implementation is a lot slower,
but the WiktionarySplitter run still only takes a few minutes.

11 months agoUpdate runner scripts for compilation into bin/
Reimar Döffinger [Mon, 13 Apr 2020 13:27:41 +0000 (15:27 +0200)]
Update runner scripts for compilation into bin/

11 months agoConsistent EOL format.
Reimar Döffinger [Mon, 13 Apr 2020 13:03:32 +0000 (15:03 +0200)]
Consistent EOL format.

11 months agoEnable all compiler warnings (-Xlint:all).
Reimar Döffinger [Mon, 13 Apr 2020 12:49:33 +0000 (14:49 +0200)]
Enable all compiler warnings (-Xlint:all).

11 months agoRemove long obsolete googlecode_upload.py
Reimar Döffinger [Mon, 13 Apr 2020 12:40:14 +0000 (14:40 +0200)]
Remove long obsolete googlecode_upload.py

11 months agoPut compiled .class files into bin/ directory.
Reimar Döffinger [Mon, 13 Apr 2020 12:35:54 +0000 (14:35 +0200)]
Put compiled .class files into bin/ directory.

Cleaner than having them all over.
Also update gitignore file.

11 months agoDelete included jars, they are not the right versions anyway.
Reimar Döffinger [Mon, 13 Apr 2020 12:31:10 +0000 (14:31 +0200)]
Delete included jars, they are not the right versions anyway.

11 months agoUpdate run command to commons-text instead of commons-lang3
Reimar Döffinger [Mon, 13 Apr 2020 12:30:42 +0000 (14:30 +0200)]
Update run command to commons-text instead of commons-lang3

11 months agoImport cleanup/changes for Eclipse compatibility.
Reimar Döffinger [Mon, 13 Apr 2020 12:10:39 +0000 (14:10 +0200)]
Import cleanup/changes for Eclipse compatibility.

Unfortunately Eclipse insists on fiddling with imports
without knowing what it does, thus breaking compilation
and in one case causing a bug that will result in
subtly broken dictionaries.

11 months agoMinor code cleanup.
Reimar Döffinger [Mon, 13 Apr 2020 12:03:31 +0000 (14:03 +0200)]
Minor code cleanup.

11 months agoFix compile warnings.
Reimar Döffinger [Sat, 11 Apr 2020 23:00:10 +0000 (01:00 +0200)]
Fix compile warnings.

11 months agoRemove unused functions that cause warnings.
Reimar Döffinger [Sat, 11 Apr 2020 22:50:09 +0000 (00:50 +0200)]
Remove unused functions that cause warnings.

11 months agoExplicitly specify encoding for compile command.
Reimar Döffinger [Sat, 11 Apr 2020 22:44:55 +0000 (00:44 +0200)]
Explicitly specify encoding for compile command.

For better usability.

11 months agoSwitch to new dictionary path.
Reimar Döffinger [Sat, 11 Apr 2020 22:09:40 +0000 (00:09 +0200)]
Switch to new dictionary path.

11 months agoAlso handle "paragraph end" newline character.
Reimar Döffinger [Sat, 11 Apr 2020 19:28:32 +0000 (21:28 +0200)]
Also handle "paragraph end" newline character.

11 months agoReplace <sup></sup> sections with only digits by UTF-8.
Reimar Döffinger [Sat, 11 Apr 2020 15:33:02 +0000 (17:33 +0200)]
Replace <sup></sup> sections with only digits by UTF-8.

11 months agoExclude some more special titles not relevant for us.
Reimar Döffinger [Sat, 11 Apr 2020 14:23:11 +0000 (16:23 +0200)]
Exclude some more special titles not relevant for us.

12 months agoSwitch to https download URL.
Reimar Döffinger [Sat, 4 Apr 2020 20:37:07 +0000 (22:37 +0200)]
Switch to https download URL.

19 months agoAdd french-swedish dictionary support.
Baptiste Jonglez [Thu, 15 Aug 2019 17:52:26 +0000 (19:52 +0200)]
Add french-swedish dictionary support.

21 months agoExplicitly request 4GB RAM to run WiktionarySplitter.
Reimar Döffinger [Sun, 16 Jun 2019 08:34:40 +0000 (10:34 +0200)]
Explicitly request 4GB RAM to run WiktionarySplitter.

2 years agoRefine fix for Spanish wiktionary.
Reimar Döffinger [Wed, 9 Jan 2019 22:47:02 +0000 (23:47 +0100)]
Refine fix for Spanish wiktionary.

2 years agoDo not hard-code path to java binary.
Reimar Döffinger [Wed, 9 Jan 2019 20:44:56 +0000 (21:44 +0100)]
Do not hard-code path to java binary.

2 years agoImprove wiktionary splitter for Spanish and Portuguese
Reimar Döffinger [Wed, 9 Jan 2019 20:43:52 +0000 (21:43 +0100)]
Improve wiktionary splitter for Spanish and Portuguese

2 years agoAdd french-greek dictionary support.
Reimar Döffinger [Tue, 4 Dec 2018 20:16:07 +0000 (21:16 +0100)]
Add french-greek dictionary support.

2 years agoAdd support for generating Romani dictionary.
Reimar Döffinger [Wed, 8 Aug 2018 22:29:16 +0000 (00:29 +0200)]
Add support for generating Romani dictionary.

2 years agoMove several files out of Util.
Reimar Döffinger [Sun, 20 May 2018 12:41:48 +0000 (14:41 +0200)]
Move several files out of Util.

2 years agoMissing part of AR-ES support.
Reimar Döffinger [Sun, 20 May 2018 12:41:18 +0000 (14:41 +0200)]
Missing part of AR-ES support.

3 years agoAdd AR-ES dictionary generation.
Reimar Döffinger [Mon, 26 Feb 2018 20:25:23 +0000 (21:25 +0100)]
Add AR-ES dictionary generation.

3 years agoAdd German-Thai dictionary to generation list.
Reimar Döffinger [Mon, 26 Feb 2018 20:18:14 +0000 (21:18 +0100)]
Add German-Thai dictionary to generation list.

3 years agoRevert accidental changes to generate_dictionaries.sh.
Reimar Döffinger [Sun, 15 Oct 2017 14:47:12 +0000 (16:47 +0200)]
Revert accidental changes to generate_dictionaries.sh.

3 years agoReduce progress prints and optimize title check.
Reimar Döffinger [Sun, 15 Oct 2017 14:25:32 +0000 (16:25 +0200)]
Reduce progress prints and optimize title check.

3 years agoMinor optimizations for endPage function.
Reimar Döffinger [Sun, 15 Oct 2017 14:03:59 +0000 (16:03 +0200)]
Minor optimizations for endPage function.

3 years agoMove code out of loop that had no reason to be in it.
Reimar Döffinger [Sun, 15 Oct 2017 13:36:19 +0000 (15:36 +0200)]
Move code out of loop that had no reason to be in it.

3 years agoCompress WiktionarySplitter output files.
Reimar Döffinger [Sun, 15 Oct 2017 13:21:12 +0000 (15:21 +0200)]
Compress WiktionarySplitter output files.

Saves around 60% of disk space with no significant
difference in speed on a multi-core system.

3 years agoSupport compressed input for parsers.
Reimar Döffinger [Sun, 15 Oct 2017 10:08:25 +0000 (12:08 +0200)]
Support compressed input for parsers.

3 years agoAdd a write buffer to wiktionary splitter outputs.
Reimar Döffinger [Sun, 15 Oct 2017 08:38:13 +0000 (10:38 +0200)]
Add a write buffer to wiktionary splitter outputs.

Around 20% faster processing, and will be useful when
adding compression support as well.

3 years agoCache compiled patterns.
Reimar Döffinger [Sun, 15 Oct 2017 08:25:05 +0000 (10:25 +0200)]
Cache compiled patterns.

3 years agoAdd read-ahead buffer to decompress in parallel.
Reimar Döffinger [Sat, 14 Oct 2017 17:55:06 +0000 (19:55 +0200)]
Add read-ahead buffer to decompress in parallel.

Allows using more than one CPU core for a good speedup.
Benchmarks:
Uncompressed files:    196.29 CPU, 5:18.34 wall clock time
xz-compressed, before: 299.19 CPU, 5:21.85 wall clock time
xz-compressed, after:  308.96 CPU, 3:29.60 wall clock time

(first was I/O limited, second was CPU-limited, now it is
almost only limited by CPU-time for XML parsing)

3 years agoWiktionarySplitter: Support compressed inputs.
Reimar Döffinger [Sat, 7 Oct 2017 19:48:29 +0000 (21:48 +0200)]
WiktionarySplitter: Support compressed inputs.

Unfortunately bzip2 decompression is very slow (slower
than the XML parsing in fact), so it might make sense to
re-compress the downloaded files from bzip2 to xz.
If the decompression could be done in a separate thread,
xz compression would even provide a speedup if the files
are on a slower (non-SSD) disk.

3 years agoAdd logic for generating DE-RO dictionary.
Reimar Döffinger [Sat, 2 Sep 2017 17:57:45 +0000 (19:57 +0200)]
Add logic for generating DE-RO dictionary.

3 years agoUpdate to use Dictionary Util subproject.
Reimar Döffinger [Sat, 2 Sep 2017 17:54:19 +0000 (19:54 +0200)]
Update to use Dictionary Util subproject.

3 years agoSwitch to FileChannel and using Util from Dictionary subproject.
Reimar Döffinger [Sun, 20 Aug 2017 12:37:49 +0000 (14:37 +0200)]
Switch to FileChannel and using Util from Dictionary subproject.

3 years agoAdd support for generating Low German dictionary.
Reimar Döffinger [Tue, 15 Aug 2017 20:42:23 +0000 (22:42 +0200)]
Add support for generating Low German dictionary.

3 years agoPrevent inserting duplicate Pairs.
Reimar Döffinger [Sun, 13 Aug 2017 11:38:46 +0000 (13:38 +0200)]
Prevent inserting duplicate Pairs.

A rather brute-force approach and not
generic, but it's at least an improvement.

3 years agoAdd CollatorWrapper class to prepare for using ICU.
Reimar Döffinger [Sat, 5 Aug 2017 18:28:44 +0000 (20:28 +0200)]
Add CollatorWrapper class to prepare for using ICU.

Android should still use java.text, since that is based
on ICU anyway.

3 years agoAdd AR-TR dictionary generation.
Reimar Döffinger [Sat, 5 Aug 2017 08:08:12 +0000 (10:08 +0200)]
Add AR-TR dictionary generation.

3 years agoClearer error message if newline could not be found.
Reimar Döffinger [Thu, 13 Apr 2017 20:51:45 +0000 (22:51 +0200)]
Clearer error message if newline could not be found.

3 years agoFix wikisplit of Pennsylvania German
Reimar Döffinger [Thu, 13 Apr 2017 20:18:42 +0000 (22:18 +0200)]
Fix wikisplit of Pennsylvania German

3 years agoAvoids false parse errors due to ]] vs ] ].
Reimar Döffinger [Thu, 13 Apr 2017 19:14:21 +0000 (21:14 +0200)]
Avoids false parse errors due to ]] vs ] ].

3 years agoAnother fix to really skip comments.
Reimar Döffinger [Thu, 13 Apr 2017 18:37:37 +0000 (20:37 +0200)]
Another fix to really skip comments.

3 years agoMake logging configurable, default to severe only.
Reimar Döffinger [Thu, 13 Apr 2017 18:25:41 +0000 (20:25 +0200)]
Make logging configurable, default to severe only.

3 years agoFix skipping of comments.
Reimar Döffinger [Thu, 13 Apr 2017 18:25:08 +0000 (20:25 +0200)]
Fix skipping of comments.

3 years agoAdd Old Church Slavonic and Pennsylvania German.
Reimar Döffinger [Thu, 13 Apr 2017 16:45:37 +0000 (18:45 +0200)]
Add Old Church Slavonic and Pennsylvania German.

4 years agoAdd Sicilian
Reimar Döffinger [Thu, 23 Mar 2017 22:28:11 +0000 (23:28 +0100)]
Add Sicilian

4 years agoUpdate to work with latest Dictionary repo version.
Reimar Döffinger [Sun, 19 Mar 2017 20:52:37 +0000 (21:52 +0100)]
Update to work with latest Dictionary repo version.

4 years agoFix PT spelling for category links.
Reimar Döffinger [Sat, 11 Feb 2017 18:11:53 +0000 (19:11 +0100)]
Fix PT spelling for category links.

4 years agoAdd pt stoplist.
Reimar Döffinger [Sat, 11 Feb 2017 18:09:40 +0000 (19:09 +0100)]
Add pt stoplist.

4 years agoGenerate es and pt dictionaries, too.
Reimar Döffinger [Sat, 11 Feb 2017 18:05:03 +0000 (19:05 +0100)]
Generate es and pt dictionaries, too.

4 years agoFix crash in dictionary generation for PT input.
Reimar Döffinger [Sat, 11 Feb 2017 16:48:18 +0000 (17:48 +0100)]
Fix crash in dictionary generation for PT input.

4 years agoSupport pt and es wiktionary in splitter.
Reimar Döffinger [Sat, 11 Feb 2017 16:32:56 +0000 (17:32 +0100)]
Support pt and es wiktionary in splitter.

The ES format seems to have changed so we can
now actually use it.

4 years agoAdd pt download, make curl follow "moved permanently" redirects.
Reimar Döffinger [Sat, 11 Feb 2017 16:16:53 +0000 (17:16 +0100)]
Add pt download, make curl follow "moved permanently" redirects.

4 years agoAdd FR-PT dictionary.
Reimar Döffinger [Tue, 13 Dec 2016 21:33:04 +0000 (22:33 +0100)]
Add FR-PT dictionary.

4 years agoApply astyle code formatting.
Reimar Döffinger [Tue, 8 Nov 2016 22:28:19 +0000 (23:28 +0100)]
Apply astyle code formatting.

4 years agoAdd support for generating IT-RU dictionary.
Reimar Döffinger [Thu, 13 Oct 2016 21:42:29 +0000 (23:42 +0200)]
Add support for generating IT-RU dictionary.

4 years agoAdd ES-CA dictionary to generation list.
Reimar Döffinger [Wed, 5 Oct 2016 22:38:59 +0000 (00:38 +0200)]
Add ES-CA dictionary to generation list.