]>
gitweb.fperrin.net Git - DictionaryPC.git/log
Reimar Döffinger [Fri, 28 Aug 2015 04:44:02 +0000 (06:44 +0200)]
Small updates to dictionary generation.
Reimar Döffinger [Thu, 27 Aug 2015 20:38:38 +0000 (22:38 +0200)]
Add commons-lang3.jar to classpath.
Reimar Döffinger [Thu, 27 Aug 2015 17:08:16 +0000 (19:08 +0200)]
Update for new dictionary release URL.
Reimar Döffinger [Thu, 27 Aug 2015 17:04:23 +0000 (19:04 +0200)]
Add comment about hang issue.
Reimar Döffinger [Thu, 27 Aug 2015 16:05:57 +0000 (18:05 +0200)]
Add script to help with dictionary generation.
Also update en and sv stoplists and parse the
Spanish wiktionary a bit.
Reimar Döffinger [Wed, 26 Aug 2015 20:34:44 +0000 (22:34 +0200)]
Script adds/improvements dictionary generation.
Reimar Döffinger [Wed, 26 Aug 2015 20:21:13 +0000 (22:21 +0200)]
Download latest wiktionary files.
The old ones referenced no longer exist,
so just try with the latest ones.
Reimar Döffinger [Wed, 26 Aug 2015 20:20:48 +0000 (22:20 +0200)]
Add script to update dictionary list in app.
Reimar Döffinger [Mon, 24 Aug 2015 20:14:10 +0000 (22:14 +0200)]
Update file location URL.
Reimar Döffinger [Mon, 24 Aug 2015 19:31:01 +0000 (21:31 +0200)]
Add horrible but working compile/run scripts.
Reimar Döffinger [Mon, 24 Aug 2015 19:29:49 +0000 (21:29 +0200)]
Replace com.sun.xml.internal.rngom.util.Uri.
I have no idea where that package can be found.
Reimar Döffinger [Mon, 24 Aug 2015 19:28:37 +0000 (21:28 +0200)]
Disable some debug code to allow compilation.
Thad Hughes [Thu, 26 Dec 2013 01:48:07 +0000 (17:48 -0800)]
Fixes for Malay$ and reorderings due to new ICU4J.
Thad Hughes [Tue, 3 Dec 2013 18:34:21 +0000 (10:34 -0800)]
Update WiktionaryLangs.
Thad Hughes [Fri, 29 Nov 2013 20:44:03 +0000 (12:44 -0800)]
Dictionary added by Phil.
Thad Hughes [Sun, 7 Apr 2013 18:01:16 +0000 (11:01 -0700)]
Updated DictionaryBuilder.jar.
Thad Hughes [Sun, 7 Apr 2013 17:57:04 +0000 (10:57 -0700)]
IT-TR dictionary test.
Thad Hughes [Sun, 7 Apr 2013 17:53:36 +0000 (10:53 -0700)]
go
Thad Hughes [Wed, 9 Jan 2013 05:53:42 +0000 (21:53 -0800)]
Fix Malay/Malayalam, add test for "buon g".
Thad Hughes [Sat, 5 Jan 2013 18:13:16 +0000 (10:13 -0800)]
Using new Chemnitz dictionary.
Thad Hughes [Sat, 5 Jan 2013 18:00:39 +0000 (10:00 -0800)]
Fix name of chemnitz dictionary.
Thad Hughes [Sat, 5 Jan 2013 17:57:44 +0000 (09:57 -0800)]
Fix AF-EN test.
Thad Hughes [Sat, 5 Jan 2013 06:17:34 +0000 (22:17 -0800)]
Fixed comment for German dictionary.
Thad Hughes [Thu, 3 Jan 2013 18:44:44 +0000 (10:44 -0800)]
Eliminated <ref>s.
Thad Hughes [Thu, 3 Jan 2013 05:09:21 +0000 (21:09 -0800)]
Skip Italian references.
Thad Hughes [Thu, 3 Jan 2013 03:35:08 +0000 (19:35 -0800)]
Split ZH into yue and cmn, fixed German heading.
Thad Hughes [Sun, 30 Dec 2012 06:36:12 +0000 (22:36 -0800)]
FR single lang.
Thad Hughes [Sun, 30 Dec 2012 06:35:44 +0000 (22:35 -0800)]
Update URL format and parsing, fix FR handling.
Thad Hughes [Sun, 23 Dec 2012 18:38:28 +0000 (10:38 -0800)]
Multi word search now looks for exact matches of TokenRows.
Thad Hughes [Sun, 23 Dec 2012 17:43:29 +0000 (09:43 -0800)]
Building dicitonaries.
Thad Hughes [Sun, 16 Dec 2012 00:02:53 +0000 (16:02 -0800)]
Update to latest wiktionaries, update unit tests, der-top/mid/bottom.
Thad Hughes [Sat, 15 Dec 2012 23:34:20 +0000 (15:34 -0800)]
Fixed URL encoding in goldens.
Thad Hughes [Mon, 3 Dec 2012 21:47:50 +0000 (13:47 -0800)]
go
thadh [Sun, 7 Oct 2012 18:36:16 +0000 (11:36 -0700)]
Added simple parsing logic for DE and IT wiktionaries.
thadh [Thu, 4 Oct 2012 17:19:32 +0000 (10:19 -0700)]
Updated test cases to latest wiktionary dumps.
thadh [Thu, 4 Oct 2012 15:09:10 +0000 (08:09 -0700)]
Updated input locations. Moved pairs in builder.
thadh [Wed, 3 Oct 2012 23:12:33 +0000 (16:12 -0700)]
Fixed trailing ,s in italian verb tenses.
Hotlink to URL at bottom of HTMLEntry page.
Parsing an only-English dictionary for the first time, yay!
thadh [Mon, 1 Oct 2012 17:41:33 +0000 (10:41 -0700)]
Format links properly.
thadh [Sun, 30 Sep 2012 17:04:56 +0000 (10:04 -0700)]
Synonyms, antonyms.
thadh [Tue, 25 Sep 2012 15:42:10 +0000 (08:42 -0700)]
Don't handle it-conj in EnParser.
thadh [Tue, 25 Sep 2012 05:47:22 +0000 (22:47 -0700)]
it-noun.
thadh [Tue, 25 Sep 2012 05:29:28 +0000 (22:29 -0700)]
Link forms, page limit arabic, change HTML.
thadh [Tue, 25 Sep 2012 04:43:16 +0000 (21:43 -0700)]
Put links into HtmlEntry.
thadh [Sun, 23 Sep 2012 15:54:13 +0000 (08:54 -0700)]
Italian verb conjugations!
thadh [Sat, 22 Sep 2012 19:39:15 +0000 (12:39 -0700)]
it-conj (most of the way), unicode handling in strings.
thadh [Tue, 18 Sep 2012 19:55:59 +0000 (12:55 -0700)]
Expand italian test to get verb conjuations.
thadh [Tue, 18 Sep 2012 19:46:23 +0000 (12:46 -0700)]
Basic general functions in WholeSectionParser.
thadh [Tue, 18 Sep 2012 18:51:02 +0000 (11:51 -0700)]
Skip lang=XX for the lang we care about.
thadh [Tue, 18 Sep 2012 18:36:15 +0000 (11:36 -0700)]
Skip w: and Image: wikiLinks.
thadh [Tue, 18 Sep 2012 18:14:34 +0000 (11:14 -0700)]
Delete Anagrams and References sections.
thadh [Tue, 18 Sep 2012 18:11:24 +0000 (11:11 -0700)]
Got rid of Category:.
thadh [Tue, 18 Sep 2012 17:49:56 +0000 (10:49 -0700)]
Get rid of training "en:word" crap.
thadh [Tue, 18 Sep 2012 17:45:40 +0000 (10:45 -0700)]
Reformat.
thadh [Tue, 18 Sep 2012 17:45:02 +0000 (10:45 -0700)]
Update unit tests for parsing function name.
thadh [Tue, 18 Sep 2012 17:13:51 +0000 (10:13 -0700)]
Fixed Builder, and escaping arg names.
thadh [Tue, 11 Sep 2012 00:46:51 +0000 (17:46 -0700)]
HtmlEntries don't count as main entries.
thadh [Mon, 10 Sep 2012 22:56:40 +0000 (15:56 -0700)]
Whitespace.
thadh [Mon, 10 Sep 2012 22:55:33 +0000 (15:55 -0700)]
Whitespace.
thadh [Mon, 10 Sep 2012 22:17:58 +0000 (15:17 -0700)]
Update goldens.
thadh [Mon, 10 Sep 2012 22:05:02 +0000 (15:05 -0700)]
Add some langs (Ancient Greek, Cantonese, Burmese(MY)), WholeSection
parser improvements, Splitter improvements. Builder uses WholeSection
parser.
thadh [Mon, 10 Sep 2012 03:40:36 +0000 (20:40 -0700)]
First decent implementation of HtmlEntry attached to TokenRow.
thadh [Mon, 20 Aug 2012 03:07:41 +0000 (20:07 -0700)]
Add TA=Tamil language.
thadh [Tue, 31 Jul 2012 15:40:55 +0000 (08:40 -0700)]
Test data added.
thadh [Sat, 28 Jul 2012 01:51:00 +0000 (18:51 -0700)]
Escape HTML. Test special ISO coding.
thadh [Tue, 24 Jul 2012 00:25:27 +0000 (17:25 -0700)]
gitignore
thadh [Tue, 24 Jul 2012 00:18:29 +0000 (17:18 -0700)]
Baseline HTML parsing done, goldens updated!
thadh [Sun, 22 Jul 2012 01:43:48 +0000 (18:43 -0700)]
Refactor code to generate dictionaries to make it all one loop!
thadh [Sat, 21 Jul 2012 18:06:55 +0000 (11:06 -0700)]
Update unit tests with new wiki data.
thadh [Sat, 21 Jul 2012 17:41:21 +0000 (10:41 -0700)]
Added WholeSection entries and parser.
Switched to real xerces parser because Sun fork was crashing on
enwiktionary data.
thadh [Tue, 17 Jul 2012 04:09:50 +0000 (21:09 -0700)]
Updated unit tests, added WholeSectionToHtmlParser.
Thad Hughes [Sun, 20 May 2012 23:31:08 +0000 (16:31 -0700)]
DictionaryBuilder prints sortable langs, JP->JA fix.
Thad Hughes [Fri, 11 May 2012 20:51:32 +0000 (13:51 -0700)]
Build fr_de dictionary from enwiktionary, yeah!
Thad Hughes [Thu, 10 May 2012 04:05:15 +0000 (21:05 -0700)]
Updated to latest enwiktionary.
Thad Hughes [Wed, 9 May 2012 22:09:04 +0000 (15:09 -0700)]
Unit tests working, looks like I'd been revamping the parsers.
Thad Hughes [Fri, 9 Mar 2012 02:15:03 +0000 (18:15 -0800)]
Added DictionaryBuilder.jar
Thad Hughes [Fri, 9 Mar 2012 00:58:31 +0000 (16:58 -0800)]
Update version to v004.
Thad Hughes [Thu, 8 Mar 2012 19:33:29 +0000 (11:33 -0800)]
Fixes to tr= and head= make Arabic,Thai look much better.
Thad Hughes [Thu, 8 Mar 2012 17:56:35 +0000 (09:56 -0800)]
Update unit tests for new wiktionary.
Thad Hughes [Thu, 8 Mar 2012 17:15:50 +0000 (09:15 -0800)]
Bug-fixes to WikiTokenizer (handle weird line-feed), update to newest
enwiktionary.
Thad Hughes [Tue, 6 Mar 2012 00:51:01 +0000 (16:51 -0800)]
EnTranslationToTranslationParser
Thad Hughes [Tue, 6 Mar 2012 00:50:29 +0000 (16:50 -0800)]
Fixed combining marks on Unicode regexes.
Thad Hughes [Fri, 10 Feb 2012 19:43:15 +0000 (11:43 -0800)]
Unit tests working again after refactoring!!!
Thad Hughes [Fri, 10 Feb 2012 18:49:08 +0000 (10:49 -0800)]
Major en refactoring underway.
Thad Hughes [Fri, 10 Feb 2012 16:51:06 +0000 (08:51 -0800)]
Rename enwiktionary package to wiktionary.
Thad Hughes [Wed, 8 Feb 2012 23:49:42 +0000 (15:49 -0800)]
gitignore
Thad Hughes [Wed, 8 Feb 2012 23:48:54 +0000 (15:48 -0800)]
Point unit tests at new wikiSplit/en/.
Thad Hughes [Wed, 8 Feb 2012 23:45:40 +0000 (15:45 -0800)]
Split EN, DE, IT, FR wiktionaries! Fix splitting to use entire header
line (hopefully this works ok).
Thad Hughes [Sat, 4 Feb 2012 01:07:51 +0000 (17:07 -0800)]
Todo
Thad Hughes [Tue, 31 Jan 2012 23:06:26 +0000 (15:06 -0800)]
Fix test.
Thad Hughes [Tue, 31 Jan 2012 22:56:05 +0000 (14:56 -0800)]
Moved normalization, more tests.
Thad Hughes [Mon, 30 Jan 2012 06:08:52 +0000 (22:08 -0800)]
Stoplist, more languages...
Thad Hughes [Thu, 26 Jan 2012 00:03:03 +0000 (16:03 -0800)]
zipSize, overrideStoplist-> special isMainEntry, tagalog, trying to
count {{t}} but failing
Thad Hughes [Tue, 24 Jan 2012 05:33:01 +0000 (21:33 -0800)]
Added Urdu!
Thad Hughes [Fri, 20 Jan 2012 01:45:32 +0000 (17:45 -0800)]
Newlines in info message.
Thad Hughes [Tue, 17 Jan 2012 21:13:38 +0000 (13:13 -0800)]
Better DictionaryInfo, IndexBuilder counts main TokenRows.
Thad Hughes [Mon, 16 Jan 2012 19:43:27 +0000 (11:43 -0800)]
Wiktionary upgrade!
Thad Hughes [Mon, 16 Jan 2012 17:55:25 +0000 (09:55 -0800)]
DictionaryInfo has full file URL.
Thad Hughes [Mon, 16 Jan 2012 07:14:01 +0000 (23:14 -0800)]
2 types of TokenRow.
Merge branch 'master' of
https://code.google.com/p/quickdic-dictionary.dictionarypc
Conflicts:
src/com/hughes/android/dictionary/engine/DictionaryBuilderMain.java
todo.txt
Thad Hughes [Mon, 16 Jan 2012 00:08:07 +0000 (16:08 -0800)]
Changing the way dictionaries are indexed (listed), new type of TokenRow
(to distinguish major from minor entries).
Thad Hughes [Wed, 11 Jan 2012 22:14:29 +0000 (14:14 -0800)]
more downloads