2 *******************************************************************************
\r
3 * Copyright (C) 2005-2010, International Business Machines Corporation and *
\r
4 * others. All Rights Reserved. *
\r
5 *******************************************************************************
\r
7 package com.ibm.icu.text;
\r
10 * Abstract class for recognizing a single charset.
\r
11 * Part of the implementation of ICU's CharsetDetector.
\r
13 * Each specific charset that can be recognized will have an instance
\r
14 * of some subclass of this class. All interaction between the overall
\r
15 * CharsetDetector and the stuff specific to an individual charset happens
\r
16 * via the interface provided here.
\r
18 * Instances of CharsetDetector DO NOT have or maintain
\r
19 * state pertaining to a specific match or detect operation.
\r
20 * The WILL be shared by multiple instances of CharsetDetector.
\r
21 * They encapsulate const charset-specific information.
\r
23 abstract class CharsetRecognizer {
\r
25 * Get the IANA name of this charset.
\r
26 * @return the charset name.
\r
28 abstract String getName();
\r
31 * Get the ISO language code for this charset.
\r
32 * @return the language code, or <code>null</code> if the language cannot be determined.
\r
34 public String getLanguage()
\r
40 * Test the match of this charset with the input text data
\r
41 * which is obtained via the CharsetDetector object.
\r
43 * @param det The CharsetDetector, which contains the input text
\r
44 * to be checked for being in this charset.
\r
45 * @return Two values packed into one int (Damn java, anyhow)
\r
47 * bits 0-7: the match confidence, ranging from 0-100
\r
49 * bits 8-15: The match reason, an enum-like value.
\r
51 abstract int match(CharsetDetector det);
\r