2 *******************************************************************************
\r
3 * Copyright (C) 1996-2004, International Business Machines Corporation and *
\r
4 * others. All Rights Reserved. *
\r
5 *******************************************************************************
\r
7 package com.ibm.icu.text;
\r
10 * Interface that defines an API for forward-only iteration
\r
12 * This is a minimal interface for iteration without random access
\r
13 * or backwards iteration. It is especially useful for wrapping
\r
14 * streams with converters into an object for collation or
\r
17 * <p>Characters can be accessed in two ways: as code units or as
\r
19 * Unicode code points are 21-bit integers and are the scalar values
\r
20 * of Unicode characters. ICU uses the type <code>int</code> for them.
\r
21 * Unicode code units are the storage units of a given
\r
22 * Unicode/UCS Transformation Format (a character encoding scheme).
\r
23 * With UTF-16, all code points can be represented with either one
\r
24 * or two code units ("surrogates").
\r
25 * String storage is typically based on code units, while properties
\r
26 * of characters are typically determined using code point values.
\r
27 * Some processes may be designed to work with sequences of code units,
\r
28 * or it may be known that all characters that are important to an
\r
29 * algorithm can be represented with single code units.
\r
30 * Other processes will need to use the code point access functions.</p>
\r
32 * <p>ForwardCharacterIterator provides next() to access
\r
33 * a code unit and advance an internal position into the text object,
\r
34 * similar to a <code>return text[position++]</code>.<br>
\r
35 * It provides nextCodePoint() to access a code point and advance an internal
\r
38 * <p>nextCodePoint() assumes that the current position is that of
\r
39 * the beginning of a code point, i.e., of its first code unit.
\r
40 * After nextCodePoint(), this will be true again.
\r
41 * In general, access to code units and code points in the same
\r
42 * iteration loop should not be mixed. In UTF-16, if the current position
\r
43 * is on a second code unit (Low Surrogate), then only that code unit
\r
44 * is returned even by nextCodePoint().</p>
\r
48 * public void function1(UForwardCharacterIterator it) {
\r
50 * while((c=it.next())!=UForwardCharacterIterator.DONE) {
\r
60 public interface UForwardCharacterIterator {
\r
63 * Indicator that we have reached the ends of the UTF16 text.
\r
66 public static final int DONE = -1;
\r
68 * Returns the UTF16 code unit at index, and increments to the next
\r
69 * code unit (post-increment semantics). If index is out of
\r
70 * range, DONE is returned, and the iterator is reset to the limit
\r
72 * @return the next UTF16 code unit, or DONE if the index is at the limit
\r
79 * Returns the code point at index, and increments to the next code
\r
80 * point (post-increment semantics). If index does not point to a
\r
81 * valid surrogate pair, the behavior is the same as
\r
82 * <code>next()<code>. Otherwise the iterator is incremented past
\r
83 * the surrogate pair, and the code point represented by the pair
\r
85 * @return the next codepoint in text, or DONE if the index is at
\r
86 * the limit of the text.
\r
89 public int nextCodePoint();
\r