2 *******************************************************************************
3 * Copyright (C) 2001-2013, International Business Machines
4 * Corporation and others. All Rights Reserved.
5 *******************************************************************************
8 /* FOOD FOR THOUGHT: currently the reordering modes are a mixture of
9 * algorithm for direct BiDi, algorithm for inverse Bidi and the bizarre
10 * concept of RUNS_ONLY which is a double operation.
11 * It could be advantageous to divide this into 3 concepts:
12 * a) Operation: direct / inverse / RUNS_ONLY
13 * b) Direct algorithm: default / NUMBERS_SPECIAL / GROUP_NUMBERS_WITH_L
14 * c) Inverse algorithm: default / INVERSE_LIKE_DIRECT / NUMBERS_SPECIAL
15 * This would allow combinations not possible today like RUNS_ONLY with
17 * Also allow to set INSERT_MARKS for the direct step of RUNS_ONLY and
18 * REMOVE_CONTROLS for the inverse step.
19 * Not all combinations would be supported, and probably not all do make sense.
20 * This would need to document which ones are supported and what are the
21 * fallbacks for unsupported combinations.
24 //TODO: make sample program do something simple but real and complete
26 package com.ibm.icu.text;
28 import java.awt.font.NumericShaper;
29 import java.awt.font.TextAttribute;
30 import java.lang.reflect.Array;
31 import java.text.AttributedCharacterIterator;
32 import java.util.Arrays;
34 import com.ibm.icu.impl.UBiDiProps;
35 import com.ibm.icu.lang.UCharacter;
36 import com.ibm.icu.lang.UCharacterDirection;
37 import com.ibm.icu.lang.UProperty;
41 * <h2>Bidi algorithm for ICU</h2>
43 * This is an implementation of the Unicode Bidirectional Algorithm. The
44 * algorithm is defined in the <a
45 * href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>.
48 * Note: Libraries that perform a bidirectional algorithm and reorder strings
49 * accordingly are sometimes called "Storage Layout Engines". ICU's Bidi and
50 * shaping (ArabicShaping) classes can be used at the core of such "Storage
53 * <h3>General remarks about the API:</h3>
55 * The "limit" of a sequence of characters is the position just after
56 * their last character, i.e., one more than that position.
59 * Some of the API methods provide access to "runs". Such a
60 * "run" is defined as a sequence of characters that are at the same
61 * embedding level after performing the Bidi algorithm.
64 * <h3>Basic concept: paragraph</h3>
65 * A piece of text can be divided into several paragraphs by characters
66 * with the Bidi class <code>Block Separator</code>. For handling of
69 * <li>{@link #countParagraphs}
70 * <li>{@link #getParaLevel}
71 * <li>{@link #getParagraph}
72 * <li>{@link #getParagraphByIndex}
75 * <h3>Basic concept: text direction</h3>
76 * The direction of a piece of text may be:
81 * <li>{@link #NEUTRAL}
84 * <h3>Basic concept: levels</h3>
86 * Levels in this API represent embedding levels according to the Unicode
87 * Bidirectional Algorithm.
88 * Their low-order bit (even/odd value) indicates the visual direction.<p>
90 * Levels can be abstract values when used for the
91 * <code>paraLevel</code> and <code>embeddingLevels</code>
92 * arguments of <code>setPara()</code>; there:
94 * <li>the high-order bit of an <code>embeddingLevels[]</code>
95 * value indicates whether the using application is
96 * specifying the level of a character to <i>override</i> whatever the
97 * Bidi implementation would resolve it to.</li>
98 * <li><code>paraLevel</code> can be set to the
99 * pseudo-level values <code>LEVEL_DEFAULT_LTR</code>
100 * and <code>LEVEL_DEFAULT_RTL</code>.</li>
103 * <p>The related constants are not real, valid level values.
104 * <code>DEFAULT_XXX</code> can be used to specify
105 * a default for the paragraph level for
106 * when the <code>setPara()</code> method
107 * shall determine it but there is no
108 * strongly typed character in the input.<p>
110 * Note that the value for <code>LEVEL_DEFAULT_LTR</code> is even
111 * and the one for <code>LEVEL_DEFAULT_RTL</code> is odd,
112 * just like with normal LTR and RTL level values -
113 * these special values are designed that way. Also, the implementation
114 * assumes that MAX_EXPLICIT_LEVEL is odd.
116 * <ul><b>See Also:</b>
117 * <li>{@link #LEVEL_DEFAULT_LTR}
118 * <li>{@link #LEVEL_DEFAULT_RTL}
119 * <li>{@link #LEVEL_OVERRIDE}
120 * <li>{@link #MAX_EXPLICIT_LEVEL}
121 * <li>{@link #setPara}
124 * <h3>Basic concept: Reordering Mode</h3>
125 * Reordering mode values indicate which variant of the Bidi algorithm to
128 * <ul><b>See Also:</b>
129 * <li>{@link #setReorderingMode}
130 * <li>{@link #REORDER_DEFAULT}
131 * <li>{@link #REORDER_NUMBERS_SPECIAL}
132 * <li>{@link #REORDER_GROUP_NUMBERS_WITH_R}
133 * <li>{@link #REORDER_RUNS_ONLY}
134 * <li>{@link #REORDER_INVERSE_NUMBERS_AS_L}
135 * <li>{@link #REORDER_INVERSE_LIKE_DIRECT}
136 * <li>{@link #REORDER_INVERSE_FOR_NUMBERS_SPECIAL}
139 * <h3>Basic concept: Reordering Options</h3>
140 * Reordering options can be applied during Bidi text transformations.
141 * <ul><b>See Also:</b>
142 * <li>{@link #setReorderingOptions}
143 * <li>{@link #OPTION_DEFAULT}
144 * <li>{@link #OPTION_INSERT_MARKS}
145 * <li>{@link #OPTION_REMOVE_CONTROLS}
146 * <li>{@link #OPTION_STREAMING}
150 * @author Simon Montagu, Matitiahu Allouche (ported from C code written by Markus W. Scherer)
154 * <h4> Sample code for the ICU Bidi API </h4>
156 * <h5>Rendering a paragraph with the ICU Bidi API</h5>
158 * This is (hypothetical) sample code that illustrates how the ICU Bidi API
159 * could be used to render a paragraph of text. Rendering code depends highly on
160 * the graphics system, therefore this sample code must make a lot of
161 * assumptions, which may or may not match any existing graphics system's
165 * The basic assumptions are:
168 * <li>Rendering is done from left to right on a horizontal line.</li>
169 * <li>A run of single-style, unidirectional text can be rendered at once.
171 * <li>Such a run of text is passed to the graphics system with characters
172 * (code units) in logical order.</li>
173 * <li>The line-breaking algorithm is very complicated and Locale-dependent -
174 * and therefore its implementation omitted from this sample code.</li>
179 * package com.ibm.icu.dev.test.bidi;
181 * import com.ibm.icu.text.Bidi;
182 * import com.ibm.icu.text.BidiRun;
184 * public class Sample {
186 * static final int styleNormal = 0;
187 * static final int styleSelected = 1;
188 * static final int styleBold = 2;
189 * static final int styleItalics = 4;
190 * static final int styleSuper=8;
191 * static final int styleSub = 16;
193 * static class StyleRun {
197 * public StyleRun(int limit, int style) {
198 * this.limit = limit;
199 * this.style = style;
203 * static class Bounds {
207 * public Bounds(int start, int limit) {
208 * this.start = start;
209 * this.limit = limit;
213 * static int getTextWidth(String text, int start, int limit,
214 * StyleRun[] styleRuns, int styleRunCount) {
215 * // simplistic way to compute the width
216 * return limit - start;
219 * // set limit and StyleRun limit for a line
220 * // from text[start] and from styleRuns[styleRunStart]
221 * // using Bidi.getLogicalRun(...)
222 * // returns line width
223 * static int getLineBreak(String text, Bounds line, Bidi para,
224 * StyleRun styleRuns[], Bounds styleRun) {
229 * // render runs on a line sequentially, always from left to right
231 * // prepare rendering a new line
232 * static void startLine(byte textDirection, int lineWidth) {
233 * System.out.println();
236 * // render a run of text and advance to the right by the run width
237 * // the text[start..limit-1] is always in logical order
238 * static void renderRun(String text, int start, int limit,
239 * byte textDirection, int style) {
242 * // We could compute a cross-product
243 * // from the style runs with the directional runs
244 * // and then reorder it.
245 * // Instead, here we iterate over each run type
246 * // and render the intersections -
247 * // with shortcuts in simple (and common) cases.
248 * // renderParagraph() is the main function.
250 * // render a directional run with
251 * // (possibly) multiple style runs intersecting with it
252 * static void renderDirectionalRun(String text, int start, int limit,
253 * byte direction, StyleRun styleRuns[],
254 * int styleRunCount) {
257 * // iterate over style runs
258 * if (direction == Bidi.LTR) {
260 * for (i = 0; i < styleRunCount; ++i) {
261 * styleLimit = styleRuns[i].limit;
262 * if (start < styleLimit) {
263 * if (styleLimit > limit) {
264 * styleLimit = limit;
266 * renderRun(text, start, styleLimit,
267 * direction, styleRuns[i].style);
268 * if (styleLimit == limit) {
271 * start = styleLimit;
277 * for (i = styleRunCount-1; i >= 0; --i) {
279 * styleStart = styleRuns[i-1].limit;
283 * if (limit >= styleStart) {
284 * if (styleStart < start) {
285 * styleStart = start;
287 * renderRun(text, styleStart, limit, direction,
288 * styleRuns[i].style);
289 * if (styleStart == start) {
292 * limit = styleStart;
298 * // the line object represents text[start..limit-1]
299 * static void renderLine(Bidi line, String text, int start, int limit,
300 * StyleRun styleRuns[], int styleRunCount) {
301 * byte direction = line.getDirection();
302 * if (direction != Bidi.MIXED) {
304 * if (styleRunCount <= 1) {
305 * renderRun(text, start, limit, direction, styleRuns[0].style);
307 * renderDirectionalRun(text, start, limit, direction,
308 * styleRuns, styleRunCount);
311 * // mixed-directional
316 * count = line.countRuns();
317 * } catch (IllegalStateException e) {
318 * e.printStackTrace();
321 * if (styleRunCount <= 1) {
322 * int style = styleRuns[0].style;
324 * // iterate over directional runs
325 * for (i = 0; i < count; ++i) {
326 * run = line.getVisualRun(i);
327 * renderRun(text, run.getStart(), run.getLimit(),
328 * run.getDirection(), style);
331 * // iterate over both directional and style runs
332 * for (i = 0; i < count; ++i) {
333 * run = line.getVisualRun(i);
334 * renderDirectionalRun(text, run.getStart(),
335 * run.getLimit(), run.getDirection(),
336 * styleRuns, styleRunCount);
342 * static void renderParagraph(String text, byte textDirection,
343 * StyleRun styleRuns[], int styleRunCount,
345 * int length = text.length();
346 * Bidi para = new Bidi();
349 * textDirection != 0 ? Bidi.LEVEL_DEFAULT_RTL
350 * : Bidi.LEVEL_DEFAULT_LTR,
352 * } catch (Exception e) {
353 * e.printStackTrace();
356 * byte paraLevel = (byte)(1 & para.getParaLevel());
357 * StyleRun styleRun = new StyleRun(length, styleNormal);
359 * if (styleRuns == null || styleRunCount <= 0) {
360 * styleRuns = new StyleRun[1];
362 * styleRuns[0] = styleRun;
364 * // assume styleRuns[styleRunCount-1].limit>=length
366 * int width = getTextWidth(text, 0, length, styleRuns, styleRunCount);
367 * if (width <= lineWidth) {
368 * // everything fits onto one line
370 * // prepare rendering a new line from either left or right
371 * startLine(paraLevel, width);
373 * renderLine(para, text, 0, length, styleRuns, styleRunCount);
375 * // we need to render several lines
376 * Bidi line = new Bidi(length, 0);
377 * int start = 0, limit;
378 * int styleRunStart = 0, styleRunLimit;
382 * styleRunLimit = styleRunCount;
383 * width = getLineBreak(text, new Bounds(start, limit),
385 * new Bounds(styleRunStart, styleRunLimit));
387 * line = para.setLine(start, limit);
388 * } catch (Exception e) {
389 * e.printStackTrace();
392 * // prepare rendering a new line
393 * // from either left or right
394 * startLine(paraLevel, width);
396 * if (styleRunStart > 0) {
397 * int newRunCount = styleRuns.length - styleRunStart;
398 * StyleRun[] newRuns = new StyleRun[newRunCount];
399 * System.arraycopy(styleRuns, styleRunStart, newRuns, 0,
401 * renderLine(line, text, start, limit, newRuns,
402 * styleRunLimit - styleRunStart);
404 * renderLine(line, text, start, limit, styleRuns,
405 * styleRunLimit - styleRunStart);
407 * if (limit == length) {
411 * styleRunStart = styleRunLimit - 1;
412 * if (start >= styleRuns[styleRunStart].limit) {
419 * public static void main(String[] args)
421 * renderParagraph("Some Latin text...", Bidi.LTR, null, 0, 80);
422 * renderParagraph("Some Hebrew text...", Bidi.RTL, null, 0, 60);
430 * General implementation notes:
432 * Throughout the implementation, there are comments like (W2) that refer to
433 * rules of the BiDi algorithm in its version 5, in this example to the second
434 * rule of the resolution of weak types.
436 * For handling surrogate pairs, where two UChar's form one "abstract" (or UTF-32)
437 * character according to UTF-16, the second UChar gets the directional property of
438 * the entire character assigned, while the first one gets a BN, a boundary
439 * neutral, type, which is ignored by most of the algorithm according to
440 * rule (X9) and the implementation suggestions of the BiDi algorithm.
442 * Later, adjustWSLevels() will set the level for each BN to that of the
443 * following character (UChar), which results in surrogate pairs getting the
444 * same level on each of their surrogates.
446 * In a UTF-8 implementation, the same thing could be done: the last byte of
447 * a multi-byte sequence would get the "real" property, while all previous
448 * bytes of that sequence would get BN.
450 * It is not possible to assign all those parts of a character the same real
451 * property because this would fail in the resolution of weak types with rules
452 * that look at immediately surrounding types.
454 * As a related topic, this implementation does not remove Boundary Neutral
455 * types from the input, but ignores them wherever this is relevant.
456 * For example, the loop for the resolution of the weak types reads
457 * types until it finds a non-BN.
458 * Also, explicit embedding codes are neither changed into BN nor removed.
459 * They are only treated the same way real BNs are.
460 * As stated before, adjustWSLevels() takes care of them at the end.
461 * For the purpose of conformance, the levels of all these codes
464 * Note that this implementation never modifies the dirProps
465 * after the initial setup, except for FSI which is changed to either
466 * LRI or RLI in getDirProps(), and paired brackets which may be changed
467 * to L or R according to N0.
470 * In this implementation, the resolution of weak types (Wn),
471 * neutrals (Nn), and the assignment of the resolved level (In)
472 * are all done in one single loop, in resolveImplicitLevels().
473 * Changes of dirProp values are done on the fly, without writing
474 * them back to the dirProps array.
477 * This implementation contains code that allows to bypass steps of the
478 * algorithm that are not needed on the specific paragraph
479 * in order to speed up the most common cases considerably,
480 * like text that is entirely LTR, or RTL text without numbers.
482 * Most of this is done by setting a bit for each directional property
483 * in a flags variable and later checking for whether there are
484 * any LTR characters or any RTL characters, or both, whether
485 * there are any explicit embedding codes, etc.
487 * If the (Xn) steps are performed, then the flags are re-evaluated,
488 * because they will then not contain the embedding codes any more
489 * and will be adjusted for override codes, so that subsequently
490 * more bypassing may be possible than what the initial flags suggested.
492 * If the text is not mixed-directional, then the
493 * algorithm steps for the weak type resolution are not performed,
494 * and all levels are set to the paragraph level.
496 * If there are no explicit embedding codes, then the (Xn) steps
499 * If embedding levels are supplied as a parameter, then all
500 * explicit embedding codes are ignored, and the (Xn) steps
503 * White Space types could get the level of the run they belong to,
504 * and are checked with a test of (flags&MASK_EMBEDDING) to
505 * consider if the paragraph direction should be considered in
506 * the flags variable.
508 * If there are no White Space types in the paragraph, then
509 * (L1) is not necessary in adjustWSLevels().
515 int pos; /* position in text */
516 int flag; /* flag for LRM/RLM, before/after */
519 static class InsertPoints {
522 Point[] points = new Point[0];
525 static class Opening {
526 int position; /* position of opening bracket */
527 int match; /* matching char or -position of closing bracket */
528 int contextPos; /* position of last strong char found before opening */
529 short flags; /* bits for L or R/AL found within the pair */
530 byte contextDir; /* L or R according to last strong char before opening */
531 byte filler; /* to complete a nice multiple of 4 bytes */
534 static class IsoRun {
535 int lastStrongPos; /* position of last strong char found in this run */
536 int contextPos; /* position of last char defining context */
537 short start; /* index of first opening entry for this run */
538 short limit; /* index after last opening entry for this run */
539 byte level; /* level of this run */
540 byte lastStrong; /* bidi class of last strong char found in this run */
541 byte contextDir; /* L or R to use as context for following openings */
542 byte filler; /* to complete a nice multiple of 4 bytes */
545 static class BracketData {
546 Opening[] openings = new Opening[SIMPLE_OPENINGS_SIZE];
547 int isoRunLast; /* index of last used entry */
548 /* array of nested isolated sequence entries; can never excess UBIDI_MAX_EXPLICIT_LEVEL
549 + 1 for index 0, + 1 for before the first isolated sequence */
550 IsoRun[] isoRuns = new IsoRun[MAX_EXPLICIT_LEVEL+2];
551 boolean isNumbersSpecial; /*reordering mode for NUMBERS_SPECIAL */
554 static class Isolate {
560 /** Paragraph level setting<p>
562 * Constant indicating that the base direction depends on the first strong
563 * directional character in the text according to the Unicode Bidirectional
564 * Algorithm. If no strong directional character is present,
565 * then set the paragraph level to 0 (left-to-right).<p>
567 * If this value is used in conjunction with reordering modes
568 * <code>REORDER_INVERSE_LIKE_DIRECT</code> or
569 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the text to reorder
570 * is assumed to be visual LTR, and the text after reordering is required
571 * to be the corresponding logical string with appropriate contextual
572 * direction. The direction of the result string will be RTL if either
573 * the rightmost or leftmost strong character of the source text is RTL
574 * or Arabic Letter, the direction will be LTR otherwise.<p>
576 * If reordering option <code>OPTION_INSERT_MARKS</code> is set, an RLM may
577 * be added at the beginning of the result string to ensure round trip
578 * (that the result string, when reordered back to visual, will produce
579 * the original source text).
580 * @see #REORDER_INVERSE_LIKE_DIRECT
581 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL
584 public static final byte LEVEL_DEFAULT_LTR = (byte)0x7e;
586 /** Paragraph level setting<p>
588 * Constant indicating that the base direction depends on the first strong
589 * directional character in the text according to the Unicode Bidirectional
590 * Algorithm. If no strong directional character is present,
591 * then set the paragraph level to 1 (right-to-left).<p>
593 * If this value is used in conjunction with reordering modes
594 * <code>REORDER_INVERSE_LIKE_DIRECT</code> or
595 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the text to reorder
596 * is assumed to be visual LTR, and the text after reordering is required
597 * to be the corresponding logical string with appropriate contextual
598 * direction. The direction of the result string will be RTL if either
599 * the rightmost or leftmost strong character of the source text is RTL
600 * or Arabic Letter, or if the text contains no strong character;
601 * the direction will be LTR otherwise.<p>
603 * If reordering option <code>OPTION_INSERT_MARKS</code> is set, an RLM may
604 * be added at the beginning of the result string to ensure round trip
605 * (that the result string, when reordered back to visual, will produce
606 * the original source text).
607 * @see #REORDER_INVERSE_LIKE_DIRECT
608 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL
611 public static final byte LEVEL_DEFAULT_RTL = (byte)0x7f;
614 * Maximum explicit embedding level.
615 * (The maximum resolved level can be up to <code>MAX_EXPLICIT_LEVEL+1</code>).
618 public static final byte MAX_EXPLICIT_LEVEL = 125;
621 * Bit flag for level input.
622 * Overrides directional properties.
625 public static final byte LEVEL_OVERRIDE = (byte)0x80;
628 * Special value which can be returned by the mapping methods when a
629 * logical index has no corresponding visual index or vice-versa. This may
630 * happen for the logical-to-visual mapping of a Bidi control when option
631 * <code>OPTION_REMOVE_CONTROLS</code> is
632 * specified. This can also happen for the visual-to-logical mapping of a
633 * Bidi mark (LRM or RLM) inserted by option
634 * <code>OPTION_INSERT_MARKS</code>.
635 * @see #getVisualIndex
637 * @see #getLogicalIndex
638 * @see #getLogicalMap
639 * @see #OPTION_INSERT_MARKS
640 * @see #OPTION_REMOVE_CONTROLS
643 public static final int MAP_NOWHERE = -1;
646 * Left-to-right text.
648 * <li>As return value for <code>getDirection()</code>, it means
649 * that the source string contains no right-to-left characters, or
650 * that the source string is empty and the paragraph level is even.
651 * <li>As return value for <code>getBaseDirection()</code>, it
652 * means that the first strong character of the source string has
653 * a left-to-right direction.
657 public static final byte LTR = 0;
660 * Right-to-left text.
662 * <li>As return value for <code>getDirection()</code>, it means
663 * that the source string contains no left-to-right characters, or
664 * that the source string is empty and the paragraph level is odd.
665 * <li>As return value for <code>getBaseDirection()</code>, it
666 * means that the first strong character of the source string has
667 * a right-to-left direction.
671 public static final byte RTL = 1;
674 * Mixed-directional text.
675 * <p>As return value for <code>getDirection()</code>, it means
676 * that the source string contains both left-to-right and
677 * right-to-left characters.
680 public static final byte MIXED = 2;
683 * No strongly directional text.
684 * <p>As return value for <code>getBaseDirection()</code>, it means
685 * that the source string is missing or empty, or contains neither
686 * left-to-right nor right-to-left characters.
689 public static final byte NEUTRAL = 3;
692 * option bit for writeReordered():
693 * keep combining characters after their base characters in RTL runs
695 * @see #writeReordered
698 public static final short KEEP_BASE_COMBINING = 1;
701 * option bit for writeReordered():
702 * replace characters with the "mirrored" property in RTL runs
703 * by their mirror-image mappings
705 * @see #writeReordered
708 public static final short DO_MIRRORING = 2;
711 * option bit for writeReordered():
712 * surround the run with LRMs if necessary;
713 * this is part of the approximate "inverse Bidi" algorithm
715 * <p>This option does not imply corresponding adjustment of the index
719 * @see #writeReordered
722 public static final short INSERT_LRM_FOR_NUMERIC = 4;
725 * option bit for writeReordered():
726 * remove Bidi control characters
727 * (this does not affect INSERT_LRM_FOR_NUMERIC)
729 * <p>This option does not imply corresponding adjustment of the index
732 * @see #writeReordered
733 * @see #INSERT_LRM_FOR_NUMERIC
736 public static final short REMOVE_BIDI_CONTROLS = 8;
739 * option bit for writeReordered():
740 * write the output in reverse order
742 * <p>This has the same effect as calling <code>writeReordered()</code>
743 * first without this option, and then calling
744 * <code>writeReverse()</code> without mirroring.
745 * Doing this in the same step is faster and avoids a temporary buffer.
746 * An example for using this option is output to a character terminal that
747 * is designed for RTL scripts and stores text in reverse order.</p>
749 * @see #writeReordered
752 public static final short OUTPUT_REVERSE = 16;
754 /** Reordering mode: Regular Logical to Visual Bidi algorithm according to Unicode.
755 * @see #setReorderingMode
758 public static final short REORDER_DEFAULT = 0;
760 /** Reordering mode: Logical to Visual algorithm which handles numbers in
761 * a way which mimicks the behavior of Windows XP.
762 * @see #setReorderingMode
765 public static final short REORDER_NUMBERS_SPECIAL = 1;
767 /** Reordering mode: Logical to Visual algorithm grouping numbers with
768 * adjacent R characters (reversible algorithm).
769 * @see #setReorderingMode
772 public static final short REORDER_GROUP_NUMBERS_WITH_R = 2;
774 /** Reordering mode: Reorder runs only to transform a Logical LTR string
775 * to the logical RTL string with the same display, or vice-versa.<br>
776 * If this mode is set together with option
777 * <code>OPTION_INSERT_MARKS</code>, some Bidi controls in the source
778 * text may be removed and other controls may be added to produce the
779 * minimum combination which has the required display.
780 * @see #OPTION_INSERT_MARKS
781 * @see #setReorderingMode
784 public static final short REORDER_RUNS_ONLY = 3;
786 /** Reordering mode: Visual to Logical algorithm which handles numbers
787 * like L (same algorithm as selected by <code>setInverse(true)</code>.
789 * @see #setReorderingMode
792 public static final short REORDER_INVERSE_NUMBERS_AS_L = 4;
794 /** Reordering mode: Visual to Logical algorithm equivalent to the regular
795 * Logical to Visual algorithm.
796 * @see #setReorderingMode
799 public static final short REORDER_INVERSE_LIKE_DIRECT = 5;
801 /** Reordering mode: Inverse Bidi (Visual to Logical) algorithm for the
802 * <code>REORDER_NUMBERS_SPECIAL</code> Bidi algorithm.
803 * @see #setReorderingMode
806 public static final short REORDER_INVERSE_FOR_NUMBERS_SPECIAL = 6;
808 /* Number of values for reordering mode. */
809 static final short REORDER_COUNT = 7;
811 /* Reordering mode values must be ordered so that all the regular logical to
812 * visual modes come first, and all inverse Bidi modes come last.
814 static final short REORDER_LAST_LOGICAL_TO_VISUAL =
815 REORDER_NUMBERS_SPECIAL;
818 * Option value for <code>setReorderingOptions</code>:
819 * disable all the options which can be set with this method
820 * @see #setReorderingOptions
823 public static final int OPTION_DEFAULT = 0;
826 * Option bit for <code>setReorderingOptions</code>:
827 * insert Bidi marks (LRM or RLM) when needed to ensure correct result of
828 * a reordering to a Logical order
830 * <p>This option must be set or reset before calling
831 * <code>setPara</code>.</p>
833 * <p>This option is significant only with reordering modes which generate
834 * a result with Logical order, specifically.</p>
836 * <li><code>REORDER_RUNS_ONLY</code></li>
837 * <li><code>REORDER_INVERSE_NUMBERS_AS_L</code></li>
838 * <li><code>REORDER_INVERSE_LIKE_DIRECT</code></li>
839 * <li><code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code></li>
842 * <p>If this option is set in conjunction with reordering mode
843 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> or with calling
844 * <code>setInverse(true)</code>, it implies option
845 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method
846 * <code>writeReordered()</code>.</p>
848 * <p>For other reordering modes, a minimum number of LRM or RLM characters
849 * will be added to the source text after reordering it so as to ensure
850 * round trip, i.e. when applying the inverse reordering mode on the
851 * resulting logical text with removal of Bidi marks
852 * (option <code>OPTION_REMOVE_CONTROLS</code> set before calling
853 * <code>setPara()</code> or option
854 * <code>REMOVE_BIDI_CONTROLS</code> in
855 * <code>writeReordered</code>), the result will be identical to the
856 * source text in the first transformation.
858 * <p>This option will be ignored if specified together with option
859 * <code>OPTION_REMOVE_CONTROLS</code>. It inhibits option
860 * <code>REMOVE_BIDI_CONTROLS</code> in calls to method
861 * <code>writeReordered()</code> and it implies option
862 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method
863 * <code>writeReordered()</code> if the reordering mode is
864 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>.</p>
866 * @see #setReorderingMode
867 * @see #setReorderingOptions
868 * @see #INSERT_LRM_FOR_NUMERIC
869 * @see #REMOVE_BIDI_CONTROLS
870 * @see #OPTION_REMOVE_CONTROLS
871 * @see #REORDER_RUNS_ONLY
872 * @see #REORDER_INVERSE_NUMBERS_AS_L
873 * @see #REORDER_INVERSE_LIKE_DIRECT
874 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL
877 public static final int OPTION_INSERT_MARKS = 1;
880 * Option bit for <code>setReorderingOptions</code>:
881 * remove Bidi control characters
883 * <p>This option must be set or reset before calling
884 * <code>setPara</code>.</p>
886 * <p>This option nullifies option
887 * <code>OPTION_INSERT_MARKS</code>. It inhibits option
888 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method
889 * <code>writeReordered()</code> and it implies option
890 * <code>REMOVE_BIDI_CONTROLS</code> in calls to that method.</p>
892 * @see #setReorderingMode
893 * @see #setReorderingOptions
894 * @see #OPTION_INSERT_MARKS
895 * @see #INSERT_LRM_FOR_NUMERIC
896 * @see #REMOVE_BIDI_CONTROLS
899 public static final int OPTION_REMOVE_CONTROLS = 2;
902 * Option bit for <code>setReorderingOptions</code>:
903 * process the output as part of a stream to be continued
905 * <p>This option must be set or reset before calling
906 * <code>setPara</code>.</p>
908 * <p>This option specifies that the caller is interested in processing
909 * large text object in parts. The results of the successive calls are
910 * expected to be concatenated by the caller. Only the call for the last
911 * part will have this option bit off.</p>
913 * <p>When this option bit is on, <code>setPara()</code> may process
914 * less than the full source text in order to truncate the text at a
915 * meaningful boundary. The caller should call
916 * <code>getProcessedLength()</code> immediately after calling
917 * <code>setPara()</code> in order to determine how much of the source
918 * text has been processed. Source text beyond that length should be
919 * resubmitted in following calls to <code>setPara</code>. The
920 * processed length may be less than the length of the source text if a
921 * character preceding the last character of the source text constitutes a
922 * reasonable boundary (like a block separator) for text to be continued.<br>
923 * If the last character of the source text constitutes a reasonable
924 * boundary, the whole text will be processed at once.<br>
925 * If nowhere in the source text there exists
926 * such a reasonable boundary, the processed length will be zero.<br>
927 * The caller should check for such an occurrence and do one of the following:
928 * <ul><li>submit a larger amount of text with a better chance to include
929 * a reasonable boundary.</li>
930 * <li>resubmit the same text after turning off option
931 * <code>OPTION_STREAMING</code>.</li></ul>
932 * In all cases, this option should be turned off before processing the last
933 * part of the text.</p>
935 * <p>When the <code>OPTION_STREAMING</code> option is used, it is
936 * recommended to call <code>orderParagraphsLTR(true)</code> before calling
937 * <code>setPara()</code> so that later paragraphs may be concatenated to
938 * previous paragraphs on the right.
941 * @see #setReorderingMode
942 * @see #setReorderingOptions
943 * @see #getProcessedLength
946 public static final int OPTION_STREAMING = 4;
949 * Comparing the description of the Bidi algorithm with this implementation
950 * is easier with the same names for the Bidi types in the code as there.
951 * See UCharacterDirection
953 static final byte L = UCharacterDirection.LEFT_TO_RIGHT; /* 0 */
954 static final byte R = UCharacterDirection.RIGHT_TO_LEFT; /* 1 */
955 static final byte EN = UCharacterDirection.EUROPEAN_NUMBER; /* 2 */
956 static final byte ES = UCharacterDirection.EUROPEAN_NUMBER_SEPARATOR; /* 3 */
957 static final byte ET = UCharacterDirection.EUROPEAN_NUMBER_TERMINATOR; /* 4 */
958 static final byte AN = UCharacterDirection.ARABIC_NUMBER; /* 5 */
959 static final byte CS = UCharacterDirection.COMMON_NUMBER_SEPARATOR; /* 6 */
960 static final byte B = UCharacterDirection.BLOCK_SEPARATOR; /* 7 */
961 static final byte S = UCharacterDirection.SEGMENT_SEPARATOR; /* 8 */
962 static final byte WS = UCharacterDirection.WHITE_SPACE_NEUTRAL; /* 9 */
963 static final byte ON = UCharacterDirection.OTHER_NEUTRAL; /* 10 */
964 static final byte LRE = UCharacterDirection.LEFT_TO_RIGHT_EMBEDDING; /* 11 */
965 static final byte LRO = UCharacterDirection.LEFT_TO_RIGHT_OVERRIDE; /* 12 */
966 static final byte AL = UCharacterDirection.RIGHT_TO_LEFT_ARABIC; /* 13 */
967 static final byte RLE = UCharacterDirection.RIGHT_TO_LEFT_EMBEDDING; /* 14 */
968 static final byte RLO = UCharacterDirection.RIGHT_TO_LEFT_OVERRIDE; /* 15 */
969 static final byte PDF = UCharacterDirection.POP_DIRECTIONAL_FORMAT; /* 16 */
970 static final byte NSM = UCharacterDirection.DIR_NON_SPACING_MARK; /* 17 */
971 static final byte BN = UCharacterDirection.BOUNDARY_NEUTRAL; /* 18 */
972 static final byte FSI = UCharacterDirection.FIRST_STRONG_ISOLATE; /* 19 */
973 static final byte LRI = UCharacterDirection.LEFT_TO_RIGHT_ISOLATE; /* 20 */
974 static final byte RLI = UCharacterDirection.RIGHT_TO_LEFT_ISOLATE; /* 21 */
975 static final byte PDI = UCharacterDirection.POP_DIRECTIONAL_ISOLATE; /* 22 */
976 static final byte ENL = PDI + 1; /* 23 */
977 static final byte ENR = ENL + 1; /* 24 */
980 * Value returned by <code>BidiClassifier</code> when there is no need to
981 * override the standard Bidi class for a given code point.
982 * @see BidiClassifier
985 public static final int CLASS_DEFAULT = UCharacterDirection
986 .CHAR_DIRECTION_COUNT;
988 /* number of paras entries allocated initially */
989 static final int SIMPLE_PARAS_SIZE = 10;
990 /* number of isolate run entries for paired brackets allocated initially */
991 static final int SIMPLE_OPENINGS_SIZE = 20;
993 private static final char CR = '\r';
994 private static final char LF = '\n';
996 static final int LRM_BEFORE = 1;
997 static final int LRM_AFTER = 2;
998 static final int RLM_BEFORE = 4;
999 static final int RLM_AFTER = 8;
1001 /* flags for Opening.flags */
1002 static final byte FOUND_L = (byte)DirPropFlag(L);
1003 static final byte FOUND_R = (byte)DirPropFlag(R);
1006 * The following bit is ORed to the property of directional control
1007 * characters which are ignored: unmatched PDF or PDI; LRx, RLx or FSI
1008 * which would exceed the maximum explicit bidi level.
1010 static final int IGNORE_CC = 0x40;
1013 * The following bit is used for the directional isolate status.
1014 * Stack entries corresponding to isolate sequences are greater than ISOLATE.
1016 static final int ISOLATE = 0x0100;
1020 * reference to parent paragraph object (reference to self if this object is
1021 * a paragraph object); set to null in a newly opened object; set to a
1022 * real value after a successful execution of setPara or setLine
1026 final UBiDiProps bdp;
1028 /* character array representing the current text */
1031 /* length of the current text */
1034 /* if the option OPTION_STREAMING is set, this is the length of
1035 * text actually processed by <code>setPara</code>, which may be shorter
1036 * than the original length. Otherwise, it is identical to the original
1041 /* if option OPTION_REMOVE_CONTROLS is set, and/or Bidi
1042 * marks are allowed to be inserted in one of the reordering modes, the
1043 * length of the result string may be different from the processed length.
1047 /* indicators for whether memory may be allocated after construction */
1048 boolean mayAllocateText;
1049 boolean mayAllocateRuns;
1051 /* arrays with one value per text-character */
1052 byte[] dirPropsMemory = new byte[1];
1053 byte[] levelsMemory = new byte[1];
1057 /* are we performing an approximation of the "inverse Bidi" algorithm? */
1060 /* are we using the basic algorithm or its variation? */
1063 /* bitmask for reordering options */
1064 int reorderingOptions;
1066 /* must block separators receive level 0? */
1067 boolean orderParagraphsLTR;
1069 /* the paragraph level */
1071 /* original paraLevel when contextual */
1072 /* must be one of DEFAULT_xxx or 0 if not contextual */
1073 byte defaultParaLevel;
1079 /* the following is set in setPara, used in processPropertySeq */
1081 ImpTabPair impTabPair; /* reference to levels state table pair */
1082 /* the overall paragraph or line directionality*/
1085 /* flags is a bit set for which directional properties are in the text */
1088 /* lastArabicPos is index to the last AL in the text, -1 if none */
1091 /* characters after trailingWSStart are WS and are */
1092 /* implicitly at the paraLevel (rule (L1)) - levels may not reflect that */
1093 int trailingWSStart;
1095 /* fields for paragraph handling, set in getDirProps() */
1097 int[] paras_limit = new int[SIMPLE_PARAS_SIZE];
1098 byte[] paras_level = new byte[SIMPLE_PARAS_SIZE];
1100 /* fields for line reordering */
1101 int runCount; /* ==-1: runs not set up yet */
1102 BidiRun[] runsMemory = new BidiRun[0];
1105 /* for non-mixed text, we only need a tiny array of runs (no allocation) */
1106 BidiRun[] simpleRuns = {new BidiRun()};
1108 /* fields for managing isolate sequences */
1110 /* maximum or current nesting depth of isolate sequences */
1111 /* Within resolveExplicitLevels() and checkExplicitLevels(), this is the maximal
1112 nesting encountered.
1113 Within resolveImplicitLevels(), this is the index of the current isolates
1117 /* mapping of runs in logical order to visual order */
1118 int[] logicalToVisualRunsMap;
1119 /* flag to indicate that the map has been updated */
1120 boolean isGoodLogicalToVisualRunsMap;
1122 /* customized class provider */
1123 BidiClassifier customClassifier = null;
1125 /* for inverse Bidi with insertion of directional marks */
1126 InsertPoints insertPoints = new InsertPoints();
1128 /* for option OPTION_REMOVE_CONTROLS */
1132 * Sometimes, bit values are more appropriate
1133 * to deal with directionality properties.
1134 * Abbreviations in these method names refer to names
1135 * used in the Bidi algorithm.
1137 static int DirPropFlag(byte dir) {
1141 static byte PureDirProp(byte prop) {
1142 return (byte)(prop & ~IGNORE_CC);
1145 boolean testDirPropFlagAt(int flag, int index) {
1146 return ((DirPropFlag(dirProps[index]) & flag) != 0);
1149 static final int DirPropFlagMultiRuns = DirPropFlag((byte)31);
1151 /* to avoid some conditional statements, use tiny constant arrays */
1152 static final int DirPropFlagLR[] = { DirPropFlag(L), DirPropFlag(R) };
1153 static final int DirPropFlagE[] = { DirPropFlag(LRE), DirPropFlag(RLE) };
1154 static final int DirPropFlagO[] = { DirPropFlag(LRO), DirPropFlag(RLO) };
1156 static final int DirPropFlagLR(byte level) { return DirPropFlagLR[level & 1]; }
1157 static final int DirPropFlagE(byte level) { return DirPropFlagE[level & 1]; }
1158 static final int DirPropFlagO(byte level) { return DirPropFlagO[level & 1]; }
1159 static final byte DirFromStrong(byte strong) { return strong == L ? L : R; }
1161 /* are there any characters that are LTR or RTL? */
1162 static final int MASK_LTR =
1163 DirPropFlag(L)|DirPropFlag(EN)|DirPropFlag(AN)|DirPropFlag(LRE)|DirPropFlag(LRO)|DirPropFlag(LRI);
1164 static final int MASK_RTL = DirPropFlag(R)|DirPropFlag(AL)|DirPropFlag(RLE)|DirPropFlag(RLO)|DirPropFlag(RLI);
1166 static final int MASK_R_AL = DirPropFlag(R)|DirPropFlag(AL);
1167 static final int MASK_STRONG_EN_AN = DirPropFlag(L)|DirPropFlag(R)|DirPropFlag(AL)|DirPropFlag(EN)|DirPropFlag(AN);
1168 /* explicit embedding codes */
1169 static final int MASK_EXPLICIT = DirPropFlag(LRE)|DirPropFlag(LRO)|DirPropFlag(RLE)|DirPropFlag(RLO)|DirPropFlag(PDF);
1170 static final int MASK_BN_EXPLICIT = DirPropFlag(BN)|MASK_EXPLICIT;
1172 /* explicit isolate codes */
1173 static final int MASK_ISO = DirPropFlag(LRI)|DirPropFlag(RLI)|DirPropFlag(FSI)|DirPropFlag(PDI);
1175 /* paragraph and segment separators */
1176 static final int MASK_B_S = DirPropFlag(B)|DirPropFlag(S);
1178 /* all types that are counted as White Space or Neutral in some steps */
1179 static final int MASK_WS = MASK_B_S|DirPropFlag(WS)|MASK_BN_EXPLICIT|MASK_ISO;
1181 /* types that are neutrals or could becomes neutrals in (Wn) */
1182 static final int MASK_POSSIBLE_N = DirPropFlag(ON)|DirPropFlag(CS)|DirPropFlag(ES)|DirPropFlag(ET)|MASK_WS;
1185 * These types may be changed to "e",
1186 * the embedding type (L or R) of the run,
1187 * in the Bidi algorithm (N2)
1189 static final int MASK_EMBEDDING = DirPropFlag(NSM)|MASK_POSSIBLE_N;
1192 * the dirProp's L and R are defined to 0 and 1 values in UCharacterDirection.java
1194 static byte GetLRFromLevel(byte level)
1196 return (byte)(level & 1);
1199 static boolean IsDefaultLevel(byte level)
1201 return ((level & LEVEL_DEFAULT_LTR) == LEVEL_DEFAULT_LTR);
1204 static boolean IsBidiControlChar(int c)
1206 /* check for range 0x200c to 0x200f (ZWNJ, ZWJ, LRM, RLM) or
1207 0x202a to 0x202e (LRE, RLE, PDF, LRO, RLO) */
1208 return (((c & 0xfffffffc) == 0x200c) || ((c >= 0x202a) && (c <= 0x202e))
1209 || ((c >= 0x2066) && (c <= 0x2069)));
1212 void verifyValidPara()
1214 if (!(this == this.paraBidi)) {
1215 throw new IllegalStateException();
1219 void verifyValidParaOrLine()
1221 Bidi para = this.paraBidi;
1227 if ((para == null) || (para != para.paraBidi)) {
1228 throw new IllegalStateException();
1232 void verifyRange(int index, int start, int limit)
1234 if (index < start || index >= limit) {
1235 throw new IllegalArgumentException("Value " + index +
1236 " is out of range " + start + " to " + limit);
1241 * Allocate a <code>Bidi</code> object.
1242 * Such an object is initially empty. It is assigned
1243 * the Bidi properties of a piece of text containing one or more paragraphs
1244 * by <code>setPara()</code>
1245 * or the Bidi properties of a line within a paragraph by
1246 * <code>setLine()</code>.<p>
1247 * This object can be reused.<p>
1248 * <code>setPara()</code> and <code>setLine()</code> will allocate
1249 * additional memory for internal structures as necessary.
1259 * Allocate a <code>Bidi</code> object with preallocated memory
1260 * for internal structures.
1261 * This method provides a <code>Bidi</code> object like the default constructor
1262 * but it also preallocates memory for internal structures
1263 * according to the sizings supplied by the caller.<p>
1264 * The preallocation can be limited to some of the internal memory
1265 * by setting some values to 0 here. That means that if, e.g.,
1266 * <code>maxRunCount</code> cannot be reasonably predetermined and should not
1267 * be set to <code>maxLength</code> (the only failproof value) to avoid
1268 * wasting memory, then <code>maxRunCount</code> could be set to 0 here
1269 * and the internal structures that are associated with it will be allocated
1270 * on demand, just like with the default constructor.
1272 * @param maxLength is the maximum text or line length that internal memory
1273 * will be preallocated for. An attempt to associate this object with a
1274 * longer text will fail, unless this value is 0, which leaves the allocation
1275 * up to the implementation.
1277 * @param maxRunCount is the maximum anticipated number of same-level runs
1278 * that internal memory will be preallocated for. An attempt to access
1279 * visual runs on an object that was not preallocated for as many runs
1280 * as the text was actually resolved to will fail,
1281 * unless this value is 0, which leaves the allocation up to the implementation.<br><br>
1282 * The number of runs depends on the actual text and maybe anywhere between
1283 * 1 and <code>maxLength</code>. It is typically small.
1285 * @throws IllegalArgumentException if maxLength or maxRunCount is less than 0
1288 public Bidi(int maxLength, int maxRunCount)
1290 /* check the argument values */
1291 if (maxLength < 0 || maxRunCount < 0) {
1292 throw new IllegalArgumentException();
1295 /* reset the object, all reference variables null, all flags false,
1297 In fact, we don't need to do anything, since class members are
1298 initialized as zero when an instance is created.
1301 mayAllocateText = false;
1302 mayAllocateRuns = false;
1303 orderParagraphsLTR = false;
1306 trailingWSStart = 0;
1309 defaultParaLevel = 0;
1312 /* get Bidi properties */
1313 bdp = UBiDiProps.INSTANCE;
1315 /* allocate memory for arrays as requested */
1316 if (maxLength > 0) {
1317 getInitialDirPropsMemory(maxLength);
1318 getInitialLevelsMemory(maxLength);
1320 mayAllocateText = true;
1323 if (maxRunCount > 0) {
1324 // if maxRunCount == 1, use simpleRuns[]
1325 if (maxRunCount > 1) {
1326 getInitialRunsMemory(maxRunCount);
1329 mayAllocateRuns = true;
1334 * We are allowed to allocate memory if object==null or
1335 * mayAllocate==true for each array that we need.
1337 * Assume sizeNeeded>0.
1338 * If object != null, then assume size > 0.
1340 private Object getMemory(String label, Object array, Class<?> arrayClass,
1341 boolean mayAllocate, int sizeNeeded)
1343 int len = Array.getLength(array);
1345 /* we have at least enough memory and must not allocate */
1346 if (sizeNeeded == len) {
1350 /* we must not allocate */
1351 if (sizeNeeded <= len) {
1354 throw new OutOfMemoryError("Failed to allocate memory for "
1357 /* we may try to grow or shrink */
1358 /* FOOD FOR THOUGHT: when shrinking it should be possible to avoid
1359 the allocation altogether and rely on this.length */
1361 return Array.newInstance(arrayClass, sizeNeeded);
1362 } catch (Exception e) {
1363 throw new OutOfMemoryError("Failed to allocate memory for "
1368 /* helper methods for each allocated array */
1369 private void getDirPropsMemory(boolean mayAllocate, int len)
1371 Object array = getMemory("DirProps", dirPropsMemory, Byte.TYPE, mayAllocate, len);
1372 dirPropsMemory = (byte[]) array;
1375 void getDirPropsMemory(int len)
1377 getDirPropsMemory(mayAllocateText, len);
1380 private void getLevelsMemory(boolean mayAllocate, int len)
1382 Object array = getMemory("Levels", levelsMemory, Byte.TYPE, mayAllocate, len);
1383 levelsMemory = (byte[]) array;
1386 void getLevelsMemory(int len)
1388 getLevelsMemory(mayAllocateText, len);
1391 private void getRunsMemory(boolean mayAllocate, int len)
1393 Object array = getMemory("Runs", runsMemory, BidiRun.class, mayAllocate, len);
1394 runsMemory = (BidiRun[]) array;
1397 void getRunsMemory(int len)
1399 getRunsMemory(mayAllocateRuns, len);
1402 /* additional methods used by constructor - always allow allocation */
1403 private void getInitialDirPropsMemory(int len)
1405 getDirPropsMemory(true, len);
1408 private void getInitialLevelsMemory(int len)
1410 getLevelsMemory(true, len);
1413 private void getInitialRunsMemory(int len)
1415 getRunsMemory(true, len);
1419 * Modify the operation of the Bidi algorithm such that it
1420 * approximates an "inverse Bidi" algorithm. This method
1421 * must be called before <code>setPara()</code>.
1423 * <p>The normal operation of the Bidi algorithm as described
1424 * in the Unicode Technical Report is to take text stored in logical
1425 * (keyboard, typing) order and to determine the reordering of it for visual
1427 * Some legacy systems store text in visual order, and for operations
1428 * with standard, Unicode-based algorithms, the text needs to be transformed
1429 * to logical order. This is effectively the inverse algorithm of the
1430 * described Bidi algorithm. Note that there is no standard algorithm for
1431 * this "inverse Bidi" and that the current implementation provides only an
1432 * approximation of "inverse Bidi".</p>
1434 * <p>With <code>isInversed</code> set to <code>true</code>,
1435 * this method changes the behavior of some of the subsequent methods
1436 * in a way that they can be used for the inverse Bidi algorithm.
1437 * Specifically, runs of text with numeric characters will be treated in a
1438 * special way and may need to be surrounded with LRM characters when they are
1439 * written in reordered sequence.</p>
1441 * <p>Output runs should be retrieved using <code>getVisualRun()</code>.
1442 * Since the actual input for "inverse Bidi" is visually ordered text and
1443 * <code>getVisualRun()</code> gets the reordered runs, these are actually
1444 * the runs of the logically ordered output.</p>
1446 * <p>Calling this method with argument <code>isInverse</code> set to
1447 * <code>true</code> is equivalent to calling <code>setReorderingMode</code>
1448 * with argument <code>reorderingMode</code>
1449 * set to <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br>
1450 * Calling this method with argument <code>isInverse</code> set to
1451 * <code>false</code> is equivalent to calling <code>setReorderingMode</code>
1452 * with argument <code>reorderingMode</code>
1453 * set to <code>REORDER_DEFAULT</code>.
1455 * @param isInverse specifies "forward" or "inverse" Bidi operation.
1458 * @see #writeReordered
1459 * @see #setReorderingMode
1460 * @see #REORDER_INVERSE_NUMBERS_AS_L
1461 * @see #REORDER_DEFAULT
1464 public void setInverse(boolean isInverse) {
1465 this.isInverse = (isInverse);
1466 this.reorderingMode = isInverse ? REORDER_INVERSE_NUMBERS_AS_L
1471 * Is this <code>Bidi</code> object set to perform the inverse Bidi
1473 * <p>Note: calling this method after setting the reordering mode with
1474 * <code>setReorderingMode</code> will return <code>true</code> if the
1475 * reordering mode was set to
1476 * <code>REORDER_INVERSE_NUMBERS_AS_L<code>, <code>false</code>
1477 * for all other values.</p>
1479 * @return <code>true</code> if the <code>Bidi</code> object is set to
1480 * perform the inverse Bidi algorithm by handling numbers as L.
1483 * @see #setReorderingMode
1484 * @see #REORDER_INVERSE_NUMBERS_AS_L
1487 public boolean isInverse() {
1492 * Modify the operation of the Bidi algorithm such that it implements some
1493 * variant to the basic Bidi algorithm or approximates an "inverse Bidi"
1494 * algorithm, depending on different values of the "reordering mode".
1495 * This method must be called before <code>setPara()</code>, and stays in
1496 * effect until called again with a different argument.
1498 * <p>The normal operation of the Bidi algorithm as described in the Unicode
1499 * Standard Annex #9 is to take text stored in logical (keyboard, typing)
1500 * order and to determine how to reorder it for visual rendering.</p>
1502 * <p>With the reordering mode set to a value other than
1503 * <code>REORDER_DEFAULT</code>, this method changes the behavior of some of
1504 * the subsequent methods in a way such that they implement an inverse Bidi
1505 * algorithm or some other algorithm variants.</p>
1507 * <p>Some legacy systems store text in visual order, and for operations
1508 * with standard, Unicode-based algorithms, the text needs to be transformed
1509 * into logical order. This is effectively the inverse algorithm of the
1510 * described Bidi algorithm. Note that there is no standard algorithm for
1511 * this "inverse Bidi", so a number of variants are implemented here.</p>
1513 * <p>In other cases, it may be desirable to emulate some variant of the
1514 * Logical to Visual algorithm (e.g. one used in MS Windows), or perform a
1515 * Logical to Logical transformation.</p>
1518 * <li>When the Reordering Mode is set to
1519 * <code>REORDER_DEFAULT</code>,
1520 * the standard Bidi Logical to Visual algorithm is applied.</li>
1522 * <li>When the reordering mode is set to
1523 * <code>REORDER_NUMBERS_SPECIAL</code>,
1524 * the algorithm used to perform Bidi transformations when calling
1525 * <code>setPara</code> should approximate the algorithm used in Microsoft
1526 * Windows XP rather than strictly conform to the Unicode Bidi algorithm.
1528 * The differences between the basic algorithm and the algorithm addressed
1529 * by this option are as follows:
1531 * <li>Within text at an even embedding level, the sequence "123AB"
1532 * (where AB represent R or AL letters) is transformed to "123BA" by the
1533 * Unicode algorithm and to "BA123" by the Windows algorithm.</li>
1535 * <li>Arabic-Indic numbers (AN) are handled by the Windows algorithm just
1536 * like regular numbers (EN).</li>
1539 * <li>When the reordering mode is set to
1540 * <code>REORDER_GROUP_NUMBERS_WITH_R</code>,
1541 * numbers located between LTR text and RTL text are associated with the RTL
1542 * text. For instance, an LTR paragraph with content "abc 123 DEF" (where
1543 * upper case letters represent RTL characters) will be transformed to
1544 * "abc FED 123" (and not "abc 123 FED"), "DEF 123 abc" will be transformed
1545 * to "123 FED abc" and "123 FED abc" will be transformed to "DEF 123 abc".
1546 * This makes the algorithm reversible and makes it useful when round trip
1547 * (from visual to logical and back to visual) must be achieved without
1548 * adding LRM characters. However, this is a variation from the standard
1549 * Unicode Bidi algorithm.<br>
1550 * The source text should not contain Bidi control characters other than LRM
1553 * <li>When the reordering mode is set to
1554 * <code>REORDER_RUNS_ONLY</code>,
1555 * a "Logical to Logical" transformation must be performed:
1557 * <li>If the default text level of the source text (argument
1558 * <code>paraLevel</code> in <code>setPara</code>) is even, the source text
1559 * will be handled as LTR logical text and will be transformed to the RTL
1560 * logical text which has the same LTR visual display.</li>
1561 * <li>If the default level of the source text is odd, the source text
1562 * will be handled as RTL logical text and will be transformed to the
1563 * LTR logical text which has the same LTR visual display.</li>
1565 * This mode may be needed when logical text which is basically Arabic or
1566 * Hebrew, with possible included numbers or phrases in English, has to be
1567 * displayed as if it had an even embedding level (this can happen if the
1568 * displaying application treats all text as if it was basically LTR).
1570 * This mode may also be needed in the reverse case, when logical text which
1571 * is basically English, with possible included phrases in Arabic or Hebrew,
1572 * has to be displayed as if it had an odd embedding level.
1574 * Both cases could be handled by adding LRE or RLE at the head of the
1575 * text, if the display subsystem supports these formatting controls. If it
1576 * does not, the problem may be handled by transforming the source text in
1577 * this mode before displaying it, so that it will be displayed properly.
1579 * The source text should not contain Bidi control characters other than LRM
1582 * <li>When the reordering mode is set to
1583 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>, an "inverse Bidi"
1584 * algorithm is applied.
1585 * Runs of text with numeric characters will be treated like LTR letters and
1586 * may need to be surrounded with LRM characters when they are written in
1587 * reordered sequence (the option <code>INSERT_LRM_FOR_NUMERIC</code> can
1588 * be used with method <code>writeReordered</code> to this end. This mode
1589 * is equivalent to calling <code>setInverse()</code> with
1590 * argument <code>isInverse</code> set to <code>true</code>.</li>
1592 * <li>When the reordering mode is set to
1593 * <code>REORDER_INVERSE_LIKE_DIRECT</code>, the "direct" Logical to
1594 * Visual Bidi algorithm is used as an approximation of an "inverse Bidi"
1595 * algorithm. This mode is similar to mode
1596 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> but is closer to the
1597 * regular Bidi algorithm.
1599 * For example, an LTR paragraph with the content "FED 123 456 CBA" (where
1600 * upper case represents RTL characters) will be transformed to
1601 * "ABC 456 123 DEF", as opposed to "DEF 123 456 ABC"
1602 * with mode <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br>
1603 * When used in conjunction with option
1604 * <code>OPTION_INSERT_MARKS</code>, this mode generally
1605 * adds Bidi marks to the output significantly more sparingly than mode
1606 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br> with option
1607 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to
1608 * <code>writeReordered</code>.</li>
1610 * <li>When the reordering mode is set to
1611 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the Logical to Visual
1612 * Bidi algorithm used in Windows XP is used as an approximation of an "inverse
1615 * For example, an LTR paragraph with the content "abc FED123" (where
1616 * upper case represents RTL characters) will be transformed to
1620 * <p>In all the reordering modes specifying an "inverse Bidi" algorithm
1621 * (i.e. those with a name starting with <code>REORDER_INVERSE</code>),
1622 * output runs should be retrieved using <code>getVisualRun()</code>, and
1623 * the output text with <code>writeReordered()</code>. The caller should
1624 * keep in mind that in "inverse Bidi" modes the input is actually visually
1625 * ordered text and reordered output returned by <code>getVisualRun()</code>
1626 * or <code>writeReordered()</code> are actually runs or character string
1627 * of logically ordered output.<br>
1628 * For all the "inverse Bidi" modes, the source text should not contain
1629 * Bidi control characters other than LRM or RLM.</p>
1631 * <p>Note that option <code>OUTPUT_REVERSE</code> of
1632 * <code>writeReordered</code> has no useful meaning and should not be used
1633 * in conjunction with any value of the reordering mode specifying "inverse
1634 * Bidi" or with value <code>REORDER_RUNS_ONLY</code>.
1636 * @param reorderingMode specifies the required variant of the Bidi
1641 * @see #writeReordered
1642 * @see #INSERT_LRM_FOR_NUMERIC
1643 * @see #OUTPUT_REVERSE
1644 * @see #REORDER_DEFAULT
1645 * @see #REORDER_NUMBERS_SPECIAL
1646 * @see #REORDER_GROUP_NUMBERS_WITH_R
1647 * @see #REORDER_RUNS_ONLY
1648 * @see #REORDER_INVERSE_NUMBERS_AS_L
1649 * @see #REORDER_INVERSE_LIKE_DIRECT
1650 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL
1653 public void setReorderingMode(int reorderingMode) {
1654 if ((reorderingMode < REORDER_DEFAULT) ||
1655 (reorderingMode >= REORDER_COUNT))
1656 return; /* don't accept a wrong value */
1657 this.reorderingMode = reorderingMode;
1659 reorderingMode == REORDER_INVERSE_NUMBERS_AS_L;
1663 * What is the requested reordering mode for a given Bidi object?
1665 * @return the current reordering mode of the Bidi object
1667 * @see #setReorderingMode
1670 public int getReorderingMode() {
1671 return this.reorderingMode;
1675 * Specify which of the reordering options should be applied during Bidi
1678 * @param options A combination of zero or more of the following
1679 * reordering options:
1680 * <code>OPTION_DEFAULT</code>, <code>OPTION_INSERT_MARKS</code>,
1681 * <code>OPTION_REMOVE_CONTROLS</code>, <code>OPTION_STREAMING</code>.
1683 * @see #getReorderingOptions
1684 * @see #OPTION_DEFAULT
1685 * @see #OPTION_INSERT_MARKS
1686 * @see #OPTION_REMOVE_CONTROLS
1687 * @see #OPTION_STREAMING
1690 public void setReorderingOptions(int options) {
1691 if ((options & OPTION_REMOVE_CONTROLS) != 0) {
1692 this.reorderingOptions = options & ~OPTION_INSERT_MARKS;
1694 this.reorderingOptions = options;
1699 * What are the reordering options applied to a given Bidi object?
1701 * @return the current reordering options of the Bidi object
1703 * @see #setReorderingOptions
1706 public int getReorderingOptions() {
1707 return this.reorderingOptions;
1711 * Get the base direction of the text provided according to the Unicode
1712 * Bidirectional Algorithm. The base direction is derived from the first
1713 * character in the string with bidirectional character type L, R, or AL.
1714 * If the first such character has type L, LTR is returned. If the first
1715 * such character has type R or AL, RTL is returned. If the string does
1716 * not contain any character of these types, then NEUTRAL is returned.
1717 * This is a lightweight function for use when only the base direction is
1718 * needed and no further bidi processing of the text is needed.
1719 * @param paragraph the text whose paragraph level direction is needed.
1720 * @return LTR, RTL, NEUTRAL
1726 public static byte getBaseDirection(CharSequence paragraph) {
1727 if (paragraph == null || paragraph.length() == 0) {
1731 int length = paragraph.length();
1735 for (int i = 0; i < length; ) {
1736 // U16_NEXT(paragraph, i, length, c) for C++
1737 c = UCharacter.codePointAt(paragraph, i);
1738 direction = UCharacter.getDirectionality(c);
1739 if (direction == UCharacterDirection.LEFT_TO_RIGHT) {
1741 } else if (direction == UCharacterDirection.RIGHT_TO_LEFT
1742 || direction == UCharacterDirection.RIGHT_TO_LEFT_ARABIC) {
1746 i = UCharacter.offsetByCodePoints(paragraph, i, 1);// set i to the head index of next codepoint
1751 /* perform (P2)..(P3) ------------------------------------------------------- */
1754 * Returns the directionality of the first strong character
1755 * after the last B in prologue, if any.
1756 * Requires prologue!=null.
1758 private byte firstL_R_AL() {
1760 for (int i = 0; i < prologue.length(); ) {
1761 int uchar = prologue.codePointAt(i);
1762 i += Character.charCount(uchar);
1763 byte dirProp = (byte)getCustomizedClass(uchar);
1765 if (dirProp == L || dirProp == R || dirProp == AL) {
1778 * Check that there are enough entries in the arrays paras_limit and paras_level
1780 private void checkParaCount() {
1783 int count = paraCount;
1784 if (count <= paras_level.length)
1786 int oldLength = paras_level.length;
1787 saveLimits = paras_limit;
1788 saveLevels = paras_level;
1790 paras_limit = new int[count * 2];
1791 paras_level = new byte[count * 2];
1792 } catch (Exception e) {
1793 throw new OutOfMemoryError("Failed to allocate memory for paras");
1795 System.arraycopy(saveLimits, 0, paras_limit, 0, oldLength);
1796 System.arraycopy(saveLevels, 0, paras_level, 0, oldLength);
1800 * Get the directional properties for the text, calculate the flags bit-set, and
1801 * determine the paragraph level if necessary (in paras_level[i]).
1802 * FSI initiators are also resolved and their dirProp replaced with LRI or RLI.
1804 static final int NOT_SEEKING_STRONG = 0; /* 0: not contextual paraLevel, not after FSI */
1805 static final int SEEKING_STRONG_FOR_PARA = 1; /* 1: looking for first strong char in para */
1806 static final int SEEKING_STRONG_FOR_FSI = 2; /* 2: looking for first strong after FSI */
1807 static final int LOOKING_FOR_PDI = 3; /* 3: found strong after FSI, looking for PDI */
1809 private void getDirProps()
1812 flags = 0; /* collect all directionalities in the text */
1815 byte defaultParaLevel = 0; /* initialize to avoid compiler warnings */
1816 boolean isDefaultLevel = IsDefaultLevel(paraLevel);
1817 /* for inverse Bidi, the default para level is set to RTL if there is a
1818 strong R or AL character at either end of the text */
1819 boolean isDefaultLevelInverse=isDefaultLevel &&
1820 (reorderingMode == REORDER_INVERSE_LIKE_DIRECT ||
1821 reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL);
1823 int controlCount = 0;
1824 boolean removeBidiControls = (reorderingOptions & OPTION_REMOVE_CONTROLS) != 0;
1827 byte lastStrong = ON; /* for default level & inverse Bidi */
1828 /* The following stacks are used to manage isolate sequences. Those
1829 sequences may be nested, but obviously never more deeply than the
1830 maximum explicit embedding level.
1831 lastStack is the index of the last used entry in the stack. A value of -1
1832 means that there is no open isolate sequence.
1833 lastStack is reset to -1 on paragraph boundaries. */
1834 /* The following stack contains the position of the initiator of
1835 each open isolate sequence */
1836 int[] isolateStartStack= new int[MAX_EXPLICIT_LEVEL+1];
1837 /* The following stack contains the last known state before
1838 encountering the initiator of an isolate sequence */
1839 byte[] previousStateStack = new byte[MAX_EXPLICIT_LEVEL+1];
1842 if ((reorderingOptions & OPTION_STREAMING) != 0)
1844 defaultParaLevel = (byte)(paraLevel & 1);
1846 if (isDefaultLevel) {
1847 paras_level[0] = defaultParaLevel;
1848 lastStrong = defaultParaLevel;
1849 if (prologue != null && /* there is a prologue */
1850 (dirProp = firstL_R_AL()) != ON) { /* with a strong character */
1852 paras_level[0] = 0; /* set the default para level */
1854 paras_level[0] = 1; /* set the default para level */
1855 state = NOT_SEEKING_STRONG;
1857 state = SEEKING_STRONG_FOR_PARA;
1860 paras_level[0] = paraLevel;
1861 state = NOT_SEEKING_STRONG;
1863 /* count paragraphs and determine the paragraph level (P2..P3) */
1865 * see comment on constant fields:
1866 * the LEVEL_DEFAULT_XXX values are designed so that
1867 * their low-order bit alone yields the intended default
1870 for (i = 0; i < originalLength; /* i is incremented in the loop */) {
1871 i0 = i; /* index of first code unit */
1872 uchar = UTF16.charAt(text, 0, originalLength, i);
1873 i += UTF16.getCharCount(uchar);
1874 i1 = i - 1; /* index of last code unit, gets the directional property */
1876 dirProp = (byte)getCustomizedClass(uchar);
1877 flags |= DirPropFlag(dirProp);
1878 dirProps[i1] = dirProp;
1879 if (i1 > i0) { /* set previous code units' properties to BN */
1880 flags |= DirPropFlag(BN);
1882 dirProps[--i1] = BN;
1885 if (removeBidiControls && IsBidiControlChar(uchar)) {
1889 if (state == SEEKING_STRONG_FOR_PARA) {
1890 paras_level[paraCount - 1] = 0;
1891 state = NOT_SEEKING_STRONG;
1893 else if (state == SEEKING_STRONG_FOR_FSI) {
1894 if (stackLast <= MAX_EXPLICIT_LEVEL) {
1895 dirProps[isolateStartStack[stackLast]] = LRI;
1896 flags |= DirPropFlag(LRI);
1898 state = LOOKING_FOR_PDI;
1903 if (dirProp == R || dirProp == AL) {
1904 if (state == SEEKING_STRONG_FOR_PARA) {
1905 paras_level[paraCount - 1] = 1;
1906 state = NOT_SEEKING_STRONG;
1908 else if (state == SEEKING_STRONG_FOR_FSI) {
1909 if (stackLast <= MAX_EXPLICIT_LEVEL) {
1910 dirProps[isolateStartStack[stackLast]] = RLI;
1911 flags |= DirPropFlag(RLI);
1913 state = LOOKING_FOR_PDI;
1917 lastArabicPos = i - 1;
1920 if (dirProp >= FSI && dirProp <= RLI) { /* FSI, LRI or RLI */
1922 if (stackLast <= MAX_EXPLICIT_LEVEL) {
1923 isolateStartStack[stackLast] = i - 1;
1924 previousStateStack[stackLast] = state;
1927 state = SEEKING_STRONG_FOR_FSI;
1929 state = LOOKING_FOR_PDI;
1932 if (dirProp == PDI) {
1933 if (state == SEEKING_STRONG_FOR_FSI) {
1934 if (stackLast <= MAX_EXPLICIT_LEVEL) {
1935 dirProps[isolateStartStack[stackLast]] = LRI;
1936 flags |= DirPropFlag(LRI);
1939 if (stackLast >= 0) {
1940 if (stackLast <= MAX_EXPLICIT_LEVEL)
1941 state = previousStateStack[stackLast];
1947 if (i < originalLength && uchar == CR && text[i] == LF) /* do nothing on the CR */
1949 paras_limit[paraCount - 1] = i;
1950 if (isDefaultLevelInverse && lastStrong == R)
1951 paras_level[paraCount - 1] = 1;
1952 if ((reorderingOptions & OPTION_STREAMING) != 0) {
1953 /* When streaming, we only process whole paragraphs
1954 thus some updates are only done on paragraph boundaries */
1955 length = i; /* i is index to next character */
1956 this.controlCount = controlCount;
1958 if (i < originalLength) { /* B not last char in text */
1960 checkParaCount(); /* check that there is enough memory for a new para entry */
1961 if (isDefaultLevel) {
1962 paras_level[paraCount - 1] = defaultParaLevel;
1963 state = SEEKING_STRONG_FOR_PARA;
1964 lastStrong = defaultParaLevel;
1966 paras_level[paraCount - 1] = paraLevel;
1967 state = NOT_SEEKING_STRONG;
1974 /* Ignore still open isolate sequences with overflow */
1975 if (stackLast > MAX_EXPLICIT_LEVEL) {
1976 stackLast = MAX_EXPLICIT_LEVEL;
1977 if (dirProps[previousStateStack[MAX_EXPLICIT_LEVEL]] != FSI)
1978 state = LOOKING_FOR_PDI;
1980 /* Resolve direction of still unresolved open FSI sequences */
1981 while (stackLast >= 0) {
1982 if (state == SEEKING_STRONG_FOR_FSI) {
1983 dirProps[isolateStartStack[stackLast]] = LRI;
1984 flags |= DirPropFlag(LRI);
1986 state = previousStateStack[stackLast];
1989 /* When streaming, ignore text after the last paragraph separator */
1990 if ((reorderingOptions & OPTION_STREAMING) != 0) {
1991 if (length < originalLength)
1994 paras_limit[paraCount - 1] = originalLength;
1995 this.controlCount = controlCount;
1997 /* For inverse bidi, default para direction is RTL if there is
1998 a strong R or AL at either end of the paragraph */
1999 if (isDefaultLevelInverse && lastStrong == R) {
2000 paras_level[paraCount - 1] = 1;
2002 if (isDefaultLevel) {
2003 paraLevel = paras_level[0];
2005 /* The following is needed to resolve the text direction for default level
2006 paragraphs containing no strong character */
2007 for (i = 0; i < paraCount; i++)
2008 flags |= DirPropFlagLR(paras_level[i]);
2010 if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) {
2011 flags |= DirPropFlag(L);
2015 /* determine the paragraph level at position index */
2016 byte GetParaLevelAt(int index)
2018 if (defaultParaLevel == 0 || index < paras_limit[0])
2021 for (i = 1; i < paraCount; i++)
2022 if (index < paras_limit[i])
2026 return paras_level[i];
2029 /* Functions for handling paired brackets ----------------------------------- */
2031 /* In the isoRuns array, the first entry is used for text outside of any
2032 isolate sequence. Higher entries are used for each more deeply nested
2033 isolate sequence. isoRunLast is the index of the last used entry. The
2034 openings array is used to note the data of opening brackets not yet
2035 matched by a closing bracket, or matched but still susceptible to change
2037 Each isoRun entry contains the index of the first and
2038 one-after-last openings entries for pending opening brackets it
2039 contains. The next openings entry to use is the one-after-last of the
2040 most deeply nested isoRun entry.
2041 isoRun entries also contain their current embedding level and the last
2042 encountered strong character, since these will be needed to resolve
2043 the level of paired brackets. */
2045 private void bracketInit(BracketData bd) {
2047 bd.isoRuns[0] = new IsoRun();
2048 bd.isoRuns[0].start = 0;
2049 bd.isoRuns[0].limit = 0;
2050 bd.isoRuns[0].level = GetParaLevelAt(0);
2051 bd.isoRuns[0].lastStrong = bd.isoRuns[0].contextDir = (byte)(GetParaLevelAt(0) & 1);
2052 bd.isoRuns[0].lastStrongPos = bd.isoRuns[0].contextPos = 0;
2053 bd.openings = new Opening[SIMPLE_OPENINGS_SIZE];
2054 bd.isNumbersSpecial = reorderingMode == REORDER_NUMBERS_SPECIAL ||
2055 reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL;
2058 /* paragraph boundary */
2059 private void bracketProcessB(BracketData bd, byte level) {
2061 bd.isoRuns[0].limit = 0;
2062 bd.isoRuns[0].level = level;
2063 bd.isoRuns[0].lastStrong = bd.isoRuns[0].contextDir = (byte)(level & 1);
2064 bd.isoRuns[0].lastStrongPos = bd.isoRuns[0].contextPos = 0;
2067 /* LRE, LRO, RLE, RLO, PDF */
2068 private void bracketProcessBoundary(BracketData bd, int lastCcPos,
2069 byte contextLevel, byte embeddingLevel) {
2070 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];
2071 if ((DirPropFlag(dirProps[lastCcPos]) & MASK_ISO) != 0) /* after an isolate */
2073 if ((embeddingLevel & ~LEVEL_OVERRIDE) >
2074 (contextLevel & ~LEVEL_OVERRIDE)) /* not a PDF */
2075 contextLevel = embeddingLevel;
2076 pLastIsoRun.limit = pLastIsoRun.start;
2077 pLastIsoRun.level = embeddingLevel;
2078 pLastIsoRun.lastStrong = pLastIsoRun.contextDir = (byte)(contextLevel & 1);
2079 pLastIsoRun.lastStrongPos = pLastIsoRun.contextPos = lastCcPos;
2083 private void bracketProcessLRI_RLI(BracketData bd, byte level) {
2084 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];
2086 lastLimit = pLastIsoRun.limit;
2088 pLastIsoRun = bd.isoRuns[bd.isoRunLast];
2089 if (pLastIsoRun == null)
2090 pLastIsoRun = bd.isoRuns[bd.isoRunLast] = new IsoRun();
2091 pLastIsoRun.start = pLastIsoRun.limit = lastLimit;
2092 pLastIsoRun.level = level;
2093 pLastIsoRun.lastStrong = pLastIsoRun.contextDir = (byte)(level & 1);
2094 pLastIsoRun.lastStrongPos = pLastIsoRun.contextPos = 0;
2098 private void bracketProcessPDI(BracketData bd) {
2102 /* newly found opening bracket: create an openings entry */
2103 private void bracketAddOpening(BracketData bd, char match, int position) {
2104 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];
2106 if (pLastIsoRun.limit >= bd.openings.length) { /* no available new entry */
2107 Opening[] saveOpenings = bd.openings;
2110 count = bd.openings.length;
2111 bd.openings = new Opening[count * 2];
2112 } catch (Exception e) {
2113 throw new OutOfMemoryError("Failed to allocate memory for openings");
2115 System.arraycopy(saveOpenings, 0, bd.openings, 0, count);
2117 pOpening = bd.openings[pLastIsoRun.limit];
2118 if (pOpening == null)
2119 pOpening = bd.openings[pLastIsoRun.limit]= new Opening();
2120 pOpening.position = position;
2121 pOpening.match = match;
2122 pOpening.contextDir = pLastIsoRun.contextDir;
2123 pOpening.contextPos = pLastIsoRun.contextPos;
2125 pLastIsoRun.limit++;
2128 /* change N0c1 to N0c2 when a preceding bracket is assigned the embedding level */
2129 private void fixN0c(BracketData bd, int openingIndex, int newPropPosition, byte newProp) {
2130 /* This function calls itself recursively */
2131 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];
2133 int k, openingPosition, closingPosition;
2134 for (k = openingIndex+1; k < pLastIsoRun.limit; k++) {
2135 qOpening = bd.openings[k];
2136 if (qOpening.match >= 0) /* not an N0c match */
2138 if (newPropPosition < qOpening.contextPos)
2140 if (newPropPosition >= qOpening.position)
2142 if (newProp == qOpening.contextDir)
2144 openingPosition = qOpening.position;
2145 dirProps[openingPosition] = dirProps[newPropPosition];
2146 closingPosition = -(qOpening.match);
2147 dirProps[closingPosition] = newProp; /* can never be AL */
2148 qOpening.match = 0; /* prevent further changes */
2149 fixN0c(bd, k, openingPosition, newProp);
2150 fixN0c(bd, k, closingPosition, newProp);
2154 /* handle strong characters, digits and candidates for closing brackets */
2155 private void bracketProcessChar(BracketData bd, int position, byte dirProp) {
2157 Opening pOpening, qOpening;
2164 if ((DirPropFlag(dirProp) & MASK_STRONG_EN_AN) != 0) { /* L, R, AL, EN or AN */
2165 pLastIsoRun = bd.isoRuns[bd.isoRunLast];
2166 /* AN after R or AL becomes R or AL; after L or L+AN, it is kept as-is */
2167 if (dirProp == AN && (pLastIsoRun.lastStrong == R || pLastIsoRun.lastStrong == AL))
2168 dirProp = pLastIsoRun.lastStrong;
2169 /* EN after L or L+AN becomes L; after R or AL, it becomes R or AL */
2170 if (dirProp == EN) {
2171 if (pLastIsoRun.lastStrong == L || pLastIsoRun.lastStrong == AN) {
2173 if (!bd.isNumbersSpecial)
2174 dirProps[position] = ENL;
2177 dirProp = pLastIsoRun.lastStrong; /* may be R or AL */
2178 if (!bd.isNumbersSpecial)
2179 dirProps[position] = dirProp == AL ? AN : ENR;
2182 pLastIsoRun.lastStrong = dirProp;
2183 pLastIsoRun.contextDir = DirFromStrong(dirProp);
2184 pLastIsoRun.lastStrongPos = pLastIsoRun.contextPos = position;
2185 if (dirProp == AL || dirProp == AN)
2187 flag = (byte)DirPropFlag(dirProp);
2188 /* strong characters found after an unmatched opening bracket
2189 must be noted for possibly applying N0b */
2190 for (i = pLastIsoRun.start; i < pLastIsoRun.limit; i++) {
2191 bd.openings[i].flags |= flag;
2197 /* First see if it is a matching closing bracket. Hopefully, this is more
2198 efficient than checking if it is a closing bracket at all */
2200 pLastIsoRun = bd.isoRuns[bd.isoRunLast];
2201 for (i = pLastIsoRun.limit - 1; i >= pLastIsoRun.start; i--) {
2202 if (bd.openings[i].match != c)
2204 /* We have a match */
2205 pOpening = bd.openings[i];
2206 direction = (byte)(pLastIsoRun.level & 1);
2207 stable = true; /* assume stable until proved otherwise */
2209 /* The stable flag is set when brackets are paired and their
2210 level is resolved and cannot be changed by what will be
2211 found later in the source string.
2212 An unstable match can occur only when applying N0c, where
2213 the resolved level depends on the preceding context, and
2214 this context may be affected by text occurring later.
2215 Example: RTL paragraph containing: abc[(latin) HEBREW]
2216 When the closing parenthesis is encountered, it appears
2217 that N0c1 must be applied since 'abc' sets an opposite
2218 direction context and both parentheses receive level 2.
2219 However, when the closing square bracket is processed,
2220 N0b applies because of 'HEBREW' being included within the
2221 brackets, thus the square brackets are treated like R and
2222 receive level 1. However, this changes the preceding
2223 context of the opening parenthesis, and it now appears
2224 that N0c2 must be applied to the parentheses rather than
2227 if ((direction == 0 && (pOpening.flags & FOUND_L) > 0) ||
2228 (direction == 1 && (pOpening.flags & FOUND_R) > 0)) { /* N0b */
2229 newProp = direction;
2231 else if ((pOpening.flags & (FOUND_L | FOUND_R)) != 0) { /* N0c */
2232 if (direction != pOpening.contextDir) {
2233 newProp = pOpening.contextDir; /* N0c1 */
2234 /* it is stable if there is no preceding text or in
2235 conditions too complicated and not worth checking */
2236 stable = (i == pLastIsoRun.start);
2239 newProp = direction; /* N0c2 */
2242 newProp = BN; /* N0d */
2244 if (newProp != BN) {
2245 dirProps[pOpening.position] = newProp;
2246 dirProps[position] = newProp;
2247 pLastIsoRun.contextDir = newProp;
2248 pLastIsoRun.contextPos = position;
2250 /* Update nested N0c pairs that may be affected */
2251 if (newProp == direction)
2252 fixN0c(bd, i, pOpening.position, newProp);
2254 pLastIsoRun.limit = (short)i; /* forget any brackets nested within this pair */
2255 /* remove lower located synonyms if any */
2256 while (pLastIsoRun.limit > pLastIsoRun.start &&
2257 bd.openings[pLastIsoRun.limit - 1].position == pOpening.position)
2258 pLastIsoRun.limit--;
2261 pOpening.match = -position;
2262 /* neutralize lower located synonyms if any */
2264 while (k >= pLastIsoRun.start &&
2265 bd.openings[k].position == pOpening.position)
2266 bd.openings[k--].match = 0;
2267 /* neutralize any unmatched opening between the current pair;
2268 this will also neutralize higher located synonyms if any */
2269 for (k = i + 1; k < pLastIsoRun.limit; k++) {
2270 qOpening = bd.openings[k];
2271 if (qOpening.position >= position)
2273 if (qOpening.match > 0)
2279 /* We get here only if the ON character was not a matching closing bracket */
2280 /* Now see if it is an opening bracket */
2281 match = (char)UCharacter.getBidiPairedBracket(c); /* get the matching char */
2282 if (match == c) /* if no matching char */
2284 if (UCharacter.getIntPropertyValue(c, UProperty.BIDI_PAIRED_BRACKET_TYPE) !=
2285 UCharacter.BidiPairedBracketType.OPEN)
2286 return; /* not an opening bracket */
2287 /* special case: process synonyms
2288 create an opening entry for each synonym */
2289 if (match == 0x232A) { /* RIGHT-POINTING ANGLE BRACKET */
2290 bracketAddOpening(bd, (char)0x3009, position);
2292 else if (match == 0x3009) { /* RIGHT ANGLE BRACKET */
2293 bracketAddOpening(bd, (char)0x232A, position);
2295 bracketAddOpening(bd, match, position);
2298 /* perform (X1)..(X9) ------------------------------------------------------- */
2300 /* determine if the text is mixed-directional or single-directional */
2301 private byte directionFromFlags() {
2302 /* if the text contains AN and neutrals, then some neutrals may become RTL */
2303 if (!((flags & MASK_RTL) != 0 ||
2304 ((flags & DirPropFlag(AN)) != 0 &&
2305 (flags & MASK_POSSIBLE_N) != 0))) {
2307 } else if ((flags & MASK_LTR) == 0) {
2315 * Resolve the explicit levels as specified by explicit embedding codes.
2316 * Recalculate the flags to have them reflect the real properties
2317 * after taking the explicit embeddings into account.
2319 * The BiDi algorithm is designed to result in the same behavior whether embedding
2320 * levels are externally specified (from "styled text", supposedly the preferred
2321 * method) or set by explicit embedding codes (LRx, RLx, PDF, FSI, PDI) in the plain text.
2322 * That is why (X9) instructs to remove all not-isolate explicit codes (and BN).
2323 * However, in a real implementation, the removal of these codes and their index
2324 * positions in the plain text is undesirable since it would result in
2325 * reallocated, reindexed text.
2326 * Instead, this implementation leaves the codes in there and just ignores them
2327 * in the subsequent processing.
2328 * In order to get the same reordering behavior, positions with a BN or a not-isolate
2329 * explicit embedding code just get the same level assigned as the last "real"
2332 * Some implementations, not this one, then overwrite some of these
2333 * directionality properties at "real" same-level-run boundaries by
2334 * L or R codes so that the resolution of weak types can be performed on the
2335 * entire paragraph at once instead of having to parse it once more and
2336 * perform that resolution on same-level-runs.
2337 * This limits the scope of the implicit rules in effectively
2338 * the same way as the run limits.
2340 * Instead, this implementation does not modify these codes, except for
2341 * paired brackets whose properties (ON) may be replaced by L or R.
2342 * On one hand, the paragraph has to be scanned for same-level-runs, but
2343 * on the other hand, this saves another loop to reset these codes,
2344 * or saves making and modifying a copy of dirProps[].
2347 * Note that (Pn) and (Xn) changed significantly from version 4 of the BiDi algorithm.
2350 * Handling the stack of explicit levels (Xn):
2352 * With the BiDi stack of explicit levels, as pushed with each
2353 * LRE, RLE, LRO, RLO, LRI, RLI and FSO and popped with each PDF and PDI,
2354 * the explicit level must never exceed MAX_EXPLICIT_LEVEL.
2356 * In order to have a correct push-pop semantics even in the case of overflows,
2357 * overflow counters and a valid isolate counter are used as described in UAX#9
2358 * section 3.3.2 "Explicit Levels and Directions".
2360 * This implementation assumes that MAX_EXPLICIT_LEVEL is odd.
2362 private byte resolveExplicitLevels() {
2365 byte level = GetParaLevelAt(0);
2369 /* determine if the text is mixed-directional or single-directional */
2370 dirct = directionFromFlags();
2372 /* we may not need to resolve any explicit levels */
2373 if (dirct != MIXED) {
2374 /* not mixed directionality: levels don't matter - trailingWSStart will be 0 */
2377 if (reorderingMode > REORDER_LAST_LOGICAL_TO_VISUAL) {
2378 /* inverse BiDi: mixed, but all characters are at the same embedding level */
2379 /* set all levels to the paragraph level */
2380 int paraIndex, start, limit;
2381 for (paraIndex = 0; paraIndex < paraCount; paraIndex++) {
2385 start = paras_limit[paraIndex - 1];
2386 limit = paras_limit[paraIndex];
2387 level = paras_level[paraIndex];
2388 for (i = start; i < limit; i++)
2391 return dirct; /* no bracket matching for inverse BiDi */
2393 if ((flags & (MASK_EXPLICIT | MASK_ISO)) == 0) {
2394 /* no embeddings, set all levels to the paragraph level */
2395 /* we still have to perform bracket matching */
2396 int paraIndex, start, limit;
2397 BracketData bracketData = new BracketData();
2398 bracketInit(bracketData);
2399 for (paraIndex = 0; paraIndex < paraCount; paraIndex++) {
2403 start = paras_limit[paraIndex-1];
2404 limit = paras_limit[paraIndex];
2405 level = paras_level[paraIndex];
2406 for (i = start; i < limit; i++) {
2408 dirProp = dirProps[i];
2410 if ((i + 1) < length) {
2411 if (text[i] == CR && text[i + 1] == LF)
2412 continue; /* skip CR when followed by LF */
2413 bracketProcessB(bracketData, level);
2417 bracketProcessChar(bracketData, i, dirProp);
2422 /* continue to perform (Xn) */
2424 /* (X1) level is set for all codes, embeddingLevel keeps track of the push/pop operations */
2425 /* both variables may carry the LEVEL_OVERRIDE flag to indicate the override status */
2426 byte embeddingLevel = level, newLevel;
2427 byte previousLevel = level; /* previous level for regular (not CC) characters */
2428 int lastCcPos = 0; /* index of last effective LRx,RLx, PDx */
2430 short[] stack = new short[MAX_EXPLICIT_LEVEL + 2]; /* we never push anything >= MAX_EXPLICIT_LEVEL
2431 but we need one more entry as base */
2433 int overflowIsolateCount = 0;
2434 int overflowEmbeddingCount = 0;
2435 int validIsolateCount = 0;
2436 BracketData bracketData = new BracketData();
2437 bracketInit(bracketData);
2438 stack[0] = level; /* initialize base entry to para level, no override, no isolate */
2440 /* recalculate the flags */
2443 for (i = 0; i < length; i++) {
2444 dirProp = dirProps[i];
2450 /* (X2, X3, X4, X5) */
2451 flags |= DirPropFlag(BN);
2452 if (dirProp == LRE || dirProp == LRO)
2453 newLevel = (byte)((embeddingLevel+2) & ~(LEVEL_OVERRIDE | 1)); /* least greater even level */
2455 newLevel = (byte)(((embeddingLevel & ~LEVEL_OVERRIDE) + 1) | 1); /* least greater odd level */
2456 if (newLevel <= MAX_EXPLICIT_LEVEL && overflowIsolateCount == 0 &&
2457 overflowEmbeddingCount == 0) {
2459 embeddingLevel = newLevel;
2460 if (dirProp == LRO || dirProp == RLO)
2461 embeddingLevel |= LEVEL_OVERRIDE;
2463 stack[stackLast] = embeddingLevel;
2464 /* we don't need to set LEVEL_OVERRIDE off for LRE and RLE
2465 since this has already been done for newLevel which is
2466 the source for embeddingLevel.
2469 dirProps[i] |= IGNORE_CC;
2470 if (overflowIsolateCount == 0)
2471 overflowEmbeddingCount++;
2476 flags |= DirPropFlag(BN);
2477 /* handle all the overflow cases first */
2478 if (overflowIsolateCount > 0) {
2479 dirProps[i] |= IGNORE_CC;
2482 if (overflowEmbeddingCount > 0) {
2483 dirProps[i] |= IGNORE_CC;
2484 overflowEmbeddingCount--;
2487 if (stackLast > 0 && stack[stackLast] < ISOLATE) { /* not an isolate entry */
2490 embeddingLevel = (byte)stack[stackLast];
2492 dirProps[i] |= IGNORE_CC;
2496 if (embeddingLevel != previousLevel) {
2497 bracketProcessBoundary(bracketData, lastCcPos,
2498 previousLevel, embeddingLevel);
2499 previousLevel = embeddingLevel;
2502 flags |= DirPropFlag(ON) | DirPropFlag(BN) | DirPropFlagLR(embeddingLevel);
2503 level = embeddingLevel;
2505 newLevel=(byte)((embeddingLevel+2)&~(LEVEL_OVERRIDE|1)); /* least greater even level */
2507 newLevel=(byte)(((embeddingLevel&~LEVEL_OVERRIDE)+1)|1); /* least greater odd level */
2508 if (newLevel <= MAX_EXPLICIT_LEVEL && overflowIsolateCount == 0
2509 && overflowEmbeddingCount == 0) {
2511 previousLevel = embeddingLevel;
2512 validIsolateCount++;
2513 if (validIsolateCount > isolateCount)
2514 isolateCount = validIsolateCount;
2515 embeddingLevel = newLevel;
2517 stack[stackLast] = (short)(embeddingLevel + ISOLATE);
2518 bracketProcessLRI_RLI(bracketData, embeddingLevel);
2520 dirProps[i] |= IGNORE_CC;
2521 overflowIsolateCount++;
2525 if (embeddingLevel != previousLevel) {
2526 bracketProcessBoundary(bracketData, lastCcPos,
2527 previousLevel, embeddingLevel);
2530 if (overflowIsolateCount > 0) {
2531 dirProps[i] |= IGNORE_CC;
2532 overflowIsolateCount--;
2534 else if (validIsolateCount > 0) {
2536 overflowEmbeddingCount = 0;
2537 while (stack[stackLast] < ISOLATE) /* pop embedding entries */
2538 stackLast--; /* until the last isolate entry */
2539 stackLast--; /* pop also the last isolate entry */
2540 validIsolateCount--;
2541 bracketProcessPDI(bracketData);
2543 dirProps[i] |= IGNORE_CC;
2544 embeddingLevel = (byte)(stack[stackLast] & ~ISOLATE);
2545 previousLevel = level = embeddingLevel;
2546 flags |= DirPropFlag(ON) | DirPropFlag(BN) | DirPropFlagLR(embeddingLevel);
2549 level = GetParaLevelAt(i);
2550 if ((i + 1) < length) {
2551 if (text[i] == CR && text[i + 1] == LF)
2552 break; /* skip CR when followed by LF */
2553 overflowEmbeddingCount = overflowIsolateCount = 0;
2554 validIsolateCount = 0;
2556 stack[0] = level; /* initialize base entry to para level, no override, no isolate */
2557 previousLevel = embeddingLevel = GetParaLevelAt(i + 1);
2558 bracketProcessB(bracketData, embeddingLevel);
2560 flags |= DirPropFlag(B);
2563 /* BN, LRE, RLE, and PDF are supposed to be removed (X9) */
2564 /* they will get their levels set correctly in adjustWSLevels() */
2565 flags |= DirPropFlag(BN);
2568 /* all other types get the "real" level */
2569 level = embeddingLevel;
2570 if (embeddingLevel != previousLevel) {
2571 bracketProcessBoundary(bracketData, lastCcPos,
2572 previousLevel, embeddingLevel);
2573 previousLevel = embeddingLevel;
2575 if ((level & LEVEL_OVERRIDE) != 0)
2576 flags |= DirPropFlagLR(level);
2578 flags |= DirPropFlag(dirProp);
2579 bracketProcessChar(bracketData, i, dirProp);
2584 * We need to set reasonable levels even on BN codes and
2585 * explicit codes because we will later look at same-level runs (X10).
2588 if (i > 0 && levels[i - 1] != level) {
2589 flags |= DirPropFlagMultiRuns;
2590 if ((level & LEVEL_OVERRIDE) != 0)
2591 flags |= DirPropFlagO(level);
2593 flags |= DirPropFlagE(level);
2595 if ((DirPropFlag(dirProp) & MASK_ISO) != 0)
2596 level = embeddingLevel;
2598 if ((flags & MASK_EMBEDDING) != 0) {
2599 flags |= DirPropFlagLR(paraLevel);
2601 if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) {
2602 flags |= DirPropFlag(L);
2605 /* subsequently, ignore the explicit codes and BN (X9) */
2607 /* again, determine if the text is mixed-directional or single-directional */
2608 dirct = directionFromFlags();
2614 * Use a pre-specified embedding levels array:
2616 * Adjust the directional properties for overrides (->LEVEL_OVERRIDE),
2617 * ignore all explicit codes (X9),
2618 * and check all the preset levels.
2620 * Recalculate the flags to have them reflect the real properties
2621 * after taking the explicit embeddings into account.
2623 private byte checkExplicitLevels() {
2626 int isolateCount = 0;
2628 this.flags = 0; /* collect all directionalities in the text */
2630 this.isolateCount = 0;
2632 for (i = 0; i < length; ++i) {
2634 dirProp = dirProps[i];
2635 if (dirProp == LRI || dirProp == RLI) {
2637 if (isolateCount > this.isolateCount)
2638 this.isolateCount = isolateCount;
2640 else if (dirProp == PDI)
2642 else if (dirProp == B)
2644 if ((level & LEVEL_OVERRIDE) != 0) {
2645 /* keep the override flag in levels[i] but adjust the flags */
2646 level &= ~LEVEL_OVERRIDE; /* make the range check below simpler */
2647 flags |= DirPropFlagO(level);
2650 flags |= DirPropFlagE(level) | DirPropFlag(dirProp);
2652 if ((level < GetParaLevelAt(i) &&
2653 !((0 == level) && (dirProp == B))) ||
2654 (MAX_EXPLICIT_LEVEL < level)) {
2655 /* level out of bounds */
2656 throw new IllegalArgumentException("level " + level +
2657 " out of bounds at " + i);
2660 if ((flags & MASK_EMBEDDING) != 0) {
2661 flags |= DirPropFlagLR(paraLevel);
2664 /* determine if the text is mixed-directional or single-directional */
2665 return directionFromFlags();
2668 /*********************************************************************/
2669 /* The Properties state machine table */
2670 /*********************************************************************/
2672 /* All table cells are 8 bits: */
2673 /* bits 0..4: next state */
2674 /* bits 5..7: action to perform (if > 0) */
2676 /* Cells may be of format "n" where n represents the next state */
2677 /* (except for the rightmost column). */
2678 /* Cells may also be of format "_(x,y)" where x represents an action */
2679 /* to perform and y represents the next state. */
2681 /*********************************************************************/
2682 /* Definitions and type for properties state tables */
2683 /*********************************************************************/
2684 private static final int IMPTABPROPS_COLUMNS = 16;
2685 private static final int IMPTABPROPS_RES = IMPTABPROPS_COLUMNS - 1;
2686 private static short GetStateProps(short cell) {
2687 return (short)(cell & 0x1f);
2689 private static short GetActionProps(short cell) {
2690 return (short)(cell >> 5);
2693 private static final short groupProp[] = /* dirProp regrouped */
2695 /* L R EN ES ET AN CS B S WS ON LRE LRO AL RLE RLO PDF NSM BN FSI LRI RLI PDI ENL ENR */
2696 0, 1, 2, 7, 8, 3, 9, 6, 5, 4, 4, 10, 10, 12, 10, 10, 10, 11, 10, 4, 4, 4, 4, 13, 14
2698 private static final short _L = 0;
2699 private static final short _R = 1;
2700 private static final short _EN = 2;
2701 private static final short _AN = 3;
2702 private static final short _ON = 4;
2703 private static final short _S = 5;
2704 private static final short _B = 6; /* reduced dirProp */
2706 /*********************************************************************/
2708 /* PROPERTIES STATE TABLE */
2710 /* In table impTabProps, */
2711 /* - the ON column regroups ON and WS, FSI, RLI, LRI and PDI */
2712 /* - the BN column regroups BN, LRE, RLE, LRO, RLO, PDF */
2713 /* - the Res column is the reduced property assigned to a run */
2715 /* Action 1: process current run1, init new run1 */
2716 /* 2: init new run2 */
2717 /* 3: process run1, process run2, init new run1 */
2718 /* 4: process run1, set run1=run2, init new run2 */
2721 /* 1) This table is used in resolveImplicitLevels(). */
2722 /* 2) This table triggers actions when there is a change in the Bidi*/
2723 /* property of incoming characters (action 1). */
2724 /* 3) Most such property sequences are processed immediately (in */
2725 /* fact, passed to processPropertySeq(). */
2726 /* 4) However, numbers are assembled as one sequence. This means */
2727 /* that undefined situations (like CS following digits, until */
2728 /* it is known if the next char will be a digit) are held until */
2729 /* following chars define them. */
2730 /* Example: digits followed by CS, then comes another CS or ON; */
2731 /* the digits will be processed, then the CS assigned */
2732 /* as the start of an ON sequence (action 3). */
2733 /* 5) There are cases where more than one sequence must be */
2734 /* processed, for instance digits followed by CS followed by L: */
2735 /* the digits must be processed as one sequence, and the CS */
2736 /* must be processed as an ON sequence, all this before starting */
2737 /* assembling chars for the opening L sequence. */
2740 private static final short impTabProps[][] =
2742 /* L, R, EN, AN, ON, S, B, ES, ET, CS, BN, NSM, AL, ENL, ENR , Res */
2743 /* 0 Init */ { 1, 2, 4, 5, 7, 15, 17, 7, 9, 7, 0, 7, 3, 18, 21, _ON },
2744 /* 1 L */ { 1, 32+2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 1, 1, 32+3, 32+18, 32+21, _L },
2745 /* 2 R */ { 32+1, 2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 2, 2, 32+3, 32+18, 32+21, _R },
2746 /* 3 AL */ { 32+1, 32+2, 32+6, 32+6, 32+8, 32+16, 32+17, 32+8, 32+8, 32+8, 3, 3, 3, 32+18, 32+21, _R },
2747 /* 4 EN */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 64+10, 11, 64+10, 4, 4, 32+3, 18, 21, _EN },
2748 /* 5 AN */ { 32+1, 32+2, 32+4, 5, 32+7, 32+15, 32+17, 32+7, 32+9, 64+12, 5, 5, 32+3, 32+18, 32+21, _AN },
2749 /* 6 AL:EN/AN */ { 32+1, 32+2, 6, 6, 32+8, 32+16, 32+17, 32+8, 32+8, 64+13, 6, 6, 32+3, 18, 21, _AN },
2750 /* 7 ON */ { 32+1, 32+2, 32+4, 32+5, 7, 32+15, 32+17, 7, 64+14, 7, 7, 7, 32+3, 32+18, 32+21, _ON },
2751 /* 8 AL:ON */ { 32+1, 32+2, 32+6, 32+6, 8, 32+16, 32+17, 8, 8, 8, 8, 8, 32+3, 32+18, 32+21, _ON },
2752 /* 9 ET */ { 32+1, 32+2, 4, 32+5, 7, 32+15, 32+17, 7, 9, 7, 9, 9, 32+3, 18, 21, _ON },
2753 /*10 EN+ES/CS */ { 96+1, 96+2, 4, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 10, 128+7, 96+3, 18, 21, _EN },
2754 /*11 EN+ET */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 32+7, 11, 32+7, 11, 11, 32+3, 18, 21, _EN },
2755 /*12 AN+CS */ { 96+1, 96+2, 96+4, 5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 12, 128+7, 96+3, 96+18, 96+21, _AN },
2756 /*13 AL:EN/AN+CS */ { 96+1, 96+2, 6, 6, 128+8, 96+16, 96+17, 128+8, 128+8, 128+8, 13, 128+8, 96+3, 18, 21, _AN },
2757 /*14 ON+ET */ { 32+1, 32+2, 128+4, 32+5, 7, 32+15, 32+17, 7, 14, 7, 14, 14, 32+3,128+18,128+21, _ON },
2758 /*15 S */ { 32+1, 32+2, 32+4, 32+5, 32+7, 15, 32+17, 32+7, 32+9, 32+7, 15, 32+7, 32+3, 32+18, 32+21, _S },
2759 /*16 AL:S */ { 32+1, 32+2, 32+6, 32+6, 32+8, 16, 32+17, 32+8, 32+8, 32+8, 16, 32+8, 32+3, 32+18, 32+21, _S },
2760 /*17 B */ { 32+1, 32+2, 32+4, 32+5, 32+7, 32+15, 17, 32+7, 32+9, 32+7, 17, 32+7, 32+3, 32+18, 32+21, _B },
2761 /*18 ENL */ { 32+1, 32+2, 18, 32+5, 32+7, 32+15, 32+17, 64+19, 20, 64+19, 18, 18, 32+3, 18, 21, _L },
2762 /*19 ENL+ES/CS */ { 96+1, 96+2, 18, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 19, 128+7, 96+3, 18, 21, _L },
2763 /*20 ENL+ET */ { 32+1, 32+2, 18, 32+5, 32+7, 32+15, 32+17, 32+7, 20, 32+7, 20, 20, 32+3, 18, 21, _L },
2764 /*21 ENR */ { 32+1, 32+2, 21, 32+5, 32+7, 32+15, 32+17, 64+22, 23, 64+22, 21, 21, 32+3, 18, 21, _AN },
2765 /*22 ENR+ES/CS */ { 96+1, 96+2, 21, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 22, 128+7, 96+3, 18, 21, _AN },
2766 /*23 ENR+ET */ { 32+1, 32+2, 21, 32+5, 32+7, 32+15, 32+17, 32+7, 23, 32+7, 23, 23, 32+3, 18, 21, _AN }
2769 /*********************************************************************/
2770 /* The levels state machine tables */
2771 /*********************************************************************/
2773 /* All table cells are 8 bits: */
2774 /* bits 0..3: next state */
2775 /* bits 4..7: action to perform (if > 0) */
2777 /* Cells may be of format "n" where n represents the next state */
2778 /* (except for the rightmost column). */
2779 /* Cells may also be of format "_(x,y)" where x represents an action */
2780 /* to perform and y represents the next state. */
2782 /* This format limits each table to 16 states each and to 15 actions.*/
2784 /*********************************************************************/
2785 /* Definitions and type for levels state tables */
2786 /*********************************************************************/
2787 private static final int IMPTABLEVELS_COLUMNS = _B + 2;
2788 private static final int IMPTABLEVELS_RES = IMPTABLEVELS_COLUMNS - 1;
2789 private static short GetState(byte cell) { return (short)(cell & 0x0f); }
2790 private static short GetAction(byte cell) { return (short)(cell >> 4); }
2792 private static class ImpTabPair {
2796 ImpTabPair(byte[][] table1, byte[][] table2,
2797 short[] act1, short[] act2) {
2798 imptab = new byte[][][] {table1, table2};
2799 impact = new short[][] {act1, act2};
2803 /*********************************************************************/
2805 /* LEVELS STATE TABLES */
2807 /* In all levels state tables, */
2808 /* - state 0 is the initial state */
2809 /* - the Res column is the increment to add to the text level */
2810 /* for this property sequence. */
2812 /* The impact arrays for each table of a pair map the local action */
2813 /* numbers of the table to the total list of actions. For instance, */
2814 /* action 2 in a given table corresponds to the action number which */
2815 /* appears in entry [2] of the impact array for that table. */
2816 /* The first entry of all impact arrays must be 0. */
2818 /* Action 1: init conditional sequence */
2819 /* 2: prepend conditional sequence to current sequence */
2820 /* 3: set ON sequence to new level - 1 */
2821 /* 4: init EN/AN/ON sequence */
2822 /* 5: fix EN/AN/ON sequence followed by R */
2823 /* 6: set previous level sequence to level 2 */
2826 /* 1) These tables are used in processPropertySeq(). The input */
2827 /* is property sequences as determined by resolveImplicitLevels. */
2828 /* 2) Most such property sequences are processed immediately */
2829 /* (levels are assigned). */
2830 /* 3) However, some sequences cannot be assigned a final level till */
2831 /* one or more following sequences are received. For instance, */
2832 /* ON following an R sequence within an even-level paragraph. */
2833 /* If the following sequence is R, the ON sequence will be */
2834 /* assigned basic run level+1, and so will the R sequence. */
2835 /* 4) S is generally handled like ON, since its level will be fixed */
2836 /* to paragraph level in adjustWSLevels(). */
2839 private static final byte impTabL_DEFAULT[][] = /* Even paragraph level */
2840 /* In this table, conditional sequences receive the higher possible level
2841 until proven otherwise.
2844 /* L, R, EN, AN, ON, S, B, Res */
2845 /* 0 : init */ { 0, 1, 0, 2, 0, 0, 0, 0 },
2846 /* 1 : R */ { 0, 1, 3, 3, 0x14, 0x14, 0, 1 },
2847 /* 2 : AN */ { 0, 1, 0, 2, 0x15, 0x15, 0, 2 },
2848 /* 3 : R+EN/AN */ { 0, 1, 3, 3, 0x14, 0x14, 0, 2 },
2849 /* 4 : R+ON */ { 0x20, 1, 3, 3, 4, 4, 0x20, 1 },
2850 /* 5 : AN+ON */ { 0x20, 1, 0x20, 2, 5, 5, 0x20, 1 }
2853 private static final byte impTabR_DEFAULT[][] = /* Odd paragraph level */
2854 /* In this table, conditional sequences receive the lower possible level
2855 until proven otherwise.
2858 /* L, R, EN, AN, ON, S, B, Res */
2859 /* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 },
2860 /* 1 : L */ { 1, 0, 1, 3, 0x14, 0x14, 0, 1 },
2861 /* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 },
2862 /* 3 : L+AN */ { 1, 0, 1, 3, 5, 5, 0, 1 },
2863 /* 4 : L+ON */ { 0x21, 0, 0x21, 3, 4, 4, 0, 0 },
2864 /* 5 : L+AN+ON */ { 1, 0, 1, 3, 5, 5, 0, 0 }
2867 private static final short[] impAct0 = {0,1,2,3,4,5,6};
2869 private static final ImpTabPair impTab_DEFAULT = new ImpTabPair(
2870 impTabL_DEFAULT, impTabR_DEFAULT, impAct0, impAct0);
2872 private static final byte impTabL_NUMBERS_SPECIAL[][] = { /* Even paragraph level */
2873 /* In this table, conditional sequences receive the higher possible
2874 level until proven otherwise.
2876 /* L, R, EN, AN, ON, S, B, Res */
2877 /* 0 : init */ { 0, 2, 1, 1, 0, 0, 0, 0 },
2878 /* 1 : L+EN/AN */ { 0, 2, 1, 1, 0, 0, 0, 2 },
2879 /* 2 : R */ { 0, 2, 4, 4, 0x13, 0, 0, 1 },
2880 /* 3 : R+ON */ { 0x20, 2, 4, 4, 3, 3, 0x20, 1 },
2881 /* 4 : R+EN/AN */ { 0, 2, 4, 4, 0x13, 0x13, 0, 2 }
2883 private static final ImpTabPair impTab_NUMBERS_SPECIAL = new ImpTabPair(
2884 impTabL_NUMBERS_SPECIAL, impTabR_DEFAULT, impAct0, impAct0);
2886 private static final byte impTabL_GROUP_NUMBERS_WITH_R[][] = {
2887 /* In this table, EN/AN+ON sequences receive levels as if associated with R
2888 until proven that there is L or sor/eor on both sides. AN is handled like EN.
2890 /* L, R, EN, AN, ON, S, B, Res */
2891 /* 0 init */ { 0, 3, 0x11, 0x11, 0, 0, 0, 0 },
2892 /* 1 EN/AN */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 2 },
2893 /* 2 EN/AN+ON */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 1 },
2894 /* 3 R */ { 0, 3, 5, 5, 0x14, 0, 0, 1 },
2895 /* 4 R+ON */ { 0x20, 3, 5, 5, 4, 0x20, 0x20, 1 },
2896 /* 5 R+EN/AN */ { 0, 3, 5, 5, 0x14, 0, 0, 2 }
2898 private static final byte impTabR_GROUP_NUMBERS_WITH_R[][] = {
2899 /* In this table, EN/AN+ON sequences receive levels as if associated with R
2900 until proven that there is L on both sides. AN is handled like EN.
2902 /* L, R, EN, AN, ON, S, B, Res */
2903 /* 0 init */ { 2, 0, 1, 1, 0, 0, 0, 0 },
2904 /* 1 EN/AN */ { 2, 0, 1, 1, 0, 0, 0, 1 },
2905 /* 2 L */ { 2, 0, 0x14, 0x14, 0x13, 0, 0, 1 },
2906 /* 3 L+ON */ { 0x22, 0, 4, 4, 3, 0, 0, 0 },
2907 /* 4 L+EN/AN */ { 0x22, 0, 4, 4, 3, 0, 0, 1 }
2909 private static final ImpTabPair impTab_GROUP_NUMBERS_WITH_R = new
2910 ImpTabPair(impTabL_GROUP_NUMBERS_WITH_R,
2911 impTabR_GROUP_NUMBERS_WITH_R, impAct0, impAct0);
2913 private static final byte impTabL_INVERSE_NUMBERS_AS_L[][] = {
2914 /* This table is identical to the Default LTR table except that EN and AN
2917 /* L, R, EN, AN, ON, S, B, Res */
2918 /* 0 : init */ { 0, 1, 0, 0, 0, 0, 0, 0 },
2919 /* 1 : R */ { 0, 1, 0, 0, 0x14, 0x14, 0, 1 },
2920 /* 2 : AN */ { 0, 1, 0, 0, 0x15, 0x15, 0, 2 },
2921 /* 3 : R+EN/AN */ { 0, 1, 0, 0, 0x14, 0x14, 0, 2 },
2922 /* 4 : R+ON */ { 0x20, 1, 0x20, 0x20, 4, 4, 0x20, 1 },
2923 /* 5 : AN+ON */ { 0x20, 1, 0x20, 0x20, 5, 5, 0x20, 1 }
2925 private static final byte impTabR_INVERSE_NUMBERS_AS_L[][] = {
2926 /* This table is identical to the Default RTL table except that EN and AN
2929 /* L, R, EN, AN, ON, S, B, Res */
2930 /* 0 : init */ { 1, 0, 1, 1, 0, 0, 0, 0 },
2931 /* 1 : L */ { 1, 0, 1, 1, 0x14, 0x14, 0, 1 },
2932 /* 2 : EN/AN */ { 1, 0, 1, 1, 0, 0, 0, 1 },
2933 /* 3 : L+AN */ { 1, 0, 1, 1, 5, 5, 0, 1 },
2934 /* 4 : L+ON */ { 0x21, 0, 0x21, 0x21, 4, 4, 0, 0 },
2935 /* 5 : L+AN+ON */ { 1, 0, 1, 1, 5, 5, 0, 0 }
2937 private static final ImpTabPair impTab_INVERSE_NUMBERS_AS_L = new ImpTabPair
2938 (impTabL_INVERSE_NUMBERS_AS_L, impTabR_INVERSE_NUMBERS_AS_L,
2941 private static final byte impTabR_INVERSE_LIKE_DIRECT[][] = { /* Odd paragraph level */
2942 /* In this table, conditional sequences receive the lower possible level
2943 until proven otherwise.
2945 /* L, R, EN, AN, ON, S, B, Res */
2946 /* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 },
2947 /* 1 : L */ { 1, 0, 1, 2, 0x13, 0x13, 0, 1 },
2948 /* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 },
2949 /* 3 : L+ON */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 0 },
2950 /* 4 : L+ON+AN */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 3 },
2951 /* 5 : L+AN+ON */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 2 },
2952 /* 6 : L+ON+EN */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 1 }
2954 private static final short[] impAct1 = {0,1,11,12};
2955 private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT = new ImpTabPair(
2956 impTabL_DEFAULT, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1);
2958 private static final byte impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = {
2959 /* The case handled in this table is (visually): R EN L
2961 /* L, R, EN, AN, ON, S, B, Res */
2962 /* 0 : init */ { 0, 0x63, 0, 1, 0, 0, 0, 0 },
2963 /* 1 : L+AN */ { 0, 0x63, 0, 1, 0x12, 0x30, 0, 4 },
2964 /* 2 : L+AN+ON */ { 0x20, 0x63, 0x20, 1, 2, 0x30, 0x20, 3 },
2965 /* 3 : R */ { 0, 0x63, 0x55, 0x56, 0x14, 0x30, 0, 3 },
2966 /* 4 : R+ON */ { 0x30, 0x43, 0x55, 0x56, 4, 0x30, 0x30, 3 },
2967 /* 5 : R+EN */ { 0x30, 0x43, 5, 0x56, 0x14, 0x30, 0x30, 4 },
2968 /* 6 : R+AN */ { 0x30, 0x43, 0x55, 6, 0x14, 0x30, 0x30, 4 }
2970 private static final byte impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = {
2971 /* The cases handled in this table are (visually): R EN L
2974 /* L, R, EN, AN, ON, S, B, Res */
2975 /* 0 : init */ { 0x13, 0, 1, 1, 0, 0, 0, 0 },
2976 /* 1 : R+EN/AN */ { 0x23, 0, 1, 1, 2, 0x40, 0, 1 },
2977 /* 2 : R+EN/AN+ON */ { 0x23, 0, 1, 1, 2, 0x40, 0, 0 },
2978 /* 3 : L */ { 3 , 0, 3, 0x36, 0x14, 0x40, 0, 1 },
2979 /* 4 : L+ON */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 0 },
2980 /* 5 : L+ON+EN */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 1 },
2981 /* 6 : L+AN */ { 0x53, 0x40, 6, 6, 4, 0x40, 0x40, 3 }
2983 private static final short impAct2[] = {0,1,7,8,9,10};
2984 private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT_WITH_MARKS =
2985 new ImpTabPair(impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS,
2986 impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct0, impAct2);
2988 private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL = new ImpTabPair(
2989 impTabL_NUMBERS_SPECIAL, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1);
2991 private static final byte impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS[][] = {
2992 /* The case handled in this table is (visually): R EN L
2994 /* L, R, EN, AN, ON, S, B, Res */
2995 /* 0 : init */ { 0, 0x62, 1, 1, 0, 0, 0, 0 },
2996 /* 1 : L+EN/AN */ { 0, 0x62, 1, 1, 0, 0x30, 0, 4 },
2997 /* 2 : R */ { 0, 0x62, 0x54, 0x54, 0x13, 0x30, 0, 3 },
2998 /* 3 : R+ON */ { 0x30, 0x42, 0x54, 0x54, 3, 0x30, 0x30, 3 },
2999 /* 4 : R+EN/AN */ { 0x30, 0x42, 4, 4, 0x13, 0x30, 0x30, 4 }
3001 private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS = new
3002 ImpTabPair(impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS,
3003 impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct0, impAct2);
3005 private static class LevState {
3006 byte[][] impTab; /* level table pointer */
3007 short[] impAct; /* action map array */
3008 int startON; /* start of ON sequence */
3009 int startL2EN; /* start of level 2 sequence */
3010 int lastStrongRTL; /* index of last found R or AL */
3011 int runStart; /* start position of the run */
3012 short state; /* current state */
3013 byte runLevel; /* run level before implicit solving */
3016 /*------------------------------------------------------------------------*/
3018 static final int FIRSTALLOC = 10;
3020 * param pos: position where to insert
3021 * param flag: one of LRM_BEFORE, LRM_AFTER, RLM_BEFORE, RLM_AFTER
3023 private void addPoint(int pos, int flag)
3025 Point point = new Point();
3027 int len = insertPoints.points.length;
3029 insertPoints.points = new Point[FIRSTALLOC];
3032 if (insertPoints.size >= len) { /* no room for new point */
3033 Point[] savePoints = insertPoints.points;
3034 insertPoints.points = new Point[len * 2];
3035 System.arraycopy(savePoints, 0, insertPoints.points, 0, len);
3039 insertPoints.points[insertPoints.size] = point;
3040 insertPoints.size++;
3043 /* perform rules (Wn), (Nn), and (In) on a run of the text ------------------ */
3046 * This implementation of the (Wn) rules applies all rules in one pass.
3047 * In order to do so, it needs a look-ahead of typically 1 character
3048 * (except for W5: sequences of ET) and keeps track of changes
3049 * in a rule Wp that affect a later Wq (p<q).
3051 * The (Nn) and (In) rules are also performed in that same single loop,
3052 * but effectively one iteration behind for white space.
3054 * Since all implicit rules are performed in one step, it is not necessary
3055 * to actually store the intermediate directional properties in dirProps[].
3058 private void processPropertySeq(LevState levState, short _prop,
3059 int start, int limit) {
3061 byte[][] impTab = levState.impTab;
3062 short[] impAct = levState.impAct;
3063 short oldStateSeq,actionSeq;
3064 byte level, addLevel;
3067 start0 = start; /* save original start position */
3068 oldStateSeq = levState.state;
3069 cell = impTab[oldStateSeq][_prop];
3070 levState.state = GetState(cell); /* isolate the new state */
3071 actionSeq = impAct[GetAction(cell)]; /* isolate the action */
3072 addLevel = impTab[levState.state][IMPTABLEVELS_RES];
3074 if (actionSeq != 0) {
3075 switch (actionSeq) {
3076 case 1: /* init ON seq */
3077 levState.startON = start0;
3080 case 2: /* prepend ON seq to current seq */
3081 start = levState.startON;
3084 case 3: /* L or S after possible relevant EN/AN */
3085 /* check if we had EN after R/AL */
3086 if (levState.startL2EN >= 0) {
3087 addPoint(levState.startL2EN, LRM_BEFORE);
3089 levState.startL2EN = -1; /* not within previous if since could also be -2 */
3090 /* check if we had any relevant EN/AN after R/AL */
3091 if ((insertPoints.points.length == 0) ||
3092 (insertPoints.size <= insertPoints.confirmed)) {
3093 /* nothing, just clean up */
3094 levState.lastStrongRTL = -1;
3095 /* check if we have a pending conditional segment */
3096 level = impTab[oldStateSeq][IMPTABLEVELS_RES];
3097 if ((level & 1) != 0 && levState.startON > 0) { /* after ON */
3098 start = levState.startON; /* reset to basic run level */
3100 if (_prop == _S) { /* add LRM before S */
3101 addPoint(start0, LRM_BEFORE);
3102 insertPoints.confirmed = insertPoints.size;
3106 /* reset previous RTL cont to level for LTR text */
3107 for (k = levState.lastStrongRTL + 1; k < start0; k++) {
3108 /* reset odd level, leave runLevel+2 as is */
3109 levels[k] = (byte)((levels[k] - 2) & ~1);
3111 /* mark insert points as confirmed */
3112 insertPoints.confirmed = insertPoints.size;
3113 levState.lastStrongRTL = -1;
3114 if (_prop == _S) { /* add LRM before S */
3115 addPoint(start0, LRM_BEFORE);
3116 insertPoints.confirmed = insertPoints.size;
3120 case 4: /* R/AL after possible relevant EN/AN */
3122 if (insertPoints.points.length > 0)
3123 /* remove all non confirmed insert points */
3124 insertPoints.size = insertPoints.confirmed;
3125 levState.startON = -1;
3126 levState.startL2EN = -1;
3127 levState.lastStrongRTL = limit - 1;
3130 case 5: /* EN/AN after R/AL + possible cont */
3131 /* check for real AN */
3132 if ((_prop == _AN) && (dirProps[start0] == AN) &&
3133 (reorderingMode != REORDER_INVERSE_FOR_NUMBERS_SPECIAL))
3136 if (levState.startL2EN == -1) { /* if no relevant EN already found */
3137 /* just note the rightmost digit as a strong RTL */
3138 levState.lastStrongRTL = limit - 1;
3141 if (levState.startL2EN >= 0) { /* after EN, no AN */
3142 addPoint(levState.startL2EN, LRM_BEFORE);
3143 levState.startL2EN = -2;
3146 addPoint(start0, LRM_BEFORE);
3149 /* if first EN/AN after R/AL */
3150 if (levState.startL2EN == -1) {
3151 levState.startL2EN = start0;
3155 case 6: /* note location of latest R/AL */
3156 levState.lastStrongRTL = limit - 1;
3157 levState.startON = -1;
3160 case 7: /* L after R+ON/EN/AN */
3161 /* include possible adjacent number on the left */
3162 for (k = start0-1; k >= 0 && ((levels[k] & 1) == 0); k--) {
3165 addPoint(k, RLM_BEFORE); /* add RLM before */
3166 insertPoints.confirmed = insertPoints.size; /* confirm it */
3168 levState.startON = start0;
3171 case 8: /* AN after L */
3172 /* AN numbers between L text on both sides may be trouble. */
3173 /* tentatively bracket with LRMs; will be confirmed if followed by L */
3174 addPoint(start0, LRM_BEFORE); /* add LRM before */
3175 addPoint(start0, LRM_AFTER); /* add LRM after */
3178 case 9: /* R after L+ON/EN/AN */
3179 /* false alert, infirm LRMs around previous AN */
3180 insertPoints.size=insertPoints.confirmed;
3181 if (_prop == _S) { /* add RLM before S */
3182 addPoint(start0, RLM_BEFORE);
3183 insertPoints.confirmed = insertPoints.size;
3187 case 10: /* L after L+ON/AN */
3188 level = (byte)(levState.runLevel + addLevel);
3189 for (k=levState.startON; k < start0; k++) {
3190 if (levels[k] < level) {
3194 insertPoints.confirmed = insertPoints.size; /* confirm inserts */
3195 levState.startON = start0;
3198 case 11: /* L after L+ON+EN/AN/ON */
3199 level = levState.runLevel;
3200 for (k = start0-1; k >= levState.startON; k--) {
3201 if (levels[k] == level+3) {
3202 while (levels[k] == level+3) {
3205 while (levels[k] == level) {
3209 if (levels[k] == level+2) {
3213 levels[k] = (byte)(level+1);
3217 case 12: /* R after L+ON+EN/AN/ON */
3218 level = (byte)(levState.runLevel+1);
3219 for (k = start0-1; k >= levState.startON; k--) {
3220 if (levels[k] > level) {
3226 default: /* we should never get here */
3227 throw new IllegalStateException("Internal ICU error in processPropertySeq");
3230 if ((addLevel) != 0 || (start < start0)) {
3231 level = (byte)(levState.runLevel + addLevel);
3232 if (start >= levState.runStart) {
3233 for (k = start; k < limit; k++) {
3238 int isolateCount = 0;
3239 for (k = start; k < limit; k++) {
3240 dirProp = dirProps[k];
3243 if (isolateCount == 0)
3245 if (dirProp == LRI || dirProp == RLI)
3253 * Returns the directionality of the last strong character at the end of the prologue, if any.
3254 * Requires prologue!=null.
3256 private byte lastL_R_AL() {
3257 for (int i = prologue.length(); i > 0; ) {
3258 int uchar = prologue.codePointBefore(i);
3259 i -= Character.charCount(uchar);
3260 byte dirProp = (byte)getCustomizedClass(uchar);
3264 if (dirProp == R || dirProp == AL) {
3275 * Returns the directionality of the first strong character, or digit, in the epilogue, if any.
3276 * Requires epilogue!=null.
3278 private byte firstL_R_AL_EN_AN() {
3279 for (int i = 0; i < epilogue.length(); ) {
3280 int uchar = epilogue.codePointAt(i);
3281 i += Character.charCount(uchar);
3282 byte dirProp = (byte)getCustomizedClass(uchar);
3286 if (dirProp == R || dirProp == AL) {
3289 if (dirProp == EN) {
3292 if (dirProp == AN) {
3299 private void resolveImplicitLevels(int start, int limit, short sor, short eor)
3302 LevState levState = new LevState();
3303 int i, start1, start2;
3304 short oldStateImp, stateImp, actionImp;
3305 short gprop, resProp, cell;
3307 short nextStrongProp = R;
3308 int nextStrongPos = -1;
3310 /* check for RTL inverse Bidi mode */
3311 /* FOOD FOR THOUGHT: in case of RTL inverse Bidi, it would make sense to
3312 * loop on the text characters from end to start.
3313 * This would need a different properties state table (at least different
3314 * actions) and different levels state tables (maybe very similar to the
3315 * LTR corresponding ones.
3317 inverseRTL=((start<lastArabicPos) && ((GetParaLevelAt(start) & 1)>0) &&
3318 (reorderingMode == REORDER_INVERSE_LIKE_DIRECT ||
3319 reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL));
3320 /* initialize for property and levels state table */
3321 levState.startON = -1;
3322 levState.startL2EN = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */
3323 levState.lastStrongRTL = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */
3324 levState.runStart = start;
3325 levState.runLevel = levels[start];
3326 levState.impTab = impTabPair.imptab[levState.runLevel & 1];
3327 levState.impAct = impTabPair.impact[levState.runLevel & 1];
3328 if (start == 0 && prologue != null) {
3329 byte lastStrong = lastL_R_AL();
3330 if (lastStrong != _ON) {
3334 /* The isolates[] entries contain enough information to
3335 resume the bidi algorithm in the same state as it was
3336 when it was interrupted by an isolate sequence. */
3337 if (dirProps[start] == PDI) {
3338 start1 = isolates[isolateCount].start1;
3339 stateImp = isolates[isolateCount].stateImp;
3340 levState.state = isolates[isolateCount].state;
3344 if (dirProps[start] == NSM)
3345 stateImp = (short)(1 + sor);
3349 processPropertySeq(levState, sor, start, start);
3353 for (i = start; i <= limit; i++) {
3355 if (limit > start) {
3356 dirProp = dirProps[limit - 1];
3357 if (dirProp == LRI || dirProp == RLI)
3358 break; /* no forced closing for sequence ending with LRI/RLI */
3363 prop = PureDirProp(dirProps[i]);
3366 /* AL before EN does not make it AN */
3368 } else if (prop == EN) {
3369 if (nextStrongPos <= i) {
3370 /* look for next strong char (L/R/AL) */
3372 nextStrongProp = R; /* set default */
3373 nextStrongPos = limit;
3374 for (j = i+1; j < limit; j++) {
3375 prop1 = dirProps[j];
3376 if (prop1 == L || prop1 == R || prop1 == AL) {
3377 nextStrongProp = prop1;
3383 if (nextStrongProp == AL) {
3388 gprop = groupProp[prop];
3390 oldStateImp = stateImp;
3391 cell = impTabProps[oldStateImp][gprop];
3392 stateImp = GetStateProps(cell); /* isolate the new state */
3393 actionImp = GetActionProps(cell); /* isolate the action */
3394 if ((i == limit) && (actionImp == 0)) {
3395 /* there is an unprocessed sequence if its property == eor */
3396 actionImp = 1; /* process the last sequence */
3398 if (actionImp != 0) {
3399 resProp = impTabProps[oldStateImp][IMPTABPROPS_RES];
3400 switch (actionImp) {
3401 case 1: /* process current seq1, init new seq1 */
3402 processPropertySeq(levState, resProp, start1, i);
3405 case 2: /* init new seq2 */
3408 case 3: /* process seq1, process seq2, init new seq1 */
3409 processPropertySeq(levState, resProp, start1, start2);
3410 processPropertySeq(levState, _ON, start2, i);
3413 case 4: /* process seq1, set seq1=seq2, init new seq2 */
3414 processPropertySeq(levState, resProp, start1, start2);
3418 default: /* we should never get here */
3419 throw new IllegalStateException("Internal ICU error in resolveImplicitLevels");
3424 /* flush possible pending sequence, e.g. ON */
3425 if (limit == length && epilogue != null) {
3426 byte firstStrong = firstL_R_AL_EN_AN();
3427 if (firstStrong != _ON) {
3432 dirProp = dirProps[limit - 1];
3433 if ((dirProp == LRI || dirProp == RLI) && limit < length) {
3435 if (isolates[isolateCount] == null)
3436 isolates[isolateCount] = new Isolate();
3437 isolates[isolateCount].stateImp = stateImp;
3438 isolates[isolateCount].state = levState.state;
3439 isolates[isolateCount].start1 = start1;
3442 processPropertySeq(levState, eor, limit, limit);
3445 /* perform (L1) and (X9) ---------------------------------------------------- */
3448 * Reset the embedding levels for some non-graphic characters (L1).
3449 * This method also sets appropriate levels for BN, and
3450 * explicit embedding types that are supposed to have been removed
3451 * from the paragraph in (X9).
3453 private void adjustWSLevels() {
3456 if ((flags & MASK_WS) != 0) {
3458 i = trailingWSStart;
3460 /* reset a sequence of WS/BN before eop and B/S to the paragraph paraLevel */
3461 while (i > 0 && ((flag = DirPropFlag(PureDirProp(dirProps[--i]))) & MASK_WS) != 0) {
3462 if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) {
3465 levels[i] = GetParaLevelAt(i);
3469 /* reset BN to the next character's paraLevel until B/S, which restarts above loop */
3470 /* here, i+1 is guaranteed to be <length */
3472 flag = DirPropFlag(PureDirProp(dirProps[--i]));
3473 if ((flag & MASK_BN_EXPLICIT) != 0) {
3474 levels[i] = levels[i + 1];
3475 } else if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) {
3478 } else if ((flag & MASK_B_S) != 0){
3479 levels[i] = GetParaLevelAt(i);
3488 * Set the context before a call to setPara().<p>
3490 * setPara() computes the left-right directionality for a given piece
3491 * of text which is supplied as one of its arguments. Sometimes this piece
3492 * of text (the "main text") should be considered in context, because text
3493 * appearing before ("prologue") and/or after ("epilogue") the main text
3494 * may affect the result of this computation.<p>
3496 * This function specifies the prologue and/or the epilogue for the next
3497 * call to setPara(). If successive calls to setPara()
3498 * all need specification of a context, setContext() must be called
3499 * before each call to setPara(). In other words, a context is not
3500 * "remembered" after the following successful call to setPara().<p>
3502 * If a call to setPara() specifies DEFAULT_LTR or
3503 * DEFAULT_RTL as paraLevel and is preceded by a call to
3504 * setContext() which specifies a prologue, the paragraph level will
3505 * be computed taking in consideration the text in the prologue.<p>
3507 * When setPara() is called without a previous call to
3508 * setContext, the main text is handled as if preceded and followed
3509 * by strong directional characters at the current paragraph level.
3510 * Calling setContext() with specification of a prologue will change
3511 * this behavior by handling the main text as if preceded by the last
3512 * strong character appearing in the prologue, if any.
3513 * Calling setContext() with specification of an epilogue will change
3514 * the behavior of setPara() by handling the main text as if followed
3515 * by the first strong character or digit appearing in the epilogue, if any.<p>
3517 * Note 1: if <code>setContext</code> is called repeatedly without
3518 * calling <code>setPara</code>, the earlier calls have no effect,
3519 * only the last call will be remembered for the next call to
3520 * <code>setPara</code>.<p>
3522 * Note 2: calling <code>setContext(null, null)</code>
3523 * cancels any previous setting of non-empty prologue or epilogue.
3524 * The next call to <code>setPara()</code> will process no
3525 * prologue or epilogue.<p>
3527 * Note 3: users must be aware that even after setting the context
3528 * before a call to setPara() to perform e.g. a logical to visual
3529 * transformation, the resulting string may not be identical to what it
3530 * would have been if all the text, including prologue and epilogue, had
3531 * been processed together.<br>
3532 * Example (upper case letters represent RTL characters):<br>
3533 * prologue = "<code>abc DE</code>"<br>
3534 * epilogue = none<br>
3535 * main text = "<code>FGH xyz</code>"<br>
3536 * paraLevel = LTR<br>
3537 * display without prologue = "<code>HGF xyz</code>"
3538 * ("HGF" is adjacent to "xyz")<br>
3539 * display with prologue = "<code>abc HGFED xyz</code>"
3540 * ("HGF" is not adjacent to "xyz")<br>
3542 * @param prologue is the text which precedes the text that
3543 * will be specified in a coming call to setPara().
3544 * If there is no prologue to consider,
3545 * this parameter can be <code>null</code>.
3547 * @param epilogue is the text which follows the text that
3548 * will be specified in a coming call to setPara().
3549 * If there is no epilogue to consider,
3550 * this parameter can be <code>null</code>.
3555 public void setContext(String prologue, String epilogue) {
3556 this.prologue = prologue != null && prologue.length() > 0 ? prologue : null;
3557 this.epilogue = epilogue != null && epilogue.length() > 0 ? epilogue : null;
3560 private void setParaSuccess() {
3561 prologue = null; /* forget the last context */
3563 paraBidi = this; /* mark successful setPara */
3566 int Bidi_Min(int x, int y) {
3567 return x < y ? x : y;
3570 int Bidi_Abs(int x) {
3571 return x >= 0 ? x : -x;
3574 void setParaRunsOnly(char[] parmText, byte parmParaLevel) {
3577 int saveLength, saveTrailingWSStart;
3580 int i, j, visualStart, logicalStart,
3581 oldRunCount, runLength, addedRuns, insertRemove,
3582 start, limit, step, indexOddBit, logicalPos,
3586 reorderingMode = REORDER_DEFAULT;
3587 int parmLength = parmText.length;
3588 if (parmLength == 0) {
3589 setPara(parmText, parmParaLevel, null);
3590 reorderingMode = REORDER_RUNS_ONLY;
3593 /* obtain memory for mapping table and visual text */
3594 saveOptions = reorderingOptions;
3595 if ((saveOptions & OPTION_INSERT_MARKS) > 0) {
3596 reorderingOptions &= ~OPTION_INSERT_MARKS;
3597 reorderingOptions |= OPTION_REMOVE_CONTROLS;
3599 parmParaLevel &= 1; /* accept only 0 or 1 */
3600 setPara(parmText, parmParaLevel, null);
3601 /* we cannot access directly levels since it is not yet set if
3602 * direction is not MIXED
3604 saveLevels = new byte[this.length];
3605 System.arraycopy(getLevels(), 0, saveLevels, 0, this.length);
3606 saveTrailingWSStart = trailingWSStart;
3608 /* FOOD FOR THOUGHT: instead of writing the visual text, we could use
3609 * the visual map and the dirProps array to drive the second call
3610 * to setPara (but must make provision for possible removal of
3611 * Bidi controls. Alternatively, only use the dirProps array via
3612 * customized classifier callback.
3614 visualText = writeReordered(DO_MIRRORING);
3615 visualMap = getVisualMap();
3616 this.reorderingOptions = saveOptions;
3617 saveLength = this.length;
3618 saveDirection=this.direction;
3620 this.reorderingMode = REORDER_INVERSE_LIKE_DIRECT;
3622 setPara(visualText, parmParaLevel, null);
3623 BidiLine.getRuns(this);
3624 /* check if some runs must be split, count how many splits */
3626 oldRunCount = this.runCount;
3628 for (i = 0; i < oldRunCount; i++, visualStart += runLength) {
3629 runLength = runs[i].limit - visualStart;
3630 if (runLength < 2) {
3633 logicalStart = runs[i].start;
3634 for (j = logicalStart+1; j < logicalStart+runLength; j++) {
3635 index = visualMap[j];
3636 index1 = visualMap[j-1];
3637 if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) {
3642 if (addedRuns > 0) {
3643 getRunsMemory(oldRunCount + addedRuns);
3644 if (runCount == 1) {
3645 /* because we switch from UBiDi.simpleRuns to UBiDi.runs */
3646 runsMemory[0] = runs[0];
3648 System.arraycopy(runs, 0, runsMemory, 0, runCount);
3651 runCount += addedRuns;
3652 for (i = oldRunCount; i < runCount; i++) {
3653 if (runs[i] == null) {
3654 runs[i] = new BidiRun(0, 0, (byte)0);
3658 /* split runs which are not consecutive in source text */
3660 for (i = oldRunCount-1; i >= 0; i--) {
3661 newI = i + addedRuns;
3662 runLength = i==0 ? runs[0].limit :
3663 runs[i].limit - runs[i-1].limit;
3664 logicalStart = runs[i].start;
3665 indexOddBit = runs[i].level & 1;
3666 if (runLength < 2) {
3667 if (addedRuns > 0) {
3668 runs[newI].copyFrom(runs[i]);
3670 logicalPos = visualMap[logicalStart];
3671 runs[newI].start = logicalPos;
3672 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);
3675 if (indexOddBit > 0) {
3676 start = logicalStart;
3677 limit = logicalStart + runLength - 1;
3680 start = logicalStart + runLength - 1;
3681 limit = logicalStart;
3684 for (j = start; j != limit; j += step) {
3685 index = visualMap[j];
3686 index1 = visualMap[j+step];
3687 if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) {
3688 logicalPos = Bidi_Min(visualMap[start], index);
3689 runs[newI].start = logicalPos;
3690 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);
3691 runs[newI].limit = runs[i].limit;
3692 runs[i].limit -= Bidi_Abs(j - start) + 1;
3693 insertRemove = runs[i].insertRemove & (LRM_AFTER|RLM_AFTER);
3694 runs[newI].insertRemove = insertRemove;
3695 runs[i].insertRemove &= ~insertRemove;
3701 if (addedRuns > 0) {
3702 runs[newI].copyFrom(runs[i]);
3704 logicalPos = Bidi_Min(visualMap[start], visualMap[limit]);
3705 runs[newI].start = logicalPos;
3706 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);
3710 /* restore initial paraLevel */
3711 this.paraLevel ^= 1;
3713 /* restore real text */
3714 this.text = parmText;
3715 this.length = saveLength;
3716 this.originalLength = parmLength;
3717 this.direction=saveDirection;
3718 this.levels = saveLevels;
3719 this.trailingWSStart = saveTrailingWSStart;
3721 this.direction = MIXED;
3724 this.reorderingMode = REORDER_RUNS_ONLY;
3728 * Perform the Unicode Bidi algorithm. It is defined in the
3729 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>,
3731 * also described in The Unicode Standard, Version 4.0 .<p>
3733 * This method takes a piece of plain text containing one or more paragraphs,
3734 * with or without externally specified embedding levels from <i>styled</i>
3735 * text and computes the left-right-directionality of each character.<p>
3737 * If the entire text is all of the same directionality, then
3738 * the method may not perform all the steps described by the algorithm,
3739 * i.e., some levels may not be the same as if all steps were performed.
3740 * This is not relevant for unidirectional text.<br>
3741 * For example, in pure LTR text with numbers the numbers would get
3742 * a resolved level of 2 higher than the surrounding text according to
3743 * the algorithm. This implementation may set all resolved levels to
3744 * the same value in such a case.<p>
3746 * The text can be composed of multiple paragraphs. Occurrence of a block
3747 * separator in the text terminates a paragraph, and whatever comes next starts
3748 * a new paragraph. The exception to this rule is when a Carriage Return (CR)
3749 * is followed by a Line Feed (LF). Both CR and LF are block separators, but
3750 * in that case, the pair of characters is considered as terminating the
3751 * preceding paragraph, and a new paragraph will be started by a character
3752 * coming after the LF.
3754 * Although the text is passed here as a <code>String</code>, it is
3755 * stored internally as an array of characters. Therefore the
3756 * documentation will refer to indexes of the characters in the text.
3758 * @param text contains the text that the Bidi algorithm will be performed
3759 * on. This text can be retrieved with <code>getText()</code> or
3760 * <code>getTextAsString</code>.<br>
3762 * @param paraLevel specifies the default level for the text;
3763 * it is typically 0 (LTR) or 1 (RTL).
3764 * If the method shall determine the paragraph level from the text,
3765 * then <code>paraLevel</code> can be set to
3766 * either <code>LEVEL_DEFAULT_LTR</code>
3767 * or <code>LEVEL_DEFAULT_RTL</code>; if the text contains multiple
3768 * paragraphs, the paragraph level shall be determined separately for
3769 * each paragraph; if a paragraph does not include any strongly typed
3770 * character, then the desired default is used (0 for LTR or 1 for RTL).
3771 * Any other value between 0 and <code>MAX_EXPLICIT_LEVEL</code>
3772 * is also valid, with odd levels indicating RTL.
3774 * @param embeddingLevels (in) may be used to preset the embedding and override levels,
3775 * ignoring characters like LRE and PDF in the text.
3776 * A level overrides the directional property of its corresponding
3777 * (same index) character if the level has the
3778 * <code>LEVEL_OVERRIDE</code> bit set.<br><br>
3779 * Except for that bit, it must be
3780 * <code>paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL</code>,
3781 * with one exception: a level of zero may be specified for a
3782 * paragraph separator even if <code>paraLevel>0</code> when multiple
3783 * paragraphs are submitted in the same call to <code>setPara()</code>.<br><br>
3784 * <strong>Caution: </strong>A reference to this array, not a copy
3785 * of the levels, will be stored in the <code>Bidi</code> object;
3786 * the <code>embeddingLevels</code>
3787 * should not be modified to avoid unexpected results on subsequent
3788 * Bidi operations. However, the <code>setPara()</code> and
3789 * <code>setLine()</code> methods may modify some or all of the
3791 * <strong>Note:</strong> the <code>embeddingLevels</code> array must
3792 * have one entry for each character in <code>text</code>.
3794 * @throws IllegalArgumentException if the values in embeddingLevels are
3795 * not within the allowed range
3797 * @see #LEVEL_DEFAULT_LTR
3798 * @see #LEVEL_DEFAULT_RTL
3799 * @see #LEVEL_OVERRIDE
3800 * @see #MAX_EXPLICIT_LEVEL
3803 public void setPara(String text, byte paraLevel, byte[] embeddingLevels)
3806 setPara(new char[0], paraLevel, embeddingLevels);
3808 setPara(text.toCharArray(), paraLevel, embeddingLevels);
3813 * Perform the Unicode Bidi algorithm. It is defined in the
3814 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>,
3816 * also described in The Unicode Standard, Version 4.0 .<p>
3818 * This method takes a piece of plain text containing one or more paragraphs,
3819 * with or without externally specified embedding levels from <i>styled</i>
3820 * text and computes the left-right-directionality of each character.<p>
3822 * If the entire text is all of the same directionality, then
3823 * the method may not perform all the steps described by the algorithm,
3824 * i.e., some levels may not be the same as if all steps were performed.
3825 * This is not relevant for unidirectional text.<br>
3826 * For example, in pure LTR text with numbers the numbers would get
3827 * a resolved level of 2 higher than the surrounding text according to
3828 * the algorithm. This implementation may set all resolved levels to
3829 * the same value in such a case.<p>
3831 * The text can be composed of multiple paragraphs. Occurrence of a block
3832 * separator in the text terminates a paragraph, and whatever comes next starts
3833 * a new paragraph. The exception to this rule is when a Carriage Return (CR)
3834 * is followed by a Line Feed (LF). Both CR and LF are block separators, but
3835 * in that case, the pair of characters is considered as terminating the
3836 * preceding paragraph, and a new paragraph will be started by a character
3837 * coming after the LF.
3839 * The text is stored internally as an array of characters. Therefore the
3840 * documentation will refer to indexes of the characters in the text.
3842 * @param chars contains the text that the Bidi algorithm will be performed
3843 * on. This text can be retrieved with <code>getText()</code> or
3844 * <code>getTextAsString</code>.<br>
3846 * @param paraLevel specifies the default level for the text;
3847 * it is typically 0 (LTR) or 1 (RTL).
3848 * If the method shall determine the paragraph level from the text,
3849 * then <code>paraLevel</code> can be set to
3850 * either <code>LEVEL_DEFAULT_LTR</code>
3851 * or <code>LEVEL_DEFAULT_RTL</code>; if the text contains multiple
3852 * paragraphs, the paragraph level shall be determined separately for
3853 * each paragraph; if a paragraph does not include any strongly typed
3854 * character, then the desired default is used (0 for LTR or 1 for RTL).
3855 * Any other value between 0 and <code>MAX_EXPLICIT_LEVEL</code>
3856 * is also valid, with odd levels indicating RTL.
3858 * @param embeddingLevels (in) may be used to preset the embedding and
3859 * override levels, ignoring characters like LRE and PDF in the text.
3860 * A level overrides the directional property of its corresponding
3861 * (same index) character if the level has the
3862 * <code>LEVEL_OVERRIDE</code> bit set.<br><br>
3863 * Except for that bit, it must be
3864 * <code>paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL</code>,
3865 * with one exception: a level of zero may be specified for a
3866 * paragraph separator even if <code>paraLevel>0</code> when multiple
3867 * paragraphs are submitted in the same call to <code>setPara()</code>.<br><br>
3868 * <strong>Caution: </strong>A reference to this array, not a copy
3869 * of the levels, will be stored in the <code>Bidi</code> object;
3870 * the <code>embeddingLevels</code>
3871 * should not be modified to avoid unexpected results on subsequent
3872 * Bidi operations. However, the <code>setPara()</code> and
3873 * <code>setLine()</code> methods may modify some or all of the
3875 * <strong>Note:</strong> the <code>embeddingLevels</code> array must
3876 * have one entry for each character in <code>text</code>.
3878 * @throws IllegalArgumentException if the values in embeddingLevels are
3879 * not within the allowed range
3881 * @see #LEVEL_DEFAULT_LTR
3882 * @see #LEVEL_DEFAULT_RTL
3883 * @see #LEVEL_OVERRIDE
3884 * @see #MAX_EXPLICIT_LEVEL
3887 public void setPara(char[] chars, byte paraLevel, byte[] embeddingLevels)
3889 /* check the argument values */
3890 if (paraLevel < LEVEL_DEFAULT_LTR) {
3891 verifyRange(paraLevel, 0, MAX_EXPLICIT_LEVEL + 1);
3893 if (chars == null) {
3894 chars = new char[0];
3897 /* special treatment for RUNS_ONLY mode */
3898 if (reorderingMode == REORDER_RUNS_ONLY) {
3899 setParaRunsOnly(chars, paraLevel);
3903 /* initialize the Bidi object */
3904 this.paraBidi = null; /* mark unfinished setPara */
3906 this.length = this.originalLength = this.resultLength = text.length;
3907 this.paraLevel = paraLevel;
3908 this.direction = (byte)(paraLevel & 1);
3911 /* Allocate zero-length arrays instead of setting to null here; then
3912 * checks for null in various places can be eliminated.
3914 dirProps = new byte[0];
3915 levels = new byte[0];
3916 runs = new BidiRun[0];
3917 isGoodLogicalToVisualRunsMap = false;
3918 insertPoints.size = 0; /* clean up from last call */
3919 insertPoints.confirmed = 0; /* clean up from last call */
3922 * Save the original paraLevel if contextual; otherwise, set to 0.
3924 defaultParaLevel = IsDefaultLevel(paraLevel) ? paraLevel : 0;
3928 * For an empty paragraph, create a Bidi object with the paraLevel and
3929 * the flags and the direction set but without allocating zero-length arrays.
3930 * There is nothing more to do.
3932 if (IsDefaultLevel(paraLevel)) {
3933 this.paraLevel &= 1;
3934 defaultParaLevel = 0;
3936 flags = DirPropFlagLR(paraLevel);
3946 * Get the directional properties,
3947 * the flags bit-set, and
3948 * determine the paragraph level if necessary.
3950 getDirPropsMemory(length);
3951 dirProps = dirPropsMemory;
3953 /* the processed length may have changed if OPTION_STREAMING is set */
3954 trailingWSStart = length; /* the levels[] will reflect the WS run */
3956 /* are explicit levels specified? */
3957 if (embeddingLevels == null) {
3958 /* no: determine explicit levels according to the (Xn) rules */
3959 getLevelsMemory(length);
3960 levels = levelsMemory;
3961 direction = resolveExplicitLevels();
3963 /* set BN for all explicit codes, check that all levels are 0 or paraLevel..MAX_EXPLICIT_LEVEL */
3964 levels = embeddingLevels;
3965 direction = checkExplicitLevels();
3968 /* allocate isolate memory */
3969 if (isolateCount > 0) {
3970 if (isolates == null || isolates.length < isolateCount)
3971 isolates = new Isolate[isolateCount + 3]; /* keep some reserve */
3973 isolateCount = -1; /* current isolates stack entry == none */
3976 * The steps after (X9) in the Bidi algorithm are performed only if
3977 * the paragraph text has mixed directionality!
3979 switch (direction) {
3981 /* make sure paraLevel is even */
3982 paraLevel = (byte)((paraLevel + 1) & ~1);
3984 /* all levels are implicitly at paraLevel (important for getLevels()) */
3985 trailingWSStart = 0;
3988 /* make sure paraLevel is odd */
3991 /* all levels are implicitly at paraLevel (important for getLevels()) */
3992 trailingWSStart = 0;
3996 * Choose the right implicit state table
3998 switch(reorderingMode) {
3999 case REORDER_DEFAULT:
4000 this.impTabPair = impTab_DEFAULT;
4002 case REORDER_NUMBERS_SPECIAL:
4003 this.impTabPair = impTab_NUMBERS_SPECIAL;
4005 case REORDER_GROUP_NUMBERS_WITH_R:
4006 this.impTabPair = impTab_GROUP_NUMBERS_WITH_R;
4008 case REORDER_RUNS_ONLY:
4009 /* we should never get here */
4010 throw new InternalError("Internal ICU error in setPara");
4012 case REORDER_INVERSE_NUMBERS_AS_L:
4013 this.impTabPair = impTab_INVERSE_NUMBERS_AS_L;
4015 case REORDER_INVERSE_LIKE_DIRECT:
4016 if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) {
4017 this.impTabPair = impTab_INVERSE_LIKE_DIRECT_WITH_MARKS;
4019 this.impTabPair = impTab_INVERSE_LIKE_DIRECT;
4022 case REORDER_INVERSE_FOR_NUMBERS_SPECIAL:
4023 if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) {
4024 this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS;
4026 this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL;
4031 * If there are no external levels specified and there
4032 * are no significant explicit level codes in the text,
4033 * then we can treat the entire paragraph as one run.
4034 * Otherwise, we need to perform the following rules on runs of
4035 * the text with the same embedding levels. (X10)
4036 * "Significant" explicit level codes are ones that actually
4037 * affect non-BN characters.
4038 * Examples for "insignificant" ones are empty embeddings
4039 * LRE-PDF, LRE-RLE-PDF-PDF, etc.
4041 if (embeddingLevels == null && paraCount <= 1 &&
4042 (flags & DirPropFlagMultiRuns) == 0) {
4043 resolveImplicitLevels(0, length,
4044 GetLRFromLevel(GetParaLevelAt(0)),
4045 GetLRFromLevel(GetParaLevelAt(length - 1)));
4047 /* sor, eor: start and end types of same-level-run */
4048 int start, limit = 0;
4049 byte level, nextLevel;
4052 /* determine the first sor and set eor to it because of the loop body (sor=eor there) */
4053 level = GetParaLevelAt(0);
4054 nextLevel = levels[0];
4055 if (level < nextLevel) {
4056 eor = GetLRFromLevel(nextLevel);
4058 eor = GetLRFromLevel(level);
4062 /* determine start and limit of the run (end points just behind the run) */
4064 /* the values for this run's start are the same as for the previous run's end */
4067 if ((start > 0) && (dirProps[start - 1] == B)) {
4068 /* except if this is a new paragraph, then set sor = para level */
4069 sor = GetLRFromLevel(GetParaLevelAt(start));
4074 /* search for the limit of this run */
4075 while (++limit < length && levels[limit] == level) {}
4077 /* get the correct level of the next run */
4078 if (limit < length) {
4079 nextLevel = levels[limit];
4081 nextLevel = GetParaLevelAt(length - 1);
4084 /* determine eor from max(level, nextLevel); sor is last run's eor */
4085 if ((level & ~LEVEL_OVERRIDE) < (nextLevel & ~LEVEL_OVERRIDE)) {
4086 eor = GetLRFromLevel(nextLevel);
4088 eor = GetLRFromLevel(level);
4091 /* if the run consists of overridden directional types, then there
4092 are no implicit types to be resolved */
4093 if ((level & LEVEL_OVERRIDE) == 0) {
4094 resolveImplicitLevels(start, limit, sor, eor);
4096 /* remove the LEVEL_OVERRIDE flags */
4098 levels[start++] &= ~LEVEL_OVERRIDE;
4099 } while (start < limit);
4101 } while (limit < length);
4104 /* reset the embedding levels for some non-graphic characters (L1), (X9) */
4109 /* add RLM for inverse Bidi with contextual orientation resolving
4110 * to RTL which would not round-trip otherwise
4112 if ((defaultParaLevel > 0) &&
4113 ((reorderingOptions & OPTION_INSERT_MARKS) != 0) &&
4114 ((reorderingMode == REORDER_INVERSE_LIKE_DIRECT) ||
4115 (reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL))) {
4119 for (int i = 0; i < paraCount; i++) {
4120 last = paras_limit[i] - 1;
4121 level = paras_level[i];
4123 continue; /* LTR paragraph */
4124 start = i == 0 ? 0 : paras_limit[i - 1];
4125 for (int j = last; j >= start; j--) {
4126 dirProp = dirProps[j];
4129 while (dirProps[last] == B) {
4133 addPoint(last, RLM_BEFORE);
4136 if ((DirPropFlag(dirProp) & MASK_R_AL) != 0) {
4143 if ((reorderingOptions & OPTION_REMOVE_CONTROLS) != 0) {
4144 resultLength -= controlCount;
4146 resultLength += insertPoints.size;
4152 * Perform the Unicode Bidi algorithm on a given paragraph, as defined in the
4153 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>,
4155 * also described in The Unicode Standard, Version 4.0 .<p>
4157 * This method takes a paragraph of text and computes the
4158 * left-right-directionality of each character. The text should not
4159 * contain any Unicode block separators.<p>
4161 * The RUN_DIRECTION attribute in the text, if present, determines the base
4162 * direction (left-to-right or right-to-left). If not present, the base
4163 * direction is computed using the Unicode Bidirectional Algorithm,
4164 * defaulting to left-to-right if there are no strong directional characters
4165 * in the text. This attribute, if present, must be applied to all the text
4166 * in the paragraph.<p>
4168 * The BIDI_EMBEDDING attribute in the text, if present, represents
4169 * embedding level information. Negative values from -1 to -62 indicate
4170 * overrides at the absolute value of the level. Positive values from 1 to
4171 * 62 indicate embeddings. Where values are zero or not defined, the base
4172 * embedding level as determined by the base direction is assumed.<p>
4174 * The NUMERIC_SHAPING attribute in the text, if present, converts European
4175 * digits to other decimal digits before running the bidi algorithm. This
4176 * attribute, if present, must be applied to all the text in the paragraph.
4178 * If the entire text is all of the same directionality, then
4179 * the method may not perform all the steps described by the algorithm,
4180 * i.e., some levels may not be the same as if all steps were performed.
4181 * This is not relevant for unidirectional text.<br>
4182 * For example, in pure LTR text with numbers the numbers would get
4183 * a resolved level of 2 higher than the surrounding text according to
4184 * the algorithm. This implementation may set all resolved levels to
4185 * the same value in such a case.<p>
4187 * @param paragraph a paragraph of text with optional character and
4188 * paragraph attribute information
4191 public void setPara(AttributedCharacterIterator paragraph)
4194 Boolean runDirection = (Boolean) paragraph.getAttribute(TextAttribute.RUN_DIRECTION);
4195 if (runDirection == null) {
4196 paraLvl = LEVEL_DEFAULT_LTR;
4198 paraLvl = (runDirection.equals(TextAttribute.RUN_DIRECTION_LTR)) ?
4203 int len = paragraph.getEndIndex() - paragraph.getBeginIndex();
4204 byte[] embeddingLevels = new byte[len];
4205 char[] txt = new char[len];
4207 char ch = paragraph.first();
4208 while (ch != AttributedCharacterIterator.DONE) {
4210 Integer embedding = (Integer) paragraph.getAttribute(TextAttribute.BIDI_EMBEDDING);
4211 if (embedding != null) {
4212 byte level = embedding.byteValue();
4215 } else if (level < 0) {
4216 lvls = embeddingLevels;
4217 embeddingLevels[i] = (byte)((0 - level) | LEVEL_OVERRIDE);
4219 lvls = embeddingLevels;
4220 embeddingLevels[i] = level;
4223 ch = paragraph.next();
4227 NumericShaper shaper = (NumericShaper) paragraph.getAttribute(TextAttribute.NUMERIC_SHAPING);
4228 if (shaper != null) {
4229 shaper.shape(txt, 0, len);
4231 setPara(txt, paraLvl, lvls);
4235 * Specify whether block separators must be allocated level zero,
4236 * so that successive paragraphs will progress from left to right.
4237 * This method must be called before <code>setPara()</code>.
4238 * Paragraph separators (B) may appear in the text. Setting them to level zero
4239 * means that all paragraph separators (including one possibly appearing
4240 * in the last text position) are kept in the reordered text after the text
4241 * that they follow in the source text.
4242 * When this feature is not enabled, a paragraph separator at the last
4243 * position of the text before reordering will go to the first position
4244 * of the reordered text when the paragraph level is odd.
4246 * @param ordarParaLTR specifies whether paragraph separators (B) must
4247 * receive level 0, so that successive paragraphs progress from left to right.
4252 public void orderParagraphsLTR(boolean ordarParaLTR) {
4253 orderParagraphsLTR = ordarParaLTR;
4257 * Is this <code>Bidi</code> object set to allocate level 0 to block
4258 * separators so that successive paragraphs progress from left to right?
4260 * @return <code>true</code> if the <code>Bidi</code> object is set to
4261 * allocate level 0 to block separators.
4265 public boolean isOrderParagraphsLTR() {
4266 return orderParagraphsLTR;
4270 * Get the directionality of the text.
4272 * @return a value of <code>LTR</code>, <code>RTL</code> or <code>MIXED</code>
4273 * that indicates if the entire text
4274 * represented by this object is unidirectional,
4275 * and which direction, or if it is mixed-directional.
4277 * @throws IllegalStateException if this call is not preceded by a successful
4278 * call to <code>setPara</code> or <code>setLine</code>
4285 public byte getDirection()
4287 verifyValidParaOrLine();
4294 * @return A <code>String</code> containing the text that the
4295 * <code>Bidi</code> object was created for.
4297 * @throws IllegalStateException if this call is not preceded by a successful
4298 * call to <code>setPara</code> or <code>setLine</code>
4304 public String getTextAsString()
4306 verifyValidParaOrLine();
4307 return new String(text);
4313 * @return A <code>char</code> array containing the text that the
4314 * <code>Bidi</code> object was created for.
4316 * @throws IllegalStateException if this call is not preceded by a successful
4317 * call to <code>setPara</code> or <code>setLine</code>
4323 public char[] getText()
4325 verifyValidParaOrLine();
4330 * Get the length of the text.
4332 * @return The length of the text that the <code>Bidi</code> object was
4335 * @throws IllegalStateException if this call is not preceded by a successful
4336 * call to <code>setPara</code> or <code>setLine</code>
4339 public int getLength()
4341 verifyValidParaOrLine();
4342 return originalLength;
4346 * Get the length of the source text processed by the last call to
4347 * <code>setPara()</code>. This length may be different from the length of
4348 * the source text if option <code>OPTION_STREAMING</code> has been
4351 * Note that whenever the length of the text affects the execution or the
4352 * result of a method, it is the processed length which must be considered,
4353 * except for <code>setPara</code> (which receives unprocessed source text)
4354 * and <code>getLength</code> (which returns the original length of the
4356 * In particular, the processed length is the one to consider in the
4359 * <li>maximum value of the <code>limit</code> argument of
4360 * <code>setLine</code></li>
4361 * <li>maximum value of the <code>charIndex</code> argument of
4362 * <code>getParagraph</code></li>
4363 * <li>maximum value of the <code>charIndex</code> argument of
4364 * <code>getLevelAt</code></li>
4365 * <li>number of elements in the array returned by <code>getLevels</code>
4367 * <li>maximum value of the <code>logicalStart</code> argument of
4368 * <code>getLogicalRun</code></li>
4369 * <li>maximum value of the <code>logicalIndex</code> argument of
4370 * <code>getVisualIndex</code></li>
4371 * <li>number of elements returned by <code>getLogicalMap</code></li>
4372 * <li>length of text processed by <code>writeReordered</code></li>
4375 * @return The length of the part of the source text processed by
4376 * the last call to <code>setPara</code>.
4378 * @throws IllegalStateException if this call is not preceded by a successful
4379 * call to <code>setPara</code> or <code>setLine</code>
4382 * @see #OPTION_STREAMING
4385 public int getProcessedLength() {
4386 verifyValidParaOrLine();
4391 * Get the length of the reordered text resulting from the last call to
4392 * <code>setPara()</code>. This length may be different from the length
4393 * of the source text if option <code>OPTION_INSERT_MARKS</code>
4394 * or option <code>OPTION_REMOVE_CONTROLS</code> has been set.
4396 * This resulting length is the one to consider in the following cases:
4398 * <li>maximum value of the <code>visualIndex</code> argument of
4399 * <code>getLogicalIndex</code></li>
4400 * <li>number of elements returned by <code>getVisualMap</code></li>
4402 * Note that this length stays identical to the source text length if
4403 * Bidi marks are inserted or removed using option bits of
4404 * <code>writeReordered</code>, or if option
4405 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> has been set.
4407 * @return The length of the reordered text resulting from
4408 * the last call to <code>setPara</code>.
4410 * @throws IllegalStateException if this call is not preceded by a successful
4411 * call to <code>setPara</code> or <code>setLine</code>
4414 * @see #OPTION_INSERT_MARKS
4415 * @see #OPTION_REMOVE_CONTROLS
4416 * @see #REORDER_INVERSE_NUMBERS_AS_L
4419 public int getResultLength() {
4420 verifyValidParaOrLine();
4421 return resultLength;
4424 /* paragraphs API methods ------------------------------------------------- */
4427 * Get the paragraph level of the text.
4429 * @return The paragraph level. If there are multiple paragraphs, their
4430 * level may vary if the required paraLevel is LEVEL_DEFAULT_LTR or
4431 * LEVEL_DEFAULT_RTL. In that case, the level of the first paragraph
4434 * @throws IllegalStateException if this call is not preceded by a successful
4435 * call to <code>setPara</code> or <code>setLine</code>
4437 * @see #LEVEL_DEFAULT_LTR
4438 * @see #LEVEL_DEFAULT_RTL
4439 * @see #getParagraph
4440 * @see #getParagraphByIndex
4443 public byte getParaLevel()
4445 verifyValidParaOrLine();
4450 * Get the number of paragraphs.
4452 * @return The number of paragraphs.
4454 * @throws IllegalStateException if this call is not preceded by a successful
4455 * call to <code>setPara</code> or <code>setLine</code>
4458 public int countParagraphs()
4460 verifyValidParaOrLine();
4465 * Get a paragraph, given the index of this paragraph.
4467 * This method returns information about a paragraph.<p>
4469 * @param paraIndex is the number of the paragraph, in the
4470 * range <code>[0..countParagraphs()-1]</code>.
4472 * @return a BidiRun object with the details of the paragraph:<br>
4473 * <code>start</code> will receive the index of the first character
4474 * of the paragraph in the text.<br>
4475 * <code>limit</code> will receive the limit of the paragraph.<br>
4476 * <code>embeddingLevel</code> will receive the level of the paragraph.
4478 * @throws IllegalStateException if this call is not preceded by a successful
4479 * call to <code>setPara</code> or <code>setLine</code>
4480 * @throws IllegalArgumentException if paraIndex is not in the range
4481 * <code>[0..countParagraphs()-1]</code>
4483 * @see com.ibm.icu.text.BidiRun
4486 public BidiRun getParagraphByIndex(int paraIndex)
4488 verifyValidParaOrLine();
4489 verifyRange(paraIndex, 0, paraCount);
4491 Bidi bidi = paraBidi; /* get Para object if Line object */
4493 if (paraIndex == 0) {
4496 paraStart = bidi.paras_limit[paraIndex - 1];
4498 BidiRun bidiRun = new BidiRun();
4499 bidiRun.start = paraStart;
4500 bidiRun.limit = bidi.paras_limit[paraIndex];
4501 bidiRun.level = GetParaLevelAt(paraStart);
4506 * Get a paragraph, given a position within the text.
4507 * This method returns information about a paragraph.<br>
4508 * Note: if the paragraph index is known, it is more efficient to
4509 * retrieve the paragraph information using getParagraphByIndex().<p>
4511 * @param charIndex is the index of a character within the text, in the
4512 * range <code>[0..getProcessedLength()-1]</code>.
4514 * @return a BidiRun object with the details of the paragraph:<br>
4515 * <code>start</code> will receive the index of the first character
4516 * of the paragraph in the text.<br>
4517 * <code>limit</code> will receive the limit of the paragraph.<br>
4518 * <code>embeddingLevel</code> will receive the level of the paragraph.
4520 * @throws IllegalStateException if this call is not preceded by a successful
4521 * call to <code>setPara</code> or <code>setLine</code>
4522 * @throws IllegalArgumentException if charIndex is not within the legal range
4524 * @see com.ibm.icu.text.BidiRun
4525 * @see #getParagraphByIndex
4526 * @see #getProcessedLength
4529 public BidiRun getParagraph(int charIndex)
4531 verifyValidParaOrLine();
4532 Bidi bidi = paraBidi; /* get Para object if Line object */
4533 verifyRange(charIndex, 0, bidi.length);
4535 for (paraIndex = 0; charIndex >= bidi.paras_limit[paraIndex]; paraIndex++) {
4537 return getParagraphByIndex(paraIndex);
4541 * Get the index of a paragraph, given a position within the text.<p>
4543 * @param charIndex is the index of a character within the text, in the
4544 * range <code>[0..getProcessedLength()-1]</code>.
4546 * @return The index of the paragraph containing the specified position,
4549 * @throws IllegalStateException if this call is not preceded by a successful
4550 * call to <code>setPara</code> or <code>setLine</code>
4551 * @throws IllegalArgumentException if charIndex is not within the legal range
4553 * @see com.ibm.icu.text.BidiRun
4554 * @see #getProcessedLength
4557 public int getParagraphIndex(int charIndex)
4559 verifyValidParaOrLine();
4560 Bidi bidi = paraBidi; /* get Para object if Line object */
4561 verifyRange(charIndex, 0, bidi.length);
4563 for (paraIndex = 0; charIndex >= bidi.paras_limit[paraIndex]; paraIndex++) {
4569 * Set a custom Bidi classifier used by the UBA implementation for Bidi
4570 * class determination.
4572 * @param classifier A new custom classifier. This can be null.
4574 * @see #getCustomClassifier
4577 public void setCustomClassifier(BidiClassifier classifier) {
4578 this.customClassifier = classifier;
4582 * Gets the current custom class classifier used for Bidi class
4585 * @return An instance of class <code>BidiClassifier</code>
4587 * @see #setCustomClassifier
4590 public BidiClassifier getCustomClassifier() {
4591 return this.customClassifier;
4595 * Retrieves the Bidi class for a given code point.
4596 * <p>If a <code>BidiClassifier</code> is defined and returns a value
4597 * other than <code>CLASS_DEFAULT</code>, that value is used; otherwise
4598 * the default class determination mechanism is invoked.</p>
4600 * @param c The code point to get a Bidi class for.
4602 * @return The Bidi class for the character <code>c</code> that is in effect
4603 * for this <code>Bidi</code> instance.
4605 * @see BidiClassifier
4608 public int getCustomizedClass(int c) {
4611 if (customClassifier == null ||
4612 (dir = customClassifier.classify(c)) == Bidi.CLASS_DEFAULT) {
4613 dir = bdp.getClass(c);
4615 if (dir >= UCharacterDirection.CHAR_DIRECTION_COUNT)
4621 * <code>setLine()</code> returns a <code>Bidi</code> object to
4622 * contain the reordering information, especially the resolved levels,
4623 * for all the characters in a line of text. This line of text is
4624 * specified by referring to a <code>Bidi</code> object representing
4625 * this information for a piece of text containing one or more paragraphs,
4626 * and by specifying a range of indexes in this text.<p>
4627 * In the new line object, the indexes will range from 0 to <code>limit-start-1</code>.<p>
4629 * This is used after calling <code>setPara()</code>
4630 * for a piece of text, and after line-breaking on that text.
4631 * It is not necessary if each paragraph is treated as a single line.<p>
4633 * After line-breaking, rules (L1) and (L2) for the treatment of
4634 * trailing WS and for reordering are performed on
4635 * a <code>Bidi</code> object that represents a line.<p>
4637 * <strong>Important: </strong>the line <code>Bidi</code> object may
4638 * reference data within the global text <code>Bidi</code> object.
4639 * You should not alter the content of the global text object until
4640 * you are finished using the line object.
4642 * @param start is the line's first index into the text.
4644 * @param limit is just behind the line's last index into the text
4645 * (its last index +1).
4647 * @return a <code>Bidi</code> object that will now represent a line of the text.
4649 * @throws IllegalStateException if this call is not preceded by a successful
4650 * call to <code>setPara</code>
4651 * @throws IllegalArgumentException if start and limit are not in the range
4652 * <code>0<=start<limit<=getProcessedLength()</code>,
4653 * or if the specified line crosses a paragraph boundary
4656 * @see #getProcessedLength
4659 public Bidi setLine(int start, int limit)
4662 verifyRange(start, 0, limit);
4663 verifyRange(limit, 0, length+1);
4664 if (getParagraphIndex(start) != getParagraphIndex(limit - 1)) {
4665 /* the line crosses a paragraph boundary */
4666 throw new IllegalArgumentException();
4668 return BidiLine.setLine(this, start, limit);
4672 * Get the level for one character.
4674 * @param charIndex the index of a character.
4676 * @return The level for the character at <code>charIndex</code>.
4678 * @throws IllegalStateException if this call is not preceded by a successful
4679 * call to <code>setPara</code> or <code>setLine</code>
4680 * @throws IllegalArgumentException if charIndex is not in the range
4681 * <code>0<=charIndex<getProcessedLength()</code>
4683 * @see #getProcessedLength
4686 public byte getLevelAt(int charIndex)
4688 verifyValidParaOrLine();
4689 verifyRange(charIndex, 0, length);
4690 return BidiLine.getLevelAt(this, charIndex);
4694 * Get an array of levels for each character.<p>
4696 * Note that this method may allocate memory under some
4697 * circumstances, unlike <code>getLevelAt()</code>.
4699 * @return The levels array for the text,
4700 * or <code>null</code> if an error occurs.
4702 * @throws IllegalStateException if this call is not preceded by a successful
4703 * call to <code>setPara</code> or <code>setLine</code>
4706 public byte[] getLevels()
4708 verifyValidParaOrLine();
4712 return BidiLine.getLevels(this);
4716 * Get a logical run.
4717 * This method returns information about a run and is used
4718 * to retrieve runs in logical order.<p>
4719 * This is especially useful for line-breaking on a paragraph.
4721 * @param logicalPosition is a logical position within the source text.
4723 * @return a BidiRun object filled with <code>start</code> containing
4724 * the first character of the run, <code>limit</code> containing
4725 * the limit of the run, and <code>embeddingLevel</code> containing
4726 * the level of the run.
4728 * @throws IllegalStateException if this call is not preceded by a successful
4729 * call to <code>setPara</code> or <code>setLine</code>
4730 * @throws IllegalArgumentException if logicalPosition is not in the range
4731 * <code>0<=logicalPosition<getProcessedLength()</code>
4733 * @see com.ibm.icu.text.BidiRun
4734 * @see com.ibm.icu.text.BidiRun#getStart()
4735 * @see com.ibm.icu.text.BidiRun#getLimit()
4736 * @see com.ibm.icu.text.BidiRun#getEmbeddingLevel()
4740 public BidiRun getLogicalRun(int logicalPosition)
4742 verifyValidParaOrLine();
4743 verifyRange(logicalPosition, 0, length);
4744 return BidiLine.getLogicalRun(this, logicalPosition);
4748 * Get the number of runs.
4749 * This method may invoke the actual reordering on the
4750 * <code>Bidi</code> object, after <code>setPara()</code>
4751 * may have resolved only the levels of the text. Therefore,
4752 * <code>countRuns()</code> may have to allocate memory,
4753 * and may throw an exception if it fails to do so.
4755 * @return The number of runs.
4757 * @throws IllegalStateException if this call is not preceded by a successful
4758 * call to <code>setPara</code> or <code>setLine</code>
4761 public int countRuns()
4763 verifyValidParaOrLine();
4764 BidiLine.getRuns(this);
4770 * Get a <code>BidiRun</code> object according to its index. BidiRun methods
4771 * may be used to retrieve the run's logical start, length and level,
4772 * which can be even for an LTR run or odd for an RTL run.
4773 * In an RTL run, the character at the logical start is
4774 * visually on the right of the displayed run.
4775 * The length is the number of characters in the run.<p>
4776 * <code>countRuns()</code> is normally called
4777 * before the runs are retrieved.
4782 * Bidi bidi = new Bidi();
4783 * String text = "abc 123 DEFG xyz";
4784 * bidi.setPara(text, Bidi.RTL, null);
4785 * int i, count=bidi.countRuns(), logicalStart, visualIndex=0, length;
4787 * for (i = 0; i < count; ++i) {
4788 * run = bidi.getVisualRun(i);
4789 * logicalStart = run.getStart();
4790 * length = run.getLength();
4791 * if (Bidi.LTR == run.getEmbeddingLevel()) {
4793 * show_char(text.charAt(logicalStart++), visualIndex++);
4794 * } while (--length > 0);
4796 * logicalStart += length; // logicalLimit
4798 * show_char(text.charAt(--logicalStart), visualIndex++);
4799 * } while (--length > 0);
4804 * Note that in right-to-left runs, code like this places
4805 * second surrogates before first ones (which is generally a bad idea)
4806 * and combining characters before base characters.
4808 * Use of <code>{@link #writeReordered}</code>, optionally with the
4809 * <code>{@link #KEEP_BASE_COMBINING}</code> option, can be considered in
4810 * order to avoid these issues.
4812 * @param runIndex is the number of the run in visual order, in the
4813 * range <code>[0..countRuns()-1]</code>.
4815 * @return a BidiRun object containing the details of the run. The
4816 * directionality of the run is
4817 * <code>LTR==0</code> or <code>RTL==1</code>,
4818 * never <code>MIXED</code>.
4820 * @throws IllegalStateException if this call is not preceded by a successful
4821 * call to <code>setPara</code> or <code>setLine</code>
4822 * @throws IllegalArgumentException if <code>runIndex</code> is not in
4823 * the range <code>0<=runIndex<countRuns()</code>
4826 * @see com.ibm.icu.text.BidiRun
4827 * @see com.ibm.icu.text.BidiRun#getStart()
4828 * @see com.ibm.icu.text.BidiRun#getLength()
4829 * @see com.ibm.icu.text.BidiRun#getEmbeddingLevel()
4832 public BidiRun getVisualRun(int runIndex)
4834 verifyValidParaOrLine();
4835 BidiLine.getRuns(this);
4836 verifyRange(runIndex, 0, runCount);
4837 return BidiLine.getVisualRun(this, runIndex);
4841 * Get the visual position from a logical text position.
4842 * If such a mapping is used many times on the same
4843 * <code>Bidi</code> object, then calling
4844 * <code>getLogicalMap()</code> is more efficient.
4846 * The value returned may be <code>MAP_NOWHERE</code> if there is no
4847 * visual position because the corresponding text character is a Bidi
4848 * control removed from output by the option
4849 * <code>OPTION_REMOVE_CONTROLS</code>.
4851 * When the visual output is altered by using options of
4852 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,
4853 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,
4854 * <code>REMOVE_BIDI_CONTROLS</code>, the visual position returned may not
4855 * be correct. It is advised to use, when possible, reordering options
4856 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.
4858 * Note that in right-to-left runs, this mapping places
4859 * second surrogates before first ones (which is generally a bad idea)
4860 * and combining characters before base characters.
4861 * Use of <code>{@link #writeReordered}</code>, optionally with the
4862 * <code>{@link #KEEP_BASE_COMBINING}</code> option can be considered instead
4863 * of using the mapping, in order to avoid these issues.
4865 * @param logicalIndex is the index of a character in the text.
4867 * @return The visual position of this character.
4869 * @throws IllegalStateException if this call is not preceded by a successful
4870 * call to <code>setPara</code> or <code>setLine</code>
4871 * @throws IllegalArgumentException if <code>logicalIndex</code> is not in
4872 * the range <code>0<=logicalIndex<getProcessedLength()</code>
4874 * @see #getLogicalMap
4875 * @see #getLogicalIndex
4876 * @see #getProcessedLength
4878 * @see #OPTION_REMOVE_CONTROLS
4879 * @see #writeReordered
4882 public int getVisualIndex(int logicalIndex)
4884 verifyValidParaOrLine();
4885 verifyRange(logicalIndex, 0, length);
4886 return BidiLine.getVisualIndex(this, logicalIndex);
4891 * Get the logical text position from a visual position.
4892 * If such a mapping is used many times on the same
4893 * <code>Bidi</code> object, then calling
4894 * <code>getVisualMap()</code> is more efficient.
4896 * The value returned may be <code>MAP_NOWHERE</code> if there is no
4897 * logical position because the corresponding text character is a Bidi
4898 * mark inserted in the output by option
4899 * <code>OPTION_INSERT_MARKS</code>.
4901 * This is the inverse method to <code>getVisualIndex()</code>.
4903 * When the visual output is altered by using options of
4904 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,
4905 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,
4906 * <code>REMOVE_BIDI_CONTROLS</code>, the logical position returned may not
4907 * be correct. It is advised to use, when possible, reordering options
4908 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.
4910 * @param visualIndex is the visual position of a character.
4912 * @return The index of this character in the text.
4914 * @throws IllegalStateException if this call is not preceded by a successful
4915 * call to <code>setPara</code> or <code>setLine</code>
4916 * @throws IllegalArgumentException if <code>visualIndex</code> is not in
4917 * the range <code>0<=visualIndex<getResultLength()</code>
4919 * @see #getVisualMap
4920 * @see #getVisualIndex
4921 * @see #getResultLength
4923 * @see #OPTION_INSERT_MARKS
4924 * @see #writeReordered
4927 public int getLogicalIndex(int visualIndex)
4929 verifyValidParaOrLine();
4930 verifyRange(visualIndex, 0, resultLength);
4931 /* we can do the trivial cases without the runs array */
4932 if (insertPoints.size == 0 && controlCount == 0) {
4933 if (direction == LTR) {
4936 else if (direction == RTL) {
4937 return length - visualIndex - 1;
4940 BidiLine.getRuns(this);
4941 return BidiLine.getLogicalIndex(this, visualIndex);
4945 * Get a logical-to-visual index map (array) for the characters in the
4946 * <code>Bidi</code> (paragraph or line) object.
4948 * Some values in the map may be <code>MAP_NOWHERE</code> if the
4949 * corresponding text characters are Bidi controls removed from the visual
4950 * output by the option <code>OPTION_REMOVE_CONTROLS</code>.
4952 * When the visual output is altered by using options of
4953 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,
4954 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,
4955 * <code>REMOVE_BIDI_CONTROLS</code>, the visual positions returned may not
4956 * be correct. It is advised to use, when possible, reordering options
4957 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.
4959 * Note that in right-to-left runs, this mapping places
4960 * second surrogates before first ones (which is generally a bad idea)
4961 * and combining characters before base characters.
4962 * Use of <code>{@link #writeReordered}</code>, optionally with the
4963 * <code>{@link #KEEP_BASE_COMBINING}</code> option can be considered instead
4964 * of using the mapping, in order to avoid these issues.
4966 * @return an array of <code>getProcessedLength()</code>
4967 * indexes which will reflect the reordering of the characters.<br><br>
4968 * The index map will result in
4969 * <code>indexMap[logicalIndex]==visualIndex</code>, where
4970 * <code>indexMap</code> represents the returned array.
4972 * @throws IllegalStateException if this call is not preceded by a successful
4973 * call to <code>setPara</code> or <code>setLine</code>
4975 * @see #getVisualMap
4976 * @see #getVisualIndex
4977 * @see #getProcessedLength
4979 * @see #OPTION_REMOVE_CONTROLS
4980 * @see #writeReordered
4983 public int[] getLogicalMap()
4985 /* countRuns() checks successful call to setPara/setLine */
4990 return BidiLine.getLogicalMap(this);
4994 * Get a visual-to-logical index map (array) for the characters in the
4995 * <code>Bidi</code> (paragraph or line) object.
4997 * Some values in the map may be <code>MAP_NOWHERE</code> if the
4998 * corresponding text characters are Bidi marks inserted in the visual
4999 * output by the option <code>OPTION_INSERT_MARKS</code>.
5001 * When the visual output is altered by using options of
5002 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,
5003 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,
5004 * <code>REMOVE_BIDI_CONTROLS</code>, the logical positions returned may not
5005 * be correct. It is advised to use, when possible, reordering options
5006 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.
5008 * @return an array of <code>getResultLength()</code>
5009 * indexes which will reflect the reordering of the characters.<br><br>
5010 * The index map will result in
5011 * <code>indexMap[visualIndex]==logicalIndex</code>, where
5012 * <code>indexMap</code> represents the returned array.
5014 * @throws IllegalStateException if this call is not preceded by a successful
5015 * call to <code>setPara</code> or <code>setLine</code>
5017 * @see #getLogicalMap
5018 * @see #getLogicalIndex
5019 * @see #getResultLength
5021 * @see #OPTION_INSERT_MARKS
5022 * @see #writeReordered
5025 public int[] getVisualMap()
5027 /* countRuns() checks successful call to setPara/setLine */
5029 if (resultLength <= 0) {
5032 return BidiLine.getVisualMap(this);
5036 * This is a convenience method that does not use a <code>Bidi</code> object.
5037 * It is intended to be used for when an application has determined the levels
5038 * of objects (character sequences) and just needs to have them reordered (L2).
5039 * This is equivalent to using <code>getLogicalMap()</code> on a
5040 * <code>Bidi</code> object.
5042 * @param levels is an array of levels that have been determined by
5045 * @return an array of <code>levels.length</code>
5046 * indexes which will reflect the reordering of the characters.<p>
5047 * The index map will result in
5048 * <code>indexMap[logicalIndex]==visualIndex</code>, where
5049 * <code>indexMap</code> represents the returned array.
5053 public static int[] reorderLogical(byte[] levels)
5055 return BidiLine.reorderLogical(levels);
5059 * This is a convenience method that does not use a <code>Bidi</code> object.
5060 * It is intended to be used for when an application has determined the levels
5061 * of objects (character sequences) and just needs to have them reordered (L2).
5062 * This is equivalent to using <code>getVisualMap()</code> on a
5063 * <code>Bidi</code> object.
5065 * @param levels is an array of levels that have been determined by
5068 * @return an array of <code>levels.length</code>
5069 * indexes which will reflect the reordering of the characters.<p>
5070 * The index map will result in
5071 * <code>indexMap[visualIndex]==logicalIndex</code>, where
5072 * <code>indexMap</code> represents the returned array.
5076 public static int[] reorderVisual(byte[] levels)
5078 return BidiLine.reorderVisual(levels);
5082 * Invert an index map.
5083 * The index mapping of the argument map is inverted and returned as
5084 * an array of indexes that we will call the inverse map.
5086 * @param srcMap is an array whose elements define the original mapping
5087 * from a source array to a destination array.
5088 * Some elements of the source array may have no mapping in the
5089 * destination array. In that case, their value will be
5090 * the special value <code>MAP_NOWHERE</code>.
5091 * All elements must be >=0 or equal to <code>MAP_NOWHERE</code>.
5092 * Some elements in the source map may have a value greater than the
5093 * srcMap.length if the destination array has more elements than the
5095 * There must be no duplicate indexes (two or more elements with the
5096 * same value except <code>MAP_NOWHERE</code>).
5098 * @return an array representing the inverse map.
5099 * This array has a number of elements equal to 1 + the highest
5100 * value in <code>srcMap</code>.
5101 * For elements of the result array which have no matching elements
5102 * in the source array, the corresponding elements in the inverse
5103 * map will receive a value equal to <code>MAP_NOWHERE</code>.
5104 * If element with index i in <code>srcMap</code> has a value k different
5105 * from <code>MAP_NOWHERE</code>, this means that element i of
5106 * the source array maps to element k in the destination array.
5107 * The inverse map will have value i in its k-th element.
5108 * For all elements of the destination array which do not map to
5109 * an element in the source array, the corresponding element in the
5110 * inverse map will have a value equal to <code>MAP_NOWHERE</code>.
5115 public static int[] invertMap(int[] srcMap)
5117 if (srcMap == null) {
5120 return BidiLine.invertMap(srcMap);
5125 * Fields and methods for compatibility with java.text.bidi (Sun implementation)
5129 * Constant indicating base direction is left-to-right.
5132 public static final int DIRECTION_LEFT_TO_RIGHT = LTR;
5135 * Constant indicating base direction is right-to-left.
5138 public static final int DIRECTION_RIGHT_TO_LEFT = RTL;
5141 * Constant indicating that the base direction depends on the first strong
5142 * directional character in the text according to the Unicode Bidirectional
5143 * Algorithm. If no strong directional character is present, the base
5144 * direction is left-to-right.
5147 public static final int DIRECTION_DEFAULT_LEFT_TO_RIGHT = LEVEL_DEFAULT_LTR;
5150 * Constant indicating that the base direction depends on the first strong
5151 * directional character in the text according to the Unicode Bidirectional
5152 * Algorithm. If no strong directional character is present, the base
5153 * direction is right-to-left.
5156 public static final int DIRECTION_DEFAULT_RIGHT_TO_LEFT = LEVEL_DEFAULT_RTL;
5159 * Create Bidi from the given paragraph of text and base direction.
5161 * @param paragraph a paragraph of text
5162 * @param flags a collection of flags that control the algorithm. The
5163 * algorithm understands the flags DIRECTION_LEFT_TO_RIGHT,
5164 * DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and
5165 * DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved.
5166 * @see #DIRECTION_LEFT_TO_RIGHT
5167 * @see #DIRECTION_RIGHT_TO_LEFT
5168 * @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT
5169 * @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT
5172 public Bidi(String paragraph, int flags)
5174 this(paragraph.toCharArray(), 0, null, 0, paragraph.length(), flags);
5178 * Create Bidi from the given paragraph of text.<p>
5180 * The RUN_DIRECTION attribute in the text, if present, determines the base
5181 * direction (left-to-right or right-to-left). If not present, the base
5182 * direction is computed using the Unicode Bidirectional Algorithm,
5183 * defaulting to left-to-right if there are no strong directional characters
5184 * in the text. This attribute, if present, must be applied to all the text
5185 * in the paragraph.<p>
5187 * The BIDI_EMBEDDING attribute in the text, if present, represents
5188 * embedding level information. Negative values from -1 to -62 indicate
5189 * overrides at the absolute value of the level. Positive values from 1 to
5190 * 62 indicate embeddings. Where values are zero or not defined, the base
5191 * embedding level as determined by the base direction is assumed.<p>
5193 * The NUMERIC_SHAPING attribute in the text, if present, converts European
5194 * digits to other decimal digits before running the bidi algorithm. This
5195 * attribute, if present, must be applied to all the text in the paragraph.<p>
5197 * Note: this constructor calls setPara() internally.
5199 * @param paragraph a paragraph of text with optional character and
5200 * paragraph attribute information
5203 public Bidi(AttributedCharacterIterator paragraph)
5210 * Create Bidi from the given text, embedding, and direction information.
5211 * The embeddings array may be null. If present, the values represent
5212 * embedding level information. Negative values from -1 to -61 indicate
5213 * overrides at the absolute value of the level. Positive values from 1 to
5214 * 61 indicate embeddings. Where values are zero, the base embedding level
5215 * as determined by the base direction is assumed.<p>
5217 * Note: this constructor calls setPara() internally.
5219 * @param text an array containing the paragraph of text to process.
5220 * @param textStart the index into the text array of the start of the
5222 * @param embeddings an array containing embedding values for each character
5223 * in the paragraph. This can be null, in which case it is assumed
5224 * that there is no external embedding information.
5225 * @param embStart the index into the embedding array of the start of the
5227 * @param paragraphLength the length of the paragraph in the text and
5228 * embeddings arrays.
5229 * @param flags a collection of flags that control the algorithm. The
5230 * algorithm understands the flags DIRECTION_LEFT_TO_RIGHT,
5231 * DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and
5232 * DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved.
5234 * @throws IllegalArgumentException if the values in embeddings are
5235 * not within the allowed range
5237 * @see #DIRECTION_LEFT_TO_RIGHT
5238 * @see #DIRECTION_RIGHT_TO_LEFT
5239 * @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT
5240 * @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT
5243 public Bidi(char[] text,
5247 int paragraphLength,
5253 case DIRECTION_LEFT_TO_RIGHT:
5257 case DIRECTION_RIGHT_TO_LEFT:
5260 case DIRECTION_DEFAULT_LEFT_TO_RIGHT:
5261 paraLvl = LEVEL_DEFAULT_LTR;
5263 case DIRECTION_DEFAULT_RIGHT_TO_LEFT:
5264 paraLvl = LEVEL_DEFAULT_RTL;
5267 byte[] paraEmbeddings;
5268 if (embeddings == null) {
5269 paraEmbeddings = null;
5271 paraEmbeddings = new byte[paragraphLength];
5273 for (int i = 0; i < paragraphLength; i++) {
5274 lev = embeddings[i + embStart];
5276 lev = (byte)((- lev) | LEVEL_OVERRIDE);
5277 } else if (lev == 0) {
5279 if (paraLvl > MAX_EXPLICIT_LEVEL) {
5283 paraEmbeddings[i] = lev;
5286 if (textStart == 0 && embStart == 0 && paragraphLength == text.length) {
5287 setPara(text, paraLvl, paraEmbeddings);
5289 char[] paraText = new char[paragraphLength];
5290 System.arraycopy(text, textStart, paraText, 0, paragraphLength);
5291 setPara(paraText, paraLvl, paraEmbeddings);
5296 * Create a Bidi object representing the bidi information on a line of text
5297 * within the paragraph represented by the current Bidi. This call is not
5298 * required if the entire paragraph fits on one line.
5300 * @param lineStart the offset from the start of the paragraph to the start
5302 * @param lineLimit the offset from the start of the paragraph to the limit
5305 * @throws IllegalStateException if this call is not preceded by a successful
5306 * call to <code>setPara</code>
5307 * @throws IllegalArgumentException if lineStart and lineLimit are not in the range
5308 * <code>0<=lineStart<lineLimit<=getProcessedLength()</code>,
5309 * or if the specified line crosses a paragraph boundary
5312 public Bidi createLineBidi(int lineStart, int lineLimit)
5314 return setLine(lineStart, lineLimit);
5318 * Return true if the line is not left-to-right or right-to-left. This means
5319 * it either has mixed runs of left-to-right and right-to-left text, or the
5320 * base direction differs from the direction of the only run of text.
5322 * @return true if the line is not left-to-right or right-to-left.
5324 * @throws IllegalStateException if this call is not preceded by a successful
5325 * call to <code>setPara</code>
5328 public boolean isMixed()
5330 return (!isLeftToRight() && !isRightToLeft());
5334 * Return true if the line is all left-to-right text and the base direction
5337 * @return true if the line is all left-to-right text and the base direction
5340 * @throws IllegalStateException if this call is not preceded by a successful
5341 * call to <code>setPara</code>
5344 public boolean isLeftToRight()
5346 return (getDirection() == LTR && (paraLevel & 1) == 0);
5350 * Return true if the line is all right-to-left text, and the base direction
5353 * @return true if the line is all right-to-left text, and the base
5354 * direction is right-to-left
5356 * @throws IllegalStateException if this call is not preceded by a successful
5357 * call to <code>setPara</code>
5360 public boolean isRightToLeft()
5362 return (getDirection() == RTL && (paraLevel & 1) == 1);
5366 * Return true if the base direction is left-to-right
5368 * @return true if the base direction is left-to-right
5370 * @throws IllegalStateException if this call is not preceded by a successful
5371 * call to <code>setPara</code> or <code>setLine</code>
5375 public boolean baseIsLeftToRight()
5377 return (getParaLevel() == LTR);
5381 * Return the base level (0 if left-to-right, 1 if right-to-left).
5383 * @return the base level
5385 * @throws IllegalStateException if this call is not preceded by a successful
5386 * call to <code>setPara</code> or <code>setLine</code>
5390 public int getBaseLevel()
5392 return getParaLevel();
5396 * Return the number of level runs.
5398 * @return the number of level runs
5400 * @throws IllegalStateException if this call is not preceded by a successful
5401 * call to <code>setPara</code> or <code>setLine</code>
5405 public int getRunCount()
5411 * Compute the logical to visual run mapping
5413 void getLogicalToVisualRunsMap()
5415 if (isGoodLogicalToVisualRunsMap) {
5418 int count = countRuns();
5419 if ((logicalToVisualRunsMap == null) ||
5420 (logicalToVisualRunsMap.length < count)) {
5421 logicalToVisualRunsMap = new int[count];
5424 long[] keys = new long[count];
5425 for (i = 0; i < count; i++) {
5426 keys[i] = ((long)(runs[i].start)<<32) + i;
5429 for (i = 0; i < count; i++) {
5430 logicalToVisualRunsMap[i] = (int)(keys[i] & 0x00000000FFFFFFFF);
5432 isGoodLogicalToVisualRunsMap = true;
5436 * Return the level of the nth logical run in this line.
5438 * @param run the index of the run, between 0 and <code>countRuns()-1</code>
5440 * @return the level of the run
5442 * @throws IllegalStateException if this call is not preceded by a successful
5443 * call to <code>setPara</code> or <code>setLine</code>
5444 * @throws IllegalArgumentException if <code>run</code> is not in
5445 * the range <code>0<=run<countRuns()</code>
5448 public int getRunLevel(int run)
5450 verifyValidParaOrLine();
5451 BidiLine.getRuns(this);
5452 verifyRange(run, 0, runCount);
5453 getLogicalToVisualRunsMap();
5454 return runs[logicalToVisualRunsMap[run]].level;
5458 * Return the index of the character at the start of the nth logical run in
5459 * this line, as an offset from the start of the line.
5461 * @param run the index of the run, between 0 and <code>countRuns()</code>
5463 * @return the start of the run
5465 * @throws IllegalStateException if this call is not preceded by a successful
5466 * call to <code>setPara</code> or <code>setLine</code>
5467 * @throws IllegalArgumentException if <code>run</code> is not in
5468 * the range <code>0<=run<countRuns()</code>
5471 public int getRunStart(int run)
5473 verifyValidParaOrLine();
5474 BidiLine.getRuns(this);
5475 verifyRange(run, 0, runCount);
5476 getLogicalToVisualRunsMap();
5477 return runs[logicalToVisualRunsMap[run]].start;
5481 * Return the index of the character past the end of the nth logical run in
5482 * this line, as an offset from the start of the line. For example, this
5483 * will return the length of the line for the last run on the line.
5485 * @param run the index of the run, between 0 and <code>countRuns()</code>
5487 * @return the limit of the run
5489 * @throws IllegalStateException if this call is not preceded by a successful
5490 * call to <code>setPara</code> or <code>setLine</code>
5491 * @throws IllegalArgumentException if <code>run</code> is not in
5492 * the range <code>0<=run<countRuns()</code>
5495 public int getRunLimit(int run)
5497 verifyValidParaOrLine();
5498 BidiLine.getRuns(this);
5499 verifyRange(run, 0, runCount);
5500 getLogicalToVisualRunsMap();
5501 int idx = logicalToVisualRunsMap[run];
5502 int len = idx == 0 ? runs[idx].limit :
5503 runs[idx].limit - runs[idx-1].limit;
5504 return runs[idx].start + len;
5508 * Return true if the specified text requires bidi analysis. If this returns
5509 * false, the text will display left-to-right. Clients can then avoid
5510 * constructing a Bidi object. Text in the Arabic Presentation Forms area of
5511 * Unicode is presumed to already be shaped and ordered for display, and so
5512 * will not cause this method to return true.
5514 * @param text the text containing the characters to test
5515 * @param start the start of the range of characters to test
5516 * @param limit the limit of the range of characters to test
5518 * @return true if the range of characters requires bidi analysis
5522 public static boolean requiresBidi(char[] text,
5526 final int RTLMask = (1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT |
5527 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC |
5528 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING |
5529 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE |
5530 1 << UCharacter.DIRECTIONALITY_ARABIC_NUMBER);
5532 for (int i = start; i < limit; ++i) {
5533 if (((1 << UCharacter.getDirection(text[i])) & RTLMask) != 0) {
5541 * Reorder the objects in the array into visual order based on their levels.
5542 * This is a utility method to use when you have a collection of objects
5543 * representing runs of text in logical order, each run containing text at a
5544 * single level. The elements at <code>index</code> from
5545 * <code>objectStart</code> up to <code>objectStart + count</code> in the
5546 * objects array will be reordered into visual order assuming
5547 * each run of text has the level indicated by the corresponding element in
5548 * the levels array (at <code>index - objectStart + levelStart</code>).
5550 * @param levels an array representing the bidi level of each object
5551 * @param levelStart the start position in the levels array
5552 * @param objects the array of objects to be reordered into visual order
5553 * @param objectStart the start position in the objects array
5554 * @param count the number of objects to reorder
5557 public static void reorderVisually(byte[] levels,
5563 byte[] reorderLevels = new byte[count];
5564 System.arraycopy(levels, levelStart, reorderLevels, 0, count);
5565 int[] indexMap = reorderVisual(reorderLevels);
5566 Object[] temp = new Object[count];
5567 System.arraycopy(objects, objectStart, temp, 0, count);
5568 for (int i = 0; i < count; ++i) {
5569 objects[objectStart + i] = temp[indexMap[i]];
5574 * Take a <code>Bidi</code> object containing the reordering
5575 * information for a piece of text (one or more paragraphs) set by
5576 * <code>setPara()</code> or for a line of text set by <code>setLine()</code>
5577 * and return a string containing the reordered text.
5579 * <p>The text may have been aliased (only a reference was stored
5580 * without copying the contents), thus it must not have been modified
5581 * since the <code>setPara()</code> call.</p>
5583 * This method preserves the integrity of characters with multiple
5584 * code units and (optionally) combining characters.
5585 * Characters in RTL runs can be replaced by mirror-image characters
5586 * in the returned string. Note that "real" mirroring has to be done in a
5587 * rendering engine by glyph selection and that for many "mirrored"
5588 * characters there are no Unicode characters as mirror-image equivalents.
5589 * There are also options to insert or remove Bidi control
5590 * characters; see the descriptions of the return value and the
5591 * <code>options</code> parameter, and of the option bit flags.
5593 * @param options A bit set of options for the reordering that control
5594 * how the reordered text is written.
5595 * The options include mirroring the characters on a code
5596 * point basis and inserting LRM characters, which is used
5597 * especially for transforming visually stored text
5598 * to logically stored text (although this is still an
5599 * imperfect implementation of an "inverse Bidi" algorithm
5600 * because it uses the "forward Bidi" algorithm at its core).
5601 * The available options are:
5602 * <code>DO_MIRRORING</code>,
5603 * <code>INSERT_LRM_FOR_NUMERIC</code>,
5604 * <code>KEEP_BASE_COMBINING</code>,
5605 * <code>OUTPUT_REVERSE</code>,
5606 * <code>REMOVE_BIDI_CONTROLS</code>,
5607 * <code>STREAMING</code>
5609 * @return The reordered text.
5610 * If the <code>INSERT_LRM_FOR_NUMERIC</code> option is set, then
5611 * the length of the returned string could be as large as
5612 * <code>getLength()+2*countRuns()</code>.<br>
5613 * If the <code>REMOVE_BIDI_CONTROLS</code> option is set, then the
5614 * length of the returned string may be less than
5615 * <code>getLength()</code>.<br>
5616 * If none of these options is set, then the length of the returned
5617 * string will be exactly <code>getProcessedLength()</code>.
5619 * @throws IllegalStateException if this call is not preceded by a successful
5620 * call to <code>setPara</code> or <code>setLine</code>
5622 * @see #DO_MIRRORING
5623 * @see #INSERT_LRM_FOR_NUMERIC
5624 * @see #KEEP_BASE_COMBINING
5625 * @see #OUTPUT_REVERSE
5626 * @see #REMOVE_BIDI_CONTROLS
5627 * @see #OPTION_STREAMING
5628 * @see #getProcessedLength
5631 public String writeReordered(int options)
5633 verifyValidParaOrLine();
5638 return BidiWriter.writeReordered(this, options);
5642 * Reverse a Right-To-Left run of Unicode text.
5644 * This method preserves the integrity of characters with multiple
5645 * code units and (optionally) combining characters.
5646 * Characters can be replaced by mirror-image characters
5647 * in the destination buffer. Note that "real" mirroring has
5648 * to be done in a rendering engine by glyph selection
5649 * and that for many "mirrored" characters there are no
5650 * Unicode characters as mirror-image equivalents.
5651 * There are also options to insert or remove Bidi control
5654 * This method is the implementation for reversing RTL runs as part
5655 * of <code>writeReordered()</code>. For detailed descriptions
5656 * of the parameters, see there.
5657 * Since no Bidi controls are inserted here, the output string length
5658 * will never exceed <code>src.length()</code>.
5660 * @see #writeReordered
5662 * @param src The RTL run text.
5664 * @param options A bit set of options for the reordering that control
5665 * how the reordered text is written.
5666 * See the <code>options</code> parameter in <code>writeReordered()</code>.
5668 * @return The reordered text.
5669 * If the <code>REMOVE_BIDI_CONTROLS</code> option
5670 * is set, then the length of the returned string may be less than
5671 * <code>src.length()</code>. If this option is not set,
5672 * then the length of the returned string will be exactly
5673 * <code>src.length()</code>.
5675 * @throws IllegalArgumentException if <code>src</code> is null.
5678 public static String writeReverse(String src, int options)
5680 /* error checking */
5682 throw new IllegalArgumentException();
5685 if (src.length() > 0) {
5686 return BidiWriter.writeReverse(src, options);