3 *******************************************************************************
4 * Copyright (C) 2001-2009, International Business Machines
5 * Corporation and others. All Rights Reserved.
6 *******************************************************************************
9 /* FOOD FOR THOUGHT: currently the reordering modes are a mixture of
10 * algorithm for direct BiDi, algorithm for inverse Bidi and the bizarre
11 * concept of RUNS_ONLY which is a double operation.
12 * It could be advantageous to divide this into 3 concepts:
13 * a) Operation: direct / inverse / RUNS_ONLY
14 * b) Direct algorithm: default / NUMBERS_SPECIAL / GROUP_NUMBERS_WITH_L
15 * c) Inverse algorithm: default / INVERSE_LIKE_DIRECT / NUMBERS_SPECIAL
16 * This would allow combinations not possible today like RUNS_ONLY with
18 * Also allow to set INSERT_MARKS for the direct step of RUNS_ONLY and
19 * REMOVE_CONTROLS for the inverse step.
20 * Not all combinations would be supported, and probably not all do make sense.
21 * This would need to document which ones are supported and what are the
22 * fallbacks for unsupported combinations.
25 //TODO: make sample program do something simple but real and complete
27 package com.ibm.icu.text;
29 //#if defined(FOUNDATION10)
31 import java.awt.font.TextAttribute;
32 import java.text.AttributedCharacterIterator;
34 //#if defined(FOUNDATION10) || defined(J2SE13)
36 import java.awt.font.NumericShaper;
38 import java.io.IOException;
39 import java.lang.reflect.Array;
40 import java.util.MissingResourceException;
41 import java.util.Arrays;
43 import com.ibm.icu.impl.UBiDiProps;
44 import com.ibm.icu.lang.UCharacter;
45 import com.ibm.icu.lang.UCharacterDirection;
49 * <h2>Bidi algorithm for ICU</h2>
51 * This is an implementation of the Unicode Bidirectional algorithm. The
52 * algorithm is defined in the <a
53 * href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>,
54 * version 13, also described in The Unicode Standard, Version 4.0 .
57 * Note: Libraries that perform a bidirectional algorithm and reorder strings
58 * accordingly are sometimes called "Storage Layout Engines". ICU's Bidi and
59 * shaping (ArabicShaping) classes can be used at the core of such "Storage
62 * <h3>General remarks about the API:</h3>
64 * The "limit" of a sequence of characters is the position just after
65 * their last character, i.e., one more than that position.
68 * Some of the API methods provide access to "runs". Such a
69 * "run" is defined as a sequence of characters that are at the same
70 * embedding level after performing the Bidi algorithm.
73 * <h3>Basic concept: paragraph</h3>
74 * A piece of text can be divided into several paragraphs by characters
75 * with the Bidi class <code>Block Separator</code>. For handling of
78 * <li>{@link #countParagraphs}
79 * <li>{@link #getParaLevel}
80 * <li>{@link #getParagraph}
81 * <li>{@link #getParagraphByIndex}
84 * <h3>Basic concept: text direction</h3>
85 * The direction of a piece of text may be:
92 * <h3>Basic concept: levels</h3>
94 * Levels in this API represent embedding levels according to the Unicode
95 * Bidirectional Algorithm.
96 * Their low-order bit (even/odd value) indicates the visual direction.<p>
98 * Levels can be abstract values when used for the
99 * <code>paraLevel</code> and <code>embeddingLevels</code>
100 * arguments of <code>setPara()</code>; there:
102 * <li>the high-order bit of an <code>embeddingLevels[]</code>
103 * value indicates whether the using application is
104 * specifying the level of a character to <i>override</i> whatever the
105 * Bidi implementation would resolve it to.</li>
106 * <li><code>paraLevel</code> can be set to the
107 * pseudo-level values <code>LEVEL_DEFAULT_LTR</code>
108 * and <code>LEVEL_DEFAULT_RTL</code>.</li>
111 * <p>The related constants are not real, valid level values.
112 * <code>DEFAULT_XXX</code> can be used to specify
113 * a default for the paragraph level for
114 * when the <code>setPara()</code> method
115 * shall determine it but there is no
116 * strongly typed character in the input.<p>
118 * Note that the value for <code>LEVEL_DEFAULT_LTR</code> is even
119 * and the one for <code>LEVEL_DEFAULT_RTL</code> is odd,
120 * just like with normal LTR and RTL level values -
121 * these special values are designed that way. Also, the implementation
122 * assumes that MAX_EXPLICIT_LEVEL is odd.
124 * <ul><b>See Also:</b>
125 * <li>{@link #LEVEL_DEFAULT_LTR}
126 * <li>{@link #LEVEL_DEFAULT_RTL}
127 * <li>{@link #LEVEL_OVERRIDE}
128 * <li>{@link #MAX_EXPLICIT_LEVEL}
129 * <li>{@link #setPara}
132 * <h3>Basic concept: Reordering Mode</h3>
133 * Reordering mode values indicate which variant of the Bidi algorithm to
136 * <ul><b>See Also:</b>
137 * <li>{@link #setReorderingMode}
138 * <li>{@link #REORDER_DEFAULT}
139 * <li>{@link #REORDER_NUMBERS_SPECIAL}
140 * <li>{@link #REORDER_GROUP_NUMBERS_WITH_R}
141 * <li>{@link #REORDER_RUNS_ONLY}
142 * <li>{@link #REORDER_INVERSE_NUMBERS_AS_L}
143 * <li>{@link #REORDER_INVERSE_LIKE_DIRECT}
144 * <li>{@link #REORDER_INVERSE_FOR_NUMBERS_SPECIAL}
147 * <h3>Basic concept: Reordering Options</h3>
148 * Reordering options can be applied during Bidi text transformations.
149 * <ul><b>See Also:</b>
150 * <li>{@link #setReorderingOptions}
151 * <li>{@link #OPTION_DEFAULT}
152 * <li>{@link #OPTION_INSERT_MARKS}
153 * <li>{@link #OPTION_REMOVE_CONTROLS}
154 * <li>{@link #OPTION_STREAMING}
158 * @author Simon Montagu, Matitiahu Allouche (ported from C code written by Markus W. Scherer)
162 * <h4> Sample code for the ICU Bidi API </h4>
164 * <h5>Rendering a paragraph with the ICU Bidi API</h5>
166 * This is (hypothetical) sample code that illustrates how the ICU Bidi API
167 * could be used to render a paragraph of text. Rendering code depends highly on
168 * the graphics system, therefore this sample code must make a lot of
169 * assumptions, which may or may not match any existing graphics system's
173 * The basic assumptions are:
176 * <li>Rendering is done from left to right on a horizontal line.</li>
177 * <li>A run of single-style, unidirectional text can be rendered at once.
179 * <li>Such a run of text is passed to the graphics system with characters
180 * (code units) in logical order.</li>
181 * <li>The line-breaking algorithm is very complicated and Locale-dependent -
182 * and therefore its implementation omitted from this sample code.</li>
187 * package com.ibm.icu.dev.test.bidi;
189 * import com.ibm.icu.text.Bidi;
190 * import com.ibm.icu.text.BidiRun;
192 * public class Sample {
194 * static final int styleNormal = 0;
195 * static final int styleSelected = 1;
196 * static final int styleBold = 2;
197 * static final int styleItalics = 4;
198 * static final int styleSuper=8;
199 * static final int styleSub = 16;
201 * static class StyleRun {
205 * public StyleRun(int limit, int style) {
206 * this.limit = limit;
207 * this.style = style;
211 * static class Bounds {
215 * public Bounds(int start, int limit) {
216 * this.start = start;
217 * this.limit = limit;
221 * static int getTextWidth(String text, int start, int limit,
222 * StyleRun[] styleRuns, int styleRunCount) {
223 * // simplistic way to compute the width
224 * return limit - start;
227 * // set limit and StyleRun limit for a line
228 * // from text[start] and from styleRuns[styleRunStart]
229 * // using Bidi.getLogicalRun(...)
230 * // returns line width
231 * static int getLineBreak(String text, Bounds line, Bidi para,
232 * StyleRun styleRuns[], Bounds styleRun) {
237 * // render runs on a line sequentially, always from left to right
239 * // prepare rendering a new line
240 * static void startLine(byte textDirection, int lineWidth) {
241 * System.out.println();
244 * // render a run of text and advance to the right by the run width
245 * // the text[start..limit-1] is always in logical order
246 * static void renderRun(String text, int start, int limit,
247 * byte textDirection, int style) {
250 * // We could compute a cross-product
251 * // from the style runs with the directional runs
252 * // and then reorder it.
253 * // Instead, here we iterate over each run type
254 * // and render the intersections -
255 * // with shortcuts in simple (and common) cases.
256 * // renderParagraph() is the main function.
258 * // render a directional run with
259 * // (possibly) multiple style runs intersecting with it
260 * static void renderDirectionalRun(String text, int start, int limit,
261 * byte direction, StyleRun styleRuns[],
262 * int styleRunCount) {
265 * // iterate over style runs
266 * if (direction == Bidi.LTR) {
268 * for (i = 0; i < styleRunCount; ++i) {
269 * styleLimit = styleRuns[i].limit;
270 * if (start < styleLimit) {
271 * if (styleLimit > limit) {
272 * styleLimit = limit;
274 * renderRun(text, start, styleLimit,
275 * direction, styleRuns[i].style);
276 * if (styleLimit == limit) {
279 * start = styleLimit;
285 * for (i = styleRunCount-1; i >= 0; --i) {
287 * styleStart = styleRuns[i-1].limit;
291 * if (limit >= styleStart) {
292 * if (styleStart < start) {
293 * styleStart = start;
295 * renderRun(text, styleStart, limit, direction,
296 * styleRuns[i].style);
297 * if (styleStart == start) {
300 * limit = styleStart;
306 * // the line object represents text[start..limit-1]
307 * static void renderLine(Bidi line, String text, int start, int limit,
308 * StyleRun styleRuns[], int styleRunCount) {
309 * byte direction = line.getDirection();
310 * if (direction != Bidi.MIXED) {
312 * if (styleRunCount <= 1) {
313 * renderRun(text, start, limit, direction, styleRuns[0].style);
315 * renderDirectionalRun(text, start, limit, direction,
316 * styleRuns, styleRunCount);
319 * // mixed-directional
324 * count = line.countRuns();
325 * } catch (IllegalStateException e) {
326 * e.printStackTrace();
329 * if (styleRunCount <= 1) {
330 * int style = styleRuns[0].style;
332 * // iterate over directional runs
333 * for (i = 0; i < count; ++i) {
334 * run = line.getVisualRun(i);
335 * renderRun(text, run.getStart(), run.getLimit(),
336 * run.getDirection(), style);
339 * // iterate over both directional and style runs
340 * for (i = 0; i < count; ++i) {
341 * run = line.getVisualRun(i);
342 * renderDirectionalRun(text, run.getStart(),
343 * run.getLimit(), run.getDirection(),
344 * styleRuns, styleRunCount);
350 * static void renderParagraph(String text, byte textDirection,
351 * StyleRun styleRuns[], int styleRunCount,
353 * int length = text.length();
354 * Bidi para = new Bidi();
357 * textDirection != 0 ? Bidi.LEVEL_DEFAULT_RTL
358 * : Bidi.LEVEL_DEFAULT_LTR,
360 * } catch (Exception e) {
361 * e.printStackTrace();
364 * byte paraLevel = (byte)(1 & para.getParaLevel());
365 * StyleRun styleRun = new StyleRun(length, styleNormal);
367 * if (styleRuns == null || styleRunCount <= 0) {
368 * styleRuns = new StyleRun[1];
370 * styleRuns[0] = styleRun;
372 * // assume styleRuns[styleRunCount-1].limit>=length
374 * int width = getTextWidth(text, 0, length, styleRuns, styleRunCount);
375 * if (width <= lineWidth) {
376 * // everything fits onto one line
378 * // prepare rendering a new line from either left or right
379 * startLine(paraLevel, width);
381 * renderLine(para, text, 0, length, styleRuns, styleRunCount);
383 * // we need to render several lines
384 * Bidi line = new Bidi(length, 0);
385 * int start = 0, limit;
386 * int styleRunStart = 0, styleRunLimit;
390 * styleRunLimit = styleRunCount;
391 * width = getLineBreak(text, new Bounds(start, limit),
393 * new Bounds(styleRunStart, styleRunLimit));
395 * line = para.setLine(start, limit);
396 * } catch (Exception e) {
397 * e.printStackTrace();
400 * // prepare rendering a new line
401 * // from either left or right
402 * startLine(paraLevel, width);
404 * if (styleRunStart > 0) {
405 * int newRunCount = styleRuns.length - styleRunStart;
406 * StyleRun[] newRuns = new StyleRun[newRunCount];
407 * System.arraycopy(styleRuns, styleRunStart, newRuns, 0,
409 * renderLine(line, text, start, limit, newRuns,
410 * styleRunLimit - styleRunStart);
412 * renderLine(line, text, start, limit, styleRuns,
413 * styleRunLimit - styleRunStart);
415 * if (limit == length) {
419 * styleRunStart = styleRunLimit - 1;
420 * if (start >= styleRuns[styleRunStart].limit) {
427 * public static void main(String[] args)
429 * renderParagraph("Some Latin text...", Bidi.LTR, null, 0, 80);
430 * renderParagraph("Some Hebrew text...", Bidi.RTL, null, 0, 60);
440 int pos; /* position in text */
441 int flag; /* flag for LRM/RLM, before/after */
447 Point[] points = new Point[0];
450 /** Paragraph level setting<p>
452 * Constant indicating that the base direction depends on the first strong
453 * directional character in the text according to the Unicode Bidirectional
454 * Algorithm. If no strong directional character is present,
455 * then set the paragraph level to 0 (left-to-right).<p>
457 * If this value is used in conjunction with reordering modes
458 * <code>REORDER_INVERSE_LIKE_DIRECT</code> or
459 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the text to reorder
460 * is assumed to be visual LTR, and the text after reordering is required
461 * to be the corresponding logical string with appropriate contextual
462 * direction. The direction of the result string will be RTL if either
463 * the righmost or leftmost strong character of the source text is RTL
464 * or Arabic Letter, the direction will be LTR otherwise.<p>
466 * If reordering option <code>OPTION_INSERT_MARKS</code> is set, an RLM may
467 * be added at the beginning of the result string to ensure round trip
468 * (that the result string, when reordered back to visual, will produce
469 * the original source text).
470 * @see #REORDER_INVERSE_LIKE_DIRECT
471 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL
474 public static final byte LEVEL_DEFAULT_LTR = (byte)0x7e;
476 /** Paragraph level setting<p>
478 * Constant indicating that the base direction depends on the first strong
479 * directional character in the text according to the Unicode Bidirectional
480 * Algorithm. If no strong directional character is present,
481 * then set the paragraph level to 1 (right-to-left).<p>
483 * If this value is used in conjunction with reordering modes
484 * <code>REORDER_INVERSE_LIKE_DIRECT</code> or
485 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the text to reorder
486 * is assumed to be visual LTR, and the text after reordering is required
487 * to be the corresponding logical string with appropriate contextual
488 * direction. The direction of the result string will be RTL if either
489 * the righmost or leftmost strong character of the source text is RTL
490 * or Arabic Letter, or if the text contains no strong character;
491 * the direction will be LTR otherwise.<p>
493 * If reordering option <code>OPTION_INSERT_MARKS</code> is set, an RLM may
494 * be added at the beginning of the result string to ensure round trip
495 * (that the result string, when reordered back to visual, will produce
496 * the original source text).
497 * @see #REORDER_INVERSE_LIKE_DIRECT
498 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL
501 public static final byte LEVEL_DEFAULT_RTL = (byte)0x7f;
504 * Maximum explicit embedding level.
505 * (The maximum resolved level can be up to <code>MAX_EXPLICIT_LEVEL+1</code>).
508 public static final byte MAX_EXPLICIT_LEVEL = 61;
511 * Bit flag for level input.
512 * Overrides directional properties.
515 public static final byte LEVEL_OVERRIDE = (byte)0x80;
518 * Special value which can be returned by the mapping methods when a
519 * logical index has no corresponding visual index or vice-versa. This may
520 * happen for the logical-to-visual mapping of a Bidi control when option
521 * <code>OPTION_REMOVE_CONTROLS</code> is
522 * specified. This can also happen for the visual-to-logical mapping of a
523 * Bidi mark (LRM or RLM) inserted by option
524 * <code>OPTION_INSERT_MARKS</code>.
525 * @see #getVisualIndex
527 * @see #getLogicalIndex
528 * @see #getLogicalMap
529 * @see #OPTION_INSERT_MARKS
530 * @see #OPTION_REMOVE_CONTROLS
533 public static final int MAP_NOWHERE = -1;
536 * All left-to-right text.
539 public static final byte LTR = 0;
542 * All right-to-left text.
545 public static final byte RTL = 1;
548 * Mixed-directional text.
551 public static final byte MIXED = 2;
554 * option bit for writeReordered():
555 * keep combining characters after their base characters in RTL runs
557 * @see #writeReordered
560 public static final short KEEP_BASE_COMBINING = 1;
563 * option bit for writeReordered():
564 * replace characters with the "mirrored" property in RTL runs
565 * by their mirror-image mappings
567 * @see #writeReordered
570 public static final short DO_MIRRORING = 2;
573 * option bit for writeReordered():
574 * surround the run with LRMs if necessary;
575 * this is part of the approximate "inverse Bidi" algorithm
577 * <p>This option does not imply corresponding adjustment of the index
581 * @see #writeReordered
584 public static final short INSERT_LRM_FOR_NUMERIC = 4;
587 * option bit for writeReordered():
588 * remove Bidi control characters
589 * (this does not affect INSERT_LRM_FOR_NUMERIC)
591 * <p>This option does not imply corresponding adjustment of the index
594 * @see #writeReordered
595 * @see #INSERT_LRM_FOR_NUMERIC
598 public static final short REMOVE_BIDI_CONTROLS = 8;
601 * option bit for writeReordered():
602 * write the output in reverse order
604 * <p>This has the same effect as calling <code>writeReordered()</code>
605 * first without this option, and then calling
606 * <code>writeReverse()</code> without mirroring.
607 * Doing this in the same step is faster and avoids a temporary buffer.
608 * An example for using this option is output to a character terminal that
609 * is designed for RTL scripts and stores text in reverse order.</p>
611 * @see #writeReordered
614 public static final short OUTPUT_REVERSE = 16;
616 /** Reordering mode: Regular Logical to Visual Bidi algorithm according to Unicode.
617 * @see #setReorderingMode
620 public static final short REORDER_DEFAULT = 0;
622 /** Reordering mode: Logical to Visual algorithm which handles numbers in
623 * a way which mimicks the behavior of Windows XP.
624 * @see #setReorderingMode
627 public static final short REORDER_NUMBERS_SPECIAL = 1;
629 /** Reordering mode: Logical to Visual algorithm grouping numbers with
630 * adjacent R characters (reversible algorithm).
631 * @see #setReorderingMode
634 public static final short REORDER_GROUP_NUMBERS_WITH_R = 2;
636 /** Reordering mode: Reorder runs only to transform a Logical LTR string
637 * to the logical RTL string with the same display, or vice-versa.<br>
638 * If this mode is set together with option
639 * <code>OPTION_INSERT_MARKS</code>, some Bidi controls in the source
640 * text may be removed and other controls may be added to produce the
641 * minimum combination which has the required display.
642 * @see #OPTION_INSERT_MARKS
643 * @see #setReorderingMode
646 public static final short REORDER_RUNS_ONLY = 3;
648 /** Reordering mode: Visual to Logical algorithm which handles numbers
649 * like L (same algorithm as selected by <code>setInverse(true)</code>.
651 * @see #setReorderingMode
654 public static final short REORDER_INVERSE_NUMBERS_AS_L = 4;
656 /** Reordering mode: Visual to Logical algorithm equivalent to the regular
657 * Logical to Visual algorithm.
658 * @see #setReorderingMode
661 public static final short REORDER_INVERSE_LIKE_DIRECT = 5;
663 /** Reordering mode: Inverse Bidi (Visual to Logical) algorithm for the
664 * <code>REORDER_NUMBERS_SPECIAL</code> Bidi algorithm.
665 * @see #setReorderingMode
668 public static final short REORDER_INVERSE_FOR_NUMBERS_SPECIAL = 6;
670 /* Number of values for reordering mode. */
671 static final short REORDER_COUNT = 7;
673 /* Reordering mode values must be ordered so that all the regular logical to
674 * visual modes come first, and all inverse Bidi modes come last.
676 static final short REORDER_LAST_LOGICAL_TO_VISUAL =
677 REORDER_NUMBERS_SPECIAL;
680 * Option value for <code>setReorderingOptions</code>:
681 * disable all the options which can be set with this method
682 * @see #setReorderingOptions
685 public static final int OPTION_DEFAULT = 0;
688 * Option bit for <code>setReorderingOptions</code>:
689 * insert Bidi marks (LRM or RLM) when needed to ensure correct result of
690 * a reordering to a Logical order
692 * <p>This option must be set or reset before calling
693 * <code>setPara</code>.</p>
695 * <p>This option is significant only with reordering modes which generate
696 * a result with Logical order, specifically.</p>
698 * <li><code>REORDER_RUNS_ONLY</code></li>
699 * <li><code>REORDER_INVERSE_NUMBERS_AS_L</code></li>
700 * <li><code>REORDER_INVERSE_LIKE_DIRECT</code></li>
701 * <li><code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code></li>
704 * <p>If this option is set in conjunction with reordering mode
705 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> or with calling
706 * <code>setInverse(true)</code>, it implies option
707 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method
708 * <code>writeReordered()</code>.</p>
710 * <p>For other reordering modes, a minimum number of LRM or RLM characters
711 * will be added to the source text after reordering it so as to ensure
712 * round trip, i.e. when applying the inverse reordering mode on the
713 * resulting logical text with removal of Bidi marks
714 * (option <code>OPTION_REMOVE_CONTROLS</code> set before calling
715 * <code>setPara()</code> or option
716 * <code>REMOVE_BIDI_CONTROLS</code> in
717 * <code>writeReordered</code>), the result will be identical to the
718 * source text in the first transformation.
720 * <p>This option will be ignored if specified together with option
721 * <code>OPTION_REMOVE_CONTROLS</code>. It inhibits option
722 * <code>REMOVE_BIDI_CONTROLS</code> in calls to method
723 * <code>writeReordered()</code> and it implies option
724 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method
725 * <code>writeReordered()</code> if the reordering mode is
726 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>.</p>
728 * @see #setReorderingMode
729 * @see #setReorderingOptions
730 * @see #INSERT_LRM_FOR_NUMERIC
731 * @see #REMOVE_BIDI_CONTROLS
732 * @see #OPTION_REMOVE_CONTROLS
733 * @see #REORDER_RUNS_ONLY
734 * @see #REORDER_INVERSE_NUMBERS_AS_L
735 * @see #REORDER_INVERSE_LIKE_DIRECT
736 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL
739 public static final int OPTION_INSERT_MARKS = 1;
742 * Option bit for <code>setReorderingOptions</code>:
743 * remove Bidi control characters
745 * <p>This option must be set or reset before calling
746 * <code>setPara</code>.</p>
748 * <p>This option nullifies option
749 * <code>OPTION_INSERT_MARKS</code>. It inhibits option
750 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method
751 * <code>writeReordered()</code> and it implies option
752 * <code>REMOVE_BIDI_CONTROLS</code> in calls to that method.</p>
754 * @see #setReorderingMode
755 * @see #setReorderingOptions
756 * @see #OPTION_INSERT_MARKS
757 * @see #INSERT_LRM_FOR_NUMERIC
758 * @see #REMOVE_BIDI_CONTROLS
761 public static final int OPTION_REMOVE_CONTROLS = 2;
764 * Option bit for <code>setReorderingOptions</code>:
765 * process the output as part of a stream to be continued
767 * <p>This option must be set or reset before calling
768 * <code>setPara</code>.</p>
770 * <p>This option specifies that the caller is interested in processing
771 * large text object in parts. The results of the successive calls are
772 * expected to be concatenated by the caller. Only the call for the last
773 * part will have this option bit off.</p>
775 * <p>When this option bit is on, <code>setPara()</code> may process
776 * less than the full source text in order to truncate the text at a
777 * meaningful boundary. The caller should call
778 * <code>getProcessedLength()</code> immediately after calling
779 * <code>setPara()</code> in order to determine how much of the source
780 * text has been processed. Source text beyond that length should be
781 * resubmitted in following calls to <code>setPara</code>. The
782 * processed length may be less than the length of the source text if a
783 * character preceding the last character of the source text constitutes a
784 * reasonable boundary (like a block separator) for text to be continued.<br>
785 * If the last character of the source text constitutes a reasonable
786 * boundary, the whole text will be processed at once.<br>
787 * If nowhere in the source text there exists
788 * such a reasonable boundary, the processed length will be zero.<br>
789 * The caller should check for such an occurrence and do one of the following:
790 * <ul><li>submit a larger amount of text with a better chance to include
791 * a reasonable boundary.</li>
792 * <li>resubmit the same text after turning off option
793 * <code>OPTION_STREAMING</code>.</li></ul>
794 * In all cases, this option should be turned off before processing the last
795 * part of the text.</p>
797 * <p>When the <code>OPTION_STREAMING</code> option is used, it is
798 * recommended to call <code>orderParagraphsLTR()</code> with argument
799 * <code>orderParagraphsLTR</code> set to <code>true</code> before calling
800 * <code>setPara()</code> so that later paragraphs may be concatenated to
801 * previous paragraphs on the right.
804 * @see #setReorderingMode
805 * @see #setReorderingOptions
806 * @see #getProcessedLength
807 * @see #orderParagraphsLTR
810 public static final int OPTION_STREAMING = 4;
813 * Comparing the description of the Bidi algorithm with this implementation
814 * is easier with the same names for the Bidi types in the code as there.
815 * See UCharacterDirection
817 static final byte L = UCharacterDirection.LEFT_TO_RIGHT;
818 static final byte R = UCharacterDirection.RIGHT_TO_LEFT;
819 static final byte EN = UCharacterDirection.EUROPEAN_NUMBER;
820 static final byte ES = UCharacterDirection.EUROPEAN_NUMBER_SEPARATOR;
821 static final byte ET = UCharacterDirection.EUROPEAN_NUMBER_TERMINATOR;
822 static final byte AN = UCharacterDirection.ARABIC_NUMBER;
823 static final byte CS = UCharacterDirection.COMMON_NUMBER_SEPARATOR;
824 static final byte B = UCharacterDirection.BLOCK_SEPARATOR;
825 static final byte S = UCharacterDirection.SEGMENT_SEPARATOR;
826 static final byte WS = UCharacterDirection.WHITE_SPACE_NEUTRAL;
827 static final byte ON = UCharacterDirection.OTHER_NEUTRAL;
828 static final byte LRE = UCharacterDirection.LEFT_TO_RIGHT_EMBEDDING;
829 static final byte LRO = UCharacterDirection.LEFT_TO_RIGHT_OVERRIDE;
830 static final byte AL = UCharacterDirection.RIGHT_TO_LEFT_ARABIC;
831 static final byte RLE = UCharacterDirection.RIGHT_TO_LEFT_EMBEDDING;
832 static final byte RLO = UCharacterDirection.RIGHT_TO_LEFT_OVERRIDE;
833 static final byte PDF = UCharacterDirection.POP_DIRECTIONAL_FORMAT;
834 static final byte NSM = UCharacterDirection.DIR_NON_SPACING_MARK;
835 static final byte BN = UCharacterDirection.BOUNDARY_NEUTRAL;
837 static final int MASK_R_AL = (1 << R | 1 << AL);
840 * Value returned by <code>BidiClassifier</code> when there is no need to
841 * override the standard Bidi class for a given code point.
842 * @see BidiClassifier
845 public static final int CLASS_DEFAULT = UCharacterDirection
846 .CHAR_DIRECTION_COUNT;
848 private static final char CR = '\r';
849 private static final char LF = '\n';
851 static final int LRM_BEFORE = 1;
852 static final int LRM_AFTER = 2;
853 static final int RLM_BEFORE = 4;
854 static final int RLM_AFTER = 8;
857 * reference to parent paragraph object (reference to self if this object is
858 * a paragraph object); set to null in a newly opened object; set to a
859 * real value after a successful execution of setPara or setLine
863 final UBiDiProps bdp;
865 /* character array representing the current text */
868 /* length of the current text */
871 /* if the option OPTION_STREAMING is set, this is the length of
872 * text actually processed by <code>setPara</code>, which may be shorter
873 * than the original length. Otherwise, it is identical to the original
878 /* if option OPTION_REMOVE_CONTROLS is set, and/or Bidi
879 * marks are allowed to be inserted in one of the reordering modes, the
880 * length of the result string may be different from the processed length.
884 /* indicators for whether memory may be allocated after construction */
885 boolean mayAllocateText;
886 boolean mayAllocateRuns;
888 /* arrays with one value per text-character */
889 byte[] dirPropsMemory = new byte[1];
890 byte[] levelsMemory = new byte[1];
894 /* are we performing an approximation of the "inverse Bidi" algorithm? */
897 /* are we using the basic algorithm or its variation? */
900 /* bitmask for reordering options */
901 int reorderingOptions;
903 /* must block separators receive level 0? */
904 boolean orderParagraphsLTR;
906 /* the paragraph level */
908 /* original paraLevel when contextual */
909 /* must be one of DEFAULT_xxx or 0 if not contextual */
910 byte defaultParaLevel;
912 /* the following is set in setPara, used in processPropertySeq */
914 ImpTabPair impTabPair; /* reference to levels state table pair */
915 /* the overall paragraph or line directionality*/
918 /* flags is a bit set for which directional properties are in the text */
921 /* lastArabicPos is index to the last AL in the text, -1 if none */
924 /* characters after trailingWSStart are WS and are */
925 /* implicitly at the paraLevel (rule (L1)) - levels may not reflect that */
928 /* fields for paragraph handling */
929 int paraCount; /* set in getDirProps() */
930 int[] parasMemory = new int[1];
931 int[] paras; /* limits of paragraphs, filled in
932 ResolveExplicitLevels() or CheckExplicitLevels() */
934 /* for single paragraph text, we only need a tiny array of paras (no allocation) */
935 int[] simpleParas = {0};
937 /* fields for line reordering */
938 int runCount; /* ==-1: runs not set up yet */
939 BidiRun[] runsMemory = new BidiRun[0];
942 /* for non-mixed text, we only need a tiny array of runs (no allocation) */
943 BidiRun[] simpleRuns = {new BidiRun()};
945 /* mapping of runs in logical order to visual order */
946 int[] logicalToVisualRunsMap;
947 /* flag to indicate that the map has been updated */
948 boolean isGoodLogicalToVisualRunsMap;
950 /* customized class provider */
951 BidiClassifier customClassifier = null;
953 /* for inverse Bidi with insertion of directional marks */
954 InsertPoints insertPoints = new InsertPoints();
956 /* for option OPTION_REMOVE_CONTROLS */
960 * Sometimes, bit values are more appropriate
961 * to deal with directionality properties.
962 * Abbreviations in these method names refer to names
963 * used in the Bidi algorithm.
965 static int DirPropFlag(byte dir) {
969 boolean testDirPropFlagAt(int flag, int index) {
970 return ((DirPropFlag((byte)(dirProps[index]&~CONTEXT_RTL)) & flag) != 0);
974 * The following bit is ORed to the property of characters in paragraphs
975 * with contextual RTL direction when paraLevel is contextual.
977 static final byte CONTEXT_RTL_SHIFT = 6;
978 static final byte CONTEXT_RTL = (byte)(1<<CONTEXT_RTL_SHIFT); // 0x40
979 static byte NoContextRTL(byte dir)
981 return (byte)(dir & ~CONTEXT_RTL);
985 * The following is a variant of DirProp.DirPropFlag() which ignores the
988 static int DirPropFlagNC(byte dir) {
989 return (1<<(dir & ~CONTEXT_RTL));
992 static final int DirPropFlagMultiRuns = DirPropFlag((byte)31);
994 /* to avoid some conditional statements, use tiny constant arrays */
995 static final int DirPropFlagLR[] = { DirPropFlag(L), DirPropFlag(R) };
996 static final int DirPropFlagE[] = { DirPropFlag(LRE), DirPropFlag(RLE) };
997 static final int DirPropFlagO[] = { DirPropFlag(LRO), DirPropFlag(RLO) };
999 static final int DirPropFlagLR(byte level) { return DirPropFlagLR[level & 1]; }
1000 static final int DirPropFlagE(byte level) { return DirPropFlagE[level & 1]; }
1001 static final int DirPropFlagO(byte level) { return DirPropFlagO[level & 1]; }
1004 * are there any characters that are LTR?
1006 static final int MASK_LTR =
1007 DirPropFlag(L)|DirPropFlag(EN)|DirPropFlag(AN)|DirPropFlag(LRE)|DirPropFlag(LRO);
1010 * are there any characters that are RTL?
1012 static final int MASK_RTL = DirPropFlag(R)|DirPropFlag(AL)|DirPropFlag(RLE)|DirPropFlag(RLO);
1014 /* explicit embedding codes */
1015 static final int MASK_LRX = DirPropFlag(LRE)|DirPropFlag(LRO);
1016 static final int MASK_RLX = DirPropFlag(RLE)|DirPropFlag(RLO);
1017 static final int MASK_OVERRIDE = DirPropFlag(LRO)|DirPropFlag(RLO);
1018 static final int MASK_EXPLICIT = MASK_LRX|MASK_RLX|DirPropFlag(PDF);
1019 static final int MASK_BN_EXPLICIT = DirPropFlag(BN)|MASK_EXPLICIT;
1021 /* paragraph and segment separators */
1022 static final int MASK_B_S = DirPropFlag(B)|DirPropFlag(S);
1024 /* all types that are counted as White Space or Neutral in some steps */
1025 static final int MASK_WS = MASK_B_S|DirPropFlag(WS)|MASK_BN_EXPLICIT;
1026 static final int MASK_N = DirPropFlag(ON)|MASK_WS;
1028 /* all types that are included in a sequence of
1029 * European Terminators for (W5) */
1030 static final int MASK_ET_NSM_BN = DirPropFlag(ET)|DirPropFlag(NSM)|MASK_BN_EXPLICIT;
1032 /* types that are neutrals or could becomes neutrals in (Wn) */
1033 static final int MASK_POSSIBLE_N = DirPropFlag(CS)|DirPropFlag(ES)|DirPropFlag(ET)|MASK_N;
1036 * These types may be changed to "e",
1037 * the embedding type (L or R) of the run,
1038 * in the Bidi algorithm (N2)
1040 static final int MASK_EMBEDDING = DirPropFlag(NSM)|MASK_POSSIBLE_N;
1043 * the dirProp's L and R are defined to 0 and 1 values in UCharacterDirection.java
1045 static byte GetLRFromLevel(byte level)
1047 return (byte)(level & 1);
1050 static boolean IsDefaultLevel(byte level)
1052 return ((level & LEVEL_DEFAULT_LTR) == LEVEL_DEFAULT_LTR);
1055 byte GetParaLevelAt(int index)
1057 return (defaultParaLevel != 0) ?
1058 (byte)(dirProps[index]>>CONTEXT_RTL_SHIFT) : paraLevel;
1061 static boolean IsBidiControlChar(int c)
1063 /* check for range 0x200c to 0x200f (ZWNJ, ZWJ, LRM, RLM) or
1064 0x202a to 0x202e (LRE, RLE, PDF, LRO, RLO) */
1065 return (((c & 0xfffffffc) == 0x200c) || ((c >= 0x202a) && (c <= 0x202e)));
1068 void verifyValidPara()
1070 if (!(this == this.paraBidi)) {
1071 throw new IllegalStateException();
1075 void verifyValidParaOrLine()
1077 Bidi para = this.paraBidi;
1083 if ((para == null) || (para != para.paraBidi)) {
1084 throw new IllegalStateException();
1088 void verifyRange(int index, int start, int limit)
1090 if (index < start || index >= limit) {
1091 throw new IllegalArgumentException("Value " + index +
1092 " is out of range " + start + " to " + limit);
1097 * Allocate a <code>Bidi</code> object.
1098 * Such an object is initially empty. It is assigned
1099 * the Bidi properties of a piece of text containing one or more paragraphs
1100 * by <code>setPara()</code>
1101 * or the Bidi properties of a line within a paragraph by
1102 * <code>setLine()</code>.<p>
1103 * This object can be reused.<p>
1104 * <code>setPara()</code> and <code>setLine()</code> will allocate
1105 * additional memory for internal structures as necessary.
1115 * Allocate a <code>Bidi</code> object with preallocated memory
1116 * for internal structures.
1117 * This method provides a <code>Bidi</code> object like the default constructor
1118 * but it also preallocates memory for internal structures
1119 * according to the sizings supplied by the caller.<p>
1120 * The preallocation can be limited to some of the internal memory
1121 * by setting some values to 0 here. That means that if, e.g.,
1122 * <code>maxRunCount</code> cannot be reasonably predetermined and should not
1123 * be set to <code>maxLength</code> (the only failproof value) to avoid
1124 * wasting memory, then <code>maxRunCount</code> could be set to 0 here
1125 * and the internal structures that are associated with it will be allocated
1126 * on demand, just like with the default constructor.
1128 * @param maxLength is the maximum text or line length that internal memory
1129 * will be preallocated for. An attempt to associate this object with a
1130 * longer text will fail, unless this value is 0, which leaves the allocation
1131 * up to the implementation.
1133 * @param maxRunCount is the maximum anticipated number of same-level runs
1134 * that internal memory will be preallocated for. An attempt to access
1135 * visual runs on an object that was not preallocated for as many runs
1136 * as the text was actually resolved to will fail,
1137 * unless this value is 0, which leaves the allocation up to the implementation.<br><br>
1138 * The number of runs depends on the actual text and maybe anywhere between
1139 * 1 and <code>maxLength</code>. It is typically small.
1141 * @throws IllegalArgumentException if maxLength or maxRunCount is less than 0
1144 public Bidi(int maxLength, int maxRunCount)
1146 /* check the argument values */
1147 if (maxLength < 0 || maxRunCount < 0) {
1148 throw new IllegalArgumentException();
1151 /* reset the object, all reference variables null, all flags false,
1153 In fact, we don't need to do anything, since class members are
1154 initialized as zero when an instance is created.
1157 mayAllocateText = false;
1158 mayAllocateRuns = false;
1159 orderParagraphsLTR = false;
1162 trailingWSStart = 0;
1165 defaultParaLevel = 0;
1168 /* get Bidi properties */
1170 bdp = UBiDiProps.getSingleton();
1172 catch (IOException e) {
1173 throw new MissingResourceException(e.getMessage(), "(BidiProps)", "");
1176 /* allocate memory for arrays as requested */
1177 if (maxLength > 0) {
1178 getInitialDirPropsMemory(maxLength);
1179 getInitialLevelsMemory(maxLength);
1181 mayAllocateText = true;
1184 if (maxRunCount > 0) {
1185 // if maxRunCount == 1, use simpleRuns[]
1186 if (maxRunCount > 1) {
1187 getInitialRunsMemory(maxRunCount);
1190 mayAllocateRuns = true;
1195 * We are allowed to allocate memory if object==null or
1196 * mayAllocate==true for each array that we need.
1198 * Assume sizeNeeded>0.
1199 * If object != null, then assume size > 0.
1201 private Object getMemory(String label, Object array, Class arrayClass,
1202 boolean mayAllocate, int sizeNeeded)
1204 int len = Array.getLength(array);
1206 /* we have at least enough memory and must not allocate */
1207 if (sizeNeeded == len) {
1211 /* we must not allocate */
1212 if (sizeNeeded <= len) {
1215 throw new OutOfMemoryError("Failed to allocate memory for "
1218 /* we may try to grow or shrink */
1219 /* FOOD FOR THOUGHT: when shrinking it should be possible to avoid
1220 the allocation altogether and rely on this.length */
1222 return Array.newInstance(arrayClass, sizeNeeded);
1223 } catch (Exception e) {
1224 throw new OutOfMemoryError("Failed to allocate memory for "
1229 /* helper methods for each allocated array */
1230 private void getDirPropsMemory(boolean mayAllocate, int len)
1232 Object array = getMemory("DirProps", dirPropsMemory, Byte.TYPE, mayAllocate, len);
1233 dirPropsMemory = (byte[]) array;
1236 void getDirPropsMemory(int len)
1238 getDirPropsMemory(mayAllocateText, len);
1241 private void getLevelsMemory(boolean mayAllocate, int len)
1243 Object array = getMemory("Levels", levelsMemory, Byte.TYPE, mayAllocate, len);
1244 levelsMemory = (byte[]) array;
1247 void getLevelsMemory(int len)
1249 getLevelsMemory(mayAllocateText, len);
1252 private void getRunsMemory(boolean mayAllocate, int len)
1254 Object array = getMemory("Runs", runsMemory, BidiRun.class, mayAllocate, len);
1255 runsMemory = (BidiRun[]) array;
1258 void getRunsMemory(int len)
1260 getRunsMemory(mayAllocateRuns, len);
1263 /* additional methods used by constructor - always allow allocation */
1264 private void getInitialDirPropsMemory(int len)
1266 getDirPropsMemory(true, len);
1269 private void getInitialLevelsMemory(int len)
1271 getLevelsMemory(true, len);
1274 private void getInitialParasMemory(int len)
1276 Object array = getMemory("Paras", parasMemory, Integer.TYPE, true, len);
1277 parasMemory = (int[]) array;
1280 private void getInitialRunsMemory(int len)
1282 getRunsMemory(true, len);
1286 * Modify the operation of the Bidi algorithm such that it
1287 * approximates an "inverse Bidi" algorithm. This method
1288 * must be called before <code>setPara()</code>.
1290 * <p>The normal operation of the Bidi algorithm as described
1291 * in the Unicode Technical Report is to take text stored in logical
1292 * (keyboard, typing) order and to determine the reordering of it for visual
1294 * Some legacy systems store text in visual order, and for operations
1295 * with standard, Unicode-based algorithms, the text needs to be transformed
1296 * to logical order. This is effectively the inverse algorithm of the
1297 * described Bidi algorithm. Note that there is no standard algorithm for
1298 * this "inverse Bidi" and that the current implementation provides only an
1299 * approximation of "inverse Bidi".</p>
1301 * <p>With <code>isInversed</code> set to <code>true</code>,
1302 * this method changes the behavior of some of the subsequent methods
1303 * in a way that they can be used for the inverse Bidi algorithm.
1304 * Specifically, runs of text with numeric characters will be treated in a
1305 * special way and may need to be surrounded with LRM characters when they are
1306 * written in reordered sequence.</p>
1308 * <p>Output runs should be retrieved using <code>getVisualRun()</code>.
1309 * Since the actual input for "inverse Bidi" is visually ordered text and
1310 * <code>getVisualRun()</code> gets the reordered runs, these are actually
1311 * the runs of the logically ordered output.</p>
1313 * <p>Calling this method with argument <code>isInverse</code> set to
1314 * <code>true</code> is equivalent to calling <code>setReorderingMode</code>
1315 * with argument <code>reorderingMode</code>
1316 * set to <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br>
1317 * Calling this method with argument <code>isInverse</code> set to
1318 * <code>false</code> is equivalent to calling <code>setReorderingMode</code>
1319 * with argument <code>reorderingMode</code>
1320 * set to <code>REORDER_DEFAULT</code>.
1322 * @param isInverse specifies "forward" or "inverse" Bidi operation.
1325 * @see #writeReordered
1326 * @see #setReorderingMode
1327 * @see #REORDER_INVERSE_NUMBERS_AS_L
1328 * @see #REORDER_DEFAULT
1331 public void setInverse(boolean isInverse) {
1332 this.isInverse = (isInverse);
1333 this.reorderingMode = isInverse ? REORDER_INVERSE_NUMBERS_AS_L
1338 * Is this <code>Bidi</code> object set to perform the inverse Bidi
1340 * <p>Note: calling this method after setting the reordering mode with
1341 * <code>setReorderingMode</code> will return <code>true</code> if the
1342 * reordering mode was set to
1343 * <code>REORDER_INVERSE_NUMBERS_AS_L<code>, <code>false</code>
1344 * for all other values.</p>
1346 * @return <code>true</code> if the <code>Bidi</code> object is set to
1347 * perform the inverse Bidi algorithm by handling numbers as L.
1350 * @see #setReorderingMode
1351 * @see #REORDER_INVERSE_NUMBERS_AS_L
1354 public boolean isInverse() {
1359 * Modify the operation of the Bidi algorithm such that it implements some
1360 * variant to the basic Bidi algorithm or approximates an "inverse Bidi"
1361 * algorithm, depending on different values of the "reordering mode".
1362 * This method must be called before <code>setPara()</code>, and stays in
1363 * effect until called again with a different argument.
1365 * <p>The normal operation of the Bidi algorithm as described in the Unicode
1366 * Standard Annex #9 is to take text stored in logical (keyboard, typing)
1367 * order and to determine how to reorder it for visual rendering.</p>
1369 * <p>With the reordering mode set to a value other than
1370 * <code>REORDER_DEFAULT</code>, this method changes the behavior of some of
1371 * the subsequent methods in a way such that they implement an inverse Bidi
1372 * algorithm or some other algorithm variants.</p>
1374 * <p>Some legacy systems store text in visual order, and for operations
1375 * with standard, Unicode-based algorithms, the text needs to be transformed
1376 * into logical order. This is effectively the inverse algorithm of the
1377 * described Bidi algorithm. Note that there is no standard algorithm for
1378 * this "inverse Bidi", so a number of variants are implemented here.</p>
1380 * <p>In other cases, it may be desirable to emulate some variant of the
1381 * Logical to Visual algorithm (e.g. one used in MS Windows), or perform a
1382 * Logical to Logical transformation.</p>
1385 * <li>When the Reordering Mode is set to
1386 * <code>REORDER_DEFAULT</code>,
1387 * the standard Bidi Logical to Visual algorithm is applied.</li>
1389 * <li>When the reordering mode is set to
1390 * <code>REORDER_NUMBERS_SPECIAL</code>,
1391 * the algorithm used to perform Bidi transformations when calling
1392 * <code>setPara</code> should approximate the algorithm used in Microsoft
1393 * Windows XP rather than strictly conform to the Unicode Bidi algorithm.
1395 * The differences between the basic algorithm and the algorithm addressed
1396 * by this option are as follows:
1398 * <li>Within text at an even embedding level, the sequence "123AB"
1399 * (where AB represent R or AL letters) is transformed to "123BA" by the
1400 * Unicode algorithm and to "BA123" by the Windows algorithm.</li>
1402 * <li>Arabic-Indic numbers (AN) are handled by the Windows algorithm just
1403 * like regular numbers (EN).</li>
1406 * <li>When the reordering mode is set to
1407 * <code>REORDER_GROUP_NUMBERS_WITH_R</code>,
1408 * numbers located between LTR text and RTL text are associated with the RTL
1409 * text. For instance, an LTR paragraph with content "abc 123 DEF" (where
1410 * upper case letters represent RTL characters) will be transformed to
1411 * "abc FED 123" (and not "abc 123 FED"), "DEF 123 abc" will be transformed
1412 * to "123 FED abc" and "123 FED abc" will be transformed to "DEF 123 abc".
1413 * This makes the algorithm reversible and makes it useful when round trip
1414 * (from visual to logical and back to visual) must be achieved without
1415 * adding LRM characters. However, this is a variation from the standard
1416 * Unicode Bidi algorithm.<br>
1417 * The source text should not contain Bidi control characters other than LRM
1420 * <li>When the reordering mode is set to
1421 * <code>REORDER_RUNS_ONLY</code>,
1422 * a "Logical to Logical" transformation must be performed:
1424 * <li>If the default text level of the source text (argument
1425 * <code>paraLevel</code> in <code>setPara</code>) is even, the source text
1426 * will be handled as LTR logical text and will be transformed to the RTL
1427 * logical text which has the same LTR visual display.</li>
1428 * <li>If the default level of the source text is odd, the source text
1429 * will be handled as RTL logical text and will be transformed to the
1430 * LTR logical text which has the same LTR visual display.</li>
1432 * This mode may be needed when logical text which is basically Arabic or
1433 * Hebrew, with possible included numbers or phrases in English, has to be
1434 * displayed as if it had an even embedding level (this can happen if the
1435 * displaying application treats all text as if it was basically LTR).
1437 * This mode may also be needed in the reverse case, when logical text which
1438 * is basically English, with possible included phrases in Arabic or Hebrew,
1439 * has to be displayed as if it had an odd embedding level.
1441 * Both cases could be handled by adding LRE or RLE at the head of the
1442 * text, if the display subsystem supports these formatting controls. If it
1443 * does not, the problem may be handled by transforming the source text in
1444 * this mode before displaying it, so that it will be displayed properly.
1446 * The source text should not contain Bidi control characters other than LRM
1449 * <li>When the reordering mode is set to
1450 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>, an "inverse Bidi"
1451 * algorithm is applied.
1452 * Runs of text with numeric characters will be treated like LTR letters and
1453 * may need to be surrounded with LRM characters when they are written in
1454 * reordered sequence (the option <code>INSERT_LRM_FOR_NUMERIC</code> can
1455 * be used with method <code>writeReordered</code> to this end. This mode
1456 * is equivalent to calling <code>setInverse()</code> with
1457 * argument <code>isInverse</code> set to <code>true</code>.</li>
1459 * <li>When the reordering mode is set to
1460 * <code>REORDER_INVERSE_LIKE_DIRECT</code>, the "direct" Logical to
1461 * Visual Bidi algorithm is used as an approximation of an "inverse Bidi"
1462 * algorithm. This mode is similar to mode
1463 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> but is closer to the
1464 * regular Bidi algorithm.
1466 * For example, an LTR paragraph with the content "FED 123 456 CBA" (where
1467 * upper case represents RTL characters) will be transformed to
1468 * "ABC 456 123 DEF", as opposed to "DEF 123 456 ABC"
1469 * with mode <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br>
1470 * When used in conjunction with option
1471 * <code>OPTION_INSERT_MARKS</code>, this mode generally
1472 * adds Bidi marks to the output significantly more sparingly than mode
1473 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br> with option
1474 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to
1475 * <code>writeReordered</code>.</li>
1477 * <li>When the reordering mode is set to
1478 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the Logical to Visual
1479 * Bidi algorithm used in Windows XP is used as an approximation of an "inverse
1482 * For example, an LTR paragraph with the content "abc FED123" (where
1483 * upper case represents RTL characters) will be transformed to
1487 * <p>In all the reordering modes specifying an "inverse Bidi" algorithm
1488 * (i.e. those with a name starting with <code>REORDER_INVERSE</code>),
1489 * output runs should be retrieved using <code>getVisualRun()</code>, and
1490 * the output text with <code>writeReordered()</code>. The caller should
1491 * keep in mind that in "inverse Bidi" modes the input is actually visually
1492 * ordered text and reordered output returned by <code>getVisualRun()</code>
1493 * or <code>writeReordered()</code> are actually runs or character string
1494 * of logically ordered output.<br>
1495 * For all the "inverse Bidi" modes, the source text should not contain
1496 * Bidi control characters other than LRM or RLM.</p>
1498 * <p>Note that option <code>OUTPUT_REVERSE</code> of
1499 * <code>writeReordered</code> has no useful meaning and should not be used
1500 * in conjunction with any value of the reordering mode specifying "inverse
1501 * Bidi" or with value <code>REORDER_RUNS_ONLY</code>.
1503 * @param reorderingMode specifies the required variant of the Bidi
1508 * @see #writeReordered
1509 * @see #INSERT_LRM_FOR_NUMERIC
1510 * @see #OUTPUT_REVERSE
1511 * @see #REORDER_DEFAULT
1512 * @see #REORDER_NUMBERS_SPECIAL
1513 * @see #REORDER_GROUP_NUMBERS_WITH_R
1514 * @see #REORDER_RUNS_ONLY
1515 * @see #REORDER_INVERSE_NUMBERS_AS_L
1516 * @see #REORDER_INVERSE_LIKE_DIRECT
1517 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL
1520 public void setReorderingMode(int reorderingMode) {
1521 if ((reorderingMode < REORDER_DEFAULT) ||
1522 (reorderingMode >= REORDER_COUNT))
1523 return; /* don't accept a wrong value */
1524 this.reorderingMode = reorderingMode;
1526 reorderingMode == REORDER_INVERSE_NUMBERS_AS_L;
1530 * What is the requested reordering mode for a given Bidi object?
1532 * @return the current reordering mode of the Bidi object
1534 * @see #setReorderingMode
1537 public int getReorderingMode() {
1538 return this.reorderingMode;
1542 * Specify which of the reordering options should be applied during Bidi
1545 * @param options A combination of zero or more of the following
1546 * reordering options:
1547 * <code>OPTION_DEFAULT</code>, <code>OPTION_INSERT_MARKS</code>,
1548 * <code>OPTION_REMOVE_CONTROLS</code>, <code>OPTION_STREAMING</code>.
1550 * @see #getReorderingOptions
1551 * @see #OPTION_DEFAULT
1552 * @see #OPTION_INSERT_MARKS
1553 * @see #OPTION_REMOVE_CONTROLS
1554 * @see #OPTION_STREAMING
1557 public void setReorderingOptions(int options) {
1558 if ((options & OPTION_REMOVE_CONTROLS) != 0) {
1559 this.reorderingOptions = options & ~OPTION_INSERT_MARKS;
1561 this.reorderingOptions = options;
1566 * What are the reordering options applied to a given Bidi object?
1568 * @return the current reordering options of the Bidi object
1570 * @see #setReorderingOptions
1573 public int getReorderingOptions() {
1574 return this.reorderingOptions;
1577 /* perform (P2)..(P3) ------------------------------------------------------- */
1579 private void getDirProps()
1582 flags = 0; /* collect all directionalities in the text */
1585 byte paraDirDefault = 0; /* initialize to avoid compiler warnings */
1586 boolean isDefaultLevel = IsDefaultLevel(paraLevel);
1587 /* for inverse Bidi, the default para level is set to RTL if there is a
1588 strong R or AL character at either end of the text */
1589 boolean isDefaultLevelInverse=isDefaultLevel &&
1590 (reorderingMode==REORDER_INVERSE_LIKE_DIRECT ||
1591 reorderingMode==REORDER_INVERSE_FOR_NUMBERS_SPECIAL);
1594 boolean removeBidiControls = (reorderingOptions & OPTION_REMOVE_CONTROLS) != 0;
1596 final int NOT_CONTEXTUAL = 0; /* 0: not contextual paraLevel */
1597 final int LOOKING_FOR_STRONG = 1; /* 1: looking for first strong char */
1598 final int FOUND_STRONG_CHAR = 2; /* 2: found first strong char */
1601 int paraStart = 0; /* index of first char in paragraph */
1602 byte paraDir; /* == CONTEXT_RTL within paragraphs
1603 starting with strong R char */
1604 byte lastStrongDir=0; /* for default level & inverse Bidi */
1605 int lastStrongLTR=0; /* for STREAMING option */
1607 if ((reorderingOptions & OPTION_STREAMING) > 0) {
1611 if (isDefaultLevel) {
1612 paraDirDefault = ((paraLevel & 1) != 0) ? CONTEXT_RTL : 0;
1613 paraDir = paraDirDefault;
1614 lastStrongDir = paraDirDefault;
1615 state = LOOKING_FOR_STRONG;
1617 state = NOT_CONTEXTUAL;
1620 /* count paragraphs and determine the paragraph level (P2..P3) */
1622 * see comment on constant fields:
1623 * the LEVEL_DEFAULT_XXX values are designed so that
1624 * their low-order bit alone yields the intended default
1627 for (i = 0; i < originalLength; /* i is incremented in the loop */) {
1628 i0 = i; /* index of first code unit */
1629 uchar = UTF16.charAt(text, 0, originalLength, i);
1630 i += UTF16.getCharCount(uchar);
1631 i1 = i - 1; /* index of last code unit, gets the directional property */
1633 dirProp = (byte)getCustomizedClass(uchar);
1634 flags |= DirPropFlag(dirProp);
1635 dirProps[i1] = (byte)(dirProp | paraDir);
1636 if (i1 > i0) { /* set previous code units' properties to BN */
1637 flags |= DirPropFlag(BN);
1639 dirProps[--i1] = (byte)(BN | paraDir);
1642 if (state == LOOKING_FOR_STRONG) {
1644 state = FOUND_STRONG_CHAR;
1647 for (i1 = paraStart; i1 < i; i1++) {
1648 dirProps[i1] &= ~CONTEXT_RTL;
1653 if (dirProp == R || dirProp == AL) {
1654 state = FOUND_STRONG_CHAR;
1656 paraDir = CONTEXT_RTL;
1657 for (i1 = paraStart; i1 < i; i1++) {
1658 dirProps[i1] |= CONTEXT_RTL;
1666 lastStrongLTR = i; /* i is index to next character */
1668 else if (dirProp == R) {
1669 lastStrongDir = CONTEXT_RTL;
1671 else if (dirProp == AL) {
1672 lastStrongDir = CONTEXT_RTL;
1673 lastArabicPos = i-1;
1675 else if (dirProp == B) {
1676 if ((reorderingOptions & OPTION_STREAMING) != 0) {
1677 this.length = i; /* i is index to next character */
1679 if (isDefaultLevelInverse && (lastStrongDir==CONTEXT_RTL) &&(paraDir!=lastStrongDir)) {
1680 for ( ; paraStart < i; paraStart++) {
1681 dirProps[paraStart] |= CONTEXT_RTL;
1684 if (i < originalLength) { /* B not last char in text */
1685 if (!((uchar == (int)CR) && (text[i] == (int)LF))) {
1688 if (isDefaultLevel) {
1689 state=LOOKING_FOR_STRONG;
1690 paraStart = i; /* i is index to next character */
1691 paraDir = paraDirDefault;
1692 lastStrongDir = paraDirDefault;
1696 if (removeBidiControls && IsBidiControlChar(uchar)) {
1700 if (isDefaultLevelInverse && (lastStrongDir==CONTEXT_RTL) &&(paraDir!=lastStrongDir)) {
1701 for (i1 = paraStart; i1 < originalLength; i1++) {
1702 dirProps[i1] |= CONTEXT_RTL;
1705 if (isDefaultLevel) {
1706 paraLevel = GetParaLevelAt(0);
1708 if ((reorderingOptions & OPTION_STREAMING) > 0) {
1709 if ((lastStrongLTR > this.length) &&
1710 (GetParaLevelAt(lastStrongLTR) == 0)) {
1711 this.length = lastStrongLTR;
1713 if (this.length < originalLength) {
1717 /* The following line does nothing new for contextual paraLevel, but is
1718 needed for absolute paraLevel. */
1719 flags |= DirPropFlagLR(paraLevel);
1721 if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) {
1722 flags |= DirPropFlag(L);
1726 /* perform (X1)..(X9) ------------------------------------------------------- */
1728 /* determine if the text is mixed-directional or single-directional */
1729 private byte directionFromFlags() {
1730 /* if the text contains AN and neutrals, then some neutrals may become RTL */
1731 if (!((flags & MASK_RTL) != 0 ||
1732 ((flags & DirPropFlag(AN)) != 0 &&
1733 (flags & MASK_POSSIBLE_N) != 0))) {
1735 } else if ((flags & MASK_LTR) == 0) {
1743 * Resolve the explicit levels as specified by explicit embedding codes.
1744 * Recalculate the flags to have them reflect the real properties
1745 * after taking the explicit embeddings into account.
1747 * The Bidi algorithm is designed to result in the same behavior whether embedding
1748 * levels are externally specified (from "styled text", supposedly the preferred
1749 * method) or set by explicit embedding codes (LRx, RLx, PDF) in the plain text.
1750 * That is why (X9) instructs to remove all explicit codes (and BN).
1751 * However, in a real implementation, this removal of these codes and their index
1752 * positions in the plain text is undesirable since it would result in
1753 * reallocated, reindexed text.
1754 * Instead, this implementation leaves the codes in there and just ignores them
1755 * in the subsequent processing.
1756 * In order to get the same reordering behavior, positions with a BN or an
1757 * explicit embedding code just get the same level assigned as the last "real"
1760 * Some implementations, not this one, then overwrite some of these
1761 * directionality properties at "real" same-level-run boundaries by
1762 * L or R codes so that the resolution of weak types can be performed on the
1763 * entire paragraph at once instead of having to parse it once more and
1764 * perform that resolution on same-level-runs.
1765 * This limits the scope of the implicit rules in effectively
1766 * the same way as the run limits.
1768 * Instead, this implementation does not modify these codes.
1769 * On one hand, the paragraph has to be scanned for same-level-runs, but
1770 * on the other hand, this saves another loop to reset these codes,
1771 * or saves making and modifying a copy of dirProps[].
1774 * Note that (Pn) and (Xn) changed significantly from version 4 of the Bidi algorithm.
1777 * Handling the stack of explicit levels (Xn):
1779 * With the Bidi stack of explicit levels,
1780 * as pushed with each LRE, RLE, LRO, and RLO and popped with each PDF,
1781 * the explicit level must never exceed MAX_EXPLICIT_LEVEL==61.
1783 * In order to have a correct push-pop semantics even in the case of overflows,
1784 * there are two overflow counters:
1785 * - countOver60 is incremented with each LRx at level 60
1786 * - from level 60, one RLx increases the level to 61
1787 * - countOver61 is incremented with each LRx and RLx at level 61
1789 * Popping levels with PDF must work in the opposite order so that level 61
1790 * is correct at the correct point. Underflows (too many PDFs) must be checked.
1792 * This implementation assumes that MAX_EXPLICIT_LEVEL is odd.
1794 private byte resolveExplicitLevels() {
1797 byte level = GetParaLevelAt(0);
1802 /* determine if the text is mixed-directional or single-directional */
1803 dirct = directionFromFlags();
1805 /* we may not need to resolve any explicit levels, but for multiple
1806 paragraphs we want to loop on all chars to set the para boundaries */
1807 if ((dirct != MIXED) && (paraCount == 1)) {
1808 /* not mixed directionality: levels don't matter - trailingWSStart will be 0 */
1809 } else if ((paraCount == 1) &&
1810 ((flags & MASK_EXPLICIT) == 0 ||
1811 reorderingMode > REORDER_LAST_LOGICAL_TO_VISUAL)) {
1812 /* mixed, but all characters are at the same embedding level */
1813 /* or we are in "inverse Bidi" */
1814 /* and we don't have contextual multiple paragraphs with some B char */
1815 /* set all levels to the paragraph level */
1816 for (i = 0; i < length; ++i) {
1820 /* continue to perform (Xn) */
1822 /* (X1) level is set for all codes, embeddingLevel keeps track of the push/pop operations */
1823 /* both variables may carry the LEVEL_OVERRIDE flag to indicate the override status */
1824 byte embeddingLevel = level;
1828 byte[] stack = new byte[MAX_EXPLICIT_LEVEL]; /* we never push anything >=MAX_EXPLICIT_LEVEL */
1829 int countOver60 = 0;
1830 int countOver61 = 0; /* count overflows of explicit levels */
1832 /* recalculate the flags */
1835 for (i = 0; i < length; ++i) {
1836 dirProp = NoContextRTL(dirProps[i]);
1841 newLevel = (byte)((embeddingLevel+2) & ~(LEVEL_OVERRIDE | 1)); /* least greater even level */
1842 if (newLevel <= MAX_EXPLICIT_LEVEL) {
1843 stack[stackTop] = embeddingLevel;
1845 embeddingLevel = newLevel;
1846 if (dirProp == LRO) {
1847 embeddingLevel |= LEVEL_OVERRIDE;
1849 /* we don't need to set LEVEL_OVERRIDE off for LRE
1850 since this has already been done for newLevel which is
1851 the source for embeddingLevel.
1853 } else if ((embeddingLevel & ~LEVEL_OVERRIDE) == MAX_EXPLICIT_LEVEL) {
1855 } else /* (embeddingLevel & ~LEVEL_OVERRIDE) == MAX_EXPLICIT_LEVEL-1 */ {
1858 flags |= DirPropFlag(BN);
1863 newLevel=(byte)(((embeddingLevel & ~LEVEL_OVERRIDE) + 1) | 1); /* least greater odd level */
1864 if (newLevel<=MAX_EXPLICIT_LEVEL) {
1865 stack[stackTop] = embeddingLevel;
1867 embeddingLevel = newLevel;
1868 if (dirProp == RLO) {
1869 embeddingLevel |= LEVEL_OVERRIDE;
1871 /* we don't need to set LEVEL_OVERRIDE off for RLE
1872 since this has already been done for newLevel which is
1873 the source for embeddingLevel.
1878 flags |= DirPropFlag(BN);
1882 /* handle all the overflow cases first */
1883 if (countOver61 > 0) {
1885 } else if (countOver60 > 0 && (embeddingLevel & ~LEVEL_OVERRIDE) != MAX_EXPLICIT_LEVEL) {
1886 /* handle LRx overflows from level 60 */
1888 } else if (stackTop > 0) {
1889 /* this is the pop operation; it also pops level 61 while countOver60>0 */
1891 embeddingLevel = stack[stackTop];
1892 /* } else { (underflow) */
1894 flags |= DirPropFlag(BN);
1900 level = GetParaLevelAt(i);
1901 if ((i + 1) < length) {
1902 embeddingLevel = GetParaLevelAt(i+1);
1903 if (!((text[i] == CR) && (text[i + 1] == LF))) {
1904 paras[paraIndex++] = i+1;
1907 flags |= DirPropFlag(B);
1910 /* BN, LRE, RLE, and PDF are supposed to be removed (X9) */
1911 /* they will get their levels set correctly in adjustWSLevels() */
1912 flags |= DirPropFlag(BN);
1915 /* all other types get the "real" level */
1916 if (level != embeddingLevel) {
1917 level = embeddingLevel;
1918 if ((level & LEVEL_OVERRIDE) != 0) {
1919 flags |= DirPropFlagO(level) | DirPropFlagMultiRuns;
1921 flags |= DirPropFlagE(level) | DirPropFlagMultiRuns;
1924 if ((level & LEVEL_OVERRIDE) == 0) {
1925 flags |= DirPropFlag(dirProp);
1931 * We need to set reasonable levels even on BN codes and
1932 * explicit codes because we will later look at same-level runs (X10).
1936 if ((flags & MASK_EMBEDDING) != 0) {
1937 flags |= DirPropFlagLR(paraLevel);
1939 if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) {
1940 flags |= DirPropFlag(L);
1943 /* subsequently, ignore the explicit codes and BN (X9) */
1945 /* again, determine if the text is mixed-directional or single-directional */
1946 dirct = directionFromFlags();
1953 * Use a pre-specified embedding levels array:
1955 * Adjust the directional properties for overrides (->LEVEL_OVERRIDE),
1956 * ignore all explicit codes (X9),
1957 * and check all the preset levels.
1959 * Recalculate the flags to have them reflect the real properties
1960 * after taking the explicit embeddings into account.
1962 private byte checkExplicitLevels() {
1965 this.flags = 0; /* collect all directionalities in the text */
1969 for (i = 0; i < length; ++i) {
1971 dirProp = NoContextRTL(dirProps[i]);
1972 if ((level & LEVEL_OVERRIDE) != 0) {
1973 /* keep the override flag in levels[i] but adjust the flags */
1974 level &= ~LEVEL_OVERRIDE; /* make the range check below simpler */
1975 flags |= DirPropFlagO(level);
1978 flags |= DirPropFlagE(level) | DirPropFlag(dirProp);
1980 if ((level < GetParaLevelAt(i) &&
1981 !((0 == level) && (dirProp == B))) ||
1982 (MAX_EXPLICIT_LEVEL <level)) {
1983 /* level out of bounds */
1984 throw new IllegalArgumentException("level " + level +
1985 " out of bounds at " + i);
1987 if ((dirProp == B) && ((i + 1) < length)) {
1988 if (!((text[i] == CR) && (text[i + 1] == LF))) {
1989 paras[paraIndex++] = i + 1;
1993 if ((flags&MASK_EMBEDDING) != 0) {
1994 flags |= DirPropFlagLR(paraLevel);
1997 /* determine if the text is mixed-directional or single-directional */
1998 return directionFromFlags();
2001 /*********************************************************************/
2002 /* The Properties state machine table */
2003 /*********************************************************************/
2005 /* All table cells are 8 bits: */
2006 /* bits 0..4: next state */
2007 /* bits 5..7: action to perform (if > 0) */
2009 /* Cells may be of format "n" where n represents the next state */
2010 /* (except for the rightmost column). */
2011 /* Cells may also be of format "_(x,y)" where x represents an action */
2012 /* to perform and y represents the next state. */
2014 /*********************************************************************/
2015 /* Definitions and type for properties state tables */
2016 /*********************************************************************/
2017 private static final int IMPTABPROPS_COLUMNS = 14;
2018 private static final int IMPTABPROPS_RES = IMPTABPROPS_COLUMNS - 1;
2019 private static short GetStateProps(short cell) {
2020 return (short)(cell & 0x1f);
2022 private static short GetActionProps(short cell) {
2023 return (short)(cell >> 5);
2026 private static final short groupProp[] = /* dirProp regrouped */
2028 /* L R EN ES ET AN CS B S WS ON LRE LRO AL RLE RLO PDF NSM BN */
2029 0, 1, 2, 7, 8, 3, 9, 6, 5, 4, 4, 10, 10, 12, 10, 10, 10, 11, 10
2031 private static final short _L = 0;
2032 private static final short _R = 1;
2033 private static final short _EN = 2;
2034 private static final short _AN = 3;
2035 private static final short _ON = 4;
2036 private static final short _S = 5;
2037 private static final short _B = 6; /* reduced dirProp */
2039 /*********************************************************************/
2041 /* PROPERTIES STATE TABLE */
2043 /* In table impTabProps, */
2044 /* - the ON column regroups ON and WS */
2045 /* - the BN column regroups BN, LRE, RLE, LRO, RLO, PDF */
2046 /* - the Res column is the reduced property assigned to a run */
2048 /* Action 1: process current run1, init new run1 */
2049 /* 2: init new run2 */
2050 /* 3: process run1, process run2, init new run1 */
2051 /* 4: process run1, set run1=run2, init new run2 */
2054 /* 1) This table is used in resolveImplicitLevels(). */
2055 /* 2) This table triggers actions when there is a change in the Bidi*/
2056 /* property of incoming characters (action 1). */
2057 /* 3) Most such property sequences are processed immediately (in */
2058 /* fact, passed to processPropertySeq(). */
2059 /* 4) However, numbers are assembled as one sequence. This means */
2060 /* that undefined situations (like CS following digits, until */
2061 /* it is known if the next char will be a digit) are held until */
2062 /* following chars define them. */
2063 /* Example: digits followed by CS, then comes another CS or ON; */
2064 /* the digits will be processed, then the CS assigned */
2065 /* as the start of an ON sequence (action 3). */
2066 /* 5) There are cases where more than one sequence must be */
2067 /* processed, for instance digits followed by CS followed by L: */
2068 /* the digits must be processed as one sequence, and the CS */
2069 /* must be processed as an ON sequence, all this before starting */
2070 /* assembling chars for the opening L sequence. */
2073 private static final short impTabProps[][] =
2075 /* L, R, EN, AN, ON, S, B, ES, ET, CS, BN, NSM, AL, Res */
2076 /* 0 Init */ { 1, 2, 4, 5, 7, 15, 17, 7, 9, 7, 0, 7, 3, _ON },
2077 /* 1 L */ { 1, 32+2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 1, 1, 32+3, _L },
2078 /* 2 R */ { 32+1, 2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 2, 2, 32+3, _R },
2079 /* 3 AL */ { 32+1, 32+2, 32+6, 32+6, 32+8, 32+16, 32+17, 32+8, 32+8, 32+8, 3, 3, 3, _R },
2080 /* 4 EN */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 64+10, 11, 64+10, 4, 4, 32+3, _EN },
2081 /* 5 AN */ { 32+1, 32+2, 32+4, 5, 32+7, 32+15, 32+17, 32+7, 32+9, 64+12, 5, 5, 32+3, _AN },
2082 /* 6 AL:EN/AN */ { 32+1, 32+2, 6, 6, 32+8, 32+16, 32+17, 32+8, 32+8, 64+13, 6, 6, 32+3, _AN },
2083 /* 7 ON */ { 32+1, 32+2, 32+4, 32+5, 7, 32+15, 32+17, 7, 64+14, 7, 7, 7, 32+3, _ON },
2084 /* 8 AL:ON */ { 32+1, 32+2, 32+6, 32+6, 8, 32+16, 32+17, 8, 8, 8, 8, 8, 32+3, _ON },
2085 /* 9 ET */ { 32+1, 32+2, 4, 32+5, 7, 32+15, 32+17, 7, 9, 7, 9, 9, 32+3, _ON },
2086 /*10 EN+ES/CS */ { 96+1, 96+2, 4, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 10, 128+7, 96+3, _EN },
2087 /*11 EN+ET */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 32+7, 11, 32+7, 11, 11, 32+3, _EN },
2088 /*12 AN+CS */ { 96+1, 96+2, 96+4, 5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 12, 128+7, 96+3, _AN },
2089 /*13 AL:EN/AN+CS */ { 96+1, 96+2, 6, 6, 128+8, 96+16, 96+17, 128+8, 128+8, 128+8, 13, 128+8, 96+3, _AN },
2090 /*14 ON+ET */ { 32+1, 32+2, 128+4, 32+5, 7, 32+15, 32+17, 7, 14, 7, 14, 14, 32+3, _ON },
2091 /*15 S */ { 32+1, 32+2, 32+4, 32+5, 32+7, 15, 32+17, 32+7, 32+9, 32+7, 15, 32+7, 32+3, _S },
2092 /*16 AL:S */ { 32+1, 32+2, 32+6, 32+6, 32+8, 16, 32+17, 32+8, 32+8, 32+8, 16, 32+8, 32+3, _S },
2093 /*17 B */ { 32+1, 32+2, 32+4, 32+5, 32+7, 32+15, 17, 32+7, 32+9, 32+7, 17, 32+7, 32+3, _B }
2096 /*********************************************************************/
2097 /* The levels state machine tables */
2098 /*********************************************************************/
2100 /* All table cells are 8 bits: */
2101 /* bits 0..3: next state */
2102 /* bits 4..7: action to perform (if > 0) */
2104 /* Cells may be of format "n" where n represents the next state */
2105 /* (except for the rightmost column). */
2106 /* Cells may also be of format "_(x,y)" where x represents an action */
2107 /* to perform and y represents the next state. */
2109 /* This format limits each table to 16 states each and to 15 actions.*/
2111 /*********************************************************************/
2112 /* Definitions and type for levels state tables */
2113 /*********************************************************************/
2114 private static final int IMPTABLEVELS_COLUMNS = _B + 2;
2115 private static final int IMPTABLEVELS_RES = IMPTABLEVELS_COLUMNS - 1;
2116 private static short GetState(byte cell) { return (short)(cell & 0x0f); }
2117 private static short GetAction(byte cell) { return (short)(cell >> 4); }
2119 private static class ImpTabPair {
2123 ImpTabPair(byte[][] table1, byte[][] table2,
2124 short[] act1, short[] act2) {
2125 imptab = new byte[][][] {table1, table2};
2126 impact = new short[][] {act1, act2};
2130 /*********************************************************************/
2132 /* LEVELS STATE TABLES */
2134 /* In all levels state tables, */
2135 /* - state 0 is the initial state */
2136 /* - the Res column is the increment to add to the text level */
2137 /* for this property sequence. */
2139 /* The impact arrays for each table of a pair map the local action */
2140 /* numbers of the table to the total list of actions. For instance, */
2141 /* action 2 in a given table corresponds to the action number which */
2142 /* appears in entry [2] of the impact array for that table. */
2143 /* The first entry of all impact arrays must be 0. */
2145 /* Action 1: init conditional sequence */
2146 /* 2: prepend conditional sequence to current sequence */
2147 /* 3: set ON sequence to new level - 1 */
2148 /* 4: init EN/AN/ON sequence */
2149 /* 5: fix EN/AN/ON sequence followed by R */
2150 /* 6: set previous level sequence to level 2 */
2153 /* 1) These tables are used in processPropertySeq(). The input */
2154 /* is property sequences as determined by resolveImplicitLevels. */
2155 /* 2) Most such property sequences are processed immediately */
2156 /* (levels are assigned). */
2157 /* 3) However, some sequences cannot be assigned a final level till */
2158 /* one or more following sequences are received. For instance, */
2159 /* ON following an R sequence within an even-level paragraph. */
2160 /* If the following sequence is R, the ON sequence will be */
2161 /* assigned basic run level+1, and so will the R sequence. */
2162 /* 4) S is generally handled like ON, since its level will be fixed */
2163 /* to paragraph level in adjustWSLevels(). */
2166 private static final byte impTabL_DEFAULT[][] = /* Even paragraph level */
2167 /* In this table, conditional sequences receive the higher possible level
2168 until proven otherwise.
2171 /* L, R, EN, AN, ON, S, B, Res */
2172 /* 0 : init */ { 0, 1, 0, 2, 0, 0, 0, 0 },
2173 /* 1 : R */ { 0, 1, 3, 3, 0x14, 0x14, 0, 1 },
2174 /* 2 : AN */ { 0, 1, 0, 2, 0x15, 0x15, 0, 2 },
2175 /* 3 : R+EN/AN */ { 0, 1, 3, 3, 0x14, 0x14, 0, 2 },
2176 /* 4 : R+ON */ { 0x20, 1, 3, 3, 4, 4, 0x20, 1 },
2177 /* 5 : AN+ON */ { 0x20, 1, 0x20, 2, 5, 5, 0x20, 1 }
2180 private static final byte impTabR_DEFAULT[][] = /* Odd paragraph level */
2181 /* In this table, conditional sequences receive the lower possible level
2182 until proven otherwise.
2185 /* L, R, EN, AN, ON, S, B, Res */
2186 /* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 },
2187 /* 1 : L */ { 1, 0, 1, 3, 0x14, 0x14, 0, 1 },
2188 /* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 },
2189 /* 3 : L+AN */ { 1, 0, 1, 3, 5, 5, 0, 1 },
2190 /* 4 : L+ON */ { 0x21, 0, 0x21, 3, 4, 4, 0, 0 },
2191 /* 5 : L+AN+ON */ { 1, 0, 1, 3, 5, 5, 0, 0 }
2194 private static final short[] impAct0 = {0,1,2,3,4,5,6};
2196 private static final ImpTabPair impTab_DEFAULT = new ImpTabPair(
2197 impTabL_DEFAULT, impTabR_DEFAULT, impAct0, impAct0);
2199 private static final byte impTabL_NUMBERS_SPECIAL[][] = { /* Even paragraph level */
2200 /* In this table, conditional sequences receive the higher possible
2201 level until proven otherwise.
2203 /* L, R, EN, AN, ON, S, B, Res */
2204 /* 0 : init */ { 0, 2, 1, 1, 0, 0, 0, 0 },
2205 /* 1 : L+EN/AN */ { 0, 2, 1, 1, 0, 0, 0, 2 },
2206 /* 2 : R */ { 0, 2, 4, 4, 0x13, 0, 0, 1 },
2207 /* 3 : R+ON */ { 0x20, 2, 4, 4, 3, 3, 0x20, 1 },
2208 /* 4 : R+EN/AN */ { 0, 2, 4, 4, 0x13, 0x13, 0, 2 }
2210 private static final ImpTabPair impTab_NUMBERS_SPECIAL = new ImpTabPair(
2211 impTabL_NUMBERS_SPECIAL, impTabR_DEFAULT, impAct0, impAct0);
2213 private static final byte impTabL_GROUP_NUMBERS_WITH_R[][] = {
2214 /* In this table, EN/AN+ON sequences receive levels as if associated with R
2215 until proven that there is L or sor/eor on both sides. AN is handled like EN.
2217 /* L, R, EN, AN, ON, S, B, Res */
2218 /* 0 init */ { 0, 3, 0x11, 0x11, 0, 0, 0, 0 },
2219 /* 1 EN/AN */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 2 },
2220 /* 2 EN/AN+ON */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 1 },
2221 /* 3 R */ { 0, 3, 5, 5, 0x14, 0, 0, 1 },
2222 /* 4 R+ON */ { 0x20, 3, 5, 5, 4, 0x20, 0x20, 1 },
2223 /* 5 R+EN/AN */ { 0, 3, 5, 5, 0x14, 0, 0, 2 }
2225 private static final byte impTabR_GROUP_NUMBERS_WITH_R[][] = {
2226 /* In this table, EN/AN+ON sequences receive levels as if associated with R
2227 until proven that there is L on both sides. AN is handled like EN.
2229 /* L, R, EN, AN, ON, S, B, Res */
2230 /* 0 init */ { 2, 0, 1, 1, 0, 0, 0, 0 },
2231 /* 1 EN/AN */ { 2, 0, 1, 1, 0, 0, 0, 1 },
2232 /* 2 L */ { 2, 0, 0x14, 0x14, 0x13, 0, 0, 1 },
2233 /* 3 L+ON */ { 0x22, 0, 4, 4, 3, 0, 0, 0 },
2234 /* 4 L+EN/AN */ { 0x22, 0, 4, 4, 3, 0, 0, 1 }
2236 private static final ImpTabPair impTab_GROUP_NUMBERS_WITH_R = new
2237 ImpTabPair(impTabL_GROUP_NUMBERS_WITH_R,
2238 impTabR_GROUP_NUMBERS_WITH_R, impAct0, impAct0);
2240 private static final byte impTabL_INVERSE_NUMBERS_AS_L[][] = {
2241 /* This table is identical to the Default LTR table except that EN and AN
2244 /* L, R, EN, AN, ON, S, B, Res */
2245 /* 0 : init */ { 0, 1, 0, 0, 0, 0, 0, 0 },
2246 /* 1 : R */ { 0, 1, 0, 0, 0x14, 0x14, 0, 1 },
2247 /* 2 : AN */ { 0, 1, 0, 0, 0x15, 0x15, 0, 2 },
2248 /* 3 : R+EN/AN */ { 0, 1, 0, 0, 0x14, 0x14, 0, 2 },
2249 /* 4 : R+ON */ { 0x20, 1, 0x20, 0x20, 4, 4, 0x20, 1 },
2250 /* 5 : AN+ON */ { 0x20, 1, 0x20, 0x20, 5, 5, 0x20, 1 }
2252 private static final byte impTabR_INVERSE_NUMBERS_AS_L[][] = {
2253 /* This table is identical to the Default RTL table except that EN and AN
2256 /* L, R, EN, AN, ON, S, B, Res */
2257 /* 0 : init */ { 1, 0, 1, 1, 0, 0, 0, 0 },
2258 /* 1 : L */ { 1, 0, 1, 1, 0x14, 0x14, 0, 1 },
2259 /* 2 : EN/AN */ { 1, 0, 1, 1, 0, 0, 0, 1 },
2260 /* 3 : L+AN */ { 1, 0, 1, 1, 5, 5, 0, 1 },
2261 /* 4 : L+ON */ { 0x21, 0, 0x21, 0x21, 4, 4, 0, 0 },
2262 /* 5 : L+AN+ON */ { 1, 0, 1, 1, 5, 5, 0, 0 }
2264 private static final ImpTabPair impTab_INVERSE_NUMBERS_AS_L = new ImpTabPair
2265 (impTabL_INVERSE_NUMBERS_AS_L, impTabR_INVERSE_NUMBERS_AS_L,
2268 private static final byte impTabR_INVERSE_LIKE_DIRECT[][] = { /* Odd paragraph level */
2269 /* In this table, conditional sequences receive the lower possible level
2270 until proven otherwise.
2272 /* L, R, EN, AN, ON, S, B, Res */
2273 /* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 },
2274 /* 1 : L */ { 1, 0, 1, 2, 0x13, 0x13, 0, 1 },
2275 /* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 },
2276 /* 3 : L+ON */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 0 },
2277 /* 4 : L+ON+AN */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 3 },
2278 /* 5 : L+AN+ON */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 2 },
2279 /* 6 : L+ON+EN */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 1 }
2281 private static final short[] impAct1 = {0,1,11,12};
2282 private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT = new ImpTabPair(
2283 impTabL_DEFAULT, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1);
2285 private static final byte impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = {
2286 /* The case handled in this table is (visually): R EN L
2288 /* L, R, EN, AN, ON, S, B, Res */
2289 /* 0 : init */ { 0, 0x63, 0, 1, 0, 0, 0, 0 },
2290 /* 1 : L+AN */ { 0, 0x63, 0, 1, 0x12, 0x30, 0, 4 },
2291 /* 2 : L+AN+ON */ { 0x20, 0x63, 0x20, 1, 2, 0x30, 0x20, 3 },
2292 /* 3 : R */ { 0, 0x63, 0x55, 0x56, 0x14, 0x30, 0, 3 },
2293 /* 4 : R+ON */ { 0x30, 0x43, 0x55, 0x56, 4, 0x30, 0x30, 3 },
2294 /* 5 : R+EN */ { 0x30, 0x43, 5, 0x56, 0x14, 0x30, 0x30, 4 },
2295 /* 6 : R+AN */ { 0x30, 0x43, 0x55, 6, 0x14, 0x30, 0x30, 4 }
2297 private static final byte impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = {
2298 /* The cases handled in this table are (visually): R EN L
2301 /* L, R, EN, AN, ON, S, B, Res */
2302 /* 0 : init */ { 0x13, 0, 1, 1, 0, 0, 0, 0 },
2303 /* 1 : R+EN/AN */ { 0x23, 0, 1, 1, 2, 0x40, 0, 1 },
2304 /* 2 : R+EN/AN+ON */ { 0x23, 0, 1, 1, 2, 0x40, 0, 0 },
2305 /* 3 : L */ { 3 , 0, 3, 0x36, 0x14, 0x40, 0, 1 },
2306 /* 4 : L+ON */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 0 },
2307 /* 5 : L+ON+EN */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 1 },
2308 /* 6 : L+AN */ { 0x53, 0x40, 6, 6, 4, 0x40, 0x40, 3 }
2310 private static final short impAct2[] = {0,1,7,8,9,10};
2311 private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT_WITH_MARKS =
2312 new ImpTabPair(impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS,
2313 impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct0, impAct2);
2315 private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL = new ImpTabPair(
2316 impTabL_NUMBERS_SPECIAL, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1);
2318 private static final byte impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS[][] = {
2319 /* The case handled in this table is (visually): R EN L
2321 /* L, R, EN, AN, ON, S, B, Res */
2322 /* 0 : init */ { 0, 0x62, 1, 1, 0, 0, 0, 0 },
2323 /* 1 : L+EN/AN */ { 0, 0x62, 1, 1, 0, 0x30, 0, 4 },
2324 /* 2 : R */ { 0, 0x62, 0x54, 0x54, 0x13, 0x30, 0, 3 },
2325 /* 3 : R+ON */ { 0x30, 0x42, 0x54, 0x54, 3, 0x30, 0x30, 3 },
2326 /* 4 : R+EN/AN */ { 0x30, 0x42, 4, 4, 0x13, 0x30, 0x30, 4 }
2328 private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS = new
2329 ImpTabPair(impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS,
2330 impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct0, impAct2);
2332 private class LevState {
2333 byte[][] impTab; /* level table pointer */
2334 short[] impAct; /* action map array */
2335 int startON; /* start of ON sequence */
2336 int startL2EN; /* start of level 2 sequence */
2337 int lastStrongRTL; /* index of last found R or AL */
2338 short state; /* current state */
2339 byte runLevel; /* run level before implicit solving */
2342 /*------------------------------------------------------------------------*/
2344 static final int FIRSTALLOC = 10;
2346 * param pos: position where to insert
2347 * param flag: one of LRM_BEFORE, LRM_AFTER, RLM_BEFORE, RLM_AFTER
2349 private void addPoint(int pos, int flag)
2351 Point point = new Point();
2353 int len = insertPoints.points.length;
2355 insertPoints.points = new Point[FIRSTALLOC];
2358 if (insertPoints.size >= len) { /* no room for new point */
2359 Point[] savePoints = insertPoints.points;
2360 insertPoints.points = new Point[len * 2];
2361 System.arraycopy(savePoints, 0, insertPoints.points, 0, len);
2365 insertPoints.points[insertPoints.size] = point;
2366 insertPoints.size++;
2369 /* perform rules (Wn), (Nn), and (In) on a run of the text ------------------ */
2372 * This implementation of the (Wn) rules applies all rules in one pass.
2373 * In order to do so, it needs a look-ahead of typically 1 character
2374 * (except for W5: sequences of ET) and keeps track of changes
2375 * in a rule Wp that affect a later Wq (p<q).
2377 * The (Nn) and (In) rules are also performed in that same single loop,
2378 * but effectively one iteration behind for white space.
2380 * Since all implicit rules are performed in one step, it is not necessary
2381 * to actually store the intermediate directional properties in dirProps[].
2384 private void processPropertySeq(LevState levState, short _prop,
2385 int start, int limit) {
2387 byte[][] impTab = levState.impTab;
2388 short[] impAct = levState.impAct;
2389 short oldStateSeq,actionSeq;
2390 byte level, addLevel;
2393 start0 = start; /* save original start position */
2394 oldStateSeq = levState.state;
2395 cell = impTab[oldStateSeq][_prop];
2396 levState.state = GetState(cell); /* isolate the new state */
2397 actionSeq = impAct[GetAction(cell)]; /* isolate the action */
2398 addLevel = (byte)impTab[levState.state][IMPTABLEVELS_RES];
2400 if (actionSeq != 0) {
2401 switch (actionSeq) {
2402 case 1: /* init ON seq */
2403 levState.startON = start0;
2406 case 2: /* prepend ON seq to current seq */
2407 start = levState.startON;
2410 case 3: /* L or S after possible relevant EN/AN */
2411 /* check if we had EN after R/AL */
2412 if (levState.startL2EN >= 0) {
2413 addPoint(levState.startL2EN, LRM_BEFORE);
2415 levState.startL2EN = -1; /* not within previous if since could also be -2 */
2416 /* check if we had any relevant EN/AN after R/AL */
2417 if ((insertPoints.points.length == 0) ||
2418 (insertPoints.size <= insertPoints.confirmed)) {
2419 /* nothing, just clean up */
2420 levState.lastStrongRTL = -1;
2421 /* check if we have a pending conditional segment */
2422 level = (byte)impTab[oldStateSeq][IMPTABLEVELS_RES];
2423 if ((level & 1) != 0 && levState.startON > 0) { /* after ON */
2424 start = levState.startON; /* reset to basic run level */
2426 if (_prop == _S) { /* add LRM before S */
2427 addPoint(start0, LRM_BEFORE);
2428 insertPoints.confirmed = insertPoints.size;
2432 /* reset previous RTL cont to level for LTR text */
2433 for (k = levState.lastStrongRTL + 1; k < start0; k++) {
2434 /* reset odd level, leave runLevel+2 as is */
2435 levels[k] = (byte)((levels[k] - 2) & ~1);
2437 /* mark insert points as confirmed */
2438 insertPoints.confirmed = insertPoints.size;
2439 levState.lastStrongRTL = -1;
2440 if (_prop == _S) { /* add LRM before S */
2441 addPoint(start0, LRM_BEFORE);
2442 insertPoints.confirmed = insertPoints.size;
2446 case 4: /* R/AL after possible relevant EN/AN */
2448 if (insertPoints.points.length > 0)
2449 /* remove all non confirmed insert points */
2450 insertPoints.size = insertPoints.confirmed;
2451 levState.startON = -1;
2452 levState.startL2EN = -1;
2453 levState.lastStrongRTL = limit - 1;
2456 case 5: /* EN/AN after R/AL + possible cont */
2457 /* check for real AN */
2458 if ((_prop == _AN) && (NoContextRTL(dirProps[start0]) == AN) &&
2459 (reorderingMode!=REORDER_INVERSE_FOR_NUMBERS_SPECIAL))
2462 if (levState.startL2EN == -1) { /* if no relevant EN already found */
2463 /* just note the righmost digit as a strong RTL */
2464 levState.lastStrongRTL = limit - 1;
2467 if (levState.startL2EN >= 0) { /* after EN, no AN */
2468 addPoint(levState.startL2EN, LRM_BEFORE);
2469 levState.startL2EN = -2;
2472 addPoint(start0, LRM_BEFORE);
2475 /* if first EN/AN after R/AL */
2476 if (levState.startL2EN == -1) {
2477 levState.startL2EN = start0;
2481 case 6: /* note location of latest R/AL */
2482 levState.lastStrongRTL = limit - 1;
2483 levState.startON = -1;
2486 case 7: /* L after R+ON/EN/AN */
2487 /* include possible adjacent number on the left */
2488 for (k = start0-1; k >= 0 && ((levels[k] & 1) == 0); k--) {
2491 addPoint(k, RLM_BEFORE); /* add RLM before */
2492 insertPoints.confirmed = insertPoints.size; /* confirm it */
2494 levState.startON = start0;
2497 case 8: /* AN after L */
2498 /* AN numbers between L text on both sides may be trouble. */
2499 /* tentatively bracket with LRMs; will be confirmed if followed by L */
2500 addPoint(start0, LRM_BEFORE); /* add LRM before */
2501 addPoint(start0, LRM_AFTER); /* add LRM after */
2504 case 9: /* R after L+ON/EN/AN */
2505 /* false alert, infirm LRMs around previous AN */
2506 insertPoints.size=insertPoints.confirmed;
2507 if (_prop == _S) { /* add RLM before S */
2508 addPoint(start0, RLM_BEFORE);
2509 insertPoints.confirmed = insertPoints.size;
2513 case 10: /* L after L+ON/AN */
2514 level = (byte)(levState.runLevel + addLevel);
2515 for (k=levState.startON; k < start0; k++) {
2516 if (levels[k] < level) {
2520 insertPoints.confirmed = insertPoints.size; /* confirm inserts */
2521 levState.startON = start0;
2524 case 11: /* L after L+ON+EN/AN/ON */
2525 level = (byte)levState.runLevel;
2526 for (k = start0-1; k >= levState.startON; k--) {
2527 if (levels[k] == level+3) {
2528 while (levels[k] == level+3) {
2531 while (levels[k] == level) {
2535 if (levels[k] == level+2) {
2539 levels[k] = (byte)(level+1);
2543 case 12: /* R after L+ON+EN/AN/ON */
2544 level = (byte)(levState.runLevel+1);
2545 for (k = start0-1; k >= levState.startON; k--) {
2546 if (levels[k] > level) {
2552 default: /* we should never get here */
2553 throw new IllegalStateException("Internal ICU error in processPropertySeq");
2556 if ((addLevel) != 0 || (start < start0)) {
2557 level = (byte)(levState.runLevel + addLevel);
2558 for (k = start; k < limit; k++) {
2564 private void resolveImplicitLevels(int start, int limit, short sor, short eor)
2566 LevState levState = new LevState();
2567 int i, start1, start2;
2568 short oldStateImp, stateImp, actionImp;
2569 short gprop, resProp, cell;
2571 short nextStrongProp = R;
2572 int nextStrongPos = -1;
2575 /* check for RTL inverse Bidi mode */
2576 /* FOOD FOR THOUGHT: in case of RTL inverse Bidi, it would make sense to
2577 * loop on the text characters from end to start.
2578 * This would need a different properties state table (at least different
2579 * actions) and different levels state tables (maybe very similar to the
2580 * LTR corresponding ones.
2582 inverseRTL=((start<lastArabicPos) && ((GetParaLevelAt(start) & 1)>0) &&
2583 (reorderingMode==REORDER_INVERSE_LIKE_DIRECT ||
2584 reorderingMode==REORDER_INVERSE_FOR_NUMBERS_SPECIAL));
2585 /* initialize for levels state table */
2586 levState.startL2EN = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */
2587 levState.lastStrongRTL = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */
2589 levState.runLevel = levels[start];
2590 levState.impTab = impTabPair.imptab[levState.runLevel & 1];
2591 levState.impAct = impTabPair.impact[levState.runLevel & 1];
2592 processPropertySeq(levState, (short)sor, start, start);
2593 /* initialize for property state table */
2594 if (dirProps[start] == NSM) {
2595 stateImp = (short)(1 + sor);
2602 for (i = start; i <= limit; i++) {
2607 prop = NoContextRTL(dirProps[i]);
2610 /* AL before EN does not make it AN */
2612 } else if (prop == EN) {
2613 if (nextStrongPos <= i) {
2614 /* look for next strong char (L/R/AL) */
2616 nextStrongProp = R; /* set default */
2617 nextStrongPos = limit;
2618 for (j = i+1; j < limit; j++) {
2619 prop1 = NoContextRTL(dirProps[j]);
2620 if (prop1 == L || prop1 == R || prop1 == AL) {
2621 nextStrongProp = prop1;
2627 if (nextStrongProp == AL) {
2632 gprop = groupProp[prop];
2634 oldStateImp = stateImp;
2635 cell = impTabProps[oldStateImp][gprop];
2636 stateImp = GetStateProps(cell); /* isolate the new state */
2637 actionImp = GetActionProps(cell); /* isolate the action */
2638 if ((i == limit) && (actionImp == 0)) {
2639 /* there is an unprocessed sequence if its property == eor */
2640 actionImp = 1; /* process the last sequence */
2642 if (actionImp != 0) {
2643 resProp = impTabProps[oldStateImp][IMPTABPROPS_RES];
2644 switch (actionImp) {
2645 case 1: /* process current seq1, init new seq1 */
2646 processPropertySeq(levState, resProp, start1, i);
2649 case 2: /* init new seq2 */
2652 case 3: /* process seq1, process seq2, init new seq1 */
2653 processPropertySeq(levState, resProp, start1, start2);
2654 processPropertySeq(levState, _ON, start2, i);
2657 case 4: /* process seq1, set seq1=seq2, init new seq2 */
2658 processPropertySeq(levState, resProp, start1, start2);
2662 default: /* we should never get here */
2663 throw new IllegalStateException("Internal ICU error in resolveImplicitLevels");
2667 /* flush possible pending sequence, e.g. ON */
2668 processPropertySeq(levState, (short)eor, limit, limit);
2671 /* perform (L1) and (X9) ---------------------------------------------------- */
2674 * Reset the embedding levels for some non-graphic characters (L1).
2675 * This method also sets appropriate levels for BN, and
2676 * explicit embedding types that are supposed to have been removed
2677 * from the paragraph in (X9).
2679 private void adjustWSLevels() {
2682 if ((flags & MASK_WS) != 0) {
2684 i = trailingWSStart;
2686 /* reset a sequence of WS/BN before eop and B/S to the paragraph paraLevel */
2687 while (i > 0 && ((flag = DirPropFlagNC(dirProps[--i])) & MASK_WS) != 0) {
2688 if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) {
2691 levels[i] = GetParaLevelAt(i);
2695 /* reset BN to the next character's paraLevel until B/S, which restarts above loop */
2696 /* here, i+1 is guaranteed to be <length */
2698 flag = DirPropFlagNC(dirProps[--i]);
2699 if ((flag & MASK_BN_EXPLICIT) != 0) {
2700 levels[i] = levels[i + 1];
2701 } else if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) {
2704 } else if ((flag & MASK_B_S) != 0){
2705 levels[i] = GetParaLevelAt(i);
2713 int Bidi_Min(int x, int y) {
2714 return x < y ? x : y;
2717 int Bidi_Abs(int x) {
2718 return x >= 0 ? x : -x;
2721 void setParaRunsOnly(char[] parmText, byte parmParaLevel) {
2724 int saveLength, saveTrailingWSStart;
2727 int i, j, visualStart, logicalStart,
2728 oldRunCount, runLength, addedRuns, insertRemove,
2729 start, limit, step, indexOddBit, logicalPos,
2733 reorderingMode = REORDER_DEFAULT;
2734 int parmLength = parmText.length;
2735 if (parmLength == 0) {
2736 setPara(parmText, parmParaLevel, null);
2737 reorderingMode = REORDER_RUNS_ONLY;
2740 /* obtain memory for mapping table and visual text */
2741 saveOptions = reorderingOptions;
2742 if ((saveOptions & OPTION_INSERT_MARKS) > 0) {
2743 reorderingOptions &= ~OPTION_INSERT_MARKS;
2744 reorderingOptions |= OPTION_REMOVE_CONTROLS;
2746 parmParaLevel &= 1; /* accept only 0 or 1 */
2747 setPara(parmText, parmParaLevel, null);
2748 /* we cannot access directly pBiDi->levels since it is not yet set if
2749 * direction is not MIXED
2751 saveLevels = new byte[this.length];
2752 System.arraycopy(getLevels(), 0, saveLevels, 0, this.length);
2753 saveTrailingWSStart = trailingWSStart;
2755 /* FOOD FOR THOUGHT: instead of writing the visual text, we could use
2756 * the visual map and the dirProps array to drive the second call
2757 * to setPara (but must make provision for possible removal of
2758 * Bidi controls. Alternatively, only use the dirProps array via
2759 * customized classifier callback.
2761 visualText = writeReordered(DO_MIRRORING);
2762 visualMap = getVisualMap();
2763 this.reorderingOptions = saveOptions;
2764 saveLength = this.length;
2765 saveDirection=this.direction;
2767 this.reorderingMode = REORDER_INVERSE_LIKE_DIRECT;
2769 setPara(visualText, parmParaLevel, null);
2770 BidiLine.getRuns(this);
2771 /* check if some runs must be split, count how many splits */
2773 oldRunCount = this.runCount;
2775 for (i = 0; i < oldRunCount; i++, visualStart += runLength) {
2776 runLength = runs[i].limit - visualStart;
2777 if (runLength < 2) {
2780 logicalStart = runs[i].start;
2781 for (j = logicalStart+1; j < logicalStart+runLength; j++) {
2782 index = visualMap[j];
2783 index1 = visualMap[j-1];
2784 if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) {
2789 if (addedRuns > 0) {
2790 getRunsMemory(oldRunCount + addedRuns);
2791 if (runCount == 1) {
2792 /* because we switch from UBiDi.simpleRuns to UBiDi.runs */
2793 runsMemory[0] = runs[0];
2795 System.arraycopy(runs, 0, runsMemory, 0, runCount);
2798 runCount += addedRuns;
2799 for (i = oldRunCount; i < runCount; i++) {
2800 if (runs[i] == null) {
2801 runs[i] = new BidiRun(0, 0, (byte)0);
2805 /* split runs which are not consecutive in source text */
2807 for (i = oldRunCount-1; i >= 0; i--) {
2808 newI = i + addedRuns;
2809 runLength = i==0 ? runs[0].limit :
2810 runs[i].limit - runs[i-1].limit;
2811 logicalStart = runs[i].start;
2812 indexOddBit = runs[i].level & 1;
2813 if (runLength < 2) {
2814 if (addedRuns > 0) {
2815 runs[newI].copyFrom(runs[i]);
2817 logicalPos = visualMap[logicalStart];
2818 runs[newI].start = logicalPos;
2819 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);
2822 if (indexOddBit > 0) {
2823 start = logicalStart;
2824 limit = logicalStart + runLength - 1;
2827 start = logicalStart + runLength - 1;
2828 limit = logicalStart;
2831 for (j = start; j != limit; j += step) {
2832 index = visualMap[j];
2833 index1 = visualMap[j+step];
2834 if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) {
2835 logicalPos = Bidi_Min(visualMap[start], index);
2836 runs[newI].start = logicalPos;
2837 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);
2838 runs[newI].limit = runs[i].limit;
2839 runs[i].limit -= Bidi_Abs(j - start) + 1;
2840 insertRemove = runs[i].insertRemove & (LRM_AFTER|RLM_AFTER);
2841 runs[newI].insertRemove = insertRemove;
2842 runs[i].insertRemove &= ~insertRemove;
2848 if (addedRuns > 0) {
2849 runs[newI].copyFrom(runs[i]);
2851 logicalPos = Bidi_Min(visualMap[start], visualMap[limit]);
2852 runs[newI].start = logicalPos;
2853 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);
2857 /* restore initial paraLevel */
2858 this.paraLevel ^= 1;
2860 /* restore real text */
2861 this.text = parmText;
2862 this.length = saveLength;
2863 this.originalLength = parmLength;
2864 this.direction=saveDirection;
2865 this.levels = saveLevels;
2866 this.trailingWSStart = saveTrailingWSStart;
2867 /* free memory for mapping table and visual text */
2871 this.direction = MIXED;
2874 this.reorderingMode = REORDER_RUNS_ONLY;
2878 * Perform the Unicode Bidi algorithm. It is defined in the
2879 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>,
2881 * also described in The Unicode Standard, Version 4.0 .<p>
2883 * This method takes a piece of plain text containing one or more paragraphs,
2884 * with or without externally specified embedding levels from <i>styled</i>
2885 * text and computes the left-right-directionality of each character.<p>
2887 * If the entire text is all of the same directionality, then
2888 * the method may not perform all the steps described by the algorithm,
2889 * i.e., some levels may not be the same as if all steps were performed.
2890 * This is not relevant for unidirectional text.<br>
2891 * For example, in pure LTR text with numbers the numbers would get
2892 * a resolved level of 2 higher than the surrounding text according to
2893 * the algorithm. This implementation may set all resolved levels to
2894 * the same value in such a case.<p>
2896 * The text can be composed of multiple paragraphs. Occurrence of a block
2897 * separator in the text terminates a paragraph, and whatever comes next starts
2898 * a new paragraph. The exception to this rule is when a Carriage Return (CR)
2899 * is followed by a Line Feed (LF). Both CR and LF are block separators, but
2900 * in that case, the pair of characters is considered as terminating the
2901 * preceding paragraph, and a new paragraph will be started by a character
2902 * coming after the LF.
2904 * Although the text is passed here as a <code>String</code>, it is
2905 * stored internally as an array of characters. Therefore the
2906 * documentation will refer to indexes of the characters in the text.
2908 * @param text contains the text that the Bidi algorithm will be performed
2909 * on. This text can be retrieved with <code>getText()</code> or
2910 * <code>getTextAsString</code>.<br>
2912 * @param paraLevel specifies the default level for the text;
2913 * it is typically 0 (LTR) or 1 (RTL).
2914 * If the method shall determine the paragraph level from the text,
2915 * then <code>paraLevel</code> can be set to
2916 * either <code>LEVEL_DEFAULT_LTR</code>
2917 * or <code>LEVEL_DEFAULT_RTL</code>; if the text contains multiple
2918 * paragraphs, the paragraph level shall be determined separately for
2919 * each paragraph; if a paragraph does not include any strongly typed
2920 * character, then the desired default is used (0 for LTR or 1 for RTL).
2921 * Any other value between 0 and <code>MAX_EXPLICIT_LEVEL</code>
2922 * is also valid, with odd levels indicating RTL.
2924 * @param embeddingLevels (in) may be used to preset the embedding and override levels,
2925 * ignoring characters like LRE and PDF in the text.
2926 * A level overrides the directional property of its corresponding
2927 * (same index) character if the level has the
2928 * <code>LEVEL_OVERRIDE</code> bit set.<br><br>
2929 * Except for that bit, it must be
2930 * <code>paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL</code>,
2931 * with one exception: a level of zero may be specified for a
2932 * paragraph separator even if <code>paraLevel>0</code> when multiple
2933 * paragraphs are submitted in the same call to <code>setPara()</code>.<br><br>
2934 * <strong>Caution: </strong>A reference to this array, not a copy
2935 * of the levels, will be stored in the <code>Bidi</code> object;
2936 * the <code>embeddingLevels</code>
2937 * should not be modified to avoid unexpected results on subsequent
2938 * Bidi operations. However, the <code>setPara()</code> and
2939 * <code>setLine()</code> methods may modify some or all of the
2941 * <strong>Note:</strong> the <code>embeddingLevels</code> array must
2942 * have one entry for each character in <code>text</code>.
2944 * @throws IllegalArgumentException if the values in embeddingLevels are
2945 * not within the allowed range
2947 * @see #LEVEL_DEFAULT_LTR
2948 * @see #LEVEL_DEFAULT_RTL
2949 * @see #LEVEL_OVERRIDE
2950 * @see #MAX_EXPLICIT_LEVEL
2953 public void setPara(String text, byte paraLevel, byte[] embeddingLevels)
2956 setPara(new char[0], paraLevel, embeddingLevels);
2958 setPara(text.toCharArray(), paraLevel, embeddingLevels);
2963 * Perform the Unicode Bidi algorithm. It is defined in the
2964 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>,
2966 * also described in The Unicode Standard, Version 4.0 .<p>
2968 * This method takes a piece of plain text containing one or more paragraphs,
2969 * with or without externally specified embedding levels from <i>styled</i>
2970 * text and computes the left-right-directionality of each character.<p>
2972 * If the entire text is all of the same directionality, then
2973 * the method may not perform all the steps described by the algorithm,
2974 * i.e., some levels may not be the same as if all steps were performed.
2975 * This is not relevant for unidirectional text.<br>
2976 * For example, in pure LTR text with numbers the numbers would get
2977 * a resolved level of 2 higher than the surrounding text according to
2978 * the algorithm. This implementation may set all resolved levels to
2979 * the same value in such a case.<p>
2981 * The text can be composed of multiple paragraphs. Occurrence of a block
2982 * separator in the text terminates a paragraph, and whatever comes next starts
2983 * a new paragraph. The exception to this rule is when a Carriage Return (CR)
2984 * is followed by a Line Feed (LF). Both CR and LF are block separators, but
2985 * in that case, the pair of characters is considered as terminating the
2986 * preceding paragraph, and a new paragraph will be started by a character
2987 * coming after the LF.
2989 * The text is stored internally as an array of characters. Therefore the
2990 * documentation will refer to indexes of the characters in the text.
2992 * @param chars contains the text that the Bidi algorithm will be performed
2993 * on. This text can be retrieved with <code>getText()</code> or
2994 * <code>getTextAsString</code>.<br>
2996 * @param paraLevel specifies the default level for the text;
2997 * it is typically 0 (LTR) or 1 (RTL).
2998 * If the method shall determine the paragraph level from the text,
2999 * then <code>paraLevel</code> can be set to
3000 * either <code>LEVEL_DEFAULT_LTR</code>
3001 * or <code>LEVEL_DEFAULT_RTL</code>; if the text contains multiple
3002 * paragraphs, the paragraph level shall be determined separately for
3003 * each paragraph; if a paragraph does not include any strongly typed
3004 * character, then the desired default is used (0 for LTR or 1 for RTL).
3005 * Any other value between 0 and <code>MAX_EXPLICIT_LEVEL</code>
3006 * is also valid, with odd levels indicating RTL.
3008 * @param embeddingLevels (in) may be used to preset the embedding and
3009 * override levels, ignoring characters like LRE and PDF in the text.
3010 * A level overrides the directional property of its corresponding
3011 * (same index) character if the level has the
3012 * <code>LEVEL_OVERRIDE</code> bit set.<br><br>
3013 * Except for that bit, it must be
3014 * <code>paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL</code>,
3015 * with one exception: a level of zero may be specified for a
3016 * paragraph separator even if <code>paraLevel>0</code> when multiple
3017 * paragraphs are submitted in the same call to <code>setPara()</code>.<br><br>
3018 * <strong>Caution: </strong>A reference to this array, not a copy
3019 * of the levels, will be stored in the <code>Bidi</code> object;
3020 * the <code>embeddingLevels</code>
3021 * should not be modified to avoid unexpected results on subsequent
3022 * Bidi operations. However, the <code>setPara()</code> and
3023 * <code>setLine()</code> methods may modify some or all of the
3025 * <strong>Note:</strong> the <code>embeddingLevels</code> array must
3026 * have one entry for each character in <code>text</code>.
3028 * @throws IllegalArgumentException if the values in embeddingLevels are
3029 * not within the allowed range
3031 * @see #LEVEL_DEFAULT_LTR
3032 * @see #LEVEL_DEFAULT_RTL
3033 * @see #LEVEL_OVERRIDE
3034 * @see #MAX_EXPLICIT_LEVEL
3037 public void setPara(char[] chars, byte paraLevel, byte[] embeddingLevels)
3039 /* check the argument values */
3040 if (paraLevel < LEVEL_DEFAULT_LTR) {
3041 verifyRange(paraLevel, 0, MAX_EXPLICIT_LEVEL + 1);
3043 if (chars == null) {
3044 chars = new char[0];
3047 /* special treatment for RUNS_ONLY mode */
3048 if (reorderingMode == REORDER_RUNS_ONLY) {
3049 setParaRunsOnly(chars, paraLevel);
3053 /* initialize the Bidi object */
3054 this.paraBidi = null; /* mark unfinished setPara */
3056 this.length = this.originalLength = this.resultLength = text.length;
3057 this.paraLevel = paraLevel;
3058 this.direction = LTR;
3061 /* Allocate zero-length arrays instead of setting to null here; then
3062 * checks for null in various places can be eliminated.
3064 dirProps = new byte[0];
3065 levels = new byte[0];
3066 runs = new BidiRun[0];
3067 isGoodLogicalToVisualRunsMap = false;
3068 insertPoints.size = 0; /* clean up from last call */
3069 insertPoints.confirmed = 0; /* clean up from last call */
3072 * Save the original paraLevel if contextual; otherwise, set to 0.
3074 if (IsDefaultLevel(paraLevel)) {
3075 defaultParaLevel = paraLevel;
3077 defaultParaLevel = 0;
3082 * For an empty paragraph, create a Bidi object with the paraLevel and
3083 * the flags and the direction set but without allocating zero-length arrays.
3084 * There is nothing more to do.
3086 if (IsDefaultLevel(paraLevel)) {
3087 this.paraLevel &= 1;
3088 defaultParaLevel = 0;
3090 if ((this.paraLevel & 1) != 0) {
3091 flags = DirPropFlag(R);
3094 flags = DirPropFlag(L);
3100 paraBidi = this; /* mark successful setPara */
3107 * Get the directional properties,
3108 * the flags bit-set, and
3109 * determine the paragraph level if necessary.
3111 getDirPropsMemory(length);
3112 dirProps = dirPropsMemory;
3114 /* the processed length may have changed if OPTION_STREAMING is set */
3115 trailingWSStart = length; /* the levels[] will reflect the WS run */
3117 /* allocate paras memory */
3118 if (paraCount > 1) {
3119 getInitialParasMemory(paraCount);
3120 paras = parasMemory;
3121 paras[paraCount - 1] = length;
3123 /* initialize paras for single paragraph */
3124 paras = simpleParas;
3125 simpleParas[0] = length;
3128 /* are explicit levels specified? */
3129 if (embeddingLevels == null) {
3130 /* no: determine explicit levels according to the (Xn) rules */
3131 getLevelsMemory(length);
3132 levels = levelsMemory;
3133 direction = resolveExplicitLevels();
3135 /* set BN for all explicit codes, check that all levels are 0 or paraLevel..MAX_EXPLICIT_LEVEL */
3136 levels = embeddingLevels;
3137 direction = checkExplicitLevels();
3141 * The steps after (X9) in the Bidi algorithm are performed only if
3142 * the paragraph text has mixed directionality!
3144 switch (direction) {
3146 /* make sure paraLevel is even */
3147 paraLevel = (byte)((paraLevel + 1) & ~1);
3149 /* all levels are implicitly at paraLevel (important for getLevels()) */
3150 trailingWSStart = 0;
3153 /* make sure paraLevel is odd */
3156 /* all levels are implicitly at paraLevel (important for getLevels()) */
3157 trailingWSStart = 0;
3161 * Choose the right implicit state table
3163 switch(reorderingMode) {
3164 case REORDER_DEFAULT:
3165 this.impTabPair = impTab_DEFAULT;
3167 case REORDER_NUMBERS_SPECIAL:
3168 this.impTabPair = impTab_NUMBERS_SPECIAL;
3170 case REORDER_GROUP_NUMBERS_WITH_R:
3171 this.impTabPair = impTab_GROUP_NUMBERS_WITH_R;
3173 case REORDER_RUNS_ONLY:
3174 /* we should never get here */
3175 throw new InternalError("Internal ICU error in setPara");
3177 case REORDER_INVERSE_NUMBERS_AS_L:
3178 this.impTabPair = impTab_INVERSE_NUMBERS_AS_L;
3180 case REORDER_INVERSE_LIKE_DIRECT:
3181 if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) {
3182 this.impTabPair = impTab_INVERSE_LIKE_DIRECT_WITH_MARKS;
3184 this.impTabPair = impTab_INVERSE_LIKE_DIRECT;
3187 case REORDER_INVERSE_FOR_NUMBERS_SPECIAL:
3188 if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) {
3189 this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS;
3191 this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL;
3196 * If there are no external levels specified and there
3197 * are no significant explicit level codes in the text,
3198 * then we can treat the entire paragraph as one run.
3199 * Otherwise, we need to perform the following rules on runs of
3200 * the text with the same embedding levels. (X10)
3201 * "Significant" explicit level codes are ones that actually
3202 * affect non-BN characters.
3203 * Examples for "insignificant" ones are empty embeddings
3204 * LRE-PDF, LRE-RLE-PDF-PDF, etc.
3206 if (embeddingLevels == null && paraCount <= 1 &&
3207 (flags & DirPropFlagMultiRuns) == 0) {
3208 resolveImplicitLevels(0, length,
3209 GetLRFromLevel(GetParaLevelAt(0)),
3210 GetLRFromLevel(GetParaLevelAt(length - 1)));
3212 /* sor, eor: start and end types of same-level-run */
3213 int start, limit = 0;
3214 byte level, nextLevel;
3217 /* determine the first sor and set eor to it because of the loop body (sor=eor there) */
3218 level = GetParaLevelAt(0);
3219 nextLevel = levels[0];
3220 if (level < nextLevel) {
3221 eor = GetLRFromLevel(nextLevel);
3223 eor = GetLRFromLevel(level);
3227 /* determine start and limit of the run (end points just behind the run) */
3229 /* the values for this run's start are the same as for the previous run's end */
3232 if ((start > 0) && (NoContextRTL(dirProps[start - 1]) == B)) {
3233 /* except if this is a new paragraph, then set sor = para level */
3234 sor = GetLRFromLevel(GetParaLevelAt(start));
3239 /* search for the limit of this run */
3240 while (++limit < length && levels[limit] == level) {}
3242 /* get the correct level of the next run */
3243 if (limit < length) {
3244 nextLevel = levels[limit];
3246 nextLevel = GetParaLevelAt(length - 1);
3249 /* determine eor from max(level, nextLevel); sor is last run's eor */
3250 if ((level & ~LEVEL_OVERRIDE) < (nextLevel & ~LEVEL_OVERRIDE)) {
3251 eor = GetLRFromLevel(nextLevel);
3253 eor = GetLRFromLevel(level);
3256 /* if the run consists of overridden directional types, then there
3257 are no implicit types to be resolved */
3258 if ((level & LEVEL_OVERRIDE) == 0) {
3259 resolveImplicitLevels(start, limit, sor, eor);
3261 /* remove the LEVEL_OVERRIDE flags */
3263 levels[start++] &= ~LEVEL_OVERRIDE;
3264 } while (start < limit);
3266 } while (limit < length);
3269 /* reset the embedding levels for some non-graphic characters (L1), (X9) */
3274 /* add RLM for inverse Bidi with contextual orientation resolving
3275 * to RTL which would not round-trip otherwise
3277 if ((defaultParaLevel > 0) &&
3278 ((reorderingOptions & OPTION_INSERT_MARKS) != 0) &&
3279 ((reorderingMode == REORDER_INVERSE_LIKE_DIRECT) ||
3280 (reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL))) {
3283 for (int i = 0; i < paraCount; i++) {
3284 last = paras[i] - 1;
3285 if ((dirProps[last] & CONTEXT_RTL) == 0) {
3286 continue; /* LTR paragraph */
3288 start= i == 0 ? 0 : paras[i - 1];
3289 for (int j = last; j >= start; j--) {
3290 dirProp = NoContextRTL(dirProps[j]);
3293 while (NoContextRTL(dirProps[last]) == B) {
3297 addPoint(last, RLM_BEFORE);
3300 if ((DirPropFlag(dirProp) & MASK_R_AL) != 0) {
3307 if ((reorderingOptions & OPTION_REMOVE_CONTROLS) != 0) {
3308 resultLength -= controlCount;
3310 resultLength += insertPoints.size;
3312 paraBidi = this; /* mark successful setPara */
3315 //#if defined(FOUNDATION10)
3318 * Perform the Unicode Bidi algorithm on a given paragraph, as defined in the
3319 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>,
3321 * also described in The Unicode Standard, Version 4.0 .<p>
3323 * This method takes a paragraph of text and computes the
3324 * left-right-directionality of each character. The text should not
3325 * contain any Unicode block separators.<p>
3327 * The RUN_DIRECTION attribute in the text, if present, determines the base
3328 * direction (left-to-right or right-to-left). If not present, the base
3329 * direction is computed using the Unicode Bidirectional Algorithm,
3330 * defaulting to left-to-right if there are no strong directional characters
3331 * in the text. This attribute, if present, must be applied to all the text
3332 * in the paragraph.<p>
3334 * The BIDI_EMBEDDING attribute in the text, if present, represents
3335 * embedding level information. Negative values from -1 to -62 indicate
3336 * overrides at the absolute value of the level. Positive values from 1 to
3337 * 62 indicate embeddings. Where values are zero or not defined, the base
3338 * embedding level as determined by the base direction is assumed.<p>
3340 * The NUMERIC_SHAPING attribute in the text, if present, converts European
3341 * digits to other decimal digits before running the bidi algorithm. This
3342 * attribute, if present, must be applied to all the text in the paragraph.
3344 * If the entire text is all of the same directionality, then
3345 * the method may not perform all the steps described by the algorithm,
3346 * i.e., some levels may not be the same as if all steps were performed.
3347 * This is not relevant for unidirectional text.<br>
3348 * For example, in pure LTR text with numbers the numbers would get
3349 * a resolved level of 2 higher than the surrounding text according to
3350 * the algorithm. This implementation may set all resolved levels to
3351 * the same value in such a case.<p>
3353 * @param paragraph a paragraph of text with optional character and
3354 * paragraph attribute information
3357 public void setPara(AttributedCharacterIterator paragraph)
3360 Boolean runDirection = (Boolean) paragraph.getAttribute(TextAttribute.RUN_DIRECTION);
3361 if (runDirection == null) {
3362 paraLvl = LEVEL_DEFAULT_LTR;
3364 paraLvl = (runDirection.equals(TextAttribute.RUN_DIRECTION_LTR)) ?
3369 int len = paragraph.getEndIndex() - paragraph.getBeginIndex();
3370 byte[] embeddingLevels = new byte[len];
3371 char[] txt = new char[len];
3373 char ch = paragraph.first();
3374 while (ch != AttributedCharacterIterator.DONE) {
3376 Integer embedding = (Integer) paragraph.getAttribute(TextAttribute.BIDI_EMBEDDING);
3377 if (embedding != null) {
3378 byte level = embedding.byteValue();
3381 } else if (level < 0) {
3382 lvls = embeddingLevels;
3383 embeddingLevels[i] = (byte)((0 - level) | LEVEL_OVERRIDE);
3385 lvls = embeddingLevels;
3386 embeddingLevels[i] = level;
3389 ch = paragraph.next();
3393 //#if defined(J2SE13)
3395 NumericShaper shaper = (NumericShaper) paragraph.getAttribute(TextAttribute.NUMERIC_SHAPING);
3396 if (shaper != null) {
3397 shaper.shape(txt, 0, len);
3400 setPara(txt, paraLvl, lvls);
3405 * Specify whether block separators must be allocated level zero,
3406 * so that successive paragraphs will progress from left to right.
3407 * This method must be called before <code>setPara()</code>.
3408 * Paragraph separators (B) may appear in the text. Setting them to level zero
3409 * means that all paragraph separators (including one possibly appearing
3410 * in the last text position) are kept in the reordered text after the text
3411 * that they follow in the source text.
3412 * When this feature is not enabled, a paragraph separator at the last
3413 * position of the text before reordering will go to the first position
3414 * of the reordered text when the paragraph level is odd.
3416 * @param ordarParaLTR specifies whether paragraph separators (B) must
3417 * receive level 0, so that successive paragraphs progress from left to right.
3422 public void orderParagraphsLTR(boolean ordarParaLTR) {
3423 orderParagraphsLTR = ordarParaLTR;
3427 * Is this <code>Bidi</code> object set to allocate level 0 to block
3428 * separators so that successive paragraphs progress from left to right?
3430 * @return <code>true</code> if the <code>Bidi</code> object is set to
3431 * allocate level 0 to block separators.
3433 * @see #orderParagraphsLTR
3436 public boolean isOrderParagraphsLTR() {
3437 return orderParagraphsLTR;
3441 * Get the directionality of the text.
3443 * @return a value of <code>LTR</code>, <code>RTL</code> or <code>MIXED</code>
3444 * that indicates if the entire text
3445 * represented by this object is unidirectional,
3446 * and which direction, or if it is mixed-directional.
3448 * @throws IllegalStateException if this call is not preceded by a successful
3449 * call to <code>setPara</code> or <code>setLine</code>
3456 public byte getDirection()
3458 verifyValidParaOrLine();
3465 * @return A <code>String</code> containing the text that the
3466 * <code>Bidi</code> object was created for.
3468 * @throws IllegalStateException if this call is not preceded by a successful
3469 * call to <code>setPara</code> or <code>setLine</code>
3475 public String getTextAsString()
3477 verifyValidParaOrLine();
3478 return new String(text);
3484 * @return A <code>char</code> array containing the text that the
3485 * <code>Bidi</code> object was created for.
3487 * @throws IllegalStateException if this call is not preceded by a successful
3488 * call to <code>setPara</code> or <code>setLine</code>
3494 public char[] getText()
3496 verifyValidParaOrLine();
3501 * Get the length of the text.
3503 * @return The length of the text that the <code>Bidi</code> object was
3506 * @throws IllegalStateException if this call is not preceded by a successful
3507 * call to <code>setPara</code> or <code>setLine</code>
3510 public int getLength()
3512 verifyValidParaOrLine();
3513 return originalLength;
3517 * Get the length of the source text processed by the last call to
3518 * <code>setPara()</code>. This length may be different from the length of
3519 * the source text if option <code>OPTION_STREAMING</code> has been
3522 * Note that whenever the length of the text affects the execution or the
3523 * result of a method, it is the processed length which must be considered,
3524 * except for <code>setPara</code> (which receives unprocessed source text)
3525 * and <code>getLength</code> (which returns the original length of the
3527 * In particular, the processed length is the one to consider in the
3530 * <li>maximum value of the <code>limit</code> argument of
3531 * <code>setLine</code></li>
3532 * <li>maximum value of the <code>charIndex</code> argument of
3533 * <code>getParagraph</code></li>
3534 * <li>maximum value of the <code>charIndex</code> argument of
3535 * <code>getLevelAt</code></li>
3536 * <li>number of elements in the array returned by <code>getLevels</code>
3538 * <li>maximum value of the <code>logicalStart</code> argument of
3539 * <code>getLogicalRun</code></li>
3540 * <li>maximum value of the <code>logicalIndex</code> argument of
3541 * <code>getVisualIndex</code></li>
3542 * <li>number of elements returned by <code>getLogicalMap</code></li>
3543 * <li>length of text processed by <code>writeReordered</code></li>
3546 * @return The length of the part of the source text processed by
3547 * the last call to <code>setPara</code>.
3549 * @throws IllegalStateException if this call is not preceded by a successful
3550 * call to <code>setPara</code> or <code>setLine</code>
3553 * @see #OPTION_STREAMING
3556 public int getProcessedLength() {
3557 verifyValidParaOrLine();
3562 * Get the length of the reordered text resulting from the last call to
3563 * <code>setPara()</code>. This length may be different from the length
3564 * of the source text if option <code>OPTION_INSERT_MARKS</code>
3565 * or option <code>OPTION_REMOVE_CONTROLS</code> has been set.
3567 * This resulting length is the one to consider in the following cases:
3569 * <li>maximum value of the <code>visualIndex</code> argument of
3570 * <code>getLogicalIndex</code></li>
3571 * <li>number of elements returned by <code>getVisualMap</code></li>
3573 * Note that this length stays identical to the source text length if
3574 * Bidi marks are inserted or removed using option bits of
3575 * <code>writeReordered</code>, or if option
3576 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> has been set.
3578 * @return The length of the reordered text resulting from
3579 * the last call to <code>setPara</code>.
3581 * @throws IllegalStateException if this call is not preceded by a successful
3582 * call to <code>setPara</code> or <code>setLine</code>
3585 * @see #OPTION_INSERT_MARKS
3586 * @see #OPTION_REMOVE_CONTROLS
3587 * @see #REORDER_INVERSE_NUMBERS_AS_L
3590 public int getResultLength() {
3591 verifyValidParaOrLine();
3592 return resultLength;
3595 /* paragraphs API methods ------------------------------------------------- */
3598 * Get the paragraph level of the text.
3600 * @return The paragraph level. If there are multiple paragraphs, their
3601 * level may vary if the required paraLevel is LEVEL_DEFAULT_LTR or
3602 * LEVEL_DEFAULT_RTL. In that case, the level of the first paragraph
3605 * @throws IllegalStateException if this call is not preceded by a successful
3606 * call to <code>setPara</code> or <code>setLine</code>
3608 * @see #LEVEL_DEFAULT_LTR
3609 * @see #LEVEL_DEFAULT_RTL
3610 * @see #getParagraph
3611 * @see #getParagraphByIndex
3614 public byte getParaLevel()
3616 verifyValidParaOrLine();
3621 * Get the number of paragraphs.
3623 * @return The number of paragraphs.
3625 * @throws IllegalStateException if this call is not preceded by a successful
3626 * call to <code>setPara</code> or <code>setLine</code>
3629 public int countParagraphs()
3631 verifyValidParaOrLine();
3636 * Get a paragraph, given the index of this paragraph.
3638 * This method returns information about a paragraph.<p>
3640 * @param paraIndex is the number of the paragraph, in the
3641 * range <code>[0..countParagraphs()-1]</code>.
3643 * @return a BidiRun object with the details of the paragraph:<br>
3644 * <code>start</code> will receive the index of the first character
3645 * of the paragraph in the text.<br>
3646 * <code>limit</code> will receive the limit of the paragraph.<br>
3647 * <code>embeddingLevel</code> will receive the level of the paragraph.
3649 * @throws IllegalStateException if this call is not preceded by a successful
3650 * call to <code>setPara</code> or <code>setLine</code>
3651 * @throws IllegalArgumentException if paraIndex is not in the range
3652 * <code>[0..countParagraphs()-1]</code>
3654 * @see com.ibm.icu.text.BidiRun
3657 public BidiRun getParagraphByIndex(int paraIndex)
3659 verifyValidParaOrLine();
3660 verifyRange(paraIndex, 0, paraCount);
3662 Bidi bidi = paraBidi; /* get Para object if Line object */
3664 if (paraIndex == 0) {
3667 paraStart = bidi.paras[paraIndex - 1];
3669 BidiRun bidiRun = new BidiRun();
3670 bidiRun.start = paraStart;
3671 bidiRun.limit = bidi.paras[paraIndex];
3672 bidiRun.level = GetParaLevelAt(paraStart);
3677 * Get a paragraph, given a position within the text.
3678 * This method returns information about a paragraph.<br>
3679 * Note: if the paragraph index is known, it is more efficient to
3680 * retrieve the paragraph information using getParagraphByIndex().<p>
3682 * @param charIndex is the index of a character within the text, in the
3683 * range <code>[0..getProcessedLength()-1]</code>.
3685 * @return a BidiRun object with the details of the paragraph:<br>
3686 * <code>start</code> will receive the index of the first character
3687 * of the paragraph in the text.<br>
3688 * <code>limit</code> will receive the limit of the paragraph.<br>
3689 * <code>embeddingLevel</code> will receive the level of the paragraph.
3691 * @throws IllegalStateException if this call is not preceded by a successful
3692 * call to <code>setPara</code> or <code>setLine</code>
3693 * @throws IllegalArgumentException if charIndex is not within the legal range
3695 * @see com.ibm.icu.text.BidiRun
3696 * @see #getParagraphByIndex
3697 * @see #getProcessedLength
3700 public BidiRun getParagraph(int charIndex)
3702 verifyValidParaOrLine();
3703 Bidi bidi = paraBidi; /* get Para object if Line object */
3704 verifyRange(charIndex, 0, bidi.length);
3706 for (paraIndex = 0; charIndex >= bidi.paras[paraIndex]; paraIndex++) {
3708 return getParagraphByIndex(paraIndex);
3712 * Get the index of a paragraph, given a position within the text.<p>
3714 * @param charIndex is the index of a character within the text, in the
3715 * range <code>[0..getProcessedLength()-1]</code>.
3717 * @return The index of the paragraph containing the specified position,
3720 * @throws IllegalStateException if this call is not preceded by a successful
3721 * call to <code>setPara</code> or <code>setLine</code>
3722 * @throws IllegalArgumentException if charIndex is not within the legal range
3724 * @see com.ibm.icu.text.BidiRun
3725 * @see #getProcessedLength
3728 public int getParagraphIndex(int charIndex)
3730 verifyValidParaOrLine();
3731 Bidi bidi = paraBidi; /* get Para object if Line object */
3732 verifyRange(charIndex, 0, bidi.length);
3734 for (paraIndex = 0; charIndex >= bidi.paras[paraIndex]; paraIndex++) {
3740 * Set a custom Bidi classifier used by the UBA implementation for Bidi
3741 * class determination.
3743 * @param classifier A new custom classifier. This can be null.
3745 * @see #getCustomClassifier
3748 public void setCustomClassifier(BidiClassifier classifier) {
3749 this.customClassifier = classifier;
3753 * Gets the current custom class classifier used for Bidi class
3756 * @return An instance of class <code>BidiClassifier</code>
3758 * @see #setCustomClassifier
3761 public BidiClassifier getCustomClassifier() {
3762 return this.customClassifier;
3766 * Retrieves the Bidi class for a given code point.
3767 * <p>If a <code>BidiClassifier</code> is defined and returns a value
3768 * other than <code>CLASS_DEFAULT</code>, that value is used; otherwise
3769 * the default class determination mechanism is invoked.</p>
3771 * @param c The code point to get a Bidi class for.
3773 * @return The Bidi class for the character <code>c</code> that is in effect
3774 * for this <code>Bidi</code> instance.
3776 * @see BidiClassifier
3779 public int getCustomizedClass(int c) {
3782 if (customClassifier == null ||
3783 (dir = customClassifier.classify(c)) == Bidi.CLASS_DEFAULT) {
3784 return bdp.getClass(c);
3791 * <code>setLine()</code> returns a <code>Bidi</code> object to
3792 * contain the reordering information, especially the resolved levels,
3793 * for all the characters in a line of text. This line of text is
3794 * specified by referring to a <code>Bidi</code> object representing
3795 * this information for a piece of text containing one or more paragraphs,
3796 * and by specifying a range of indexes in this text.<p>
3797 * In the new line object, the indexes will range from 0 to <code>limit-start-1</code>.<p>
3799 * This is used after calling <code>setPara()</code>
3800 * for a piece of text, and after line-breaking on that text.
3801 * It is not necessary if each paragraph is treated as a single line.<p>
3803 * After line-breaking, rules (L1) and (L2) for the treatment of
3804 * trailing WS and for reordering are performed on
3805 * a <code>Bidi</code> object that represents a line.<p>
3807 * <strong>Important: </strong>the line <code>Bidi</code> object may
3808 * reference data within the global text <code>Bidi</code> object.
3809 * You should not alter the content of the global text object until
3810 * you are finished using the line object.
3812 * @param start is the line's first index into the text.
3814 * @param limit is just behind the line's last index into the text
3815 * (its last index +1).
3817 * @return a <code>Bidi</code> object that will now represent a line of the text.
3819 * @throws IllegalStateException if this call is not preceded by a successful
3820 * call to <code>setPara</code>
3821 * @throws IllegalArgumentException if start and limit are not in the range
3822 * <code>0<=start<limit<=getProcessedLength()</code>,
3823 * or if the specified line crosses a paragraph boundary
3826 * @see #getProcessedLength
3829 public Bidi setLine(int start, int limit)
3832 verifyRange(start, 0, limit);
3833 verifyRange(limit, 0, length+1);
3834 if (getParagraphIndex(start) != getParagraphIndex(limit - 1)) {
3835 /* the line crosses a paragraph boundary */
3836 throw new IllegalArgumentException();
3838 return BidiLine.setLine(this, start, limit);
3842 * Get the level for one character.
3844 * @param charIndex the index of a character.
3846 * @return The level for the character at <code>charIndex</code>.
3848 * @throws IllegalStateException if this call is not preceded by a successful
3849 * call to <code>setPara</code> or <code>setLine</code>
3850 * @throws IllegalArgumentException if charIndex is not in the range
3851 * <code>0<=charIndex<getProcessedLength()</code>
3853 * @see #getProcessedLength
3856 public byte getLevelAt(int charIndex)
3858 verifyValidParaOrLine();
3859 verifyRange(charIndex, 0, length);
3860 return BidiLine.getLevelAt(this, charIndex);
3864 * Get an array of levels for each character.<p>
3866 * Note that this method may allocate memory under some
3867 * circumstances, unlike <code>getLevelAt()</code>.
3869 * @return The levels array for the text,
3870 * or <code>null</code> if an error occurs.
3872 * @throws IllegalStateException if this call is not preceded by a successful
3873 * call to <code>setPara</code> or <code>setLine</code>
3876 public byte[] getLevels()
3878 verifyValidParaOrLine();
3882 return BidiLine.getLevels(this);
3886 * Get a logical run.
3887 * This method returns information about a run and is used
3888 * to retrieve runs in logical order.<p>
3889 * This is especially useful for line-breaking on a paragraph.
3891 * @param logicalPosition is a logical position within the source text.
3893 * @return a BidiRun object filled with <code>start</code> containing
3894 * the first character of the run, <code>limit</code> containing
3895 * the limit of the run, and <code>embeddingLevel</code> containing
3896 * the level of the run.
3898 * @throws IllegalStateException if this call is not preceded by a successful
3899 * call to <code>setPara</code> or <code>setLine</code>
3900 * @throws IllegalArgumentException if logicalPosition is not in the range
3901 * <code>0<=logicalPosition<getProcessedLength()</code>
3903 * @see com.ibm.icu.text.BidiRun
3904 * @see com.ibm.icu.text.BidiRun#getStart()
3905 * @see com.ibm.icu.text.BidiRun#getLimit()
3906 * @see com.ibm.icu.text.BidiRun#getEmbeddingLevel()
3910 public BidiRun getLogicalRun(int logicalPosition)
3912 verifyValidParaOrLine();
3913 verifyRange(logicalPosition, 0, length);
3914 return BidiLine.getLogicalRun(this, logicalPosition);
3918 * Get the number of runs.
3919 * This method may invoke the actual reordering on the
3920 * <code>Bidi</code> object, after <code>setPara()</code>
3921 * may have resolved only the levels of the text. Therefore,
3922 * <code>countRuns()</code> may have to allocate memory,
3923 * and may throw an exception if it fails to do so.
3925 * @return The number of runs.
3927 * @throws IllegalStateException if this call is not preceded by a successful
3928 * call to <code>setPara</code> or <code>setLine</code>
3931 public int countRuns()
3933 verifyValidParaOrLine();
3934 BidiLine.getRuns(this);
3940 * Get a <code>BidiRun</code> object according to its index. BidiRun methods
3941 * may be used to retrieve the run's logical start, length and level,
3942 * which can be even for an LTR run or odd for an RTL run.
3943 * In an RTL run, the character at the logical start is
3944 * visually on the right of the displayed run.
3945 * The length is the number of characters in the run.<p>
3946 * <code>countRuns()</code> is normally called
3947 * before the runs are retrieved.
3952 * Bidi bidi = new Bidi();
3953 * String text = "abc 123 DEFG xyz";
3954 * bidi.setPara(text, Bidi.RTL, null);
3955 * int i, count=bidi.countRuns(), logicalStart, visualIndex=0, length;
3957 * for (i = 0; i < count; ++i) {
3958 * run = bidi.getVisualRun(i);
3959 * logicalStart = run.getStart();
3960 * length = run.getLength();
3961 * if (Bidi.LTR == run.getEmbeddingLevel()) {
3963 * show_char(text.charAt(logicalStart++), visualIndex++);
3964 * } while (--length > 0);
3966 * logicalStart += length; // logicalLimit
3968 * show_char(text.charAt(--logicalStart), visualIndex++);
3969 * } while (--length > 0);
3974 * Note that in right-to-left runs, code like this places
3975 * second surrogates before first ones (which is generally a bad idea)
3976 * and combining characters before base characters.
3978 * Use of <code>{@link #writeReordered}</code>, optionally with the
3979 * <code>{@link #KEEP_BASE_COMBINING}</code> option, can be considered in
3980 * order to avoid these issues.
3982 * @param runIndex is the number of the run in visual order, in the
3983 * range <code>[0..countRuns()-1]</code>.
3985 * @return a BidiRun object containing the details of the run. The
3986 * directionality of the run is
3987 * <code>LTR==0</code> or <code>RTL==1</code>,
3988 * never <code>MIXED</code>.
3990 * @throws IllegalStateException if this call is not preceded by a successful
3991 * call to <code>setPara</code> or <code>setLine</code>
3992 * @throws IllegalArgumentException if <code>runIndex</code> is not in
3993 * the range <code>0<=runIndex<countRuns()</code>
3996 * @see com.ibm.icu.text.BidiRun
3997 * @see com.ibm.icu.text.BidiRun#getStart()
3998 * @see com.ibm.icu.text.BidiRun#getLength()
3999 * @see com.ibm.icu.text.BidiRun#getEmbeddingLevel()
4002 public BidiRun getVisualRun(int runIndex)
4004 verifyValidParaOrLine();
4005 BidiLine.getRuns(this);
4006 verifyRange(runIndex, 0, runCount);
4007 return BidiLine.getVisualRun(this, runIndex);
4011 * Get the visual position from a logical text position.
4012 * If such a mapping is used many times on the same
4013 * <code>Bidi</code> object, then calling
4014 * <code>getLogicalMap()</code> is more efficient.
4016 * The value returned may be <code>MAP_NOWHERE</code> if there is no
4017 * visual position because the corresponding text character is a Bidi
4018 * control removed from output by the option
4019 * <code>OPTION_REMOVE_CONTROLS</code>.
4021 * When the visual output is altered by using options of
4022 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,
4023 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,
4024 * <code>REMOVE_BIDI_CONTROLS</code>, the visual position returned may not
4025 * be correct. It is advised to use, when possible, reordering options
4026 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.
4028 * Note that in right-to-left runs, this mapping places
4029 * second surrogates before first ones (which is generally a bad idea)
4030 * and combining characters before base characters.
4031 * Use of <code>{@link #writeReordered}</code>, optionally with the
4032 * <code>{@link #KEEP_BASE_COMBINING}</code> option can be considered instead
4033 * of using the mapping, in order to avoid these issues.
4035 * @param logicalIndex is the index of a character in the text.
4037 * @return The visual position of this character.
4039 * @throws IllegalStateException if this call is not preceded by a successful
4040 * call to <code>setPara</code> or <code>setLine</code>
4041 * @throws IllegalArgumentException if <code>logicalIndex</code> is not in
4042 * the range <code>0<=logicalIndex<getProcessedLength()</code>
4044 * @see #getLogicalMap
4045 * @see #getLogicalIndex
4046 * @see #getProcessedLength
4048 * @see #OPTION_REMOVE_CONTROLS
4049 * @see #writeReordered
4052 public int getVisualIndex(int logicalIndex)
4054 verifyValidParaOrLine();
4055 verifyRange(logicalIndex, 0, length);
4056 return BidiLine.getVisualIndex(this, logicalIndex);
4061 * Get the logical text position from a visual position.
4062 * If such a mapping is used many times on the same
4063 * <code>Bidi</code> object, then calling
4064 * <code>getVisualMap()</code> is more efficient.
4066 * The value returned may be <code>MAP_NOWHERE</code> if there is no
4067 * logical position because the corresponding text character is a Bidi
4068 * mark inserted in the output by option
4069 * <code>OPTION_INSERT_MARKS</code>.
4071 * This is the inverse method to <code>getVisualIndex()</code>.
4073 * When the visual output is altered by using options of
4074 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,
4075 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,
4076 * <code>REMOVE_BIDI_CONTROLS</code>, the logical position returned may not
4077 * be correct. It is advised to use, when possible, reordering options
4078 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.
4080 * @param visualIndex is the visual position of a character.
4082 * @return The index of this character in the text.
4084 * @throws IllegalStateException if this call is not preceded by a successful
4085 * call to <code>setPara</code> or <code>setLine</code>
4086 * @throws IllegalArgumentException if <code>visualIndex</code> is not in
4087 * the range <code>0<=visualIndex<getResultLength()</code>
4089 * @see #getVisualMap
4090 * @see #getVisualIndex
4091 * @see #getResultLength
4093 * @see #OPTION_INSERT_MARKS
4094 * @see #writeReordered
4097 public int getLogicalIndex(int visualIndex)
4099 verifyValidParaOrLine();
4100 verifyRange(visualIndex, 0, resultLength);
4101 /* we can do the trivial cases without the runs array */
4102 if (insertPoints.size == 0 && controlCount == 0) {
4103 if (direction == LTR) {
4106 else if (direction == RTL) {
4107 return length - visualIndex - 1;
4110 BidiLine.getRuns(this);
4111 return BidiLine.getLogicalIndex(this, visualIndex);
4115 * Get a logical-to-visual index map (array) for the characters in the
4116 * <code>Bidi</code> (paragraph or line) object.
4118 * Some values in the map may be <code>MAP_NOWHERE</code> if the
4119 * corresponding text characters are Bidi controls removed from the visual
4120 * output by the option <code>OPTION_REMOVE_CONTROLS</code>.
4122 * When the visual output is altered by using options of
4123 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,
4124 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,
4125 * <code>REMOVE_BIDI_CONTROLS</code>, the visual positions returned may not
4126 * be correct. It is advised to use, when possible, reordering options
4127 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.
4129 * Note that in right-to-left runs, this mapping places
4130 * second surrogates before first ones (which is generally a bad idea)
4131 * and combining characters before base characters.
4132 * Use of <code>{@link #writeReordered}</code>, optionally with the
4133 * <code>{@link #KEEP_BASE_COMBINING}</code> option can be considered instead
4134 * of using the mapping, in order to avoid these issues.
4136 * @return an array of <code>getProcessedLength()</code>
4137 * indexes which will reflect the reordering of the characters.<br><br>
4138 * The index map will result in
4139 * <code>indexMap[logicalIndex]==visualIndex</code>, where
4140 * <code>indexMap</code> represents the returned array.
4142 * @throws IllegalStateException if this call is not preceded by a successful
4143 * call to <code>setPara</code> or <code>setLine</code>
4145 * @see #getVisualMap
4146 * @see #getVisualIndex
4147 * @see #getProcessedLength
4149 * @see #OPTION_REMOVE_CONTROLS
4150 * @see #writeReordered
4153 public int[] getLogicalMap()
4155 /* countRuns() checks successful call to setPara/setLine */
4160 return BidiLine.getLogicalMap(this);
4164 * Get a visual-to-logical index map (array) for the characters in the
4165 * <code>Bidi</code> (paragraph or line) object.
4167 * Some values in the map may be <code>MAP_NOWHERE</code> if the
4168 * corresponding text characters are Bidi marks inserted in the visual
4169 * output by the option <code>OPTION_INSERT_MARKS</code>.
4171 * When the visual output is altered by using options of
4172 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,
4173 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,
4174 * <code>REMOVE_BIDI_CONTROLS</code>, the logical positions returned may not
4175 * be correct. It is advised to use, when possible, reordering options
4176 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.
4178 * @return an array of <code>getResultLength()</code>
4179 * indexes which will reflect the reordering of the characters.<br><br>
4180 * The index map will result in
4181 * <code>indexMap[visualIndex]==logicalIndex</code>, where
4182 * <code>indexMap</code> represents the returned array.
4184 * @throws IllegalStateException if this call is not preceded by a successful
4185 * call to <code>setPara</code> or <code>setLine</code>
4187 * @see #getLogicalMap
4188 * @see #getLogicalIndex
4189 * @see #getResultLength
4191 * @see #OPTION_INSERT_MARKS
4192 * @see #writeReordered
4195 public int[] getVisualMap()
4197 /* countRuns() checks successful call to setPara/setLine */
4199 if (resultLength <= 0) {
4202 return BidiLine.getVisualMap(this);
4206 * This is a convenience method that does not use a <code>Bidi</code> object.
4207 * It is intended to be used for when an application has determined the levels
4208 * of objects (character sequences) and just needs to have them reordered (L2).
4209 * This is equivalent to using <code>getLogicalMap()</code> on a
4210 * <code>Bidi</code> object.
4212 * @param levels is an array of levels that have been determined by
4215 * @return an array of <code>levels.length</code>
4216 * indexes which will reflect the reordering of the characters.<p>
4217 * The index map will result in
4218 * <code>indexMap[logicalIndex]==visualIndex</code>, where
4219 * <code>indexMap</code> represents the returned array.
4223 public static int[] reorderLogical(byte[] levels)
4225 return BidiLine.reorderLogical(levels);
4229 * This is a convenience method that does not use a <code>Bidi</code> object.
4230 * It is intended to be used for when an application has determined the levels
4231 * of objects (character sequences) and just needs to have them reordered (L2).
4232 * This is equivalent to using <code>getVisualMap()</code> on a
4233 * <code>Bidi</code> object.
4235 * @param levels is an array of levels that have been determined by
4238 * @return an array of <code>levels.length</code>
4239 * indexes which will reflect the reordering of the characters.<p>
4240 * The index map will result in
4241 * <code>indexMap[visualIndex]==logicalIndex</code>, where
4242 * <code>indexMap</code> represents the returned array.
4246 public static int[] reorderVisual(byte[] levels)
4248 return BidiLine.reorderVisual(levels);
4252 * Invert an index map.
4253 * The index mapping of the argument map is inverted and returned as
4254 * an array of indexes that we will call the inverse map.
4256 * @param srcMap is an array whose elements define the original mapping
4257 * from a source array to a destination array.
4258 * Some elements of the source array may have no mapping in the
4259 * destination array. In that case, their value will be
4260 * the special value <code>MAP_NOWHERE</code>.
4261 * All elements must be >=0 or equal to <code>MAP_NOWHERE</code>.
4262 * Some elements in the source map may have a value greater than the
4263 * srcMap.length if the destination array has more elements than the
4265 * There must be no duplicate indexes (two or more elements with the
4266 * same value except <code>MAP_NOWHERE</code>).
4268 * @return an array representing the inverse map.
4269 * This array has a number of elements equal to 1 + the highest
4270 * value in <code>srcMap</code>.
4271 * For elements of the result array which have no matching elements
4272 * in the source array, the corresponding elements in the inverse
4273 * map will receive a value equal to <code>MAP_NOWHERE</code>.
4274 * If element with index i in <code>srcMap</code> has a value k different
4275 * from <code>MAP_NOWHERE</code>, this means that element i of
4276 * the source array maps to element k in the destination array.
4277 * The inverse map will have value i in its k-th element.
4278 * For all elements of the destination array which do not map to
4279 * an element in the source array, the corresponding element in the
4280 * inverse map will have a value equal to <code>MAP_NOWHERE</code>.
4285 public static int[] invertMap(int[] srcMap)
4287 if (srcMap == null) {
4290 return BidiLine.invertMap(srcMap);
4295 * Fields and methods for compatibility with java.text.bidi (Sun implementation)
4299 * Constant indicating base direction is left-to-right.
4302 public static final int DIRECTION_LEFT_TO_RIGHT = LTR;
4305 * Constant indicating base direction is right-to-left.
4308 public static final int DIRECTION_RIGHT_TO_LEFT = RTL;
4311 * Constant indicating that the base direction depends on the first strong
4312 * directional character in the text according to the Unicode Bidirectional
4313 * Algorithm. If no strong directional character is present, the base
4314 * direction is left-to-right.
4317 public static final int DIRECTION_DEFAULT_LEFT_TO_RIGHT = LEVEL_DEFAULT_LTR;
4320 * Constant indicating that the base direction depends on the first strong
4321 * directional character in the text according to the Unicode Bidirectional
4322 * Algorithm. If no strong directional character is present, the base
4323 * direction is right-to-left.
4326 public static final int DIRECTION_DEFAULT_RIGHT_TO_LEFT = LEVEL_DEFAULT_RTL;
4329 * Create Bidi from the given paragraph of text and base direction.
4331 * @param paragraph a paragraph of text
4332 * @param flags a collection of flags that control the algorithm. The
4333 * algorithm understands the flags DIRECTION_LEFT_TO_RIGHT,
4334 * DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and
4335 * DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved.
4336 * @see #DIRECTION_LEFT_TO_RIGHT
4337 * @see #DIRECTION_RIGHT_TO_LEFT
4338 * @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT
4339 * @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT
4342 public Bidi(String paragraph, int flags)
4344 this(paragraph.toCharArray(), 0, null, 0, paragraph.length(), flags);
4347 //#if defined(FOUNDATION10)
4350 * Create Bidi from the given paragraph of text.<p>
4352 * The RUN_DIRECTION attribute in the text, if present, determines the base
4353 * direction (left-to-right or right-to-left). If not present, the base
4354 * direction is computed using the Unicode Bidirectional Algorithm,
4355 * defaulting to left-to-right if there are no strong directional characters
4356 * in the text. This attribute, if present, must be applied to all the text
4357 * in the paragraph.<p>
4359 * The BIDI_EMBEDDING attribute in the text, if present, represents
4360 * embedding level information. Negative values from -1 to -62 indicate
4361 * overrides at the absolute value of the level. Positive values from 1 to
4362 * 62 indicate embeddings. Where values are zero or not defined, the base
4363 * embedding level as determined by the base direction is assumed.<p>
4365 * The NUMERIC_SHAPING attribute in the text, if present, converts European
4366 * digits to other decimal digits before running the bidi algorithm. This
4367 * attribute, if present, must be applied to all the text in the paragraph.<p>
4369 * Note: this constructor calls setPara() internally.
4371 * @param paragraph a paragraph of text with optional character and
4372 * paragraph attribute information
4375 public Bidi(AttributedCharacterIterator paragraph)
4383 * Create Bidi from the given text, embedding, and direction information.
4384 * The embeddings array may be null. If present, the values represent
4385 * embedding level information. Negative values from -1 to -61 indicate
4386 * overrides at the absolute value of the level. Positive values from 1 to
4387 * 61 indicate embeddings. Where values are zero, the base embedding level
4388 * as determined by the base direction is assumed.<p>
4390 * Note: this constructor calls setPara() internally.
4392 * @param text an array containing the paragraph of text to process.
4393 * @param textStart the index into the text array of the start of the
4395 * @param embeddings an array containing embedding values for each character
4396 * in the paragraph. This can be null, in which case it is assumed
4397 * that there is no external embedding information.
4398 * @param embStart the index into the embedding array of the start of the
4400 * @param paragraphLength the length of the paragraph in the text and
4401 * embeddings arrays.
4402 * @param flags a collection of flags that control the algorithm. The
4403 * algorithm understands the flags DIRECTION_LEFT_TO_RIGHT,
4404 * DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and
4405 * DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved.
4407 * @throws IllegalArgumentException if the values in embeddings are
4408 * not within the allowed range
4410 * @see #DIRECTION_LEFT_TO_RIGHT
4411 * @see #DIRECTION_RIGHT_TO_LEFT
4412 * @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT
4413 * @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT
4416 public Bidi(char[] text,
4420 int paragraphLength,
4426 case DIRECTION_LEFT_TO_RIGHT:
4430 case DIRECTION_RIGHT_TO_LEFT:
4433 case DIRECTION_DEFAULT_LEFT_TO_RIGHT:
4434 paraLvl = LEVEL_DEFAULT_LTR;
4436 case DIRECTION_DEFAULT_RIGHT_TO_LEFT:
4437 paraLvl = LEVEL_DEFAULT_RTL;
4440 byte[] paraEmbeddings;
4441 if (embeddings == null) {
4442 paraEmbeddings = null;
4444 paraEmbeddings = new byte[paragraphLength];
4446 for (int i = 0; i < paragraphLength; i++) {
4447 lev = embeddings[i + embStart];
4449 lev = (byte)((- lev) | LEVEL_OVERRIDE);
4450 } else if (lev == 0) {
4452 if (paraLvl > MAX_EXPLICIT_LEVEL) {
4456 paraEmbeddings[i] = lev;
4459 if (textStart == 0 && embStart == 0 && paragraphLength == text.length) {
4460 setPara(text, paraLvl, paraEmbeddings);
4462 char[] paraText = new char[paragraphLength];
4463 System.arraycopy(text, textStart, paraText, 0, paragraphLength);
4464 setPara(paraText, paraLvl, paraEmbeddings);
4469 * Create a Bidi object representing the bidi information on a line of text
4470 * within the paragraph represented by the current Bidi. This call is not
4471 * required if the entire paragraph fits on one line.
4473 * @param lineStart the offset from the start of the paragraph to the start
4475 * @param lineLimit the offset from the start of the paragraph to the limit
4478 * @throws IllegalStateException if this call is not preceded by a successful
4479 * call to <code>setPara</code>
4480 * @throws IllegalArgumentException if lineStart and lineLimit are not in the range
4481 * <code>0<=lineStart<lineLimit<=getProcessedLength()</code>,
4482 * or if the specified line crosses a paragraph boundary
4485 public Bidi createLineBidi(int lineStart, int lineLimit)
4487 return setLine(lineStart, lineLimit);
4491 * Return true if the line is not left-to-right or right-to-left. This means
4492 * it either has mixed runs of left-to-right and right-to-left text, or the
4493 * base direction differs from the direction of the only run of text.
4495 * @return true if the line is not left-to-right or right-to-left.
4497 * @throws IllegalStateException if this call is not preceded by a successful
4498 * call to <code>setPara</code>
4501 public boolean isMixed()
4503 return (!isLeftToRight() && !isRightToLeft());
4507 * Return true if the line is all left-to-right text and the base direction
4510 * @return true if the line is all left-to-right text and the base direction
4513 * @throws IllegalStateException if this call is not preceded by a successful
4514 * call to <code>setPara</code>
4517 public boolean isLeftToRight()
4519 return (getDirection() == LTR && (paraLevel & 1) == 0);
4523 * Return true if the line is all right-to-left text, and the base direction
4526 * @return true if the line is all right-to-left text, and the base
4527 * direction is right-to-left
4529 * @throws IllegalStateException if this call is not preceded by a successful
4530 * call to <code>setPara</code>
4533 public boolean isRightToLeft()
4535 return (getDirection() == RTL && (paraLevel & 1) == 1);
4539 * Return true if the base direction is left-to-right
4541 * @return true if the base direction is left-to-right
4543 * @throws IllegalStateException if this call is not preceded by a successful
4544 * call to <code>setPara</code> or <code>setLine</code>
4548 public boolean baseIsLeftToRight()
4550 return (getParaLevel() == LTR);
4554 * Return the base level (0 if left-to-right, 1 if right-to-left).
4556 * @return the base level
4558 * @throws IllegalStateException if this call is not preceded by a successful
4559 * call to <code>setPara</code> or <code>setLine</code>
4563 public int getBaseLevel()
4565 return getParaLevel();
4569 * Return the number of level runs.
4571 * @return the number of level runs
4573 * @throws IllegalStateException if this call is not preceded by a successful
4574 * call to <code>setPara</code> or <code>setLine</code>
4578 public int getRunCount()
4584 * Compute the logical to visual run mapping
4586 void getLogicalToVisualRunsMap()
4588 if (isGoodLogicalToVisualRunsMap) {
4591 int count = countRuns();
4592 if ((logicalToVisualRunsMap == null) ||
4593 (logicalToVisualRunsMap.length < count)) {
4594 logicalToVisualRunsMap = new int[count];
4597 long[] keys = new long[count];
4598 for (i = 0; i < count; i++) {
4599 keys[i] = ((long)(runs[i].start)<<32) + i;
4602 for (i = 0; i < count; i++) {
4603 logicalToVisualRunsMap[i] = (int)(keys[i] & 0x00000000FFFFFFFF);
4606 isGoodLogicalToVisualRunsMap = true;
4610 * Return the level of the nth logical run in this line.
4612 * @param run the index of the run, between 0 and <code>countRuns()-1</code>
4614 * @return the level of the run
4616 * @throws IllegalStateException if this call is not preceded by a successful
4617 * call to <code>setPara</code> or <code>setLine</code>
4618 * @throws IllegalArgumentException if <code>run</code> is not in
4619 * the range <code>0<=run<countRuns()</code>
4622 public int getRunLevel(int run)
4624 verifyValidParaOrLine();
4625 BidiLine.getRuns(this);
4626 verifyRange(run, 0, runCount);
4627 getLogicalToVisualRunsMap();
4628 return runs[logicalToVisualRunsMap[run]].level;
4632 * Return the index of the character at the start of the nth logical run in
4633 * this line, as an offset from the start of the line.
4635 * @param run the index of the run, between 0 and <code>countRuns()</code>
4637 * @return the start of the run
4639 * @throws IllegalStateException if this call is not preceded by a successful
4640 * call to <code>setPara</code> or <code>setLine</code>
4641 * @throws IllegalArgumentException if <code>run</code> is not in
4642 * the range <code>0<=run<countRuns()</code>
4645 public int getRunStart(int run)
4647 verifyValidParaOrLine();
4648 BidiLine.getRuns(this);
4649 verifyRange(run, 0, runCount);
4650 getLogicalToVisualRunsMap();
4651 return runs[logicalToVisualRunsMap[run]].start;
4655 * Return the index of the character past the end of the nth logical run in
4656 * this line, as an offset from the start of the line. For example, this
4657 * will return the length of the line for the last run on the line.
4659 * @param run the index of the run, between 0 and <code>countRuns()</code>
4661 * @return the limit of the run
4663 * @throws IllegalStateException if this call is not preceded by a successful
4664 * call to <code>setPara</code> or <code>setLine</code>
4665 * @throws IllegalArgumentException if <code>run</code> is not in
4666 * the range <code>0<=run<countRuns()</code>
4669 public int getRunLimit(int run)
4671 verifyValidParaOrLine();
4672 BidiLine.getRuns(this);
4673 verifyRange(run, 0, runCount);
4674 getLogicalToVisualRunsMap();
4675 int idx = logicalToVisualRunsMap[run];
4676 int len = idx == 0 ? runs[idx].limit :
4677 runs[idx].limit - runs[idx-1].limit;
4678 return runs[idx].start + len;
4682 * Return true if the specified text requires bidi analysis. If this returns
4683 * false, the text will display left-to-right. Clients can then avoid
4684 * constructing a Bidi object. Text in the Arabic Presentation Forms area of
4685 * Unicode is presumed to already be shaped and ordered for display, and so
4686 * will not cause this method to return true.
4688 * @param text the text containing the characters to test
4689 * @param start the start of the range of characters to test
4690 * @param limit the limit of the range of characters to test
4692 * @return true if the range of characters requires bidi analysis
4696 public static boolean requiresBidi(char[] text,
4700 final int RTLMask = (1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT |
4701 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC |
4702 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING |
4703 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE |
4704 1 << UCharacter.DIRECTIONALITY_ARABIC_NUMBER);
4706 for (int i = start; i < limit; ++i) {
4707 if (((1 << UCharacter.getDirection(text[i])) & RTLMask) != 0) {
4715 * Reorder the objects in the array into visual order based on their levels.
4716 * This is a utility method to use when you have a collection of objects
4717 * representing runs of text in logical order, each run containing text at a
4718 * single level. The elements at <code>index</code> from
4719 * <code>objectStart</code> up to <code>objectStart + count</code> in the
4720 * objects array will be reordered into visual order assuming
4721 * each run of text has the level indicated by the corresponding element in
4722 * the levels array (at <code>index - objectStart + levelStart</code>).
4724 * @param levels an array representing the bidi level of each object
4725 * @param levelStart the start position in the levels array
4726 * @param objects the array of objects to be reordered into visual order
4727 * @param objectStart the start position in the objects array
4728 * @param count the number of objects to reorder
4731 public static void reorderVisually(byte[] levels,
4737 byte[] reorderLevels = new byte[count];
4738 System.arraycopy(levels, levelStart, reorderLevels, 0, count);
4739 int[] indexMap = reorderVisual(reorderLevels);
4740 Object[] temp = new Object[count];
4741 System.arraycopy(objects, objectStart, temp, 0, count);
4742 for (int i = 0; i < count; ++i) {
4743 objects[objectStart + i] = temp[indexMap[i]];
4748 * Take a <code>Bidi</code> object containing the reordering
4749 * information for a piece of text (one or more paragraphs) set by
4750 * <code>setPara()</code> or for a line of text set by <code>setLine()</code>
4751 * and return a string containing the reordered text.
4753 * <p>The text may have been aliased (only a reference was stored
4754 * without copying the contents), thus it must not have been modified
4755 * since the <code>setPara()</code> call.</p>
4757 * This method preserves the integrity of characters with multiple
4758 * code units and (optionally) combining characters.
4759 * Characters in RTL runs can be replaced by mirror-image characters
4760 * in the returned string. Note that "real" mirroring has to be done in a
4761 * rendering engine by glyph selection and that for many "mirrored"
4762 * characters there are no Unicode characters as mirror-image equivalents.
4763 * There are also options to insert or remove Bidi control
4764 * characters; see the descriptions of the return value and the
4765 * <code>options</code> parameter, and of the option bit flags.
4767 * @param options A bit set of options for the reordering that control
4768 * how the reordered text is written.
4769 * The options include mirroring the characters on a code
4770 * point basis and inserting LRM characters, which is used
4771 * especially for transforming visually stored text
4772 * to logically stored text (although this is still an
4773 * imperfect implementation of an "inverse Bidi" algorithm
4774 * because it uses the "forward Bidi" algorithm at its core).
4775 * The available options are:
4776 * <code>DO_MIRRORING</code>,
4777 * <code>INSERT_LRM_FOR_NUMERIC</code>,
4778 * <code>KEEP_BASE_COMBINING</code>,
4779 * <code>OUTPUT_REVERSE</code>,
4780 * <code>REMOVE_BIDI_CONTROLS</code>,
4781 * <code>STREAMING</code>
4783 * @return The reordered text.
4784 * If the <code>INSERT_LRM_FOR_NUMERIC</code> option is set, then
4785 * the length of the returned string could be as large as
4786 * <code>getLength()+2*countRuns()</code>.<br>
4787 * If the <code>REMOVE_BIDI_CONTROLS</code> option is set, then the
4788 * length of the returned string may be less than
4789 * <code>getLength()</code>.<br>
4790 * If none of these options is set, then the length of the returned
4791 * string will be exactly <code>getProcessedLength()</code>.
4793 * @throws IllegalStateException if this call is not preceded by a successful
4794 * call to <code>setPara</code> or <code>setLine</code>
4796 * @see #DO_MIRRORING
4797 * @see #INSERT_LRM_FOR_NUMERIC
4798 * @see #KEEP_BASE_COMBINING
4799 * @see #OUTPUT_REVERSE
4800 * @see #REMOVE_BIDI_CONTROLS
4801 * @see #OPTION_STREAMING
4802 * @see #getProcessedLength
4805 public String writeReordered(int options)
4807 verifyValidParaOrLine();
4810 return new String("");
4813 return BidiWriter.writeReordered(this, options);
4817 * Reverse a Right-To-Left run of Unicode text.
4819 * This method preserves the integrity of characters with multiple
4820 * code units and (optionally) combining characters.
4821 * Characters can be replaced by mirror-image characters
4822 * in the destination buffer. Note that "real" mirroring has
4823 * to be done in a rendering engine by glyph selection
4824 * and that for many "mirrored" characters there are no
4825 * Unicode characters as mirror-image equivalents.
4826 * There are also options to insert or remove Bidi control
4829 * This method is the implementation for reversing RTL runs as part
4830 * of <code>writeReordered()</code>. For detailed descriptions
4831 * of the parameters, see there.
4832 * Since no Bidi controls are inserted here, the output string length
4833 * will never exceed <code>src.length()</code>.
4835 * @see #writeReordered
4837 * @param src The RTL run text.
4839 * @param options A bit set of options for the reordering that control
4840 * how the reordered text is written.
4841 * See the <code>options</code> parameter in <code>writeReordered()</code>.
4843 * @return The reordered text.
4844 * If the <code>REMOVE_BIDI_CONTROLS</code> option
4845 * is set, then the length of the returned string may be less than
4846 * <code>src.length()</code>. If this option is not set,
4847 * then the length of the returned string will be exactly
4848 * <code>src.length()</code>.
4850 * @throws IllegalArgumentException if <code>src</code> is null.
4853 public static String writeReverse(String src, int options)
4855 /* error checking */
4857 throw new IllegalArgumentException();
4860 if (src.length() > 0) {
4861 return BidiWriter.writeReverse(src, options);
4864 return new String("");