Only read this section if you want the gory details about why this happens.
The expected behavior when the text is displayed is not described in detail in the XHTML/HTML specifications, but is described in
recent CSS specifications. Although the examples on this page do not use CSS, the same principles apply. The following is taken from the CSS 2.1
Working Draft:
- If 'white-space' is set to 'normal', 'nowrap', or 'pre-line',
- every tab (U+0009) is converted to a space (U+0020)
- any space (U+0020) following another space (U+0020) — even a space before the inline, if that space also has 'white-space' set
to 'normal', 'nowrap' or 'pre-line' — is removed.
Given a scenario as follows (where the colors represent spaces, U+0020, for easy identification):
<ltr>A <rtl> B </rtl> C</ltr>
the spec says that the space after A is kept, the space before B is removed, the space after B is kept, the space before C is removed.
This is then rendered according to the Unicode bidirectional algorithm, and the end result is:
A BC
Note that there are two spaces between A and B! The embedding levels can be expressed as follows:
11221