This page contains examples using non-Latin characters. Use accesskey "n" to jump to the internal navigation links at any point. Right now you can skip to:

Go to W3C Home Page Go to Architecture Domain home page. Go to Internationalization Activity home page.

FAQ: Bidi space loss

Question

Why does my browser collapse spaces between Latin and Arabic/Hebrew text?

Background

Spaces between Latin and Arabic/Hebrew text may appear to collapse if text is followed by white space inside an inline element that includes a dir attribute.

For example, in such browsers the code:

<p dir="rtl"> العالمية <span dir="ltr">(W3C) </span> تخلق قواعد </p>

Would produce a result that looks as follows, where the arrow indicates the location of the missing space:

Picture of the result, showing no space to left of Latin text.

Note that this effect also occurs when right-to-left text is embedded in a left-to-right passage.

Answer

If the previous section describes the look of your code, the solution is to remove all space before the end tag of the inline element, or remove the dir attribute (if appropriate).

For example, removing the space between (W3C) and </span>:

<p dir="rtl"> العالمية <span dir="ltr">(W3C)</span> تخلق قواعد </p>

would produce a result that looks like:

Picture of the result, showing space on both sides of Latin text.

Note also that in this example the dir="ltr" attribute in the <span> element around the text (W3C) is not actually needed to produce the correct ordering. Leaving out the attribute or the whole span element will also solve the problem.

How does it look for me?

The following boxes show code samples followed by an implementation of that code on this page, so that you can test the behavior of your current user agent.

Code: <p dir="rtl"> العالمية <span dir="ltr">(W3C) </span> تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Code: <p dir="rtl"> العالمية <span dir="ltr">(W3C)</span> تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Code: <p dir="rtl"> العالمية <span>(W3C) </span> تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Code: <p dir="rtl"> العالمية <span>(W3C)</span> تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Code: <p dir="rtl"> العالمية (W3C) تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Technical detail

Only read this section if you want the gory details about why this happens.

The expected behavior when the text is displayed is not described in detail in the XHTML/HTML specifications, but is described in recent CSS specifications. Although the examples on this page do not use CSS, the same principles apply. The following is taken from the CSS 2.1 Working Draft:

  1. If 'white-space' is set to 'normal', 'nowrap', or 'pre-line',
    1. every tab (U+0009) is converted to a space (U+0020)
    2. any space (U+0020) following another space (U+0020) — even a space before the inline, if that space also has 'white-space' set to 'normal', 'nowrap' or 'pre-line' — is removed.

Given a scenario as follows (where the colors represent spaces, U+0020, for easy identification):

<ltr>A <rtl> B </rtl> C</ltr>

the spec says that the space after A is kept, the space before B is removed, the space after B is kept, the space before C is removed. This is then rendered according to the Unicode bidirectional algorithm, and the end result is:

A  BC

Note that there are two spaces between A and B! The embedding levels can be expressed as follows:

11221

Useful links


Contributed by Richard Ishida, W3C.

Valid XHTML 1.0! Valid CSS!Encoded in UTF-8!

Content created 4 July, 2003.
Version: $Id: qa-setting-encoding-in-applications.html,v 1.3 2003/11/06 09:10:35 rishida Exp $