Japanese page

I18N of Debconf - A Foundation for Templates Translation

Index


News


Download of the newest version

debconf-1.3.0 (or newer versions) is contained in Debian Sid (unstable). You will be asked to install either debconf-i18n or debconf-english. Please install debconf-i18n.

libtext-charwidth-perl and libtext-wrapi18n-perl packages are contained in Debian Sid (unstable). Since libtext-wrapi18n-perl package Depends: on libtext-charwidth-perl package, the following procedure will be enough to install both packages: apt-get update; apt-get install libtext-wrapi18n-perl.

My patch for debconf is no more needed. If you are interested in, the last version (1.2.42.i18n.1, 2003-06-20) is available here.


Introduction - Why Is This Improvement Needed?

Here I will explain why this improvement is needed.

Debconf is a standard configuration tool for Debian packages. It supports several "frontends". Some of them have problem on line folding around multibyte characters (such as Unicode and east Asian legacy encodings). Some of them simply cannot display Japanese characters at all.

The rough reason why line-folding problem occurs:

  1. Debconf uses Text::Wrap, a standard simple module for line folding in Perl, which doesn't support multibyte characters. This causes line folding may occur between the 1st and the 2nd bytes of a multibyte character. Also, this causes Debconf to count number of characters wrongly and to fold lines at inappropriate points.
  2. Text::Wrap assumes that all characters occupy one column on screen, which is not valid for east Asian multicolumn (doublewidth, fullwidth) characters (which occupy two columns on screen) or combining characters (which occupy zero columns on screen). This may cause miscalculation of visual width of characters and lines and folding point of lines.
  3. Text::Wrap doesn't support languages which doesn't use whitespaces between words (Chinese and Japanese). Since Debconf cannot find a whitespace as a line-folding point at appropriate position, it results in ugly line-folding.
Note that this isn't probably a problem for languages which use small amount of non-ASCII characters (which are all multibyte characters in UTF-8) and whitespaces between words (such as German and French) because (1) miscalculation of visible width and number of characters is almost negligible, and (2) since line-folding occurs almost at whitespace, multibyte characters (all non-ASCII characters in UTF-8) are rarely divided by newline character.

The following several sections show examples of broken situations when this improvement is not applied. I used debconf 1.2.36 to take these screenshots.


Screenshots of Buggy Situations - Dialog Frontend

dialog frontend in ja_JP.eucJP locale
In "ja_JP.eucJP" locale. It works well. Note that Japanese translators of Debconf templates have to insert whitespaces at appropriate positions of Japanese texts so that Debconf can fold lines properly, and this is what Japanese translators have exactly done. Note this workaround doesn't work on non-80-column (non-standard) terminals.

dialog frontend in ja_JP.UTF-8 locale
In "ja_JP.UTF-8" locale. Visible width calculation doesn't work well. Since most of Japanese characters are 3 bytes in UTF-8 and occupy 2 columns, most of line-folding occur at about 2/3 position of the true line width. Similar problem would occur not only east Asian but also many non-Latin-script languages such as Russian and Greek.

Note that some of lines are folded at much shorter position. This occurs when a whitespace appears in the text. Since Japanese sentence rarely use whitespaces, Debconf (or Text::Wrap) is likely to want to fold line at such rare whitespace positions.

Moreover, some of multibyte characters are broken by divided into the end of a line and the top of the next line. Such characters are simply lost. In this case, such fragment bytes of broken multibyte characters are shown as question marks.


Screenshots of Buggy Situations - Readline Frontend

readline frontend in ja_JP.eucJP locale
In "ja_JP.eucJP" locale. It works well. Notes for Dialog frontend in "ja_JP.eucJP" locale are applicable here.

readline frontend in ja_JP.UTF-8 locale
In "ja_JP.UTF-8" locale. It doesn't work well, just like the Dialog frontend. Note that flagments of divided multibyte characters are not displayed even as question marks (positions of red arrows). This behavior (reaction for malformed illegal byte sequences) depends on terminal emulators.


Screenshots of Buggy Situations - Gnome Frontend

gnome frontend in ja_JP.eucJP locale
In "ja_JP.eucJP" locale. It works well.

gnome frontend in ja_JP.UTF-8 locale
In "ja_JP.UTF-8" locale. It is clear that this is not usable at all. This is because of improper font selection. Though GTK has a potential to display Japanese, the default font setting for dialog panel drops Japanese. This is probably because GTK developers don't test their default settings well.

error when gnome frontend is invoked in ja_JP.UTF-8 locale
This is error messages which appear when Gnome frontend was invoked on "ja_JP.UTF-8" locale. This error message is the reason why I guessed the reason of the above Gnome frontend problem as above.


Solution

I wrote Debconf::Wrap module, a Text::Wrap substitution, which:

Perl5.8's standard 'Encoding' module should not be used so that Debconf works well even in installation process when the machine doesn't have 'perl' package. In other words, Debconf should work minimally well with 'perl-base' package. Also, since wcwidth() is not available from Perl, it must be implemented.

Also, it should work for various user encodings. It may a solution to convert every text from/to UTF-8 so that all internal text handlings will be done by UTF-8. However, in this case, we have to be sure that Debconf doesn't discriminate languages. It sometimes occurs that "Support for East Asian languages is eliminated because it needs large character mapping table" but Debconf must not do such a discrimination. (Note that my patch doesn't contain any large character mapping table.)

The above description is exactly my patches does. You can test it.

For Gnome Frontend, I expect that usage of Gnome2 instead of Gnome1 would solve this situation, because Gnome2 is based on Pango. However, my patch doesn't support this yet. I need Gnome2 module for Perl.


久保田 智広 Tomohiro KUBOTA <debian at tmail dot plala dot or dot jp>