I18N of Debconf - A Foundation for Templates Translation
Index
- 2003-07-13: debconf version 1.3.0 (and then 1.3.1) was installed
into Debian Sid (unstable). Thus, the problem was solved except for
Dialog frontend (with dialog package) and Gnome frontend.
- 2003-07-03: Joey Hess uploaded the debconf version 1.3.0 to Debian
Sid (unstable). Since the construction of the package is changed
and the procedure for new packages is taken, it may take about
one week or so for the package to be installed into Debian Sid.
- 2003-07-03: libtext-wrapi18n-perl and libtext-charwidth-perl
were registered to Debian. Hereafter you can install these packages
by using "apt-get update; apt-get install libtext-wrapi18n-perl"
on Debian Sid (unstable) system.
- 2003-06-22: New versions of libtext-charwidth-perl (0.03-1) and
libtext-wrapi18n-perl (0.04) were released.
Download.
- Added test scripts like other Perl modules.
- Fixed a behavior when wrapping a string which ends with LF
character(s), which affects Editor frontend of debconf.
- 2003-06-21: Joey prepared a modified debconf-1.3.0 which runs
with renamed Text::CharWidth and Text::WrapI18N. Read
this mail
for detail.
Download.
- 2003-06-21: New versions of libtext-wrapi18n-perl and
libtext-charwidth-perl were released.
- Module names are changed from Text::Charwidth and Text::Wrapi18n
to Text::CharWidth and Text::WrapI18N, respectively.
This causes debconf-1.3.0 needs modification.
- Fixed a bug that Text::WrapI18N ignores LF.
- Many other tiny modifications.
- 2003-06-20: A new version debconf-1.3.0 was released by Joey.
Please read
this mail
for detail. This version will obsolete my debconf patches.
- 2003-06-20: A new version 1.2.42.i18n.1
(DEBCONF-I18N-11.diff.gz) was released.
- Debconf::Wrap was renamed into Text::Wrapi18n and moved into
separate package libtext-wrapi18n-perl, according to a
suggestion from Joey.
- A new package libtext-charwidth-perl is also released.
- 2003-06-15:
A new version 1.2.41.i18n.1 (DEBCONF-I18N-10.diff.gz) was released.
- Updated the base version.
- More comments in the source code.
- 2003-06-05:
A new version 1.2.39.i18n.1 (DEBCONF-I18N-9.diff.gz) was released.
- Updated the base version.
- A bug is fixed that this doesn't work well without
libtext-iconv-perl package.
- Length for underline for "Configurating <package>"
in Readline frontend
is now calculated properly awaring multibyte/combining/fullwidth
characters
- 2003-06-05:
I said that whiptail package doesn't support multibyte characters
nor combining/fullwidth characters, which was wrong.
"whiptail" package can be used for Dialog frontend.
- 2003-06-03:
When you want to use EUC-JP, EUC-KR, GB2312, or Big5 encoding and
Dialog frontend, please don't use xterm+luit. This is because it
won't display non-ASCII characters (i.e., Chinese, Japanese, and
Korean in this case) after displaying line-drawing characters.
This is caused by inappropriate terminfo definition. Please read
a
mail from Juliusz Chroboczek for detail.
- 2003-06-02: New version of debconf cannot be used.
Since debconf version 1.2.38 and above don't support whiptail-utf8,
you cannot use these versions of debconf to test Dialog frontend.
Waiting for improvement of whiptail-utf8 package.
Read Bug#195818 and
Bug#195836
for detail.
- 2003-06-02: A bug report for dialog.
There is a whiptail-utf8 package as a substitute of whiptail package
with multibyte support enabled, but dialog package doesn't have
a substitute. Thus I filed a bug report for dialog package to
be multibyte-compliant.
Read Bug#195674
for detail.
- 2003-06-02: A bug.
In Readline frontend, line-folding doesn't work well at
lines where user input is required (prompt lines). This is
because line-folding is processed not by debconf but by
libreadline4 for prompt lines. libreadline4 must be improved.
Read Bug#195678
for detail.
- 2003-06-01: Notes.
When using Dialog frontend in east Asian languages, please use
whiptail-utf8 package instead of dialog package. whiptail-utf8
will be used when both of them are installed. Also, you will
need libtext-iconv-perl so that translated templates are displayed.
liblocale-gettext-perl package is not a must.
Example screenshot when dialog package is used:
Japanese-EUC-JP,
Japanese-UTF-8,
Russian-KOI8-R, and
Russian-UTF-8.
It seems to work well on 8bit character encoding.
- 2003-06-01:
A patch for debconf-1.2.37 (DEBCONF-I18n-8.diff.gz)
(a mail for Debian-JP Development Mailing List, in Japanese)
- Line-folding algorithm was slightly changed. One character
for Japanese or Chinese will be treated as one word. This fixes
a problem when an English word comes at the end of a Japanese
(or Chinese) line.
- Character extraction algorithm was extended. Now JIS X 0201
and JIS X 0212 character sets from EUC-JP encoding and
ISO-8859-11/TIS-620 (Thai) are supported. Also, doublebyte
characters from EUC-KR is excluded from line-foldable characters.
Japanese/Chinese punctuations like "、" are now line-foldable.
Malformed UTF-8 sequences can be ignored.
- Debconf::Wrap is used instead of Text::Wrap also for Editor
frontend.
- 2003-06-01:
A patch for debconf-1.2.37 (bugfix version)
(a mail for Debian-JP Development Mailing List, in Japanese)
When a non-ASCII non-fullwidth character is used in UTF-8,
Debconf stopped.
- 2003-06-01: Sample screenshots of the patch:
Dialog-Japanese-UTF-8 and
Dialog-Russian-UTF-8
- 2003-05-29:
A patch for debconf-1.2.37
(a mail for Debian-JP Development Mailing List, in Japanese)
A patch to replace Text::Wrap module by an original Debconf::Wrap module.
Unlike Text::Wrap, this module supports (1) multibyte characters including
UTF-8, EUC-JP, EUC-KR, GB2312, and Big5, (2) characters whose width on
screen is not 1, i.e., combining characters (width 0) and fullwidth
characters (width 2), and (3) languages which don't use whitespaces
between words (Chinese and Japanese).
debconf-1.3.0 (or newer versions) is contained in Debian Sid (unstable).
You will be asked to install either debconf-i18n or debconf-english.
Please install debconf-i18n.
libtext-charwidth-perl and libtext-wrapi18n-perl packages are contained
in Debian Sid (unstable). Since libtext-wrapi18n-perl package Depends:
on libtext-charwidth-perl package, the following procedure will be enough
to install both packages:
apt-get update; apt-get install libtext-wrapi18n-perl.
My patch for debconf is no more needed.
If you are interested in, the last version
(1.2.42.i18n.1, 2003-06-20) is available here.
Here I will explain why this improvement is needed.
Debconf is a standard configuration tool for Debian packages.
It supports several "frontends". Some of them have problem
on line folding around multibyte characters (such as Unicode
and east Asian legacy encodings). Some of them simply cannot
display Japanese characters at all.
The rough reason why line-folding problem occurs:
- Debconf uses Text::Wrap, a standard simple module for line
folding in Perl, which doesn't support multibyte characters.
This causes line folding may occur between the 1st and the
2nd bytes of a multibyte character. Also, this causes Debconf
to count number of characters wrongly and to fold lines at
inappropriate points.
- Text::Wrap assumes that all characters occupy one column
on screen, which is not valid for east Asian multicolumn
(doublewidth, fullwidth) characters (which occupy two columns on screen)
or combining characters (which occupy zero columns on screen).
This may cause miscalculation of
visual width of characters and lines and folding point of lines.
- Text::Wrap doesn't support languages which doesn't use
whitespaces between words (Chinese and Japanese). Since Debconf
cannot find a whitespace as a line-folding point at appropriate
position, it results in ugly line-folding.
Note that this isn't probably a problem for languages which
use small amount of non-ASCII characters (which are all multibyte
characters in UTF-8) and whitespaces between
words (such as German and French)
because (1) miscalculation of visible width and number
of characters is almost negligible, and (2) since line-folding occurs
almost at whitespace, multibyte characters (all non-ASCII characters
in UTF-8) are rarely divided by newline character.
The following several sections show examples of broken situations
when this improvement is not applied. I used debconf 1.2.36 to
take these screenshots.
In "ja_JP.eucJP" locale. It works well. Note that Japanese
translators of Debconf templates have to insert whitespaces
at appropriate positions of Japanese texts so that Debconf
can fold lines properly,
and this is what Japanese translators have exactly done.
Note this workaround doesn't work on non-80-column
(non-standard) terminals.
In "ja_JP.UTF-8" locale. Visible width calculation doesn't
work well. Since most of Japanese characters are 3 bytes
in UTF-8 and occupy 2 columns, most of line-folding occur at
about 2/3 position of the true line width. Similar problem
would occur not only east Asian but also many non-Latin-script
languages such as Russian and Greek.
Note that some of lines are folded at much shorter position.
This occurs when a whitespace appears in the text. Since
Japanese sentence rarely use whitespaces, Debconf (or Text::Wrap)
is likely to want to fold line at such rare whitespace positions.
Moreover, some of multibyte characters are broken
by divided into the end of a line and the top of the next line.
Such characters are simply lost. In this case, such
fragment bytes of broken multibyte characters are shown as
question marks.
In "ja_JP.eucJP" locale. It works well. Notes for Dialog
frontend in "ja_JP.eucJP" locale are applicable here.
In "ja_JP.UTF-8" locale. It doesn't work well, just like
the Dialog frontend. Note that flagments of divided multibyte characters
are not displayed even as question marks (positions of red arrows).
This behavior
(reaction for malformed illegal byte sequences) depends
on terminal emulators.
In "ja_JP.eucJP" locale. It works well.
In "ja_JP.UTF-8" locale. It is clear that this is not
usable at all. This is because of improper font selection.
Though GTK has a potential to display Japanese,
the default font setting for dialog panel drops Japanese.
This is probably because GTK developers don't test their
default settings well.
This is error messages which appear when Gnome frontend
was invoked on "ja_JP.UTF-8" locale. This error message
is the reason why I guessed the reason of the above Gnome
frontend problem as above.
I wrote Debconf::Wrap module, a Text::Wrap substitution, which:
- can handle multibyte characters:
this is a basis for other following points.
- can handle visible width (number of columns) of each character:
so that Debconf can calculate appropriate position for line folding.
- can handle languages which don't use many whitespaces:
so that Debconf doesn't need to fold lines at rare whitespaces
which may accidentally appear at non-appropreate position for
line folding in Chinese and Japanese.
Perl5.8's standard 'Encoding' module should not be used so that
Debconf works well even in installation process when the machine
doesn't have 'perl' package. In other words, Debconf should work
minimally well with 'perl-base' package. Also, since wcwidth()
is not available from Perl, it must be implemented.
Also, it should work for various user encodings. It may a solution
to convert every text from/to UTF-8 so that all internal text handlings
will be done by UTF-8. However, in this case, we have to be sure that
Debconf doesn't discriminate languages. It sometimes occurs that
"Support for East Asian languages is eliminated because it needs
large character mapping table" but Debconf must not do such a
discrimination. (Note that my patch doesn't contain any large
character mapping table.)
The above description is exactly my patches does. You can test it.
For Gnome Frontend, I expect that usage of Gnome2 instead of Gnome1
would solve this situation, because Gnome2 is based on Pango.
However, my patch doesn't support this yet. I need Gnome2 module
for Perl.
久保田 智広 Tomohiro KUBOTA
<debian at tmail dot plala dot or dot jp>