Japanese page

XTerm

I am working on internationalization (i18n)-related improvement of XTerm, which is included in the distribution of XFree86 and is the most widely used terminal emulator on X Window System in the world.

"Locale" is a mechanism of ISO C and UNIX to consistently specify behavior of softwares which are related to languages, customs, and cultures in the world. For example, error messages should be shown in the language which "locale" specifies. There are several locale categories, each of which is related to a certain field of culture or software feature. LC_CTYPE category is one of them to specify the encoding to be used for every stream I/O. Thus, a software must output messages in ISO-8859-1 in ISO-8859-1 locale, EUC-JP in EUC-JP locale, UTF-8 in UTF-8 locale, KOI8-R in KOI8-R locale, and so on. Note that support of UTF-8 should also be implemented in the framework of locale, i.e., softwares should use UTF-8 if LC_CTYPE locale orders it. In other words, all what users who want to use UTF-8 have to do should be to set LANG variable to *.UTF-8 .

Softwares which use different encoding than the encoding which is specified by the current locale should be regarded as buggy. Thus, these my works should be regarded as bugfixes rather than improvements.

There are many properly internationalized softwares which obey LC_CTYPE locale to do any stream I/O. To work such good softwares to work well, terminal emulators have to properly display the messages which softwares send to the terminal. This is why terminal emulators have to obey LC_CTYPE locale to determine the encoding to be used.

Following works are related mainly to this point.

(2002-09-15) Though internationalization (i.e. LC_CTYPE locale sensibility) has almost finished on 2002-08-17 patch, automatic font selection was not implemented. This means, when XTerm automatically uses UTF-8 mode (luit-using locale-sensible mode also uses UTF-8 mode internally), *-iso10646-1 fonts should be used automatically instead of 8bit fonts. However, since XTerm could not have separate configurations for both of conventional 8bit mode and UTF-8 mode, such automation was impossible.
Thus I wrote this patch. This enables XTerm to have font setting for each of 8bit and UTF-8 mode. Fonts for UTF-8 mode are automatically used when XTerm uses UTF-8 mode.
download: XTerm (cvs 20020817), font patch.
(2002-08-17) My 2002-07-18 patch was integrated into CVS repository of XFree86. Now you can use locale-sensibility without any of my patches.
(2002-07-18) I submitted patch for xterm to patch@xfree86 and got sequence number of 5328. Now all I have to do is to wait for this patch to be processed. Then we will use various encodings by XTerm! By improving luit, XTerm will support more encodings. (For example, TCVN, GBK, and Shift_JIS will be supported by using 2002-07-04 patch).
To try this patch, you will also need to add --enable-wide-chars when you invoke ./configure. You don't need this if you use xmkmf.
Please use *-iso10646 fonts which are distributed by XFree86. If you use -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1, XTerm will automatically detect -misc-fixed-medium-r-normal-ja-13-120-75-75-c-120-iso10646-1 for east Asian doublewidth characters.
download: XTerm (cvs 20020513), patch.
(2002-07-04) A new patch for luit was posted to the XFree86 i18n mailing list. It supports GBK, Shift_JIS, and UTF-8.
download: luit (cvs 20020701) (required), UTF-8 patch (required), fontenc patch (for XFree86 4.1 or before).

Old Informations

(2002-06-07) Based on luit's improvement in 2002-06-05, I sent a mail with an XTerm patch to XFree86 i18n mailing list and linux-utf8 mailing list. I will send the patch to formal patch acceptance address of XFree86 (patch@xfree86) after discussion on these mailing lists.
download: the patch.
(2002-05-14) Updated the patch for XTerm to call luit to support various encodings (such as ISO-8859-2,3,11,15, KOI8-R, EUC-JP, BIG5, etc). This patch is a renewal of my patch in 2002-02-03 to catch up with the upstream XTerm.
In EUC-JP locale (and other east Asian locales), this has a problem that many characters (mainly symbols) which should be doublewidth are displayed in singlewidth. To try locale-sensiblity, you will need a resource setting of "XTerm*Locale: true". You will also need to add --enable-wide-chars when you invoke ./configure.
download: XTerm (cvs 20020513), patch to invoke luit.
(2002-06-05) luit in CVS tree of XFree86 was improved and we don't need "misc bugfix/improvement patch" in 2002-05-13.
download: luit (cvs 20020606) (required), fontenc patch (for XFree86 4.1 or before).
(2002-05-13) luit in CVS tree of XFree86 is newer than verion 0.8.1 which is downloadable in Juliusz's page. However, you will need the "misc bugfix/improvements patch". This patch is needed to use luit from XTerm, because it solves a problem around "--" command line option.
This patch is a result from discussioin in XFree86 i18n mailing list in Feb 2002 and here is a summary. (I resubmitted the patch with sequence number #5279.)
Note that CVS version of luit requires XFree86 4.2 or later to be compiled. If you have previous version of XFree86, you can use "fontenc patch".
Download : luit (cvs 20020513) (required), misc bugfix/improvement patch (required), fontenc patch (for XFree86 4.1 or before).
I submitted a patch for xterm to obey LC_CTYPE locale to determine the encoding, by calling luit internally. You have to configure Unicode font and have "XTerm*Locale: true" resource to try this feature. You also have to use --enable-wide-chars option for ./configure.
XFree86 4.2 was released on 20 Jan 2002. This includes luit, a small software to convert between UTF-8 and other various encodings. Note: you will have to have XFree86 4.2 to compile luit which is included in XFree86 4.2 (or CVS). On the other hand, you can download luit from the above page written by Juliusz Chroboczek and the version can be compiled alone. XFree86 4.2 also includes many bugfixes around multibyte encodings.
XTerm-158 was released on 8 Sep 2001. This version supports OverTheSpot preedit type of XIM. XFree86 CVS version fixed a bug aroung XIM of XTerm-158. You will need this to input Korean using a Korean XIM server Ami. OverTheSpot is a convenient preedit type also for Chinese and Japanese.
I submitted xterm-156-overthespot2 patch on 14 June 2001. This patch adds OverTheSpot preedit type support to XTerm and use Xutf8LookupString() to receive XIM string as UTF-8, the internal encoding of XTerm. Note this patch is not related to previous works of Robert Brady and mine. Though this patch has much less features than my previous patches, I think this way will help earlier integration of minimal XIM and CJK languages support to the official XTerm. Especially, the previous patches include fribidi which is distributed under LGPL, which causes a license problem. You will need XFree86 4.x with cstomb fix patch. [Download xterm-156.tar.gz from this site]
Robert Brady released xterm-152-27 on 23 March 2001. This includes all my patches for xterm-150-23 (see below). [Download xterm-152.tar.gz from this site] [Download xterm-152-27.diff.gz from this site]

Much Older Informations

My work is based on:

the original XTerm by Thomas Dickey [download xterm-150.tar.gz from this site] and
fine patch by Robert Brady [download xterm-150-23.diff.gz from this site].
You will also need an unofficial (development) version (2.13.20000819 or later) of autoconf, if you'd like to rebuild configure script.
For OSes without iconv() or nl_langinfo(), you will need libiconv by Bruno Haible.

To try internationalization, invoke ./configure with --enable-wide-chars.

You can use fonts of -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1 (for normalwidth characters) and -misc-fixed-medium-r-normal-ja-13-120-75-75-c-120-iso10646-1 (for doublewidth characters) which are included in XFree86 4.0. Please type xterm -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1 and XTerm can automatically detect -misc-fixed-medium-r-normal-ja-13-120-75-75-c-120-iso10646-1 font for doublewidth characters.

Please test. Reports on testing or bugs and patches are welcome. Especially, since I use only Debian GNU/Linux, I can hardly test configure script. If you are interested in the development and you'd like to discuss with me (or other developers), please join XFree86 Internationalization mailing list.

Known bugs are:

XIM input doesn't work under UTF-8 mode under OSes which don't support nl_langinfo(3).
XIM input doesn't work under FreeBSD-4.2-RELEASE and Bruno's libiconv.

The followings are my patches. You can download and test.

xterm-150-23-k10 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- Check BOM addition by iconv() (for BSD)
- Check libxpg4 in ./configure (for BSD)
- Use Bruno's iconv() check in ./configure. However, I added --with-libiconv[=DIR] and check for locale_charset()/nl_langinfo().
xterm-150-23-k9 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- Removed strndup() which is not a standard function.
- Removed WCHAR_T in iconv_open() (wcwidth.c).
- Added setlocale(LC_ALL,"") according to the report of FreeBSD person.
- Wholly rewrote ./configure macros for checking iconv() and nl_langinfo().
- Names for UTF-8 and UCS-4 in iconv_open() are determined by ./configure.
- added --with-libiconv and --with-libcharset for ./configure.
xterm-150-23-k8 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- XIM input have caused abnormal exit of XTerm under UTF-8 mode. This bug is fixed.
- Manual page is rewritten. (-8, -en, -lc, -u8, encodingMode)
xterm-150-23-k7 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- Added a command option '-en' to directly specify encoding. This is a stopgap for OSes which don't have nl_langinfo() but have iconv().
- Name of iconv checker in aclocal.m4 is changed from AM_ICONV to CF_ICONV to follow naming policy of XTerm.
- Checker for nl_langinfo() for aclocal.m4 was written. If nl_langinfo() is not available, try to use locale_charset() in libcharset by Bruno. (not tested).
- Checker for wcwidth() for aclocal.m4 was written.
- Fixed a bug that nl_langinfo() is never used in k6 patch.
xterm-150-23-k6 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- Changed name of command option for bidi from -/+b to -/+bi because '-b' is already used for different purpose.
- Changed the algorithm to determine default width mode when system's wcwidth() is not available.
- Modified manual:
  added descriptions for command options (-/+bi, -fx, -8, -u8, -lc, -wcs, -wcu, and -wcc) and resources (bidi, ximFont, encodingMode, and widthMode).
- removed descriptions for command option (+u8) and resource (utf8).
- Rewrite source codes assuming the following macros are available: HAVE_ICONV, HAVE_LANGINFO, and HAVE_WCWIDTH.
- Rewrite UXTerm.ad ("*VT100*utf8: 1" --> "*VT100*encodingMode: utf8").
- Added Bruno's AM_ICONV to aclocal.m4 . Am I right?
- Added basic check for nl_langinfo() and wcwidth() (just like for iconv() in the previous configure.in). Am I right?
xterm-150-23-k5 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- Compilation problem with GNU libc 2.1 was fixed.
xterm-150-23-k4 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- Fixed the column problem of XTerm when ./configure without --enable-wide-chars. This is my fault in the previous patch.

xterm-150-23-k3 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.

XIM and Unicode keysym can co-exist.
wc* are defined as resources.

rename -wc* options. corresponding resource is 'widthMode' (class 'WidthMode').

   ------------------------------------------------------------
   option  parameter for 'widthMode'            note
   ------------------------------------------------------------
   -wcs    'system' or 'locale'
   -wcu    'unicode', 'standard', or 'markus'   previous '-wcm'
   -wcc    'cjk', 'eastasia', or 'doublewidth'  previous '-wcl'
   ------------------------------------------------------------

restructure encoding-related options. corresponding resource is 'encodingMode' (class 'EncodingMode').

   ---------------------------------------------------------------------
   option  parameter for 'encodingMode'  default for...
   -8      '8bit'                        'C' and 'POSIX' locales,
                                         ISO-8859-* (except for 6 and 8)
   -u8     'utf8' or 'utf-8'             UTF-8 locale
   -lc     'locale' or 'lc_ctype'        other locales
   ---------------------------------------------------------------------

thus abolished '+u8' option and 'utf8' resource.

xterm-150-23-k2 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- Remove compilation error when ./configure without options. (Replace PAIRED_CHARS with TRI_CHARS, and so on).
- Remove compilation error when ./configure --enable-trace. (Replace PAIRED_CHARS with TRI_CHARS, and so on).
- Confusion of wchar_t and UCS-4 is fixed in LocalEncodingToUnicode().
- Endian problem of LocalEncodingToUnicode() is fixed.
- Confusion of wchar_t and UCS-4 is fixed in my_wcwidth(). (Code for conversion from UCS-4 to wchar_t is newly written.)
- Limitation of do_precomposition() of 16bit Unicode is fixed.
- Possible bug fix in ScreenWrite(). (str3 should be shifted 16, not 8).
- Integrate XIM patch from Debian person to enable XMODIFIERS variable. (VTInitI18N())
xterm-150-23-k1 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #23.
- wcwidth_cjk() depended on system's wcwidth().
xterm-150-22-xim2 (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #22.
- OverTheSpot is supported.
- New command option '-fx' and a resource 'ximFont' are introduced.
xterm-150-22-xim (announcement)
Based on Thomas Dickey's XTerm #150 and Robert Brady's patch #22.
- XIM works well.

Tomohiro KUBOTA <debian at tmail dot plala dot or dot jp>