Japanese page return

EUC-JP round-trip compatibility (2001-04-30)

This is the easiest problem. I mean, easy to understand there exists a problem, not easy to solve this problem.

In CJK world, CES (Character Encoding Scheme) and CCS (Coded Character Set) are actually different concept. I.e., one CES may contain multiple CCS. For example, EUC-JP is a CES which includes CCS of ASCII and JIS X 0208 (optionally JIS X 0201 Kana and JIS X 0212).

Unicode Consortium's conversion table from JIS X 0208 to Unicode (http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0208.TXT). It (version 0.9, 1994-03-08) maps 0x2140 in JIS X 0208 into U+005C (REVERSE SOLIDUS). Though this is OK when JIS X 0208 is used separately, this causes a conflict of code point when used combined with ASCII for EUC-JP.

To implement EUC-JP with JIS X 0212, one more conflict problem occur. It is 0x2237 in JIS X 0212, which is mapped into U+007E by http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0212.TXT.


Tomohiro KUBOTA <debian at tmail dot plala dot or dot jp>