Please read the newest version!
There are many CES (Character Encoding Schemes) which use a common CCS (Coded Character Set). For example, CES such as EUC-JP, Shift_JIS, and CP932 include JIS X 0208 as CCS.
For these CES, character from the same CCS should be mapped into same UCS character. However, this is not realized for dozens of characters.
The following table is a table of characters with witch same character in JIS X 0208 and so on are mapped into different code points by using various conversion tables.
--------------------------------------------------------------------------------------------- ORIGINAL Converted** to U+????/EastAsianWidth CCS Shift_JIS* EUC-JP* 0208 SJIS CP932 APPLE 0221A 0221B JAVAA JAVAB --------------------------------------------------------------------------------------------- [ASCII] 0x5C ---- 0x5C ---- ---- ---- ---- ---- 005C/Na ---- 005C/Na 0x7E ---- 0x7E ---- ---- ---- ---- ---- 007E/Na ---- 007E/Na [JISX0201 Roman] 0x5C 0x5C ---- ---- 00A5/Na 005C/Na 00A5/Na 00A5/Na ---- 005C/Na 00A5/Na 0x7E 0x7E ---- ---- 203E/N 007E/Na 007E/Na 203E/N ---- 007E/Na 203E/N [JISX0208] 0x2131 0x81 0x50 0xA1 0xB1 FFE3/F FFE3/F FFE3/F FFE3/F FFE3/F 203E/N FFE3/F FFE3/F 0x213D 0x81 0x5C 0xA1 0xBD 2015/A 2015/A 2015/A 2014/A 2014/A 2014/A 2015/A 2015/A 0x2140 0x81 0x5F 0xA1 0xC0 005C/Na 005C/Na FF3C/F FF3C/F 005C/Na FF3C/F FF3C/F FF3C/F 0x2141 0x81 0x60 0xA1 0xC1 301C/W 301C/W FF5E/F 301C/W 301C/W 301C/W 301C/W 301C/W 0x2142 0x81 0x61 0xA1 0xC2 2016/A 2016/A 2225/A 2016/A 2016/A 2016/A 2016/A 2016/A 0x215D 0x81 0x7C 0xA1 0xDD 2212/N 2212/N FF0D/F 2212/N 2212/N 2212/N 2212/N 2212/N 0x216F 0x81 0x8F 0xA1 0xEF FFE5/F FFE5/F FFE5/F FFE5/F FFE5/F 00A5/Na FFE5/F FFE5/F 0x2171 0x81 0x91 0xA1 0xF1 00A2/Na 00A2/Na FFE0/F 00A2/Na 00A2/Na 00A2/Na 00A2/Na 00A2/Na 0x2172 0x81 0x92 0xA1 0xF2 00A3/Na 00A3/Na FFE1/F 00A3/Na 00A3/Na 00A3/Na 00A3/Na 00A3/Na 0x224C 0x81 0xCA 0xA2 0xCC 00AC/Na 00AC/Na FFE2/F 00AC/Na 00AC/Na 00AC/Na 00AC/Na 00AC/Na [JISX0212] 0x2217 ---- 0x8F,A2,97 ---- ---- ---- ---- 007E/Na FF5E/F ---- ---- ---------------------------------------------------------------------------------------------
Note 1 This table mentions Japanese encodings only.
Note 2 This table doesn't contain vendors' extended characters (invalid characters in formal EUC_JP and Shift_JIS).
Note * Converted from ASCII, JISX0201 Roman, and JISX0208 algorithmically. The algorithm for EUC-JP is described in http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT. The algorithm to convert from JIS X 0208 to Shift_JIS is:
where in1 and in2 are the 1st and 2nd bytes of JIS X 0208 respectively and out1 and out2 are the 1st and 2nd bytes of Shift_JIS. Shift_JIS value is used for original code for conversion of "SJIS", "CP932", "Win98", and "Apple", because all of them (other than Shift_JIS itself) are supersets of Shift_JIS.out1 = (((in1 - 1) >> 1) + (in1 <= 0x5e) ? 0x71 : 0xb1); out2 = in2 + ((in1 & 1) ? ((in2 < 0x60) ? 0x1f : 0x20) : 0x7e);
Note **
Thus, same characters in Japanese encodings is mapped into different Unicode characters, according to the conversion table. Especially, CP932 (which has relatively more differences) is called Shift_JIS in Microsoft OSes and very widely used. This will introduce vast problems in future when Unicode will be more popular in Japan.