Computers have been used for long years in the CJK world, as Euro-American world. Ideographs have occupied two columns in terminal-based softwares and hardwares since CJK people had come to use Ideographs by computers. Thus, there are singlewidth or narrow ("Hankaku" or 半角 in Japanese) characters and doublewidth or wide ("Zenkaku" or 全角 in Japanese) characters. Though there are no official standards which mention width of characters (at least in Japan), the concept of width is very strong de-facto standard in CJK world.
In CJK local encodings, it is very easy to tell which a character is singlewidth or doublewidth. Characters from ISO 646 (ASCII, JIS X 0201 Roman, and so on) and JIS X 0201 Kana (i.e., 1-byte characters) are singlewidth and others are doublewidth. CJK people have long history (tens of years) to widely rely on this de-facto standard and IMO this proves the de-facto standard has no fatal problems. Thus, Unicode and its conversion tables are responsible to the problem I am going to explain below.
Unicode Consortium supplies Unicode Standard Annex #11 EAST ASIAN WIDTH (UAX#11, former UTR#11) in order to keep compatibility to CJK the de-facto standard. It classifies UCS characters into a few categories - "N", "A", "H", "W", "F", and "Na".
Note: Na is "narrow" and H is "half width" and they should occupy one column in column-based display. W is "wide" and F is "full width" and they should occupy two columns in column-based display. N is "not asian" which doesn't appear in CJK coded character sets. A is "ambiguous" and depends on context; in many case, it should occupy one column in non-CJK context and two columns in CJK context.
To keep compatibility with CJK de-facto standard, characters from ISO 646 (ASCII, JIS X 0201 Roman, and so on) and JIS X 0201 Kana have to have "Na" or "H" and others have to have "W", "F", or "A" in CJK encodings. In addition, appearance of "N" should be regarded as a bug of UAX#11.
Note: The reason why "A" is accepted for two-column characters: because "A" is applied to characters which are originally one-column but they are two-column in CJK context. For example, Cyrillic characters like д or non-letter symbols like ●.
I checked EastAsianWidth.txt by using a script. However, the research is obselete because the EastAsianWidth.txt is revised and modified. Thus, I checked again.
The research object is ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt for Unicode 3.2. The top line of the file writes:
# EastAsianWidth-3.2.0.txt
The research needs mapping tables from Unicode to various CJK encodings. I used the following mapping tables which are downloaded from ftp://ftp.unicode.org/Public/MAPPINGS .
Here is the script I used for the check. This script reads CJK encodings and checks whether 1-byte characters are "Na" or "H" and 2-byte characters are "A", "W", or "F". It reports when it is not true. It reports 1-byte characters in "A" because it will require exceptional treatement in softwares.
#!/usr/bin/perl open(FILE, "EastAsianWidth.txt") || die "Cannot open width file."; while($a = <FILE>) { $a =~ /^([0-9A-F]+);([A-Za-z]+)/; $num = $1; $w = $2; if ($num eq "") {next;} $width{$num} = $w; } close(FILE); sub checkfile($$$$) { my($file, $localcolumn, $ucscolumn, $commentcolumn)=@_; open(FILE, $file) || die "Cannot open $file"; print "FILE $file------\n"; while($a = <FILE>) { if ($a =~ /^\#/) {next;} chomp($a); @list = split(/\t/, $a); $loc = $list[$localcolumn]; $ucs = $list[$ucscolumn]; if ($ucs < 0x20 || ($ucs >= 0x7f && $ucs <= 0x9f)) {next;} $ucs =~ s/0x//; $width = $width{$ucs}; $com = $list[$commentcolumn]; if ($loc < 0x100 && ($width eq "W" || $width eq "F" || $width eq "A" || $width eq "N")) { print "$loc U+$ucs $width $com\n"; } elsif ($loc > 0x100 && ($width eq "N" || $width eq "H" || $width eq "Na")) { print "$loc U+$ucs $width $com\n"; } } } &checkfile("JIS0208.TXT", 1, 2, 3); &checkfile("JIS0212.TXT", 0, 1, 2); &checkfile("SHIFTJIS.TXT", 0, 1, 2); &checkfile("CP932.TXT", 0, 1, 2); &checkfile("JAPANESE.TXT", 0, 1, 2); &checkfile("GB2312.TXT", 0, 1, 2); &checkfile("CHINSIMP.TXT", 0, 1, 2); &checkfile("BIG5.TXT", 0, 1, 2); &checkfile("CHINTRAD.TXT", 0, 1, 2); &checkfile("KSX1001.TXT", 0, 1, 2); &checkfile("KOREAN.TXT", 0, 1, 2);
This is the result.
FILE JIS0208.TXT------ 0x2140 U+005C Na # REVERSE SOLIDUS 0x215D U+2212 N # MINUS SIGN 0x2171 U+00A2 Na # CENT SIGN 0x2172 U+00A3 Na # POUND SIGN 0x224C U+00AC Na # NOT SIGN FILE JIS0212.TXT------ 0x2234 U+00AF Na # MACRON 0x2237 U+007E Na # TILDE 0x2238 U+0384 N # GREEK TONOS 0x2239 U+0385 N # GREEK DIALYTIKA TONOS 0x2243 U+00A6 Na # BROKEN BAR 0x226D U+00A9 N # COPYRIGHT SIGN 0x2661 U+0386 N # GREEK CAPITAL LETTER ALPHA WITH TONOS 0x2662 U+0388 N # GREEK CAPITAL LETTER EPSILON WITH TONOS 0x2663 U+0389 N # GREEK CAPITAL LETTER ETA WITH TONOS 0x2664 U+038A N # GREEK CAPITAL LETTER IOTA WITH TONOS 0x2665 U+03AA N # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA 0x2667 U+038C N # GREEK CAPITAL LETTER OMICRON WITH TONOS 0x2669 U+038E N # GREEK CAPITAL LETTER UPSILON WITH TONOS 0x266A U+03AB N # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA 0x266C U+038F N # GREEK CAPITAL LETTER OMEGA WITH TONOS 0x2671 U+03AC N # GREEK SMALL LETTER ALPHA WITH TONOS 0x2672 U+03AD N # GREEK SMALL LETTER EPSILON WITH TONOS 0x2673 U+03AE N # GREEK SMALL LETTER ETA WITH TONOS 0x2674 U+03AF N # GREEK SMALL LETTER IOTA WITH TONOS 0x2675 U+03CA N # GREEK SMALL LETTER IOTA WITH DIALYTIKA 0x2676 U+0390 N # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS 0x2677 U+03CC N # GREEK SMALL LETTER OMICRON WITH TONOS 0x2678 U+03C2 N # GREEK SMALL LETTER FINAL SIGMA 0x2679 U+03CD N # GREEK SMALL LETTER UPSILON WITH TONOS 0x267A U+03CB N # GREEK SMALL LETTER UPSILON WITH DIALYTIKA 0x267B U+03B0 N # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS 0x267C U+03CE N # GREEK SMALL LETTER OMEGA WITH TONOS 0x2742 U+0402 N # CYRILLIC CAPITAL LETTER DJE 0x2743 U+0403 N # CYRILLIC CAPITAL LETTER GJE 0x2744 U+0404 N # CYRILLIC CAPITAL LETTER UKRAINIAN IE 0x2745 U+0405 N # CYRILLIC CAPITAL LETTER DZE 0x2746 U+0406 N # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I 0x2747 U+0407 N # CYRILLIC CAPITAL LETTER YI 0x2748 U+0408 N # CYRILLIC CAPITAL LETTER JE 0x2749 U+0409 N # CYRILLIC CAPITAL LETTER LJE 0x274A U+040A N # CYRILLIC CAPITAL LETTER NJE 0x274B U+040B N # CYRILLIC CAPITAL LETTER TSHE 0x274C U+040C N # CYRILLIC CAPITAL LETTER KJE 0x274D U+040E N # CYRILLIC CAPITAL LETTER SHORT U 0x274E U+040F N # CYRILLIC CAPITAL LETTER DZHE 0x2772 U+0452 N # CYRILLIC SMALL LETTER DJE 0x2773 U+0453 N # CYRILLIC SMALL LETTER GJE 0x2774 U+0454 N # CYRILLIC SMALL LETTER UKRAINIAN IE 0x2775 U+0455 N # CYRILLIC SMALL LETTER DZE 0x2776 U+0456 N # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I 0x2777 U+0457 N # CYRILLIC SMALL LETTER YI 0x2778 U+0458 N # CYRILLIC SMALL LETTER JE 0x2779 U+0459 N # CYRILLIC SMALL LETTER LJE 0x277A U+045A N # CYRILLIC SMALL LETTER NJE 0x277B U+045B N # CYRILLIC SMALL LETTER TSHE 0x277C U+045C N # CYRILLIC SMALL LETTER KJE 0x277D U+045E N # CYRILLIC SMALL LETTER SHORT U 0x277E U+045F N # CYRILLIC SMALL LETTER DZHE 0x2922 U+0110 N # LATIN CAPITAL LETTER D WITH STROKE 0x2A21 U+00C1 N # LATIN CAPITAL LETTER A WITH ACUTE 0x2A22 U+00C0 N # LATIN CAPITAL LETTER A WITH GRAVE 0x2A23 U+00C4 N # LATIN CAPITAL LETTER A WITH DIAERESIS 0x2A24 U+00C2 N # LATIN CAPITAL LETTER A WITH CIRCUMFLEX 0x2A25 U+0102 N # LATIN CAPITAL LETTER A WITH BREVE 0x2A26 U+01CD N # LATIN CAPITAL LETTER A WITH CARON 0x2A27 U+0100 N # LATIN CAPITAL LETTER A WITH MACRON 0x2A28 U+0104 N # LATIN CAPITAL LETTER A WITH OGONEK 0x2A29 U+00C5 N # LATIN CAPITAL LETTER A WITH RING ABOVE 0x2A2A U+00C3 N # LATIN CAPITAL LETTER A WITH TILDE 0x2A2B U+0106 N # LATIN CAPITAL LETTER C WITH ACUTE 0x2A2C U+0108 N # LATIN CAPITAL LETTER C WITH CIRCUMFLEX 0x2A2D U+010C N # LATIN CAPITAL LETTER C WITH CARON 0x2A2E U+00C7 N # LATIN CAPITAL LETTER C WITH CEDILLA 0x2A2F U+010A N # LATIN CAPITAL LETTER C WITH DOT ABOVE 0x2A30 U+010E N # LATIN CAPITAL LETTER D WITH CARON 0x2A31 U+00C9 N # LATIN CAPITAL LETTER E WITH ACUTE 0x2A32 U+00C8 N # LATIN CAPITAL LETTER E WITH GRAVE 0x2A33 U+00CB N # LATIN CAPITAL LETTER E WITH DIAERESIS 0x2A34 U+00CA N # LATIN CAPITAL LETTER E WITH CIRCUMFLEX 0x2A35 U+011A N # LATIN CAPITAL LETTER E WITH CARON 0x2A36 U+0116 N # LATIN CAPITAL LETTER E WITH DOT ABOVE 0x2A37 U+0112 N # LATIN CAPITAL LETTER E WITH MACRON 0x2A38 U+0118 N # LATIN CAPITAL LETTER E WITH OGONEK 0x2A3A U+011C N # LATIN CAPITAL LETTER G WITH CIRCUMFLEX 0x2A3B U+011E N # LATIN CAPITAL LETTER G WITH BREVE 0x2A3C U+0122 N # LATIN CAPITAL LETTER G WITH CEDILLA 0x2A3D U+0120 N # LATIN CAPITAL LETTER G WITH DOT ABOVE 0x2A3E U+0124 N # LATIN CAPITAL LETTER H WITH CIRCUMFLEX 0x2A3F U+00CD N # LATIN CAPITAL LETTER I WITH ACUTE 0x2A40 U+00CC N # LATIN CAPITAL LETTER I WITH GRAVE 0x2A41 U+00CF N # LATIN CAPITAL LETTER I WITH DIAERESIS 0x2A42 U+00CE N # LATIN CAPITAL LETTER I WITH CIRCUMFLEX 0x2A43 U+01CF N # LATIN CAPITAL LETTER I WITH CARON 0x2A44 U+0130 N # LATIN CAPITAL LETTER I WITH DOT ABOVE 0x2A45 U+012A N # LATIN CAPITAL LETTER I WITH MACRON 0x2A46 U+012E N # LATIN CAPITAL LETTER I WITH OGONEK 0x2A47 U+0128 N # LATIN CAPITAL LETTER I WITH TILDE 0x2A48 U+0134 N # LATIN CAPITAL LETTER J WITH CIRCUMFLEX 0x2A49 U+0136 N # LATIN CAPITAL LETTER K WITH CEDILLA 0x2A4A U+0139 N # LATIN CAPITAL LETTER L WITH ACUTE 0x2A4B U+013D N # LATIN CAPITAL LETTER L WITH CARON 0x2A4C U+013B N # LATIN CAPITAL LETTER L WITH CEDILLA 0x2A4D U+0143 N # LATIN CAPITAL LETTER N WITH ACUTE 0x2A4E U+0147 N # LATIN CAPITAL LETTER N WITH CARON 0x2A4F U+0145 N # LATIN CAPITAL LETTER N WITH CEDILLA 0x2A50 U+00D1 N # LATIN CAPITAL LETTER N WITH TILDE 0x2A51 U+00D3 N # LATIN CAPITAL LETTER O WITH ACUTE 0x2A52 U+00D2 N # LATIN CAPITAL LETTER O WITH GRAVE 0x2A53 U+00D6 N # LATIN CAPITAL LETTER O WITH DIAERESIS 0x2A54 U+00D4 N # LATIN CAPITAL LETTER O WITH CIRCUMFLEX 0x2A55 U+01D1 N # LATIN CAPITAL LETTER O WITH CARON 0x2A56 U+0150 N # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE 0x2A57 U+014C N # LATIN CAPITAL LETTER O WITH MACRON 0x2A58 U+00D5 N # LATIN CAPITAL LETTER O WITH TILDE 0x2A59 U+0154 N # LATIN CAPITAL LETTER R WITH ACUTE 0x2A5A U+0158 N # LATIN CAPITAL LETTER R WITH CARON 0x2A5B U+0156 N # LATIN CAPITAL LETTER R WITH CEDILLA 0x2A5C U+015A N # LATIN CAPITAL LETTER S WITH ACUTE 0x2A5D U+015C N # LATIN CAPITAL LETTER S WITH CIRCUMFLEX 0x2A5E U+0160 N # LATIN CAPITAL LETTER S WITH CARON 0x2A5F U+015E N # LATIN CAPITAL LETTER S WITH CEDILLA 0x2A60 U+0164 N # LATIN CAPITAL LETTER T WITH CARON 0x2A61 U+0162 N # LATIN CAPITAL LETTER T WITH CEDILLA 0x2A62 U+00DA N # LATIN CAPITAL LETTER U WITH ACUTE 0x2A63 U+00D9 N # LATIN CAPITAL LETTER U WITH GRAVE 0x2A64 U+00DC N # LATIN CAPITAL LETTER U WITH DIAERESIS 0x2A65 U+00DB N # LATIN CAPITAL LETTER U WITH CIRCUMFLEX 0x2A66 U+016C N # LATIN CAPITAL LETTER U WITH BREVE 0x2A67 U+01D3 N # LATIN CAPITAL LETTER U WITH CARON 0x2A68 U+0170 N # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE 0x2A69 U+016A N # LATIN CAPITAL LETTER U WITH MACRON 0x2A6A U+0172 N # LATIN CAPITAL LETTER U WITH OGONEK 0x2A6B U+016E N # LATIN CAPITAL LETTER U WITH RING ABOVE 0x2A6C U+0168 N # LATIN CAPITAL LETTER U WITH TILDE 0x2A6D U+01D7 N # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE 0x2A6E U+01DB N # LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE 0x2A6F U+01D9 N # LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON 0x2A70 U+01D5 N # LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON 0x2A71 U+0174 N # LATIN CAPITAL LETTER W WITH CIRCUMFLEX 0x2A72 U+00DD N # LATIN CAPITAL LETTER Y WITH ACUTE 0x2A73 U+0178 N # LATIN CAPITAL LETTER Y WITH DIAERESIS 0x2A74 U+0176 N # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX 0x2A75 U+0179 N # LATIN CAPITAL LETTER Z WITH ACUTE 0x2A76 U+017D N # LATIN CAPITAL LETTER Z WITH CARON 0x2A77 U+017B N # LATIN CAPITAL LETTER Z WITH DOT ABOVE 0x2B23 U+00E4 N # LATIN SMALL LETTER A WITH DIAERESIS 0x2B24 U+00E2 N # LATIN SMALL LETTER A WITH CIRCUMFLEX 0x2B25 U+0103 N # LATIN SMALL LETTER A WITH BREVE 0x2B28 U+0105 N # LATIN SMALL LETTER A WITH OGONEK 0x2B29 U+00E5 N # LATIN SMALL LETTER A WITH RING ABOVE 0x2B2A U+00E3 N # LATIN SMALL LETTER A WITH TILDE 0x2B2B U+0107 N # LATIN SMALL LETTER C WITH ACUTE 0x2B2C U+0109 N # LATIN SMALL LETTER C WITH CIRCUMFLEX 0x2B2D U+010D N # LATIN SMALL LETTER C WITH CARON 0x2B2E U+00E7 N # LATIN SMALL LETTER C WITH CEDILLA 0x2B2F U+010B N # LATIN SMALL LETTER C WITH DOT ABOVE 0x2B30 U+010F N # LATIN SMALL LETTER D WITH CARON 0x2B33 U+00EB N # LATIN SMALL LETTER E WITH DIAERESIS 0x2B36 U+0117 N # LATIN SMALL LETTER E WITH DOT ABOVE 0x2B38 U+0119 N # LATIN SMALL LETTER E WITH OGONEK 0x2B39 U+01F5 N # LATIN SMALL LETTER G WITH ACUTE 0x2B3A U+011D N # LATIN SMALL LETTER G WITH CIRCUMFLEX 0x2B3B U+011F N # LATIN SMALL LETTER G WITH BREVE 0x2B3D U+0121 N # LATIN SMALL LETTER G WITH DOT ABOVE 0x2B3E U+0125 N # LATIN SMALL LETTER H WITH CIRCUMFLEX 0x2B41 U+00EF N # LATIN SMALL LETTER I WITH DIAERESIS 0x2B42 U+00EE N # LATIN SMALL LETTER I WITH CIRCUMFLEX 0x2B46 U+012F N # LATIN SMALL LETTER I WITH OGONEK 0x2B47 U+0129 N # LATIN SMALL LETTER I WITH TILDE 0x2B48 U+0135 N # LATIN SMALL LETTER J WITH CIRCUMFLEX 0x2B49 U+0137 N # LATIN SMALL LETTER K WITH CEDILLA 0x2B4A U+013A N # LATIN SMALL LETTER L WITH ACUTE 0x2B4B U+013E N # LATIN SMALL LETTER L WITH CARON 0x2B4C U+013C N # LATIN SMALL LETTER L WITH CEDILLA 0x2B4F U+0146 N # LATIN SMALL LETTER N WITH CEDILLA 0x2B50 U+00F1 N # LATIN SMALL LETTER N WITH TILDE 0x2B53 U+00F6 N # LATIN SMALL LETTER O WITH DIAERESIS 0x2B54 U+00F4 N # LATIN SMALL LETTER O WITH CIRCUMFLEX 0x2B56 U+0151 N # LATIN SMALL LETTER O WITH DOUBLE ACUTE 0x2B58 U+00F5 N # LATIN SMALL LETTER O WITH TILDE 0x2B59 U+0155 N # LATIN SMALL LETTER R WITH ACUTE 0x2B5A U+0159 N # LATIN SMALL LETTER R WITH CARON 0x2B5B U+0157 N # LATIN SMALL LETTER R WITH CEDILLA 0x2B5C U+015B N # LATIN SMALL LETTER S WITH ACUTE 0x2B5D U+015D N # LATIN SMALL LETTER S WITH CIRCUMFLEX 0x2B5E U+0161 N # LATIN SMALL LETTER S WITH CARON 0x2B5F U+015F N # LATIN SMALL LETTER S WITH CEDILLA 0x2B60 U+0165 N # LATIN SMALL LETTER T WITH CARON 0x2B61 U+0163 N # LATIN SMALL LETTER T WITH CEDILLA 0x2B65 U+00FB N # LATIN SMALL LETTER U WITH CIRCUMFLEX 0x2B66 U+016D N # LATIN SMALL LETTER U WITH BREVE 0x2B68 U+0171 N # LATIN SMALL LETTER U WITH DOUBLE ACUTE 0x2B6A U+0173 N # LATIN SMALL LETTER U WITH OGONEK 0x2B6B U+016F N # LATIN SMALL LETTER U WITH RING ABOVE 0x2B6C U+0169 N # LATIN SMALL LETTER U WITH TILDE 0x2B71 U+0175 N # LATIN SMALL LETTER W WITH CIRCUMFLEX 0x2B72 U+00FD N # LATIN SMALL LETTER Y WITH ACUTE 0x2B73 U+00FF N # LATIN SMALL LETTER Y WITH DIAERESIS 0x2B74 U+0177 N # LATIN SMALL LETTER Y WITH CIRCUMFLEX 0x2B75 U+017A N # LATIN SMALL LETTER Z WITH ACUTE 0x2B76 U+017E N # LATIN SMALL LETTER Z WITH CARON 0x2B77 U+017C N # LATIN SMALL LETTER Z WITH DOT ABOVE FILE SHIFTJIS.TXT------ 0x7E U+203E A # OVERLINE 0x815F U+005C Na # REVERSE SOLIDUS 0x817C U+2212 N # MINUS SIGN 0x8191 U+00A2 Na # CENT SIGN 0x8192 U+00A3 Na # POUND SIGN 0x81CA U+00AC Na # NOT SIGN FILE CP932.TXT------ FILE JAPANESE.TXT------ FILE GB2312.TXT------ FILE CHINSIMP.TXT------ FILE BIG5.TXT------ 0xA14E U+FF64 H # HALFWIDTH IDEOGRAPHIC COMMA 0xA1F2 U+2641 N # EARTH 0xA244 U+00A5 Na # YEN SIGN 0xA246 U+00A2 Na # CENT SIGN 0xA247 U+00A3 Na # POUND SIGN FILE CHINTRAD.TXT------ FILE KSX1001.TXT------ FILE KOREAN.TXT------Note: This research only checks mapping tables from Unicode Consortium.
This result should be regarded as a bug of UAX#11 or mapping tables.
There is a comment from Unicode Consortium for the previous version of this research. According to the comment, many characters from the result (which should be two-column but it is not "W" nor "F" in my research) should be one-column because these characters have corresponding FULLWIDTH FORMS in U+FFxx region. According to the comment, it should be regarded to be a bug of mapping tables, not of UAX#11, in such cases. For example,
It should be noted that JIS standards like JIS X 0201, JIS X 0208, JIS X 0212, and JIS X 0213 don't specify the widths of characters. Previously, one-column and two-columns are only de-facto standard. Thus, it is natural that JIS maps cent sign in JIS X 0208 into U+00A2 CENT SIGN, not into U+FFE0 FULLWIDTH CENT SIGN.
Unicode and JIS, or UAX#11 and mapping tables seem to blame each other. (I don't know JIS's viewpoint because I didn't ask JIS). However, it is much more important to pursue users' conveinence than consistence and perfection of one standard. I think one side (or both side) should compromise, to achieve users' convenience. Since it is too late to modify mapping tables, I think UAX#11 should be modified.
Let's discuss other problems.
FILE JIS0208.TXT------ 0x2140 U+005C Na # REVERSE SOLIDUSSince JIS X 0208 is used with ISO 646 IRV in EUC-JP, usage of U+005C for JIS X 0208 should be discouraged. (See EUC-JP round-trip compatibility for detail.) In other words, this mapping has a much severe problem than the width problem and should be modified. If the modification is performed, U+005C can be left Na.
Next,
FILE JIS0208.TXT------ 0x215D U+2212 N # MINUS SIGNThough the comment from Unicode Consortium insists that corresponding FULLWIDTH characters in U+FFxx range is available, I could not find FULLWIDTH MINUS SIGN in the Unicode standard. Thus, this should be changed from N to A.
Next,
FILE SHIFTJIS.TXT------ 0x7E U+203E A # OVERLINEThis seems to be a result of bad modification of EastAsianWidth.txt in Revision 9, if we consider this character only. This is because SHIFTJIS 0x7e should be one-column and should be "Na" or "H". However, U+203E is also used for
Next,
FILE BIG5.TXT------ 0xA244 U+00A5 Na # YEN SIGNIf we were modify EastAsianWidth.txt to make this two-column, it would conflict with Shift_JIS which need this character to be one-column. Thus, mapping table should be changed.
Next,
FILE BIG5.TXT------ 0xA14E U+FF64 H # HALFWIDTH IDEOGRAPHIC COMMAThis is a bug of the mapping table because HALFWIDTH IDEOGRAPHIC COMMA must apparently be "H".
Next, JIS X 0212 characters. The comment from Unicode Consortium says that JIS X 0212 is not widely supported and will be even less supported. However, since EastAsianWidth exists to support compatibility to east Asian custom, I think it should consider JIS X 0212.
In conclusion, EastAsianWidth shoud be modified as following:
JIS X 0213 was released in 2000. Many characters will have to be modified into "A", I imagine. Further research is needed.