The ASCII and Unicode Character Sets

Appendix C The ASCII and Unicode Character Sets

Java uses the Unicode character set for representing character data. The Unicode set represents each character as a 16-bit unsigned integer. It can, therefore, represent 2\(^{16}\) \(=\) 65,536 different characters. This enables Unicode to represent characters from not only English but also a wide range of international languages.

🔗

Unicode supersedes the ASCII character set (American Standard Code for Information Interchange). The ASCII code represents each character as a 7-bit or 8-bit unsigned integer. A 7-bit code can represent only 2\(^7\) \(=\) 128 characters. In order to make Unicode backward compatible with ASCII, the first 128 characters of Unicode have the same integer representation as the ASCII characters.

🔗

The following table shows the integer representations for the printable subset of ASCII characters. The characters with codes 0 through 31 and code 127 are nonprintable characters, many of which are associated with keys on a standard keyboard. For example, the delete key is represented by 127, the backspace by 8, and the return key by 13.

🔗

Code   32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Char   SP !  "  #  $  %  &  '  (  )   *  +  ,  -  .  /
Code   48 49 50 51 52 53 54 55 56 57
Char   0  1  2  3  4  5  6  7  8  9
Code   58 59 60 61 62 63 64
Char   :  ;  <  =  >  ?  @
Code   65 66 67 68 69 70 71 72 73 74 75 76 77
Char   A  B  C  D  E  F  G  H  I  J  K  L  M
Code   78 79 80 81 82 83 84 85 86 87 88 89 90
Char   N  O  P  Q  R  S  T  U  V  W  X  Y  Z
Code   91 92 93 94 95 96
Char   [  \  ]  ^  _  `
Code   97 98 99 100 101 102 103 104 105 106 107 108 109
Char   a  b  c  d   e   f   g   h   i   j   k   l   m
Code   110 111 112 113 114 115 116 117 118 119 120 121 122
Char   n   o   p   q   r   s   t   u   v   w   x   y   z
Code   123 124 125 126
Char   {   |   }   ~

Figure C.0.1. ASCII codes for selected characters

🔗

Prev Top Next