View Single Post
  #6 (permalink)  
Old 05-20-2008, 01:30 AM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: encoding="utf-8" and formatted direct access

James Giles wrote:
(snip)

> In fact there's probably not *any* language with built-in support for
> direct I/O style access to records made up of utf-8 characters. At
> least none that will do any better than merely accomodating 4*N
> byte lengths where N is the max number of characters your records
> might have.


Java uses unicode for its native character set. The char data
type is unsigned 16 bit. The library will convert between 16 bit
unicode and UTF8 representation of those characters. I believe that
means only 1, 2, or 3 bytes but I haven't looked at UTF8 recently.

Also, Java source is unicode and Java identifiers can be any
unicode alphabetic or numeric characters. There are some
unicode letters that look exactly like ASCII letters but at
a different code point.

-- glen

Reply With Quote