i18N Inc. logo and banner i18N Inc. banner
|   Home    |   About Us    |   Contact Us    |   Français   |   
Services
Workshops
Overview
Agenda
Program
Events
Publications
Resources
Customers
Partners
Agenda   
 

Unicode, Multilingual Databases and Asian Character Sets
Establishing The Proper Foundation For Your Global Product
 
09:00 - 09:20
Welcome and Introduction
09:20 - 10:00

Coded Character Sets: Past, Present and Future

 

A brief history of character sets covering ASCII, ISO 646 & 2022, ISO Latin-1, DOS & Windows code pages, Shift-JIS, Big 5, etc. The history will serve to progressively introduce concepts related to character sets: graphic sets, control sets, serialization forms, designation, invocation, transparency, etc.

The versions of Unicode from 1.0 to the current version 4.0 are presented.

10:00 - 10:45
Unicode: Character Set & Standard
 

The Unicode 21-bit character set: Why Unicode? The 10 design goals of Unicode and how real they are. The 16-bit history and confusion; characters and surrogates. Combining characters, character properties, special ideographic characters. Characters to avoid and deprecated characters.

The Unicode standard: the book, the technical reports and the CD. Relationship between Unicode and ISO 10646. Conformance requirements and levels. Conformance (and non-conformance) to Unicode & ISO 10646.

10:45 - 11:00
Coffee Break
11:00 - 12:15
Representing Unicode: Choosing the Proper Form
 

Serialization forms: The pros and cons of the basic encodings: UTF-8, UTF-16, UTF-32. The precise meaning of UTF and UCS-2, UCS-4. UTF-8 security issues and confusion with CESU-8. BOM: the byte order mark and the ZWNJ (Zero-Width Non-Joiner) problem. SCSU: the compression format.

Normalization forms: The four normalization forms of Unicode. The advantages of each form and when to use them. The W3C recommended normalization form and why you should normalize early. Rules that guarantee the stability of normalized forms (at the expense of intuitive behaviour).

12:15 - 13:30
Lunch
13:30 - 14:45
Unicode Implementation
 

Working with text: Case conversions, word and sentence boundaries, line wrapping, sorting and searching. Transcoding and its complexity for contextual languages. Unicode tools and libraries.

Platforms: Available Unicode support on operating systems, programming languages, desktop applications, browsers, markup languages, etc.

14:45 - 15:15
Database Issues
 

The design of multilingual data schemas. Oracle and SQL server support. Oracle 8i UTF-8 vs. Oracle 9i AL32UTF8. UTF-16 access problems with JDBC. Distributed multi-locale databases: dynamic transcoding vs. database migration to Unicode.

15:15 - 15:30
Coffee Break
15:30 - 17:00

Asian Character Sets: GB 18030-2000, Hong Kong SCS, JIS X 0213:2000

 

The older Asian character sets still in widespread use: GB 2312, Big Five, GSCS, JIS X0208. The new Asian character sets, their structure and their legal conformance requirements and levels. Implementation issues and conversion to/from Unicode. The consequences of Han unification.

  top of page
i18N Inc. © 2001-2010  |   Email: info@i18n.ca