Published on June 19, 2007
Unicode from a distance…: Unicode from a distance… Mark Davis Chief Software Globalization Architect, IBM President, Unicode Consortium Starting back a bitbefore Unicode…: Starting back a bit before Unicode… 1850: Where? When?: 1850: Where? When? Longitude non-standard Paris meridian Greenwich meridian Berlin meridian Time non-standard 7:16 Boston 6:52 DC 4:06 LA 3:51 SF That had to change… That had to change…: That had to change… Telegraph → exact longitudes Railway → timezones Shipping → Prime Meridian Washington, 1884 France delays until 1914… Uniformity Winning: Uniformity Winning Of course, the French gave us all the metric system Portuguese mile Roman mile Hamburg mile US mile But we didn’t get metric time Still Babylonian… Why one and not the other? Fast forwarda few years: Fast forward a few years 1985: Characters not Standardized – Data Exchange Limited: 1985: Characters not Standardized – Data Exchange Limited That had to change…: That had to change… No longer data “islands”: No longer data 'islands' Customers could be from any country Companies have heterogeneous systems People can’t tolerate it when text is lost or corrupted in transmission, or when lookups fail English / European languages only part of the world market… GDP-PPP – 1975..2002: GDP-PPP – 1975..2002 GDP-PPP– 2003..2010: GDP-PPP– 2003..2010 Silicon Valley, 1991 - Unicode: Silicon Valley, 1991 - Unicode The Unicode Standard provides: a unique code for every character in the world a model and architecture for every script properties and behavior, isolating programmers from details. 2004 – Unicode, the “Prime Meridian” of computing : 2004 – Unicode, the 'Prime Meridian' of computing 96,000+ Characters (V4.0) Wide-ranging specifications for uniform cross-product behavior Used in every major operating system in all major office software as the core definition of text in XML, HTML, … as the core of Java, C#, C (with ICU), … Website Globalization: Website Globalization Websites present both static and composed data, the latter frequently backed by one or more databases Unicode makes the entire architecture vastly simpler from back-end databases to pages served to client People used to convert to legacy sets on output but less needed now, except special circumstances Unicode Consortium: Unicode Consortium Development of Key SW Globalization Standards Unicode Standard Other Specs: Sorting, Int’l Regular Expressions, Matching (case-insensitive), Line-breaking, Identifiers,… New Projects: Common Locale Data Repository Uniform date/time/number formatting, sorting,… across programs/platforms Open to new Members: Corporate, Associate, Specialist http://www.unicode.org/consortium/why_join.html References: References ICU Longitude The Unicode Standard UTN #13: GDP by Language Einstein’s Clocks, Poincaré’s Maps More about Unicode: March 31 - April 2!