January 29, 2010

Unicode nearing 50% of the web

According to a recent post from the Google Blog, Unicode nearing 50% uptake on the web. A rather steep graph as well:

This is pretty good news. I've had the 'pleasure' of working with a number of integration project where the 3rd party was still using iso-8859-1 (aka latin-1). Usually when this is the case, its not by choice but because of their software's default settings (Browsers, MySQL, etc.). I for one hope non-unicode charsets will soon be a thing of the past.

One other note in the post was about ligatures, such as ﬁ and the dutch ĳ. If this is the first time you heard about these, you might be surprised to see that you can (likely) only copy-paste ĳ as a whole, and not just the i or j. It's one unicode character, not two. It just made me wonder: what kind of software would generate these, and more importantly why?

Web mentions

Comments

Dave • Jan 29, 2010
"It just made me wonder: what kind of software would generate these, and more importantly why?"

Well, the answer is right there in the post you referenced, it just looks better in documents intended for printing: "[...] especially generated PDF documents."
Jordan Walker • Jan 29, 2010
Let the battle and competition rage.
Evert • Jan 29, 2010
@Dave,

Maybe I'm crazy, but shouldn't it be a job of the font to make a combination of 2 characters look better?
Lars Gunther • Jan 29, 2010
And of course this means that PHP 6 is becoming more important with each day. But is it in sight?
Jay Pipes • Jan 29, 2010
Drizzle got rid of all non-UTF-8 character sets a long time ago. The web is UTF8 and so should be the data behind it.

One minor thing, though. UTF-8 != Unicode :) UTF-8 is technically just a mapping of Unicode code points to a range of values.

I would argue that the web has standardized on UTF-8, not UCS4, UTF-32, UTF-16 or other Unicode tranformation mappings...

Cheers!

jay
Nelson Menezes • Jan 30, 2010
As mentioned above, ligatures simply look better on print or large font sizes on-screen.

If you are getting situations where ligatures are being copied-pasted then someone screwed up -- the ligatures are meant to be applied on rendering only, not on source material. So, it would be the job of a browser to introduce ligatures on screen, but still allow copy/paste of individual characters.

BTW, great things are coming... http://hacks.mozilla.org/2009/10/font-control-for-designers/
Joost • Feb 03, 2010
Ligatures like IJ are also important because of capitalization rules, I know Bing Maps only uppercases the first letter, which is wrong in Dutch.

http://www.bing.com/maps/#JnE9eXAuaGV0K2lqJTdlc3N0LjAlN2VwZy4xJmJiPTUzLjAxOTQzMDQyMDYxODIlN2U1LjYzOTk5NTU2MDA1MDAxJTdlNTMuMDAzNzU3NTgxOTI4JTdlNS42MDAwODQyODk5MDg0MQ==

http://maps.google.nl/maps?f=q&source=s_q&hl=nl&geocode=&q=het+ij&sll=52.469397,5.509644&sspn=3.935848,9.876709&ie=UTF8&hq=&hnear=Het+IJ&ll=52.369992,4.997234&spn=0.030814,0.077162&z=14