subscribe

Unicode nearing 50% of the web

According to a recent post from the Google Blog, Unicode nearing 50% uptake on the web. A rather steep graph as well:

unicode uptake graph

This is pretty good news. I've had the 'pleasure' of working with a number of integration project where the 3rd party was still using iso-8859-1 (aka latin-1). Usually when this is the case, its not by choice but because of their software's default settings (Browsers, MySQL, etc.). I for one hope non-unicode charsets will soon be a thing of the past.

One other note in the post was about ligatures, such as fi and the dutch ij. If this is the first time you heard about these, you might be surprised to see that you can (likely) only copy-paste ij as a whole, and not just the i or j. It's one unicode character, not two. It just made me wonder: what kind of software would generate these, and more importantly why?

Web mentions

Comments

  • Dave

    Dave

    "It just made me wonder: what kind of software would generate these, and more importantly why?"

    Well, the answer is right there in the post you referenced, it just looks better in documents intended for printing: "[...] especially generated PDF documents."
  • Jordan Walker

    Jordan Walker

    Let the battle and competition rage.
  • Evert

    Evert

    @Dave,

    Maybe I'm crazy, but shouldn't it be a job of the font to make a combination of 2 characters look better?

  • Lars Gunther

    Lars Gunther

    And of course this means that PHP 6 is becoming more important with each day. But is it in sight?
  • Jay Pipes

    Jay Pipes

    Drizzle got rid of all non-UTF-8 character sets a long time ago. The web is UTF8 and so should be the data behind it.

    One minor thing, though. UTF-8 != Unicode :) UTF-8 is technically just a mapping of Unicode code points to a range of values.

    I would argue that the web has standardized on UTF-8, not UCS4, UTF-32, UTF-16 or other Unicode tranformation mappings...

    Cheers!

    jay
  • Nelson Menezes

    Nelson Menezes

    As mentioned above, ligatures simply look better on print or large font sizes on-screen.

    If you are getting situations where ligatures are being copied-pasted then someone screwed up -- the ligatures are meant to be applied on rendering only, not on source material. So, it would be the job of a browser to introduce ligatures on screen, but still allow copy/paste of individual characters.

    BTW, great things are coming... http://hacks.mozilla.org/2009/10/font-control-for-designers/
  • Joost

    Ligatures like IJ are also important because of capitalization rules, I know Bing Maps only uppercases the first letter, which is wrong in Dutch.

    http://www.bing.com/maps/#JnE9eXAuaGV0K2lqJTdlc3N0LjAlN2VwZy4xJmJiPTUzLjAxOTQzMDQyMDYxODIlN2U1LjYzOTk5NTU2MDA1MDAxJTdlNTMuMDAzNzU3NTgxOTI4JTdlNS42MDAwODQyODk5MDg0MQ==

    http://maps.google.nl/maps?f=q&source=s_q&hl=nl&geocode=&q=het+ij&sll=52.469397,5.509644&sspn=3.935848,9.876709&ie=UTF8&hq=&hnear=Het+IJ&ll=52.369992,4.997234&spn=0.030814,0.077162&z=14