Reminds me a lot of +Matt Giuca 's post (see comments) on the subject. Didn't even know there were standard ways of normalizing unicode input (such as "Normalization Form C"). Seems like a pretty sensible compromise between just saying "size in bytes" and trying to universally define what a "character" is.

The post in question: http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/

(And yes, I did link to Twitter's Counting Characters article in my article, and this is why I used Twitter so much as a positive example. If you scroll down to the bottom of my comments, you'll see the apparent author of Counting Characters wrote a reply.)