detecting improperly encoded text (in perl)

i need a way to detect when a string has been double-encoded into utf-8. that is, a string of utf-8 bytes that was basically treated to an iso-8859-1 to utf-8 conversion.

this will help deal with the encoding bugs in's changes feed. (which, unfortunately, is not consistently broken: sometimes the encoding is right, sometimes the encoding is wrong. at least, i think sometimes the encoding is right, although i can’t find any examples right now.

what would be even better, of course, would be for blogger to fix the bug. i reported it, and got a “we know, we hope to resolve the problem soon” response.

looks like they could take a lesson from joel spolsky's mini-tutorial on unicode. (i’ll admit to being surprised that blogger gets it wrong: i was under the impression that they used java, which i believe has pretty solid unicode support.)

« php{con west 2003refurnishing »