• skip to sidebar
  • skip to search box

trainedmonkey

by Jim Winstead Jr.

June, 21, 2004 archives

well, that took longer than expected

mysql> alter table blogs type=innodb;
Query OK, 2027382 rows affected (1 hour 32 min 41.49 sec)
Records: 2027382  Duplicates: 0  Warnings: 0

i did catch an out-of-control blog notification bot that may have been chewing up memory and otherwise getting in the way for most of that period.

» Monday, June 21, 2004 @ 8:59pm » blo.gs » Comment

php’s dumb xml parsing behavior

steve minutillo, author of feed on feeds, runs headlong into the execrable character encoding behavior of php’s xml parsing functions. hey, i was complaining about that just last year... (via phil ringnalda.)

and a related link, this article from the w3c explains how to deal with encoding issues in forms and has a nice regex that verifies whether a string is valid utf-8.

here’s some links culled from an i18n discussion on the twiki site:

Now that I've looked a bit more, there are many algorithms out there for charset detection, but most are aimed at HTML page auto-detection, and may well not work well for URLs:
  • Frank Tang's charset detection links - includes simple Perl UTF-8 detector based on legal codings
  • Excellent paper on Mozilla's 3-part algorithm using coding legality, character frequencies and two-character frequencies - detects the language as well as the encoding. Too complex for use on URLs, but looks very good.
  • Discussion on IRC auto-detection of charsets
  • Simple UTF-8 detector in C
  • CPAN:Unicode::Japanese - includes auto-detection for various Japanese charsets
  • CPAN:Encode::Guess - auto-detection from suitably dissimilar charsets (needs Perl 5.8)
  • Browser detection for forms input datatypes including useful undocumented JavaScript to check IE's current charset (try this out now if you are using IE - see Sandbox.TestCharset).
  • TextCat, tool for language detection - in Perl, OpenSource

i really need to write the slides for my talk at oscon, which will cover exactly this sort of thing.

» Tuesday, June 22, 2004 @ 8:44pm » code » 1 comment, add yours

perspective

it popped into my head to check something recently. the number of blogs added to blo.gs, per day, since june 15:

+------------+-----------+
| added      | new blogs |
+------------+-----------+
| 2004-06-15 |      8118 |
| 2004-06-16 |      8170 |
| 2004-06-17 |      7362 |
| 2004-06-18 |      2512 |
| 2004-06-19 |      4299 |
| 2004-06-20 |      7802 |
| 2004-06-21 |      9264 |
+------------+-----------+
» Tuesday, June 22, 2004 @ 8:52pm » blo.gs » Comment
« Sunday, June 20, 2004 • Tuesday, June 22, 2004 »
  • Home
  • About
  • Archive
  • Bookmarks
  • Photos
  • Projects
  • GitHub
  • @jimw@mefi.social

Dedicated to the public domain by Jim Winstead Jr.