with 'code' tag

thought i missed one: oscommerce

i ran across a reference to oscommerce in the slides of a tutorial i presented at o’really oscon in 2002(!) where i ran through of a survey of major php applications, and i thought that meant i had missed one in my round-up of open-source php point-of-sale applications.

but it’s an ecommerce platform, not a point-of-sale system and it doesn’t look like it has a module or add-on to provide a point-of-sale interface.

speaking of that, there are some point-of-sale add-ons for woocommerce, which is itself the ecommerce add-on to wordpress. it looks like the only open-source/free ones are built specifically for use with square or paypal terminals.

titi, a simple database toolkit

at some point in my life i got tired of writing all my SQL queries by hand, and was casting about for a database abstraction that simplified things. but i didn’t care for anything that required that i specify my actual SQL tables in code or another format. i wanted something that would just work on top of whatever tables i already had.

i don’t know what i considered at the time, but where i landed was using Idiorm and Paris, which bills itself as a “minimalist database toolkit for PHP5” which gives you a sense of its age. it was long ago put into maintenance-only mode by its developers, and eventually i ran across something that i wanted to fix or otherwise do that i knew would never be accepted upstream.

so i took the code that was in two distinct repositories, merged it together, tossed it in a new namespace, and renamed it Titi. i haven’t really done much with it beyond that, but i know there is code that i should be pulling back in from scat. an advantage to being a solo developer is you can kind of punch through abstraction layers to get things done, but that also leaves cleanup work to be tackled eventually.

should anybody else use this? maybe not. but it has been useful for me in my projects, and it’s also been a good playground to learn more about new php language features and tools.

(like most of my open source projects, this is named for a type of monkey, the titi monkey.)

scat is scatter-brained

while i folded all of the website/ecommerce parts of scat into the same repository as the point-of-sale system itself, it doesn’t really work out of the box and it is because of the odd way in which we run it for our store. the website used to be a separate application that was called ordure, so there’s a little legacy of that in some class names. i still think of the point-of-sale side as “scat” and the website side as “ordure”.

the point-of-sale system itself runs on a server here at the store (a dell poweredge t30), but our website runs on a virtual server hosted by linode. they run semi-independently, and they’re on a shared tailscale network.

ordure calls back to scat for user and gift card information, to send SMS messages, and to get shipment tracking information. so if the store is off-line, it mostly works and customers can still place orders. (but things will go wrong if they try to log in or use gift cards.)

there are scheduled jobs on the scat side that:

  • push a file of the current inventory and pricing (every minute)
  • pull new user signups (every minute)
  • check for new completed orders and pull them over (every minute)
  • push the product catalog and web content if a flag was set (checked every minute)
  • push updated google/facebook/pinterest data feeds (daily)
  • send out abandoned cart emails (daily)

so ordure has a copy of scat’s catalog data that only gets updated on demand but does get a slightly-delayed update of pricing and inventory levels. the catalog data gets transferred using ssh and mysqldump. (basically: it get dumped, copied over, loaded into a staging database, and a generated 'rename table' query swaps the tables with the current database, and the old tables get dropped so the staging area is clear for next time.)

not all of this is reflected within the scat code repository, and this post is just sort of my thinking through out loud where it has ended up. part of the reason for this setup is that the store used to have a janky DSL connection so i was minimizing any dependencies on both sides being available for the other to work.

as a side note, all of the images used in the catalog are stored in a backblaze b2 bucket and we use gumlet to do image optimizing, resizing, etc. when we add images to our catalog, it can be done by pulling from an external URL and the scat side actually calls out to the ordure side to do that work because when we were on that crappy DSL connection, pulling and pushing large images through that pipe was painful.

php pieces of what?

back in july 2010 i wrote about how i was frustrated with our point of sale system (Checkout, a Mac application which changed hands once or twice and is no longer being developed) and had taken a quick survey around to see what open source solutions there were.

the one that i mentioned there (PHP Point of Sale) is still around, but is no longer open source. here is a very early fork of it that still survives. i know at least one art supply store out there is using it (the closed-source version, not that early fork), but i haven’t really looked at it since 2010.

there are a few more php point of sale systems now.

the biggest is called Open Source Point of Sale and appears to be undergoing an upgrade from CodeIgniter 3 to CodeIgniter 4 right now. i spent a few minutes poking around the demo online, and i don’t think i would be happy using it. it is under an MIT license.

another big one is NexoPOS, which is GPL-licensed. i have not played around with the demo, but the supporting website looks pretty slick.

most of the others look like they are just experimental projects or not being actively used or developed.

something i think about a lot is whether i should be trying to take Scat POS beyond just using it ourselves. part of me feels like i am a seasoned enough developer to know that the work that would be required to give it the level of polish and durability to survive usage outside of our own doors could be substantial.

sidekiq for php?

it is a little strange still developing in php and having done it for so long, because you look at how other systems are built today and it isn’t always clear how that translates to php.

mastodon (the server software) is built primarily with ruby-on-rails, and uses a system called sidekiq to handle job processing. when you post to your mastodon server, it queues up a bunch of jobs that push it out to your subscribers, creates thumbnails of web pages, and all sorts of other stuff that may take a while so it makes no sense to make the web request hang around for it.

for scat pos, there are a few queue-like tasks that just get processed by special handlers that i use cron jobs to trigger. for example, when a transaction is completed it reports the tax information to our tax compliance service, but if that fails (because of connectivity issues or whatever) there’s a cron job that runs every night to re-try.

as best i can tell, the state of the art for php applications that want to have some sort of job queue system like sidekiq is Gearman and GearmanManager and it is wild to me that projects i remember starting up in 2008 are still just chugging along like that.

stable and well-understood technologies

AddyOsmani.com - Stick to boring architecture for as long as possible

Prioritize delivering value by initially leaning on stable and well-understood technologies.

i appreciate this sentiment. it is a little funny to me that what i can claim the most expertise in would probably be considered some of the most stable and well-understood technologies out there right now, but i have been working with them since they were neither. perhaps i have crossed from where as long as possible becomes too long, at least as far as employability is concerned.

a related tweet from maciej ceglowski.

logging with context

i have had this blog post about moving from logs to metrics open for a while, since i know one of the weak points in our systems right now is some pretty basic stuff like logging and monitoring. and then jwz ran into a problem with logging errors from php-fpm and what it reminded me about is how logs need to carry enough context so you can pull the threads together from something like a single request.

i have not wrapped my head around the idea of just using metrics, because that sounds like rather a lot of data to be storing. maybe i’m just an on-prem brain in a cloud world.

scat pos proof of life (screencasts)

i recorded a couple of quick screencasts to show cloning it from github and starting it up with docker-compose and going through the initial database setup and processing a sale with sample data.

like the website says, the system is a work in progress and not suitable for use by anyone, but we have been using it for more than ten years.

i am not sure if it is something that anyone else would want to use, but i figure one way to find that out is to at least start pushing it towards where that is even be feasible.

pinging blo.gs again

i guess it only makes sense that i should ping blo.gs when i post things here.

hard to believe that thing has been running for over twenty years now.

now with new comments

not that i think that there is anyone reading this, but you can now comment on entries for seven days after they have been posted.

you can throw html in your comment, but it will get filtered by html purifier.

spam will be deleted promptly if it wasn’t already blocked or sequestered by akismet.

another php akismet api implementation

in poking around with adding support for comments here, i looked at integrating with the akismet anti-spam service, and the existing php libraries for using it didn’t work how i wanted or brought in dependencies that i wanted to avoid. so i made a simple akismet-api package that just uses guzzlehttp under the hood.

i haven’t made a test suite or added real documentation yet, so you should consider it pre-production, but it seems to work okay.

cleantalk is another anti-spam service that we use for registrations and comments on our store website and their php library is kinda non-idiomatic and strange, too. a reimplementation of that might be in the cards.

don’t be too clever

what is it about websites for government entities that result in login systems that try to do something clever that just falls down in the real world? treasurydirect used to have a password entry system that relied on a virtual keyboard, which was an accessibility nightmare and of course did not play nicely with a password manager. calsavers, the state of california’s retirement savings program does something fancy when submitting passwords that results in apple’s built-in password management wanting to save the transformed password on every log in, which means the saved password no longer works.

one small project i have in mind is to explore passkeys and how to implement them, and i sure wish the folks at calsavers had spent time on that rather than whatever janky client-side password chicanery they have going on now.

tooting while blogging

i figured that i should do something clever like automatically post to my mastodon account when i posted here, but i was surprised to find that the state of mastodon api clients for php is pretty sad. php-mastodon was what i used to get it working, but it's really an incomplete implementation and the error handling is pretty much non-existent so it took way longer than it should have to get going.

(and put me down as someone who is glad that “tooting” is being pushed into the background as the term of art for posting on mastodon, but couldn’t resist using it this time.)

one person's technical debt

My 20 Year Career is Technical Debt or Deprecated

Everything eventually becomes tech debt, or the projects get sunsetted. If you are lucky, your code survives long enough to be technical debt to someone else.

i rather liked this piece on the ever-changing nature of software tools and how entropy catches up with us all, but what a focus on technical debt doesn't quite capture is the underlying value. old code has accumulated a lot of knowledge and value. it's why you don't just rewrite from scratch.

scat pos has been a one-person project for over a decade. at this point, it has literally encoded my experience in how to manage our retail store. you could throw it all away; start from scratch, or just switch to an off-the-shelf solution. but you would be throwing away a lot of accumulated knowledge and value.

made a new saddle

while being able to write entries and send them via email seemed like fun, the reality is that the setup was fragile. so it was enough of a hurdle to writing anything here that i rarely wanted to deal with it.

i do want to write more here, so i knocked together a basic web interface that will allow me to do that.

the biggest thing that i still haven't figured out is how i want to handle is images. i could go back to using flickr and embedding from there, or i could implement a basic media library. i think the long-term solution is probably doing it myself because that's kind of the reason for this place.

migrated to slim framework 4

a couple of weeks ago i finally took some time to upgrade the code for this blog to the latest major version of the slim framework. it is still a strange mash-up of framework and hand-coded sql queries and old php code but this should make it easier to for me to tinker with going forward. the main thing i need to do is add a way to post images again.

server-side tracking

i gave up on server-side event tracking on our website for now. segment was promising but the ecommerce functionality wasn’t at the level i needed for all of the platforms we integrate with, and the cost was just too steep. it’s based on what they call “monthly tracked users” and even our modest needs looked like it was going to be way more expensive than i could justify.

so i just migrated (back) to google tag manager loading everything and a simple javascript wrapper to generate all of the events for each service. since then, i ran across rudderstack, which seems very similar to segment but with an open-source implementation and what appears to be a more sensible pricing structure for their cloud service. it will be the top of my list of things to investigate whenever i want to revisit this again.

flipping switches

cloudflare zaraz is a great concept: manage the third-party code for your website sort of like google tag manager, but run as much of the code as possible in the cloud instead of the browser. but the execution is still rough around the edges, especially when it comes to the ecommerce functionality.

each of the platforms where we publish our catalog (and can use that to advertise) have their own way of collecting performance metrics. the way i had hacked support for each into our old website was messy and fragile. zaraz intervenes here with a simple zaraz.ecommerce(event, data) call that pushes out the data to each of those third-party tools.

the problem is that how zaraz maps their simplified interface to those various systems is undocumented, and as near as the community can figure out, not always correct. i also found that if i enabled the ecommerce integration for facebook, it broke all of the ecommerce reporting everywhere.

i am still hopeful that they can work through the bugs and issues, add support for some of the other platforms that would be useful for us (like pinterest), and we can collect the data we need with a minimized impact on site performance.

the worst case is that i can just drop in my own implementation to turn those zaraz.ecommerce() into the old browser-side integration and it will still be more streamlined than it used to be.

dipping my toes in go

one of the very first things i noticed when i migrated our website to a new server is that someone was running a vulnerability scanner against us, which was annoying. i cranked up the bot-fighting tools on cloudflare, but i also got fail2ban running pretty quickly so it would add the IP addresses for obviously bad requests to an IP list on cloudflare that would lock those addresses out of the site for a while. not a foolproof measure, of course, but maybe it just makes us a slightly harder target so they move on to someone else.

but fail2ban is a very old system with a pretty gross configuration system. i was poking around for a more modern take on the problem, and i found a simple application written in go called silencer that i decided to try and work with. i forked it so i could integrate it with cloudflare, and it was very straightforward. i also had to update one of the dependencies so it actually handled log file rotation. when i get time to hack on it some more, i’ll add handling for ipv6 as well as ipv4 addresses.

go is an interesting language. obviously i don’t have my head wrapped around the customs and community, so it seems a little rough to me, but it’s also not so different that i couldn’t feel my way around pretty quickly to solve my problem at hand.

another three years

another three years between entries. some stuff has happened. the store is still going, and i am still finding excuses to code and learn new things.

i wrote before about how i was converting scat from a frankenstein monster to a more modern php application built on a framework, which has more or less happened. there’s just a little bit of the monster left in there that i just need to work up the proper motivation to finish rooting out.

i also took what was a separate online store application built on a different php framework and made it a different face of scat. it is still evolving and there’s bits that make it work that aren’t really reflected in the repository, but it’s in production and seems to sort of work, which has been gratifying to get accomplished. the interface for the online store doesn’t use any javascript or css frameworks. between that and running everything behind cloudflare, it’s much faster than it used to be.

big, heavy, and wood

justin mason flagged this article about "The log/event processing pipeline you can't have" a while back, and it has been on my mind ever since. our digital infrastructure is split across a few machines (virtual and not) and i often wish that i had a more cohesive way of collecting logs and doing even minimally interesting things with them.

i think the setup there is probably overkill for what i want, but i love the philosophy behind it. small, simple tools that fit together in an old-school unix way.

i set up an instance of graylog to play with a state-of-the-art log management tool, and it is actually pretty nice. the documentation around it is kind of terrible right now because the latest big release broke a lot of the recipes for processing logs.

right now, the path i am using for getting logs from nginx in a docker container to graylog involves nginx outputting JSON that gets double-encoded. it’s all very gross.

i think i am having a hard time finding the correct tooling for the gap between “i run everything on a single box” and “i have a lot of VC money to throw at an exponentially scalable system”. (while also avoiding AWS.)

(the very first post to this blog was the same ren & stimpy reference as the title of this post.)

the state of things

just over seven years ago, i mentioned that i had decided to switch over to using scat, the point of sale software that i had been knocking together in my spare time. it happened, and we have been using it while i continue to work on it in that copious spare time. the project page says “it is currently a very rough work-in-progress and not suitable for use by anyone.” and that's still true. perhaps even more true. (and absolutely true if you include the online store component.)

it is currently a frankenstein monster as i am (slowly) transforming it from an old-school php application to being built on the slim framework. i am using twig for templating, and using idiorm and paris as a database abstraction thing.

i am using docker containers for some things, but i have very mixed emotions about it. i started doing that because i was doing development on my macbook pro (15-inch early 2008) which is stuck on el capitan, but found it convenient to keep using them once i transitioned to doing development on the same local server where i'm running our production instance.

the way that docker gets stdout and stderr wrong constantly vexes me. (there may be reasonable technical reasons for it.)

i have been reading jamie zawinski’s blog for a very long time. all the way back to when it was on live journal, and also the blog for dna lounge, the nightclub he owns. the most recent post about writing his own user account system to the club website sounded very familiar.

character encoding is still hard?

email2webhook is nice in theory, but fell flat in handling basic character encoding.

that i still have to fight the same sort of issues that i was dealing with about 16(!) years ago is somehow not at all surprising.

i’ve switched to using zapier’s email to webhook features. it will probably bring a different set of challenges.

😎

how to fix eleven bugs in mysql 5.1

my “mysql client fixes” branch on launchpad contains fixes for eleven bugs (nine of them reported on bugs.mysql.com).

don’t get too excited — these are all the lowest priority-level bugs, mostly typos in comments and documentation.

now i have to figure out the latest process for actually getting these changes into the official tree. there are different policies around how and when to push to trees since i was last doing any server development. from someone who is partially outside, it all seems very tedious and designed to make it impossible to fix anything. process gone bad.

the mysql server isn’t going to get the benefits of using a good, open-source distributed revision control system unless it stops

bug tracking and code review

i was going to write some reactions to an observation that postgresql has no bug tracker and its discussion last week, but lost the spark and abandoned the post after a few days. but today i ran across a quote from linus torvalds that neatly sums up my thoughts:

We’ve always had some pending/unresolved issues, and I think that as our tracking gets better, there’s likely to be more of them. A number of bug-reports are either hard to reproduce (often including from the reporter) or end up without updates etc.

before there was a bug tracking system for mysql, there was a claim that all bugs were fixed in each release (or documented), and there has been a lot of pain in seeing how well that sort of claim stacks up against a actual growing repository of bug reports. if the postgresql project were to adopt a bug-tracking system, i am certain that they would find the same issue. before long, they would be planning bug triage days just like every other project with a bug-tracking system seems destined to do.

another good email from linus about these issues was pointed out by a coworker, but this part in particular caught my eye:

Same goes for “we should all just spend time looking at each others patches and trying to find bugs in them.” That’s not a solution, that’s a drug-induced dream you’re living in. And again, if I want to discuss dreams, I’d rather talk about my purple guy, and the bad things he does to the hedgehog that lives next door.

the procedure at mysql for code reviews is that either two other developers must review the patch, or one of an elite group of developers who are trusted to make single reviews. then the developer can push their changes into their team trees, taking care to have merged the changes correctly in anywhere from one to four versions (4.1 and up).

this is a huge amount of friction, and is one of the most significant problems causing pain for mysql development. two reviewers is just too high of a bar for most patches, and having the rule makes the reviews rote and less useful. there is also an unreasonable amount of distrust being displayed by this procedure, that says that developers can’t be trusted to ask for help when they are unsure, but should feel free to make the occasional mistake by pushing something that isn’t quite right.

i wonder if we could be taking better lessons from linux’s hierarchical development model, with the pulling of code up through lieutenants to a single main repository, rather than the existing model that involves every developer moving their own stones up the pyramid. it would require some developers (more senior ones, presumably) to spend more of their time doing code management as opposed to actual coding.

monty is not particularly happy with the state of development of his brainchild now. would he be happier if he were in a linus-like role of rolling up patches and managing releases?

i wish had the patience to write at less length and greater coherence about this.

don’t ask too many questions

chyrp is a nice looking piece of blog software. individual posts can have different styles, something it borrowed from the hosted tumblr service. i was interested to read about “the sql query massacre of january 19th, 2008” but the numbers gave me pause — 21 queries to generate the index page? that is down from an astounding 116, but that still seems ridiculous to me.

the number of queries to generate the index of this site? two. one of them is SET NAMES utf8. i could see boosting that to three or four if i moved some things like the list of links in the sidebar into the database, or added archive links. call it five if i had user accounts.

but right now, the number of queries used to load the index page on a chyrp site grows with the number of posts displayed on the front page. not only that, it grows by two times the number of posts on the front page.

chyrp could use a security audit, too.

what is 10% of php worth?

i am listed as one of the ten members of the php group. most of the php source code says it is copyright “the php group” (except for the zend engine stuff). the much-debated contributor license agreement for PDO2 involves the php group.

could i assign whatever rights (and responsibilities) my membership in the php group represents to someone else? how much should i try to get for it? i mean, if mysql was worth $1 billion....

i am still disappointed that a way of evolving the membership of the php group was never established.

connector/odbc 5.1.1 (beta!)

mysql connector/odbc 5.1.1-beta is available.

we didn’t implement all of the features in our original plan, but we decided to close out 5.1 to new features so that we could work on getting it to a production (GA) release as soon as possible.

5.1 has it’s share of bugs still, but we have tackled the most serious ones, and now that we are done with features (for the time being) we can focus on making the GA release shine.

now the race is on to see who gets out a 5.1 GA release first — the server or connector/odbc!

mac os x programming help needed

one of the features we had planned for mysql connector/odbc 5.1 is native setup libraries for the major platforms. we have the microsoft windows version going, and some code to get us going on linux/unix (using gtk instead of qt), but our gui team is too busy to get us started on a native mac os x version.

anyone want to pitch in by showing us how to get a basic dialog window to pop up based on a c library call? i think we will be able to customize it from there, but i am just unfamiliar enough with mac os x gui programming that i have a feeling it would take a long time for me to get that going.

connector/odbc 3.51.20 and 5.1.0

another month, another release of connector/odbc 3.51. there’s not a lot of bug fixes in this one, but we did manage to get the bug count under 80 bugs.

the reason there were fewer bug fixes in the release of 3.51.20 (other than there being fewer bugs to fix) was that we have been hard at work on connector/odbc 5.1.0, which builds on the 3.51 foundation to bring new functionality like unicode and descriptor support. there are more features planned, and you can see the release announcement for details. i hope that we’ll be able to keep on releasing new versions of 3.51 and 5.1 on a monthly basis.

connector/odbc 5.0 has met the same fate as the aborted 3.52 and 3.53 releases. it was an ambitious ground-up rewrite of the driver, but once we had put renewed efforts into getting the 3.51 code into better shape, it became clear that doing the same for a completely different code-base made little sense. we are going to be cherry-picking some of the 5.0 code for some of the new features.

i am sorry that we have been secretive about what was up with the future of 5.0, but we decided it was better to not talk about what was happening until we were confident about the decision to kill it.

connector/odbc 3.51.19

we managed to let a pretty significant regression sneak through in 3.51.18, so we’ve turned out a quick release of mysql connector/odbc 3.51.19. sorry for the hassle.

independence day for code

as i’ve been threatening to do for quite some time, i’ve finally made the source code for bugs.mysql.com available. it is not the prettiest code, and there’s still all sorts of hard-coded company-specific stuff in there. but it is free code, so stop complaining.

it is available as a bazaar repository at http://bugs.mysql.com/bzr/. i have not yet set up any sort of fancy web view, or mirrored it to launchpad.

i plan to do the same for the lists.mysql.com code some day. one limiting factor now is that machine only has python 2.3 on it, and bazaar needs python 2.4.

my five mysql wishes

jay pipes started with his five mysql wishes, and others have chimed in. i guess i may as well take a whack at it.

  1. connect by. yeah, yeah. it’s not standard. i don’t care.
  2. expose character-set conversions in the client library. all the code to convert between all of the character sets understood by the server is there, there’s just no public interface to it.
  3. online backup. it’s in progress, but this will make things so much better in so many ways. we could actually have reliable backups of bugs.mysql.com. and it’s going to make starting up new slaves so much easier in replication.
  4. re-learn how to ship software. the long release cycles of 5.0 and 5.1 have been pretty ridiculous, and i’m sure we can find a better way to add features without having to slog through months of bug-fixing to get a release to production quality. it is frustrating to ask for new features and have the fear that there won’t be a production release that includes them for another couple of years.
  5. fix planet mysql to handle utf-8. seriously, guys, it’s not that hard.

style="design: prettier"

i finally got tired of the index pages on the mysql mailing lists looking like ezmlm-cgi, so i cribbed some design from the perl mailing lists and now the by-thread index pages include who participated in a thread. i didn’t steal the pagination of busy months. yet.

i need to package up more of the bits of code driving the mysql mailing lists. there are some quirks, but i like the way it all fits together.

i also need to put in the few hours it would take to make it possible to post to the lists from the web interface.

angry programming

mysql doesn’t have quite the number of fancy internal applications that you might suspect, and i got frustrated when the company started to roll out a system of monthly time-off reports based on emailing around an excel spreadsheet. (to add icing to that cake, they kept sending out the excel sheet with password protection!)

last friday, i spent an afternoon cooking up this little proof-of-concept application that tracked the same information as the spreadsheet, but in tasty web format, with some ajax goodness (courtesy of prototype).

as it turns out, there was an official company tool for doing this that was in the works, but they hadn’t bothered to let anyone know it was imminent. i’m told it is sox-compliant and configurable six ways to sunday. i haven’t seen it yet.

so my meager efforts did not go to waste, i just spent another half hour to make this a standalone demo (rather than tying into our internal personnel database). perhaps someone else can find some use for it, or take some inspiration from it.

here’s the simple workflow for the application:

  1. employee clicks on days they took off in a month.
  2. employee clicks button to get month approved, which sends email to boss.
  3. boss reads email and follows link to view the report online.
  4. boss clicks the button to approve the report, which sends mail to the employee and the finance department.
  5. the finance department does whatever it does with the data. the employee can no longer change it.

obviously that’s not quite all you would want for a fully-functional application, but it is most of the way there. i think it’s already better than the system that involved emailing an excel spreadsheet around.

jobs at mysql

mysql has quite a few open job listings. some positions of note: web developer, support engineer, maintenance developer, qa engineer, and performance architect. all of these positions are available world-wide, so you get to work from home. some of the other jobs from the full list are location-specific.

if you mention that i referred you for some of these positions and are then hired, i get some sort of referral bonus.

backcountry programming

i’m back to doing some work on connector/odbc, fixing bugs in the “stable” version (otherwise known as 3.51.x). we have another project going that is a ground-up rewrite of the driver, but i’m not really involved with that.

the state of the stable version of the driver is pretty sad. i keep running into pockets of code that are truly frightening. my first big foray was a bug i reported when i was trying to get the test suite to run, and it basically entailed throwing out a function that made no sense and replacing it with code that has such niceties as comments.

as i’ve started into the catalog functions, starting from this ancient bug, i’m finding even more frightening (and untested) code.

my general goal is to leave things cleaner than i’ve found them, doing things as incrementally as i can. we’re going to be building out a richer test suite, which will be a tremendous help, both in getting the “stable” version of the driver into better shape, and proving the capabilities of the rewrite.

i know it has been a long time since the last connector/odbc 3.51 release — kent, one of the build team members, is working on scripting the building, testing, and packaging so that we can crank out builds more consistently and reliably. unfortunately, a lot of the magic incantations were lost as people moved on to other work or other companies. the days of connector/odbc being the neglected stepchild of mysql products may be coming to an end.

colobus on google code

i created a google code project for colobus and imported the old releases into the repository. i’m not a huge subversion fan, but it didn’t make much sense to stick with bitkeeper with no free version available, and making system administration google’s problem seemed like a good thing.

i’ve folded in a couple of patches that were sitting in my inbox, and have another bigger one to put the group information into the database.

being known for being you

mike kruckenberg shared his observations from watching mysql source code commits, and jay pipes commented about this commit from antony curtis which had him excited. now that’s how open source is supposed to work, at least in part.

i replied to a later version of that commit to our internal developer list (and antony), pointing out that with just a little effort the comment would be more useful to people outside of the development team. “plugin server variables” doesn’t really do it justice, and “WL 2936” is useful to people who can access our internal task tracking tool, but does no good to people like mike.

the other reason it is good to engage the community like this is because it is very healthy for your own future. being able to point to the work i had done on open source and the networking that came from that have both been key factors in getting jobs for me. i’m sure it will be useful next time i am looking, too.

joining activerecord with mysql 5

dhh committed a patch for activerecord to make it work with mysql 5 that was subsequently reverted because it broke things on postgres and sqlite.

obviously we’d like ruby on rails to work with mysql 5, but because there was no test case committed along with either of these changes, i don’t really know the root cause of the problem. dhh claims it is the changes that made mysql conform to the standard sql join syntax, but i can’t evaluate that because i can’t reproduce the problem.

any activerecord gurus want to point me in the right direction?

better out-of-the-box mysql support for ruby on rails

activerecord now supports mysql 4.1 (and later) out of the box whether you are using new or old-style passwords, because they applied my patch for handling the related protocol changes correctly.

(it’s not quite out-of-the-box yet — the fix will appear in the next major release of rails, i guess. it’s fixed in their repository.)

now if only the upstream developer would show signs of life, and get that fixed. i’d complain about that more, but there’s a lot of windows around here.

don’t bother paddling upstream

so turns out that the ruby on rails developers had already added 4.1 authentication support for their bundled version of ruby/mysql, but they’ve found the upstream maintainer as unresponsive as i have. their implementation wasn’t quite complete, so i’ve submitted a patch to round it out.

the version included with ruby on rails doesn’t include the test suite, though.

more ruby/mysql love

i’ve updated my patch for new-style mysql authentication for ruby/mysql, with a new test case for the change_user method (and support for same with new authentication).

i’ve even tested this against a 4.0 server, so i’m pretty sure i didn’t break anything.

new-style mysql authentication in pure ruby

ruby has two modules for connecting to mysql. one is called mysql/ruby and is built in top of the standard libmysqlclient c library. the other is called ruby/mysql and is pure ruby. the problem with the latter is that it is a from-scratch implementation of the mysql network protocol, and the authentication handshake changed in mysql 4.1.

but here is a patch to add support for new-style mysql authentication to ruby/mysql. it should also deal with the other protocol changes that came along at the same time. it doesn’t do anything to expose server-side prepared statements.

it is only lightly tested. in particular, i haven’t tried to connect to a pre-4.1 version of the server. it should still work, but it is entirely possible i screwed it up. i’m also still just learning ruby, so there are some ugly bits.

i think having more from-scratch implementations of the protocol is a good thing. there’s at least five — the server code itself (and client library), connector/j, connector/net, ruby/mysql, and Net::MySQL (perl). once we have these all collected into an integration test suite, the server developers will get much better feedback when they go off the reservation with the protocol.

where should i be lurking?

trying to find places where people talk about using python, ruby, and php with mysql has been a bit of a challenge.

the problem on the php side is that php forum on forums.mysql.com is so filled with pre-beginner-level questions that it’s barely worth it for me to spend my time digging through it.

for python, the python forum on forums.mysql.com is nearly a ghost town. the forums for the mysql-python project seem slightly active, but the sourceforge forum interface is just bad. (not that any web-based forum isn’t starting from a bad place.) the db-sig mail archives also have some interesting discussions.

for ruby, the ruby forum on forums.mysql.com is even quieter than the python one, and i haven’t found anywhere else.

another thing i’ll take a look at is apr_dbd_mysql, which is not part of the main apr-util repository because of licensing issues (ugh).

where else should i be looking?

acronyms

tim bray coined MARS: it stands for “mysql + apache + ruby + solaris.” (get the shirt.)

bill de hóra proposed MADD: “mysql + apache + django + debian.”

when forwarding the above to an internal mailing list at mysql, i proposed MAUDE: “mysql + apache + ubuntu + django + eclipse.” the logo would be a picture of bea arthur, of course.

but mårten mickos, ceo of mysql, came up with MARTEN: “mysql + apache + ruby + tomcat + eclipse + nagios.”

or would that be åpache?

one month to go

mysql users conference 2006 the mysql users conference 2006 is only a month away. i’m just going to be dropping in for one day to give two talks — “embedding mysql” and “practical i18n with php and mysql.”

there is also a great lineup of other speakers, tutorials, and keynotes. i’m going to miss the keynote by mark shuttleworth, but i am looking forward to the keynote by the founder of rightnow.

the truth is out there

andi gutmans handles questions i talked at scale 4x, and you can download the exciting slides. the picture is of future oracle employee and zend co-founder andi gutmans, and there are a few more pictures from the first day.

(neither andi nor dave from sleepycat admitted to the imminent acquisitions of their companies by oracle.)

numbers singly

the “generally available” (or production-ready) version of mysql 5.0 was officially released today, and kai voigt, one of the mysql trainers, has posted a sudoku solver written as an sql stored procedure. the solver was actually written by per-erik marten, the main implementor of mysql’s stored procedure support.

it’s probably not the best showcase of stored procedures, but it is a nifty little hack.

matt sergeant’s article about using qpsmtpd is noteworthy for reasons other than it has my name in it.

i’m still running a pretty minimal set of qpsmtpd plugins since i upgraded by server to ubuntu. my main source of spam is my old college address, which is so ancient that it is deeply embedded in the mailing lists that spammers swap. and apparently they aren’t running very good antispam software at hmc. (i think they expect each user to set up spamassassin on their own.)

here’s an interesting tidbit i picked up from the hmc cs department site: a co-inventor of sql, don chamberlin, is a fellow alum.

stefan esser has dug up and fixed more php xml-rpc vulnerabilities, and best of all, has worked with the package maintainers to purge them of their use of eval().

stefan can be a bit of a blowhard, but it’s excellent work like this that makes that easier to swallow.

more jobs at mysql

it occurred to me that i mentioned the product engineer position, but there are a number of other jobs at mysql that are open, including web developer.

the race is on

stefan esser dissects one tip in a bad article about php, but is merciful in leaving the others alone. one thing you’ll note if you line up this second article with the first article is that not only are the tips not very good, the author can’t count to ten.

and on the useful-php-news front, andrei’s unicode work* has landed in the php development tree, and rasmus sparked a long discussion of other php6 features. the perennially lost cause of trying to rename functions and change their argument order resurfaces, of course, but it doesn’t look like anyone is taking it all that seriously.

the race is now on between perl6 and php6.

* other people have been involved, i’m sure. i just don’t know who they are.

damian conway’s “ten essential development practices” article (via daring fireball) may appear on perl.com, but the basics are applicable to any software project.

i would put “use a revision control system” way at the top of the list, and i would also add “use a bug-tracking system.”

there was a hole in the pear xml-rpc package, and as a result many php-based applications had a security hole as a result, such as the many php blogging apps.

the thing is, this came about because the xml-rpc library builds up some code and calls eval(). whoever wrote code to parse xml-rpc by building code and calling eval() should have their computer taken away. and then possibly be beaten with it.

the pear code is actually a fork of edd dumbill’s php xml-rpc code, and this is not the first security hole that has been discovered in that code as a result of this positively shameful architecture. i will not be at all surprised if it is not the last.

and for those keeping score at home, i pointed out how dumb this was almost four years ago.

a few resources

here’s a few resources that someone may find helpful:

and don’t forget that in php, variables like $_SERVER['REQUEST_URI'] and $_SERVER['HTTP_REFERER'] are user input.

happy birthday, php

just ten short years ago, php appeared on the scene.

the first time i wrote any php code was about eight or nine years ago. the most recent was about eight or nine minutes ago.

thanks to everyone who has made that all possible. especially, of course, rasmus, who we blame it all on. (well, most of us do.)

here’s what others have had to say about it.

short tags and other php coding things

i like php’s short tags. i feel sad for people who feel they need to use the ‘<?php’ construct all the time. or worse, ‘<?php echo’ where a ‘<?=’ will do.

one part of my always-evolving personal php coding style is how i embed sql statements into my code. i used to generally do it like:

  $query = "SELECT id,name,url,rss,md5sum,method,updated AS up,"
         . "       UNIX_TIMESTAMP(lastchecked) AS lastchecked,"
         . "       UNIX_TIMESTAMP(updated) AS updated"
         . "  FROM blogs "
         . " WHERE updated > NOW() - INTERVAL 10 MINUTE AND method = 0"
         . " ORDER BY up DESC"
         . " LIMIT 10";

but lately i’ve been doing:

  $query= "SELECT id,name,url,rss,md5sum,method,updated AS up,
                  UNIX_TIMESTAMP(lastchecked) AS lastchecked,
                  UNIX_TIMESTAMP(updated) AS updated
             FROM blogs
            WHERE updated > NOW() - INTERVAL 10 MINUTE AND method = 0
            ORDER BY up DESC
            LIMIT 10
          ";

it makes it easier to cut-and-paste into the mysql client for testing.

O_NONBLOCK broken on darwin and irix?

i’ve been dealing with a mysql bug on irix and mac os x that turned up in our test suite once i fixed the kill test to actually do the test properly. after much digging in code and staring at debug traces, i noticed on irix that in the thread that is being killed, it was stuck in a blocking read that the calling code believed would be non-blocking.

by changing our setting of the non-blocking mode to use both O_NDELAY and O_NONBLOCK instead of just O_NONBLOCK, i was able to get the code to work. but i’m not sure why it is necessary.

on the bright side, this may also fix this bug about wait_timeout not working on mac os x.

i may not be doing web development for my day job any more, but i put a little more elbow grease into the mysql bugs database to add two new features that people have asked for at various times: subscribing to updates on bugs, and making private comments. i also cleaned up the database structure a bit. for example, instead of storing email addresses for the assigned developer and reviewer, it actually has a proper link to the user database.

it’s not a particularly pretty code base (although i clean it up as i go), but i’m rather fond of this little bugs system.

cnet’s coverage of the bitkeeper kerfuffle revealed the osdl employee who drove the wedge: andrew tridgell, of samba fame.

bitmover is dropping the free version of bitkeeper, which is a shame. i’m not sure that we have decided what to do. i wish bazaar ng was a little further along. it looks like it is shaping up to be the best-of-breed of the new generation of open-source version control systems.

mysql users conference 2005 i got suckered into agreed to pick up some slack at this year’s mysql user conference, and will be giving a talk on “embedded mysql.”

the conference should be a lot of fun this year — from what i’ve heard, the number of signups has been huge, and we’re still four weeks away. one of the tutorials, advanced mysql performance optimization, is already sold out. being back in the bay area is definitely a good thing for this sort of conference.

wait, did someone just say four weeks away? i guess i need to figure out what this embedded mysql stuff is all about.

i’ll also be doing what is called guru best practices: php with andi gutmans.

URI::Fetch is a new perl module from ben trott (of movable type renown) that does compression, ETag, and last-modified handling when retrieving web resources. the lazyweb delivers again.

speaking of that, i found i had to do one additional thing to my php code that fetches pages because of a non-existent workaround for server bugs in the version of curl i’m using. so when blo.gs fetches a page to verify a ping and gets a particular compression-related error, it goes back out and requests the page again without compression.

city of angels to adopt open source?

a few los angeles city councilmembers have introduced a measure to have the city study using open-source software, and putting the possible money saved towards hiring new police officers. it sounds like a great plan, and i hope to get around to writing my city councilmember soon to encourage her to support the motion.

speaking of my city councilmember, i have gotten four calls from her campaign in the last few days. one of them was actually from the councilmember herself (before this open-source motion came up) due to some sort of mix-up by her campaign staff that led her to believe i had some issue i wanted to discuss. as i was sucking on the world of warcrack pipe at the time, i was in no mood to talk to her. then today was call number four, and i pointed out to the caller that if they called me again, i would almost certainly not vote for her in the upcoming primary. (the only other call i’ve gotten is from the bernard parks mayoral campaign.)

repeating myself

for the blo.gs cloud service, i had written a little server in php that took input from connections on a unix domain socket and repeated it out to a number of external tcp connections. but it would keep getting stuck when someone wasn’t reading data fast enough, and i couldn’t figure out how to get the php code to handle that.

so i rewrote it in c. it’s only 274 lines of code, and it seems to work just fine. it was actually a pretty straightforward port from php to c, although i had to spend some quality time refreshing my memory on how all the various socket functions work.

there’s a visible bump in the graph of my outgoing bandwidth from after this was fixed.

new job

after the first of the year, i’ll be starting my new job. but as seems to be the trend these days, it’s not a new job with a new company, just a new job with the same company.

i’m getting out of doing web development, at least for my day job. that’s why mysql ab is hiring a webmaster (which isn’t exactly the job i have now, but it basically the person who will take on the biggest chunk of what i was doing).

what i’m going to be doing is joining the development team, with my initial focus being maintenance programming for the server. i’m going back to my roots, and getting my hands dirty with “real” programming again. and i don’t think there’s any better way to learn the ins-and-outs of a system than chasing down bugs. just fixing the bug in how CREATE TABLE ... SELECT statements were logged for replication gave me a good reason to get up-to-speed on several aspects of how things work under the hood.

this article by rands about the type of employee who has gotten locked into a role goes part of the way in explaining why i’m moving on from my current position. even if trying to become irreplaceable by being the only one who knows how to do something is not your goal, it is easy for that to happen by default if you’re in the same position for too long. so i hope that shaking things up will be good for the company as a whole, and not just for own own mental health.

one thing i’ll likely do early in the new year is get a new machine for doing development. i’m thinking of a athlon64 shuttle system, which i can get pretty loaded within my annual work computer budget. i may also upgrade my desktop (which is a personal machine) so that i can use the monitor with the development box when necessary (although it would run headless most of the time, and i doubt i’ll spring for a kvm or anything fancy like that). instead of actually getting a new desktop machine, one possibility is just selling the 17" imac and getting an apple cinema display and using that with my laptop (and the development machine).

(the fact that said development machine would likely be powerful enough to run world of warcraft well is entirely coincidental.)

enemies of carlotta is a mailing list manager in the style of ezmlm, but written in python by lars wirzenius. one problem with it is that it is written as a pretty monolithic application, as opposed to ezmlm’s series of commands that are run for a few different addresses. but there’s some interesting design decisions made. it doesn’t implement digests yet.

one of my biggest annoyances with ezmlm these days is that the digest generation is not character-encoding aware. so for a list like the mysql japanese list, the digests, particularly the plain-text one, look like garbage. this is more frustrating because i spent a fair amount of time making sure the web-based archive got the encoding issues right.

the mysql lists are set up so that both mime-encoded and plain-text digests are generated, using a dummy list and some judicious symlinks. when we took over the maxdb lists from sap, the existing lists only had a plain-text format, and the subscribers clamored for that when we only had the mime-encoded versions available.

three out of four ain’t bad

mysql 4.1.8 is out and it includes a fix for the bug that had been plaguing blo.gs. it also contains a fix i made for another bug.

i now have code in linux, mysql, and php. if only my patch to apache had been accepted, i’d have code in the whole LAMP stack. (the CookieDomain configuration setting was finally added about two years later, but not using my patch.)

jobs at mysql ab: webmaster.

more later.

another planet

planet mysql collects the blogs of various mysql employees and community members. (well, the only community members included right now are ex-employees.)

“X-Mailer: RLSP Mailer” appears to be a highly reliable indicator for spam, at least judging by the 250 or so messages i’ve gotten with that header in the last several months, which appear to all be variants of lottery and 419 spam. one place it comes up in a google search is the source for myphpnuke. i wonder if there’s a connection.

that reminds me: i should start using the spamassassin backport, to join the world of spamassassin 3.0. something to add to the list of things to play with over the long holiday weekend.

stealing an idea (and four lines of code) from shelley powers, i’ve implemented a very basic live comment preview. i need to read up on this xml http request object thing (which this does not use) to try doing other and more clever things. (christian stocker’s livesearch is a good example of clever xml http request object usage.)

that was easy. i bolted on the basic livesearch here. the integration could (and maybe someday will) be improved, but it was quite easy to get going.

layers

BEA’s Apache Beehive Hits Milestone

Officials at BEA Systems announced the first milestone release of their open source project, Apache Beehive.

i’m shocked, shocked to find that gambling is going on in here!

the aside on john lim’s php weblog about the “wintermute” nickname makes me laugh. i always treat it as shorthand for “someone who can be safely ignored because they thought they were clever for using a nickname from a classic cyberpunk book.”

sxip is looking for a lead web developer.

mysql users conference call for papers extended

the conference website doesn’t reflect the new date yet, but the call for papers for the mysql users conference is being extended until november 15. so you have a second chance if you missed the initial call for papers, and are just hearing about it now. the speaker notification date isn’t changing. we just came to our senses and realized we didn’t need three whole weeks to make decisions on which sessions and tutorials to accept.

with the way things are going, we may have so many great talks that i don’t need to do one.

more on the uc2005 call for papers

i’m the chair for the lamp track, and would gladly accept bribes for submitted proposals. the best bribe is the submission of a good tutorial or session proposal. here’s the summary of the lamp track:

LAMP

This track is for anyone who is already using or considering making the open source LAMP (Linux, Apache, MySQL, PHP/Perl/Python) stack a core component of their software infrastructure. How is the LAMP stack is being used to power massive websites while maintaining a cost-effective TCO?

Topics:

  • New LAMP Releases and Features
  • Application Architectures
  • Best Practices
  • Case Studies
  • LAMP related components (FreeBSD, NetWare, Windows)

there’s a new site opposing software patents in the european union called, appropriately enough, no software patents! the very clever might notice that the domain is registered in my name (along with the .org and .net variants), which is just a side effect of how the project got initiated within mysql (whose position on patents is online, incidentally). my involvement basically ended at registering the domain names, and they will be getting transferred to someone else soon.

eventum

i have been terribly remiss in not mentioning eventum before. it’s the issue-tracking tool that the mysql support team uses, and it is also used for task-tracking by a growing number of groups within the company. we liked it so much, we hired the author, bought out the software, and got it released under the gpl.

post chaining

powerblogs, the closed-source, hosted blog application that a few well-known political blogs use (among others, presumably) introduced a feature recently called “post chaining”. it’s sort of an on-the-fly categorization, where the posts in the chain are automatically linked to by all of the other entries in the chain. so if i provided a link to this posting in the middle of a chain about jimmy swaggart’s stupid kill-gay-people remark, you can easily find what earlier and later posts on the same issue had to say.

theoretically, saving a document as a pdf file is easy on mac os x. as a practical matter, the option is grayed out in the print box when i try to do it. and if i go into the output options, it won't allow me to save as pdf from there, either (but it will allow me to save as postscript).

i do remember being able to do this once upon a time. i think it stopped working sometime after i reinstalled with panther.

finished!

i couldn’t come up with an excuse to include an omnigraffle graphic.

flow

i wrote nearly 1200 words this afternoon. it is one of those things where getting started is the hard part. and, i guess, going back and editing and rewriting to make what is written actually coherent. but i’ll experience more of that tomorrow.

is there anything about unicode and character set support in mysql 4.1 that you want explained? now would be a good time to tell me.

yummy

after meeting joshua schachter at foo camp, i decided that i should take another look at del.icio.us, which he created. i’d consider it a bit of a spiritual cousin of blo.gs, in that it is basically not-for-profit, and a way for him to blow off creative steam.

here’s my del.icio.us page, which i may at some point figure out a way to incorporate here.

i am definitely planning on revamping the crude category system i wrote for this blogs to be tag-based. tags are what all the cool kids are doing these days.

php vs. perl for web development

joe johnston explains why php is more popular than mod_perl for web development. the short answer is that they solve different problems.

i've been thinking about this with regard to python recently. i'd love to learn more python, and use it in the web space, but mod_python is more like mod_perl than php, and when i'm developing web stuff, my thinking is matches the php model closer than the mod_(python|perl) model.

more on the cloud

ben hyde writes very smart things about “collaborative model synchronization” based on my earlier post about decentralized notifications and content distribution.

the privacy issue is something i forgot to mention, but is definitely another factor to consider. (i’m not sure it is a critical issue, but my perspective is likely skewed by how public i am with my list of subscribed blogs.)

here’s another: how does the publisher know how many people are following what they write? (again, not something i personally feel is critical, but i also rarely even look at the stats or logs of my sites.)

decentralized web(site|log) update notifications and content distribution

this is something that has been on my mind lately, and hope to talk about with smart people this weekend. (“the first rule is...”)

in a bit of interesting timing, this little software company in redmond recently hit the wall in dealing with feeding rss to zillions of clients on one of their sites.

in preparation, i’ve been digging into info on some of the p2p frameworks out there. the most promising thing i’ve come across is scribe. the disappointing thing (for me) is that it is built with java, which limits my ability to play with it.

while it would be tempting to think merely about update notifications, that just doesn’t go far enough. even if you eliminated all of the polling that rss and atom aggregation clients did, you would have just traded it for a thundering-herd problem when a notification was sent out. (this is the problem that shrook’s distributed checking has, aside from the distribution of notifications not being distributed.)

the atom-syntax list has a long thread on the issue of bandwidth consumption of rss/atom feeds, and bob wyman is clearly exploring some of the same edges of the space as me.

maybe it’s useful to sketch out a scenario of how i envision this working: i decide to track a site like boing boing, so i subscribe to it using my aggregation client. when it subscribes, it gets a public key (probably something i fetch from their server, perhaps embedded in the rss/atom feed). my client then hooks into the notification-and-content-distribution-network-in-the-sky, and says “hey, give me updates about boingboing”. later, the fine folks at boing boing (or xeni) post something, and because they’re using fancy new software that supports this mythical decentralized distribution system, it pushes the entry into the cloud. the update circulates through the cloud, reaching me in a nice ln(n) sort of way. my client then checks that the signature actually matches the public key i got earlier, and goes ahead and displays the content to me, fresh from the oven.

another scenario: now when i subscribe to jeremy zawodny’s blog, who has been slow to update his weblog software (in my hypothetical scenario) because he’s too busy learning how to fly airplanes, i don’t get updates whenever he publishes. but there’s enough other readers running this cloud-enabled aggregation software that when they decide they haven’t seen an update recently, they go ahead and poll his site. but when they notice an update, they inject it into the cloud. or they even notify the cloud that there hasn’t been an update.

obviously that second situation is much less ideal: there’s no signature, so some bozo could start injecting “postgresql is great!” entries into the jeremy zawodny feed space. or someone could just feed “nothing changed” messages, resulting in updates not getting noticed. the latter is fairly easy to deal with (add a bit of fuzzy logic there, where clients sometime decide to check for themselves even when they’ve been told nothing is new), but i’m not so sure about the forgery problem in the absence of some sort of signing mechanism.

in addition to notification, a nice feature for this cloud to have would be caching. that way when i wake up my machine in the morning, the updates i’ve missed can stream in from the network of peers who have been awake, and i don’t have to bother the original sites.

i don’t think there is going to be a quick and easy solution to this, but i hope to aid in the bootstrapping. if nothing else, blo.gs can certainly gateway what it knows about blog updates into whatever system materializes. (it certainly can’t scale any worse than the existing cloud interface, which is pretty inefficient given the rate that pings are coming in.)

a footnote on the signing mechanism: there’s the xml-signature syntax and processing specification that covers this. i haven’t really looked at in detail to know what parts of the problem it solves or does not solve.

(anybody who suggests bittorrent as a key component of the solution will have to work much harder to get a passing grade.)

mailing list wishlist

justin mason has a mailing list wishlist. the ezmlm-based system for the mysql mailing lists does the archive-permalink thing. it is added to the message as the List-Archive header. (maybe that is an abuse of that header, but it seems more relevant than just putting the link to the main archive in the header.)

there are a number of things i don’t like about ezmlm, but the biggest advantage is that it is decomposed into enough distinct little bits that it is not difficult to rip out and replace specific bits. for example, you can replace the subscription confirmation (and make it web-based, and not vulnerable to stupid autoresponders subscribing themselves) by just adding a program into the manager that handles them before they get to ezmlm-request and ezmlm-manage.

i haven’t spent a lot of time futzing with mailman, but i’ve never really cared for it as a mailing list user.

but i’m not sure it really matters. all the kids are crazy about web-based forums these days. people who recognize the superiority of mailing lists are dinosaurs.

xplanet desktop background for mac os x

justin mason provides desktop background images generated using xplanet and satellite cloud data.

i couldn’t get the recommended mac os x tool (geektool) to work, so i came up with a lower-tech solution. i created a folder ~/Pictures/Backgrounds/, and set up a cron job to pull down the latest image every hour (using curl, and limited to the hours i’m likely to be awake to avoid some unnecessary traffic). and then in the system preferences, i set up the background to change picture every 5 minutes, with the folder i created selected. since there is only ever one picture in there, it just reloads that image.

it’s not quite the ideal solution (it would be nice to be able to just signal that the image should be reloaded after it is updated, rather than having it do it every five minutes), but what i did was easy to set up.

the image is fascinating. you can see all sorts of other tropical storms that you don’t hear about in the news, and right now you can see a cloud front moving across the midwest.

new colobus release

i popped out a new colobus release. nothing exciting, just some performance tweaking to take better advantage of the database back-end.

ballistic

in the race of who would snap first and rewrite all of ezmlm in perl, it looks like ask has jumped to the head of the pack. now i just have to find some time to play around with it (and pitch in — i’d particularly like to implement flexible digest generation that wasn’t oblivious to character sets).

failed password for root from ...

what is with the recent uptick in failed ssh logins everywhere? a few weeks ago, i almost never got emails from the automatic log watchers about these, now i get at least one or two a day, all from different ip addresses. usually they’re attempted root logins, but sometimes they’re attempts to log in as other role accounts (like bin).

for the record

rasmus first implemented the handling of urls like http://php.net/base64_encode on october 6, 2000. (i did something similar for urls like http://mysql.com/select on march 11, 2003.)

i’m thinking that this might be a good topic for an article for the mysql developer zone.

generating a last-modified time from php

while the getlastmod() function can tell you when the main file was last modified, it would be cool if php kept track of the most recent last modification time of all included files, assuming that it is already doing a stat() on each file as it includes it.

the value may not always be directly applicable (sometimes you are pulling in other data, or database information), but it would be useful. i guess you would want another function to inject possible timestamps into the mix.

the alternative is to iterate over the results of get_included_files() and stat() all of them.

or i guess you could just live with getlastmod(), and ignore the fact that it isn’t accurate when you do something like change the header that you’re including via some other mechanism.

uppsala calling

i picked up a grandstream budgetone 101 to use with the sip (voip) server we have set up at work. i just plugged in the server, username, and password info, and now i can talk with my colleagues all over the world, without long distance charges. pretty nifty. (well, it will be more nifty once more of said colleagues also get their hands on voip phones, or headsets with a softphone.)

i do wish it was a dual-line phone that handle pots in addition to voip.

when i’m bored, i can dial the echo server number and talk to myself via a server in sweden, which has a pleasant perversity to it.

reading php 5: a sign that php could soon be owned by sun requires the installation of a tin-foil hat, and buying in to the premise that zend == php. (via harry fuecks.)

i don’t even know where to start with a statement like this:

“Some very useful functions have been added to PHP5. It’s been nine years in the making, but PHP5 now includes two functions to uuencode and uudecode. Combining those functions with the new socket and stream functions, developers can create a lots of "kewl" applications. An application to automatically encode and decode files to and from news servers comes to mind as an example of how to incorporate these new functions.”

java does not appear to have built in uuencode and uudecode functions. clearly php is superior! (you see, i’m being sarcastic....)

on a slight tangent, is it just me, or could the migrating to php 5 section of the php manual use a once-over by someone with a firmer grasp of english grammar? (no disrespect to the authors meant, it just has a surprising number of clunky statements.)

the funny bit about danny o’brien's notes on andy oram’s talk at oscon is that he’s clearly the person who was entering just as i was leaving. andy had just pulled up the slide about trackback when i stepped out. (i had a flight to catch.)

validating utf-8 by regex

in actually using the regex in this w3c faq, i noticed that it has a few typos: the first three escapes are missing the 'x' to put them into hex. i’ve let the author know. the corrected example:

$field =~
  m/^(
     [\x09\x0A\x0D\x20-\x7E]            # ASCII
   | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
   |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
   | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
   |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
   |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
   | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
   |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
  )*$/x;

getting the right abstraction from your database abstraction layer

jeremy zawodny rails against database abstraction layers, particularly those that aim to provide database independence. the abstraction layer that i use is really just aimed at making it so i can do more with less code. it’s inspired a bit by the PEAR DB layer, but without the database independence cruft. it means i can write $db->get_assoc("SELECT foo, bar FROM users WHERE id = ?", $id) instead of having the same four lines of query building and row fetching all over the place.

this is actually the object-oriented version of the procedural version of this that i used to use. i needed to be able to use it in situations where i may have multiple connections to different databases open, so the old abstraction layer got painted with a thin OO veneer. the mysql extension’s handling of a default connection is useful for the lazy programmer, but doesn’t mix well with functions that take a variable number of arguments.

“hello shoe, meet the other foot.”

the guys who founded zend used to get hot under the collar because they believed they weren’t getting the recognition they deserved for php, whereas rasmus was more frequently quoted, interviewed, and credited. “Andi and I created the language in 1997, when we needed a solution to implement a shopping cart project for University. Some ideas were borrowed from PHP/FI, a tool that we tried to use beforehand, but that proved to be far too limited and unreliable for our purposes.” (via linux today.)

apachecon cfp

the apachecon 2004 call for participation is out. submission deadline is 23 july 2004, right before oscon. perhaps i’ll submit the follow-up to my oscon talk. (which would be talking about the i18n features of mysql 4.1 and php5, which i plan on only mentioning briefly in my oscon talk.)

php’s dumb xml parsing behavior

steve minutillo, author of feed on feeds, runs headlong into the execrable character encoding behavior of php’s xml parsing functions. hey, i was complaining about that just last year... (via phil ringnalda.)

and a related link, this article from the w3c explains how to deal with encoding issues in forms and has a nice regex that verifies whether a string is valid utf-8.

here’s some links culled from an i18n discussion on the twiki site:

Now that I've looked a bit more, there are many algorithms out there for charset detection, but most are aimed at HTML page auto-detection, and may well not work well for URLs:

i really need to write the slides for my talk at oscon, which will cover exactly this sort of thing.

msn.co.kr gets header encoding wrong?

someone posted a question to one of the mysql mailing lists, and the archives weren’t displaying the korean characters correctly.

the encoded bit of the From header looks like =?ks_c_5601-1987?B?7JygIOywve2YuA==?=, but if i treat the content as utf-8, it is displayed correctly (at least identically to how my mail program displays it). but for the body, i need to recode the content from mscp949 (another name for ks_c_5601-1987, according to this email to ietf-charsets) to utf-8 to get it to display correctly (or at least something resembling correct).

so i can get the message to display correctly (i think), but only by cheating: treating ks_c_5601-1987 as utf-8 when recoding the headers, and as mscp949 when recoding the body. that’s just a little gross.

this will surely cause problems in the face of another mail client that actually uses that character set correctly. it appears to be unique to microsoft mailers, though, so perhaps they got it wrong consistently.

and, of course, it is entirely possible that the results i’m displaying now are completely wrong, and it says rude things about elmo where the MSN advertisement is supposed to be. (although judging from the babelfish translation of the text, i think it is correct.)

colobus 2.0 released

i’ve released version 2.0 of colobus, my nntp server that runs on top of ezmlm archives. there’s more that could be improved, but this at least gets an initial DBI-driven version out there.

i tried to announce the release on freshmeat, but it kept timing out. we’ll see if it takes.

blogging tools and code quality

rayg takes a look at some of the php-based alternatives to moveable type. i have to agree with his assessment of wordpress and serendipity. the code is just messy, like most open-source php projects. my impression of textpattern was that it is even messier.

of course, this is mostly an aesthetic judgement. one thing they all have in common is good, and especially good-looking, user interfaces. i suspect that’s more important than the code quality for most bloggers.

(and to be fair, i’ve only looked at version 1.0.2 of wordpress, an older version of s9y, and an early beta of textpattern. they may have all improved.)

i’m still of the roll-my-own blogging tool mentality. it isn’t often that i miss a feature that one of the off-the-shelf packages would give me, and it’s not a large amount of code to hack on for fun. the code for the blogging bits of this site is less than 1000 lines. the code for another blog of mine is less than 100 lines, plus the 936 lines for textile. (it doesn’t supporting comments, though, and does post-by-email instead of having a web form.)

another colobus release?

i was all ready to write an blog entry about an upcoming release of colobus (my nntp server for ezmlm mailing lists), when i happened to look at one of my terminal windows and notice a very bad rm -rf colobus*. uh, oops.

i’ve recreated the changes (the hard part was writing it the first time), now i just need to do more testing to make sure i really did recreate the changes. putting the 165,478 messages from the mysql general mailing list into the database takes a while.

i also have (and did not accidently delete, at least not yet) a replacement for most of the rest of the bits of the web-based archives that are currently served up by ezmlm-cgi. which for lists.mysql.com, is just the listings — i already replaced the message view with code based on the lists.php.net code.

it handles the encoding of the posts to the japanese users list (unlike the current ezmlm-cgi listings), which is cool, even if i can’t understand it. it also handles that wacky Antwort prefix the germans love so much.

i should really package up the web frontend stuff someday, too. there’s really not much of anything specific about the lists.mysql.com setup to it. it just mimics ezmlm-cgi right now — i need to think more about how i really want it to look and work.

slides from “mysql and php: best practices”

the slides from my talk at the mysql users conference are available online now.

i’m most proud of the image on slide six when it comes to matching the image to the text, although it’s not my favorite of the images i used. maybe for my next talk, i’ll find an illustrator to do custom images. (i should have it written well in advance this time, since i’ll be writing an article for the mysql developer zone based on the topic, and possibly presenting it elsewhere before the o’reilly conference.)

omnigraffle is cool

i needed to remake some images for this article about storage engines in mysql, so after futzing around in illustrator for a bit, i remember omnigraffle, and registered it by the time i was making the third image. with some practice, i could be dangerous with this tool.

speaking of articles on the mysql developer zone, we’re working on publishing a new article there every week. if you’re interested in writing an article, drop me a line. i can’t promise fortune, but there may be a little fame. it may be a good way to find an open source programming job. (or support job, or documentation job, or ...)

someone put peanut butter in my шоколад

i’m a little undecided as to whether i really, really hate trying to track down problems with character encodings, or really enjoy it. there’s something about groveling through hex dumps trying to figure out which bytes are missing, incorrect, or shouldn’t be there in some EUC-JP encoded text, causing it to render funny little chinese characters instead of the correct funny little japanese characters.

i think it is a little surprising that there only two talks at the o’reilly open source conference that touch on internationalization and localization.

at least i’m getting some practical experience getting stuff like this to render correctly. or so i’m told. i may actually know what i’m talking about by the time i have to give the talk.

sam ruby has been writing various interesting things on this topic recently.

it’s a shame in particular that there’s no perl talk dealing with unicode issues. i’m still foggy what magic it is that perl does under the hood with regards to that.

gzip vs. bzip2 vs. rzip for log files

with my curiousity piqued by jeremy’s tests of gzip vs. bzip2 vs. rzip using a bunch of mail as the test data, i tried compressing an apache log file with the three tools, plus lzop:

programcpu time (s)size
gzip19.21028,362,079
gzip -932.40027,036,433
bzip2 -9849.48915,496,248
rzip147.46018,823,330
lzop3.24048,719,254
lzop -980.81032,531,485

the original file size is 295,927,205.

it’s too bad rzip can’t decompress to a stream. that makes it much less attractive as a log compression solution.

my two ₰s

screenshot of character palette showing tibetan charactersi was going to point out unicode font info as a useful tool for looking at the unicode character space, and point out how it would be nice if you could navigate the space regardless of font, and then have it tell you which fonts included which characters. but while i was fiddling with it, i stumbled on the character palette built into mac os x, which does exactly that. the unicode font info tool is still handy for being able to see the nitty-gritty of the font details, but the built-in character palette is super nifty. (via lordpixel’s advogato diary.)

mark pilgrim’s article on determining the character encoding of a feed touches on more stuff related to my practical i18n talk.

here’s another essay on UTF from tim bray.

(don’t mind me, i’m just making sure i’ve marked some of these articles for future reference when i actually start to write said talk.)

i ♣ encoding problems

some notes on character encoding issues. this is the sort of stuff i plan on covering in my talk at the o’reilly open source conference, with a focus on the practical issues of dealing with it from php and mysql. (via simon willison.)

colobus 1.2

it has been over two years since the last release of colobus, my nntp server written in perl that runs on top of ezmlm archives. this new release just incorporates a couple of accumulated bug fixes and tiny features.

i have a proof-of-concept version that uses a mysql backend. i’ll get that code folded in and cleaned up and make a 2.0 release some day.

tips on building networked applications

once upon a time, someone wrote an essay about things to keep in mind when developing or designing networked applications. (one point, or maybe the main point, being that you shouldn’t treat remote procedure calls just like local procedure calls.)

ring any bells for anyone?

oscon 2004 and mysql users conference 2004

speaker, 2004 mysql users conference

my talk for the 2004 o’reilly open source convention was accepted: “practical i18n with php and mysql”.

before that, i’ll be speaking at the 2004 mysql users conference on “mysql and php: best practices”. (part of my rough schedule for traveling to orlando and cancun in april.)

for both talks, i’ll actually be pushing the boundaries of my own experience a bit. it’s a good way to force myself to learn more. (they’ll also both be all-new, or mostly all-new, talks.)

who wants to place bets on whether i end up in the last speaking slot for both conferences? i always seem to end up there.

shh

one of the secrets of doing things on the web is how little hardware and clever coding you really need. www.mysql.com is a dual p3/850 serving over 16 million page views per month.

the coding is quite ham-handed for the most part, honestly.

here’s how ham-handed: except for the documentation pages, all of the .html pages are handled via a php script that creates an output buffer, includes the file, captures the output buffer into a variable, outputs the header (which gets things like the page title from a variable that were set when the file was included), outputs the data, and then outputs the footer. (the documentation pages are actually php files that include calls to generate the header and footer.) require_once all over the place. a three-element include_path. no Last-Modified header (or conditional GET handling, obviously). no php compiler cache. 96 RewriteRule. the news on the front page? pulled from the database on every hit. (the query cache is turned on, however.)

things do gradually improve. i finally nuked the include file that had gems like this one:

 function open_tr()
 {
  echo("<tr>");
 }

i’ve also finally pared things down so the list of country names is only in one common include file. (actually, that’s not quite true. there are some old forms that define it on their own. one of them has three copies. and the geoip code has its own copy.)

maybe this isn’t the best time to mention i’ll be giving a best practices talk at the 2004 mysql users conference.

in defense of connect by

something on the near-term todo list in the mysql manual is “oracle-like connect by prior ...”. every once in a while, someone drops a comment there to say that there’s no way that connect by should be implemented, because the sql standard specifies another syntax for recursive queries, known as with.

as it turns out, ibm’s db2 implements the with syntax, and here’s a nice article on the difference between the two syntaxes.

i can’t see how anyone can look at that article and clamor for the with syntax instead of connect by. i look at the statement using connect by and the results it gives, and can think of several ways i could apply it in applications i’ve built or want to build. i look at the with syntax and get dizzy. the syntax of with just looks incredibly un-natural, even for sql syntax, in a way that connect by does not.

there are undoubtedly things you can do using the with syntax that you couldn’t with connect by, but nobody has been able to point them out to me. and as far as i can tell from the article on ibm’s site, getting the type of results i’m interested in requires a stored procedure and query that is at least four times as verbose as oracle’s syntax.

(disclaimer: i work for mysql ab, but am not part of the development team, have no special insights into when either syntax will get implemented. i suspect that both will eventually be implemented: connect by as an aid to people transitioning away from oracle and because it is something a lot of people ask for, and with as part of our commitment to supporting the sql standards.)

a µ problem

planet apache does not handle utf-8-encoded content correctly. maybe it isn’t planet apache’s fault. it is trying to set the encoding in a <meta http-equiv="Content-Type" ... > header, but camino, safari, ie5/mac, and ie5.5/win all ignore it. i’m not sure what the rules are with regard to the content type being specified with different charsets in the response headers and in a <meta> element.

i’m not surprised it doesn’t work, there’s still a lot of gaps in being able to use utf-8 pervasively. i’m actually generating curly quotes by typing them instead of using something fancy like textile. (and since i’m always having to search to find this: source code for textile 2.)

protected by spf

protected by SPF i’ve set up the dns entries for making the domains under my immediate control protected by spf.

this means that for mail transfer agents that pay attention to spf data, they will know that the mail is bogus if it claims to come from one of my domains but is not actually sent from my machine. (or any machine, for some of the domains that never send mail.)

the lists.mysql.com server has been checking spf info for a while, and it blocks a dozen or so messages a day. that’s a really tiny percentage of the 150,000 incoming messages per day, but it does show that the system works when people publish the data.

i guess the next thing to do will be to get entries set up for other domains not under my direct control, but under my influence.

there’s all sorts of interesting data i’m logging on both my own mail server and lists.mysql.com. some day i should really write some tools to help analyze it. part of the problem is that there’s just too much stuff making it through the front-line filters. the lists.mysql.com smtp server still accepts about 25,000 messages a day, and even my own mail server accepts about 500 a day.

i’m still seeing about 20 spam messages get through a day. about two-thirds of that comes via work addresses (like the webmaster address), another one-sixth to my address here, and the rest via various other addresses. (that doesn’t include worms or worm-related bounces.) i could eliminate some of that by refusing mails sent via my alumni.hmc.edu address that is spam-tagged but still forwarded.

i’m still holding the line on doing any actual delivery-time filtering. once mail is accepted by my mail server, it goes into a regular mailbox, not something that fills up with piles of crap that i only check every three months. so when you send a mail and i don’t reply, it probably means i’m ignoring you. (don’t be offended, i do that to everyone.)

(disclaimer: spf is not the ultimate solution to kill all spam. but it would serve to eliminate some classes of spam, and helps out on the “joe job” front.)

déjà vu.

return HTML::br();

jeremy’s thoughts on developers who build from scratch vs. those that bring their own toolkits misses what i think is one big issue: how does each type of programmer fit into a larger team?

i would have serious qualms about hiring someone who had their pet framework that they used for building things. i think part of that comes from seeing so many half-witted frameworks. (anyone who has written a library or class to generate HTML with function, method, or object names like p and br deserves to be shunned.)

but perhaps i’m seeing three categorizations where jeremy sees two. i see people who build from scratch, those who seek out and use common frameworks and libraries from resources like PEAR and CPAN, and those who have built their own toolkits and frameworks. it’s the people at both ends that i worry about. of those three categorizations, i tend to wobble between the first and second.

here’s a cool referral to tarsier, a little freeware program i released a long time ago. the page is in russian. i wonder what it says.