Entries tagged 'bugs'
Maintenance engineer, slightly used
A popular response to the attempted backdooring of the XZ Utils has been people like Tim Bray talking about the maintenance of open source projects and how to pay for them.
When I transitioned from leading the web development team at MySQL to an engineering position in the server team, I spent the first year as a maintenance engineer. I blogged a little about the results of that one year and calculated that I had fixed approximately one reported bug per working day.
But you’ll also notice that I had to heap some praise on Sergei Golubchik who reviewed fixes for even more bugs than I had fixed. (He also was responsible for working on new features. He is extremely talented, and I’m not surprised to see he’s the chief architect at MariaDB.)
That sort of reviewing and pulling in patches is a critical component of maintaining an open source project, and a big problem is that is not all that fun. Writing code? Fun. Fixing bugs? Often fun. Reviewing changes, merging them in, and making releases? A lot less fun. (Building tools to do that? More fun, and can sidetrack people from doing the less-fun part.)
It is also a lot different for projects with a lot of developers, a small crowd of developers, and just a few developers. The process that a patch goes through to make it into the Linux kernel doesn’t necessarily scale down to a project with just a few part-time developers, and vice versa. A long time ago, I made some noise about how MySQL might want to adopt something that looked more like the Linux kernel system of pulling up changes rather than what was the existing system of many developers pushing into the main tree, and nobody seemed very interested.
Anyway, as people think about creating ways of paying people to maintain open source software, I think it is very important to make sure they don’t inadvertently create a system that bullies existing open source project maintainers to make them focus on the less-fun aspects to developing software, because that’s kind of how we got into this latest mess.
You already see that happening with supposed-to-be-helpful supply chain tools demanding that projects jump through hoops to be certified, or packaging tools trying to push their build configuration into projects (with an extra layer of crypto nonsense), or a $3 trillion dollar company demanding a “high priority” bug fix from volunteers.
I am curious to see where these discussions lead, because there is certainly not one easy solution that is going to work everywhere. It will also be interesting to see how quickly they lose steam as we get some distance from the XZ Utils backdoor experience.
(Also, I’m still looking for work, and I’m willing to do the less-fun stuff if the pay is right.)
flipping switches
cloudflare zaraz is a great concept: manage the third-party code for your website sort of like google tag manager, but run as much of the code as possible in the cloud instead of the browser. but the execution is still rough around the edges, especially when it comes to the ecommerce functionality.
each of the platforms where we publish our catalog (and can use that to advertise) have their own way of collecting performance metrics. the way i had hacked support for each into our old website was messy and fragile. zaraz intervenes here with a simple zaraz.ecommerce(event, data)
call that pushes out the data to each of those third-party tools.
the problem is that how zaraz maps their simplified interface to those various systems is undocumented, and as near as the community can figure out, not always correct. i also found that if i enabled the ecommerce integration for facebook, it broke all of the ecommerce reporting everywhere.
i am still hopeful that they can work through the bugs and issues, add support for some of the other platforms that would be useful for us (like pinterest), and we can collect the data we need with a minimized impact on site performance.
the worst case is that i can just drop in my own implementation to turn those zaraz.ecommerce()
into the old browser-side integration and it will still be more streamlined than it used to be.
how to fix eleven bugs in mysql 5.1
my “mysql client fixes” branch on launchpad contains fixes for eleven bugs (nine of them reported on bugs.mysql.com).
don’t get too excited — these are all the lowest priority-level bugs, mostly typos in comments and documentation.
now i have to figure out the latest process for actually getting these changes into the official tree. there are different policies around how and when to push to trees since i was last doing any server development. from someone who is partially outside, it all seems very tedious and designed to make it impossible to fix anything. process gone bad.
the mysql server isn’t going to get the benefits of using a good, open-source distributed revision control system unless it stops
bug tracking and code review
i was going to write some reactions to an observation that postgresql has no bug tracker and its discussion last week, but lost the spark and abandoned the post after a few days. but today i ran across a quote from linus torvalds that neatly sums up my thoughts:
We’ve always had some pending/unresolved issues, and I think that as our tracking gets better, there’s likely to be more of them. A number of bug-reports are either hard to reproduce (often including from the reporter) or end up without updates etc.
before there was a bug tracking system for mysql, there was a claim that all bugs were fixed in each release (or documented), and there has been a lot of pain in seeing how well that sort of claim stacks up against a actual growing repository of bug reports. if the postgresql project were to adopt a bug-tracking system, i am certain that they would find the same issue. before long, they would be planning bug triage days just like every other project with a bug-tracking system seems destined to do.
another good email from linus about these issues was pointed out by a coworker, but this part in particular caught my eye:
Same goes for “we should all just spend time looking at each others patches and trying to find bugs in them.” That’s not a solution, that’s a drug-induced dream you’re living in. And again, if I want to discuss dreams, I’d rather talk about my purple guy, and the bad things he does to the hedgehog that lives next door.
the procedure at mysql for code reviews is that either two other developers must review the patch, or one of an elite group of developers who are trusted to make single reviews. then the developer can push their changes into their team trees, taking care to have merged the changes correctly in anywhere from one to four versions (4.1 and up).
this is a huge amount of friction, and is one of the most significant problems causing pain for mysql development. two reviewers is just too high of a bar for most patches, and having the rule makes the reviews rote and less useful. there is also an unreasonable amount of distrust being displayed by this procedure, that says that developers can’t be trusted to ask for help when they are unsure, but should feel free to make the occasional mistake by pushing something that isn’t quite right.
i wonder if we could be taking better lessons from linux’s hierarchical development model, with the pulling of code up through lieutenants to a single main repository, rather than the existing model that involves every developer moving their own stones up the pyramid. it would require some developers (more senior ones, presumably) to spend more of their time doing code management as opposed to actual coding.
monty is not particularly happy with the state of development of his brainchild now. would he be happier if he were in a linus-like role of rolling up patches and managing releases?
i wish had the patience to write at less length and greater coherence about this.
connector/odbc 5.1.3 (release candidate!)
yeah, it is all odbc, all the time here, it seems. that is just because i can’t write about the really exciting stuff. soon!
that is not to say that releasing mysql connector/odbc 5.1.3-rc is not a huge milestone! it took us a while to get there, but we finally have a unicode-aware odbc driver that is, in our opinions, production-ready. now we just need some community feedback to find out if we are right. there are a few minor issues we know about already, but the impact of those is generally small enough that the majority of folks should not have any problems.
iodbc and mac os x problems
working with the iodbc driver manager on mac os x has been a frustration on two fronts.
first, the installer api functions provided by iodbc constantly set the configuration mode to ODBC_BOTH_DSN
, which means you have to keep resetting it to the correct value after nearly every installer api call. this problem is platform-agnostic — the iodbc code is just plain wrong.
second, when called from the odbc administrator application on mac os x, any failures that the driver reports or passes through from the installer api in registering the driver are ignored, and the application instead uses a generic prompt for dsn configuration.
so even with the first problem fixed, the second problem has led to a lot of tail-chasing until i discovered that the odbc administrator application only obtains enough privileges to write to /Library/ODBC as a member of the admin
group, not as the root
user. because the connector/odbc installer was trying to be helpful in only creating the /Library/ODBC/*.ini files with root-writable permissions, it was running straight into the second problem.
this is all related to bug #31495 filed against mysql connector/odbc.
connector/odbc 3.51.21
after eight releases, we have gone from over 150 open bugs to under 70 bugs.
one of the really old bugs we are still looking at is how identifiers that are reserved words (or have non-alphanumeric characters) are handled from ado. as far as we can tell, the driver is doing everything correctly, and it is ado that is failing to properly quote the identifiers, but we have gotten some developers at microsoft involved in tracking the problem from that end.
just today there was a new bug filed about using the driver with visual basic 6, which was itself released in 1998. i am going to have to build a vm image with that installed so i can do some testing.
the next release of the new 5.1 branch should be out later this week. we will probably limit the scope of new features we are going to implement in 5.1 so that we can get unicode support and the other already-implemented features out there as a beta (and then production/ga) release sooner.
connector/odbc 3.51.18
we were able to get out this month’s connector/odbc release a little earlier in the month than usual. one reason we made the release earlier was to get a replacement for last month’s 3.51.17 out there, because that release had an unfortunate bug that caused problems when working with many odbc applications, like microsoft access.
we were also able to get under 90 bugs by fixing a number of other bugs, and working through more of the old bugs and figuring out that they were either already solved or otherwise no longer relevant.
the other reason to get this out earlier in the month has to do with a project that should see some more daylight by the end of the month. more on that when the time comes.
connector/odbc 3.51.17
another month, another mysql connector/odbc release. it has almost become a trend. we only chipped it down to about 124 bugs this time, about a half-dozen less than last time. but we’re going back and re-evaluating some the open bugs now.
we didn’t manage to get windows x64 packaged up this time, but we might slip out a 3.51.17 package for that platform before the next full release. part of the problem in getting it together in time for this release was that odbc on win64 appears rather half-baked, and we couldn’t find much in the way of applications to test with it.
now i’m hip-deep in making sure that the way we calculate the various column lengths that you can retrieve from odbc are correct. in many cases they are not, but the msdn odbc documentation is wonderfully imprecise on what lengths are meant to be returned for many of these. and it sometimes appears to contradict some of the ibm db2 odbc documentation.
independence day for code
as i’ve been threatening to do for quite some time, i’ve finally made the source code for bugs.mysql.com available. it is not the prettiest code, and there’s still all sorts of hard-coded company-specific stuff in there. but it is free code, so stop complaining.
it is available as a bazaar repository at http://bugs.mysql.com/bzr/
. i have not yet set up any sort of fancy web view, or mirrored it to launchpad.
i plan to do the same for the lists.mysql.com code some day. one limiting factor now is that machine only has python 2.3 on it, and bazaar needs python 2.4.
connector/odbc 3.51.16
it’s another month, so time for another connector/odbc release.
there’s already three bug fixes that have been committed to the repository for the next release, and the changes to support building on windows x64 should land soon.
we’re down to about 130 open bugs, about 20 less than the last release. some of those were newly fixed, and some were closed because they duplicated earlier problems that had already been fixed. this release does close another bug that is nearly three years old.
one of the things i hope to get fixed for the next release is being able to specify the default character set for the connection. you can’t do this now, so when developers try to use a different default character set like big5
, problems show up in how parameters are escaped. this shouldn’t be hard to do, but it will involve adding another widget to our gui configuration, which i haven’t really had to do very much with up until now.
rambling about work
sorry for things being so boring around here. i’ve been grinding away at bugs at work. after gaining some ground on bugs in connector/odbc, i’m being reassigned to help out with some server bugs again, at least part-time.
fixing bugs in c/odbc is an adventure. the code base bears the scars of several different developers of varying levels of cleverness and somewhat conflicting coding styles. but now that the test suite is in shape, it is easier (and safer) to do some more mechanical transformations to undo some of the damage.
one problem with tackling bugs in an odbc driver is that a lot of the reports involve third-party applications like microsoft access or crystal reports, or development tools like delphi that we don’t have as much expertise in. the initial reports often don’t include all of the information we need to be able to reproduce the bug.
this can be frustrating both for us and the reporters — we just don’t have enough people looking at c/odbc bugs to play around with every application to figure out exactly how to reproduce bugs that are reported, and often the problem seems blindingly obvious when you’re the one who runs across it. i think that most of the time we get it under control, but there have been a few times when this frustration has taken things in the wrong direction.
a del.icio.us bug
i got a stupid robo-reply to reporting this through the official channel, so i’ll just blog it instead: http://del.icio.us/jm is in my network, but now i am also seeing the bookmarks from http://del.icio.us/jm. (note the trailing .) in my network.
i should probably use del.icio.us more than i do, but i’m always torn between it being a sort of lightweight blogging and bookmarks that i almost never actually care about later.
…and they never check out
right on schedule, i’m done with the pressing changes we wanted to make to the mysql bugs system. the most visible things (to non-mysql employees) are probably just the cleanup of the layout of the bug pages themselves, and the new public tagging interface. (with the requisite ajax-y goodness.)
under the hood, i’ve taken a machete to some of the more egregious bits of code. that’s not to say there isn’t a lot more that could be cleaned up, but it’s a start. now that i’ve cleaned up the bug reporting and editing forms, they’re ripe for merging.
based on the priorities set by the developement management team, i did less of the cleanup of the main bugs schema than i had originally planned, but things are in a state now that it should be easier to tackle those in the future.
my plan is to release this code publicly, but one of the things i need to do first is transition it out of bitkeeper and into another revision control system. probably bzr, but i really wish it supported per-file commit messages.
tasty dogfood
part of my focus for the next couple of weeks will be on rolling out some improvements to the mysql bugs system. the first step in doing that was to upgrade from mysql 4.1 to the latest mysql 5.1 beta, which turned out to be entirely painless.
the next step is going to be some database normalization and code refactoring. but because there are some other people who have written ad-hoc tools against the existing schema, i’ll be hiding the schema changes behind some views.
the first big schema change will be moving the categories from a bunch of hard-coded strings in the source code (and a varchar(32)
field) to a table organized using the nested set model. that’s something i’ve been wanting to do for years.
2005 in review: work edition
there are only a few hours left in my work year, so i did a little crunching to see what i accomplished this year. since i started the year with a new position on the maintenance team, there is really one major metric — how many bugs that were assigned to me are now closed. as of this instant, that’s 224. it will go up by another few when some additional fixes are documented. here is the search to see all of the server bugs i have closed. “server bugs” includes bugs in the command-line clients.
that works out very close to one bug per working day. whew!
of course, i’m just one step in the process. sergei golubchik is listed as the reviewer on over 300 of the bugs that were closed this year!
except all the others that have been tried
in an o’reilly network article, matthew b. doar asks, “bug trackers: do they all really suck?”
my answer would be yes, but i love tinkering with them anyway. we’re still using a hacked up version of the bugs.php.net code at bugs.mysql.com, despite periodic threats to move us over to bugzilla. some things that block the migration are that we’ve added various bits of workflow and bitkeeper integration into our bug tracker that someone will have to re-do for bugzilla, and someone will also have to figure out how to integrate it into the login infrastructure (and user database) for our websites.
meanwhile, i hack new features and fields into the existing bugs system whenever the need is strong enough.
funny characters are not ☢
sam ruby pulled a good quote on building in support for internationalization in web applications, which i agree is really important.
it is very annoying that i can’t use my flickr recent comments feed because the atom feed is broken due to bad utf-8 handling.
i’m thinking of doing another talk at the mysql conference next year about handling this sort of thing. there’s really no excuse for it. which makes it a little hard to do a 45-minute talk on — it’s so easy to get right!
damian conway’s “ten essential development practices” article (via daring fireball) may appear on perl.com, but the basics are applicable to any software project.
i would put “use a revision control system” way at the top of the list, and i would also add “use a bug-tracking system.”
play the home game
you can see all of the active bugs currently assigned to me in the mysql server. it’s currently thick with patches pending but not yet approved.
so far this year, i’ve fixed 122 verified bugs. that’s on pace for one bug per working day. whew.
now if only the rate at which bugs were introduced was less than one per working day, we’d be set.
pthread_rwlock_wrlock bug on amd64 with hoary hedgehog
my otherwise-painless upgrade to ubuntu’s hoary hedgehog release was marred by a bug in pthread_rwlock_wrlock() on amd64 that was fixed in the upstream glibc more than a year ago. ugh.
i wonder what the policy of ubuntu is with regard to fixing things like this. i really hope i don’t have to created a patched glibc myself.
on the bright side, the upgrade fixed the xserver configuration, so now it starts up and shows the pretty login screen. i logged in and it looked and sounded pretty.
O_NONBLOCK broken on darwin and irix?
i’ve been dealing with a mysql bug on irix and mac os x that turned up in our test suite once i fixed the kill
test to actually do the test properly. after much digging in code and staring at debug traces, i noticed on irix that in the thread that is being killed, it was stuck in a blocking read that the calling code believed would be non-blocking.
by changing our setting of the non-blocking mode to use both O_NDELAY
and O_NONBLOCK
instead of just O_NONBLOCK
, i was able to get the code to work. but i’m not sure why it is necessary.
on the bright side, this may also fix this bug about wait_timeout
not working on mac os x.
i may not be doing web development for my day job any more, but i put a little more elbow grease into the mysql bugs database to add two new features that people have asked for at various times: subscribing to updates on bugs, and making private comments. i also cleaned up the database structure a bit. for example, instead of storing email addresses for the assigned developer and reviewer, it actually has a proper link to the user database.
it’s not a particularly pretty code base (although i clean it up as i go), but i’m rather fond of this little bugs system.
e= mc²
obviously my furious pace of work-related blogging tapered off pretty quickly. i’ve still been fighting the good fight against bugs (two swatted today).
something i’ve always been very flexible on is coding style, so i haven’t had much trouble adapting to the coding style for the mysql server, although my brace-placement reflexes need some re-training. one rule it has that i haven’t run across before is how assignment is handled, with the equal sign next to the variable name, like so:
lower_case_table_names= arg;
this makes it easy to use basic search tools (like grep
or vim’s /
command) to find where assignments to a particular variable happen, without also getting hits at equality tests, which should always have a space between the variable name and ==
.
compiling mysql 5.0 from the development tree on ubuntu linux
i started setting up my new development machine today. it’s an amd64, and so far i’ve installed ubuntu linux on it. the installation was totally painless. to be able to compile mysql 5.0 from the bitkeeper source tree, i had to install automake1.8, autoconf, libtool, gcc, g++, ccache (not a requirement, but nice), bison, gawk (mawk doesn’t work), libncurses5-dev, and libssl-dev. and those pulled in various dependencies thanks to the magic of apt-get
.
one new bug fixed today, and two old bugs re-fixed. (mostly — still waiting for the build and test for one of them.)
i fought the bugs, and the bugs won
no bugs fixed today. i spent almost the entire day chasing a single bug that has turned out to be like one of those loose threads on a sweater that you shouldn’t pull.
on the bright side, my new machine arrived so i have something to play with tomorrow.
(and i’m so close to ordering a mac mini. i think apple is going to do very well with that machine.)
just another manic monday
today was another of those one-bug days (but i also spent a fair amount of time on web-related tasks). that one bug took a long time to solve because it only showed up on qnx, and i had to do battle with our qnx build machine to get it to build at all for me. i eventually tracked it down to a shell issue where libtool would have one of its variables get corrupted with the default shell (ksh). i’m not sure why our regular builds on that platform don’t encounter the same problem.
but dealing with the bug did remind that i should probably ask for an okay to purchase vmware for my new box (arrives tomorrow!), so i can run multiple operating systems on it without having to reboot all the time. that would let me run platforms like qnx and solaris locally, which would probably be handy.
i spent a lot of time today waiting for compiles and tests to run. my new machine is scheduled to be delivered on tuesday, and i hope it will be (a lot) faster than the machine i’m currently using. i’m also not using ccache yet, which should be a big help. i don’t know why i haven’t just installed ccache on my current development machine. i guess i’m dumb.
six bugs today, which gives a total of nineteen for the week. i don’t mean to over-sell that number, though — lots of the bugs have been very minor, basically cosmetic, problems. i think my largest patch has been about ten lines of new code. but the best patch is the one that replaced about sixteen lines of code with two.