Entries tagged 'code'
Just make it better
This access hatch on a sidewalk on Main St. in downtown Los Angeles used to have chipped concrete around the edges and the doors had a lot of flex to them when you walked over them. A few weeks ago, it was finally fixed up and now it looks clean, the doors don’t have any flex to them, and my near-daily experience of walking on that stretch of sidewalk feels a little bit better and safer.
Today, the website for the PHP Documentation Team was finally moved to a new host. Everything (or nearly so) related to the installation is on the appropriate repositories, it’s being served up over TLS, some of the code has been cleaned up, and the contribution guide has gotten more focused attention than it has had in several years.
None of this is perfect. None of it is done. But making things incrementally better is the kind of good trouble that I want to continue.
Into the blue again after the money’s gone
A reason that I finally implemented better thread navigation for the PHP mailing list archives is because it was a bit of unfinished business — I had implemented it for the MySQL mailing lists (RIP), but never brought it back over to the PHP mailing lists. There, it accessed the MySQL database used by the Colobus server directly, but this time I exposed what I needed through NNTP.
An advantage to doing it this way is that anyone can still clone the site and run it against the NNTP server during development without needing any access to the database server. There may be future features that require coming up with ways of exposing more via NNTP, but I suspect a lot of ideas will not.
Another reason to implement thread navigation was that a hobby of mine is poking at the history of the PHP project, and I wanted to make it easier to dive into old threads like this thread from 2013 when Anthony Ferrara, a prominent PHP internals developer, left the list. (The tweet mentioned in the post is gone now, but you can find it and more context from this post on his blog.)
Reading this very long thread about the 2016 RFC to adopt a Code of Conduct (which never came to a vote) was another of those bits of history that I knew was out there but hadn’t been able to read quite so easily.
Which just leads me to tap the sign and point out that there is a de facto Code of Conduct and a group administering it.
I think implementing a search engine for the mailing list archives may be an upcoming project because it is still kind of a hassle to dig threads up. I’m thinking of using Manticore Search. Probably by building it into Colobus and exposing it via another NNTP extension.
Surprise, it’s a new release of Colobus!
Because there’s nothing that quite says “hire me” like polishing your Perl bona fides, I have finally made a new release of Colobus, the NNTP server that runs on top of ezmlm and Mlmmj mail archives. (It was actually three new releases, I had to work out some kinks. More to come as I work out more in testing.)
The Mlmmj support and some of the other tweaks are really just pulling in and polishing changes that had been made to the install used for the the PHP.net mailing lists. There are also a few bug fixes I pulled in from Ask’s fork of colobus that the perl.org project uses.
I did add a significant new feature, which is a non-standard XTHREAD id_or_msgid
command that returns an XOVER
-style result for all of the messages in the same “thread”. The code to take advantage of this new feature for the PHP mailing list archives is on the way.
Writing documentation in anger
As I continue to slog through my job search, I also continue to contribute in various ways to the PHP project. Taking some inspiration from the notion of “good trouble” so wonderfully modeled by John Lewis, I have been pushing against some of the boundaries to move and expand the project.
In a recent email to the Internals mailing list from Larry Garfield, he said:
And that's before we even run into the long-standing Internals aversion to even recognizing the existence of 3rd party tools for fear of "endorsing" anything. (With the inexplicable exception of Docuwiki.)
I can guess about a lot of the history there, but I think it is time to recognize that the state of the PHP ecosystem in 2024 has come a long way since the more rough-and-tumble days when related projects like Composer were experimental.
So I took the small step of submitting a couple of pull requests to add a chapter about Composer and an example of using it’s autoloader to the documentation.
The PHP documentation should be more inclusive, and I think the best way to make that happen is for me and others to just starting making the contributions. We need to shake off the notion that this is somehow unusual, not choose to say nothing about third-party tools for fear of “favoring” one over the other, and help support the whole PHP ecosystem though its primary documentation.
I would love to add a chapter on static analysis tools. And another one about linting and refactoring tools. Maybe a chapter on frameworks.
None of these have to be long or exhaustive. They only need to introduce the main concepts and give the reader a better sense of what is possible and the grounding to do more research on their own.
A big benefit of putting this sort of information in the documentation is that there are teams of people working on translating the documentation to other languages.
And yes, contributing to the PHP documentation can be kind of tedious because the tooling is pretty baroque. I am happy to help hammer any text that someone writes into the right shape to make it into the documentation, just send me what you have. If you want to do more of the heavy lifting, join the PHP documentation team email list and let’s make more good trouble together.
The rules can matter
I had written a variation of this in a couple of spots now and wanted to put it here, this weird place where I keep writing things:
Organizations with vague rules get captured by people who just fill in the gaps with rules they make up on their own to their own advantage, and then they will continuously find reasons that the “official” rules can’t be fixed because the proposed change is somehow imperfect so they force you to accept the rules they have made up.
The first time I ran into this sort of problem was probably in college being involved in student government, but I haven’t been able to come up with the specifics of what happened that makes me think that it was.
A later instance where I came across it that I remember more vividly was when I was involved with the Downtown Los Angeles Neighborhood Council and how one of the executive board members would squash efforts by citing “standing rules” that nobody could ever substantiate.
More recently, I came across it in looking at discussions of why certain people are or are not allowed to vote on PHP RFCs where the lead developers of one or more popular PHP packages have been shut out because of what I would argue is a misreading of the “Who can vote” section of the Voting Process RFC.
Is this Twig or Jinja? Maybe both!
A project I have been playing around with the last couple of weekends has been making a Python version of this site. The code, which is very rough because I barely know what I’m doing and I’m in the hacking-it-together phase as opposed to trying to make it pretty, is in this GitHub repository.
I am using the Flask framework with SQLAlchemy and Jinja.
I was interested to see if I could just use the same templates as my PHP version, which uses Twig, but there have been a few sticking points:
- The Twig
escape
filter takes an argument to more finely control the context it is being used in so it knows how to escape within HTML, or a URI, or an HTML attribute. Jinja’sescape
doesn’t take an argument. I was able to override it take an extra argument, but mostly ignore it for now. - Jinja doesn’t have Twig’s ternary
?:
operator. Not surprising, Python doesn’t either. I rewrote those bits of templates to use slightly more verboseif
blocks. - Jinja doesn’t have Twig’s string comparators like
matches
andstarts with
. Looks like I can get rid of the need for them, but I just punted on those for now. - Jinja doesn’t have a
block()
function. I think I can also avoid needing it. - Jinja’s
url_for()
method expects a more Python-ic argument list, likeurl_for('route', var = 'value')
but Twig uses a dictionary likeurl_for('route', { 'var' : 'value' })
. I was able to override Jinja’s version to handle this, too. - I’ll need to implement versions of Twig’s
date()
function and filter.
I had cobbled together a way on the Twig side to let me store some templates (side navigation, the “Hire me!” message on the front page) in the database, so my next trick is going to implement template loaders for both the PHP and Python versions so that is more cleanly abstracted. I have the Python side of that done already.
I hope to eventually create a Rust version of this, too, and it will be interesting to see what new complications using Tera will bring.
Awesomplete is... awesome
An earlier iteration of my blog software had an autocomplete widget for adding tags for when I was writing a post, but somewhere along the line I lost track of that version of the software, and I’ve just been winging it when I added tags to posts since then. Yesterday I ran across a reference to Awesomplete which is a lightweight autocomplete widget by Lea Verou, and now I’ve plugged it into the couple of places where I add tags.
I need to do some more behind-the-scenes work to clean up the tags in the system now, but this should help keep me on track in terms of knowing when I’m actually creating a new tag instead of using an existing one. (Like I could never remember if I have been using the “book” or “books” tag and would always have to look it up.)
Taste the rainbow
When I added the support for dark mode, I had to add a little support to the syntax highlighting styles so they looked okay in both light and dark modes.
Today I went in and worked on the JavaScript code a bit to make it a little more modern and refined the styles a little bit more. I also added the language file for YAML, and built it out so it does a better job of highlighting some of YAML’s more special syntax.
I am sticking with this JavaScript-based syntax highlighting for now, mostly because it works.
Here is a YAML sample pulled from the 1.2.2 spec plus a couple of minor additions to show off additional syntax that is handled.
%YAML 1.2
--- !<tag:clarkevans.com,2002:invoice>
invoice: 34843
date : 2001-01-23
bill-to: &id001
given : Chris
family : Dumars
address:
lines: |
458 Walkman Dr.
Suite #292
city : Royal Oak
state : MI
postal : 48046
ship-to: *id001
product:
- sku : BL394D
quantity : 4
description : Basketball
price : 450.00
- sku : BL4438H
quantity : 1
description : Super Hoop
price : 2392.00
tax : 251.42
total: 4443.52
shipped:
- false
comments:
Late afternoon is best.
Backup contact is Nancy
Billsmer @ 338-4338.
Acomplish accomplished
I was perusing some old messages in the archives of the PHP mailing lists, and I stumbled across an email from me where I shared a sample of code as an example of a “natural” language I had used. That’s some real Acomplish code!
[ Declaration of a function to add an item to a list, keeping the list sorted. ]
To insert an item 'the_item' sorted into an item 'the_container':
For each item 'the_place' in the_container,
If the_item < the_place,
Add the_item before the_place.
Return.
Append the_item to the_container.
Make a list named myList of { 2, 3, 7, 14}.
Insert 10 sorted into myList.
Print myList.
[ Prints 2, 3, 7, 10, 14, more or less. ]
And another historical bit of trivia in that email: I proposed a for ... in
syntax for PHP, although that eventually became foreach ... as
.
Looking for photos
I realized the other day that I hadn’t actually wired up the search box in my photo library.
I don’t like how it is a separate search from the rest of search on this site, but that’s a hill to climb another day.
Dr. Brain Thinking Games: IQ Adventure
Dr. Brain Thinking Games: IQ Adventure was one of first two games produced by Knowledge Adventure based on the earlier Dr. Brain games from Sierra Entertainment. KA and Sierra had wound up under the same corporate umbrella, and the “games group” at Knowledge Adventure that I was part of developed them. Our group handled three projects at the time: IQ Adventure, codename “Dime,” Dr. Brain Thinking Games: Puzzle Madness, codename “Nickel,” and the corporate website, codename “Penny.”
IQ Adventure is a third-person isometric puzzle/adventure game which was written in C++, and we used the networking and graphics library from Blizzard Entertainment (another corporate sibling). I was the lead programmer. We did some strangely ambitious things, one of which is that the game levels weren’t just laid out by hand, but we had a map specification language that was used to generate variations of the levels. Here is an example map that I was able to extract from the files on the CD. I couldn’t tell you how it works, really.
During our early prototyping, we did have a way of building environments by hand to test out artwork and the interface. It was just a mode in the game engine that let you “draw” with terrain tiles or place others into the environment. I remember before the team doing the artwork had created our main character, I prototyped with just a little whirling tornado that moved around so I could work on things like the path-finding algorithm.
The whole game was very data-driven. There was a text dialog system that let you interact with the NPC characters that was HTML-inspired. The animations of Dr. Brain giving you instructions were lip-synced using a tool that Knowledge Adventure had developed for their whole line of titles, which meant it was just audio files and frame timings that drove an eight-frame animation set. All of the puzzles and in-game quizzes were rule-based so they would be different on every playthrough. (Sorry to our QA team!)
Here’s a video I found someone playing through one of the levels (or more, I didn’t watch the whole thing).
The multiplayer was pretty simple but I also don’t remember much of the specifics. You could chat with other players, and because this was aimed at younger users there was a basic attempt at filtering out bad words, and I believe all of the chat was logged and someone from the customer service team was assigned to review it regularly, or maybe only when someone complained.
I wish that I still had the source code for the game and even the original media asset sources. In the released game, they were all rendered down to a 256 color palette because that was how things were at the time. I think it would be fairly straightforward to bring the game up on current platforms. You could probably even do it on WebAssembly or something else cross-platform. Unfortunately all of the filenames get lost when extracting the assets from the CD, so even just sorting them out to build something else with them would be pretty tedious. (Then again, the original game may still just work on a more current version of Windows that can run 32-bit apps, since I don’t think there was anything particularly fancy about it.)
I believe that Puzzle Madness was developed in Acomplish, the in-house proprietary multimedia scripting language that I blogged about earlier.
The third game in the series (from KA) , Dr. Brain Thinking Games: Action Reaction, was a first-person puzzle/shooter, and that was partly funded by Intel in their effort to drive adoption of the Pentium processor. It was developed using the Unreal Engine. The bad guys in that game worked for S.P.O.R.E.: Sinister People Organized Really Efficiently, which still makes me laugh. (I am pretty sure the codename for this one was “Quarter” but I didn’t work on it and left the company while it was being developed.)
Kill It with Fire
Kill It with Fire: Manage Aging Computer Systems (and Future Proof Modern Ones) by Marianne Bellotti was recommended to me by a college classmate on a post I made on LinkedIn a while back. (Part of a series of observations about how terrible ZipRecruiter is.)
This book is great, and I would highly recommend it to any software engineer. It’s not only about modernizing software applications, but has a lot of insight into how being a long-lived project is a reasonably likely outcome for any project, and you can save someone in the future from a lot of trouble by making some better decisions up front.
It also leans into the human factors and business realities of how software is developed, something I feel like I have been complaining about here and elsewhere.
Maintenance engineer, slightly used
A popular response to the attempted backdooring of the XZ Utils has been people like Tim Bray talking about the maintenance of open source projects and how to pay for them.
When I transitioned from leading the web development team at MySQL to an engineering position in the server team, I spent the first year as a maintenance engineer. I blogged a little about the results of that one year and calculated that I had fixed approximately one reported bug per working day.
But you’ll also notice that I had to heap some praise on Sergei Golubchik who reviewed fixes for even more bugs than I had fixed. (He also was responsible for working on new features. He is extremely talented, and I’m not surprised to see he’s the chief architect at MariaDB.)
That sort of reviewing and pulling in patches is a critical component of maintaining an open source project, and a big problem is that is not all that fun. Writing code? Fun. Fixing bugs? Often fun. Reviewing changes, merging them in, and making releases? A lot less fun. (Building tools to do that? More fun, and can sidetrack people from doing the less-fun part.)
It is also a lot different for projects with a lot of developers, a small crowd of developers, and just a few developers. The process that a patch goes through to make it into the Linux kernel doesn’t necessarily scale down to a project with just a few part-time developers, and vice versa. A long time ago, I made some noise about how MySQL might want to adopt something that looked more like the Linux kernel system of pulling up changes rather than what was the existing system of many developers pushing into the main tree, and nobody seemed very interested.
Anyway, as people think about creating ways of paying people to maintain open source software, I think it is very important to make sure they don’t inadvertently create a system that bullies existing open source project maintainers to make them focus on the less-fun aspects to developing software, because that’s kind of how we got into this latest mess.
You already see that happening with supposed-to-be-helpful supply chain tools demanding that projects jump through hoops to be certified, or packaging tools trying to push their build configuration into projects (with an extra layer of crypto nonsense), or a $3 trillion dollar company demanding a “high priority” bug fix from volunteers.
I am curious to see where these discussions lead, because there is certainly not one easy solution that is going to work everywhere. It will also be interesting to see how quickly they lose steam as we get some distance from the XZ Utils backdoor experience.
(Also, I’m still looking for work, and I’m willing to do the less-fun stuff if the pay is right.)
But I still haven’t found what I’m looking for
I’m still looking for a job.
It is a new month, so I thought it was a good time to raise this flag again, despite it being a bad day to try and be honest and earnest on the internet.
I wish I was the sort of organized that allowed me to run down statistics of how many jobs I have applied to and how many interviews I have gone through other than to say it has been a lot and very few.
Last month I decided to start (re)developing my Python skills because that seems to be much more in demand than the PHP skills I can more obviously lay claim to. I made some contributions to an open source project, ArchiveBox: improving the importing tools, writing tests, and updating it to the latest LTS version of Django from the very old version it was stuck on. I also started putting together a Python library/tool to create a single-file version of an HTML file by pulling in required external resources and in-lining them; my way of learning more about the Python culture and ecosystem.
That and attending SCALE 21x really did help me realize how much I want to be back in the open source development space. I am certainly not dogmatic about it, but I believe to my bones that operating in a community is the best way to develop software.
I think my focus this month has to be on preparing for the “technical interview” exercises that are such a big of the tech hiring process these days, as much as I hate it. I think what makes me a valuable senior engineer is not that I can whip up code on demand for data structures and algorithms, but that I know how to put systems together, have a broader business experience that means I have a deeper of understanding of what matters, and can communicate well. But these tests seem to be an accepted and expected component of the interview process now, so it only makes sense to polish those skills.
(Every day this drags on, I regret my detour into opening a small business more. That debt is going to be a drag on the rest of my life, compounded by the huge weird hole it puts in my résumé.)
Is GitHub becoming SourceForget v2.0?
Back in the day, open source packages used SourceForge for distribution, issue tracking, and other bits of managing the community around projects but it eventually became a wasteland of neglected and abandoned projects and was referred to as SourceForget.
As I have been poking around at adding Markdown parsing and syntax highlighting to my PHP project, I can’t help but feel like GitHub is taking on some of those qualities.
Parsedown is (was?) a popular PHP package for parsing Markdown, but the main branch hasn’t seen any development in at least five years, and the “2.0” branch appears to have stalled out a couple of years ago. Good luck figuring out if any of the 1,100 forks is where active development has moved.
I think it would be good if more community norms and best practices were developed around the idea of the community of a project being able to take over maintenance when the developer steps away. What’s the solution to the thousands of open issues on GitHub that ask if a project is abandoned?
Here is an issue I found on one project where the developer is trying to hand over more access to community members, and I wonder if a guide to taking your project through that transition would have been valuable to move it along.
Another way this comes up that is very relevant is the assertion put forth in “Redis Renamed to Redict” which really asks the question what moral rights the community has to a project.
(SourceForge also came to be loaded down with advertising and I remember it being kind of a miserable website to use, and as GitHub loads up with “AI” features and feels increasingly clunky to use, it’s just another way I wonder if we are seeing history repeat itself.)
Writing software is fun
Writing software is fun. (For me. Your mileage may vary. But I am not alone in feeling this way.)
This means it is a particularly fraught field for exploitation.
A comparison I would make is to making music. Practically every musical biopic (or fictional version) features the part of the story where the artist (Ray, The One-ders, Elvis, The Dreams, Queen, The Pussycats, etc.) who is creating and/or performing music for their love of creating and performing comes under the influence of someone who sees the potential for money to be made. They have more experience in the business related to the craft, and they use that information asymmetry to exploit the artist.
The business of music has been around quite a bit longer than the business of writing software, and it is still messy and there are constant struggles and upheavals over the rights of artists, how to distribute the money when it gets made, and what sort of gatekeeping goes on within the business.
Seven years ago I pointed out that the games industry was having the same discussions about “crunch time” as 20 years before that. It’s always been a segment of the industry fed on the enthusiasm of people who think writing games is fun.
All of this to say, that as we enter another cycle of software licensing shenanigans in the open source world, I am interested, invested, and extremely tired.
Sometimes I just want to bang on the drums keyboard all day, share that with others, and forget that it is part of this complex ecosystem of people who are coming at it from different angles.
Time to modernize PHP’s syntax highlighting?
This blog post about “A syntax highlighter that doesn't suck” was timely because recently I had been kicking at the code for the syntax highlighter that I use on this blog. It’s a very old JavaScript package called SHJS based on GNU Source-highlight.
I created a Git repository where I imported all of the released versions of SHJS and then tried to update the included language files to the ones from the latest GNU Source-highlight release (which was four years ago), but ran into some trouble. There are some new features to the syntax files that the old Perl code in the SHJS package can’t handle. And as you might imagine, the pile of code involved is really, really old.
That new PHP package seems like a great idea and all, but I really like the idea of leveraging work that other people have done to create syntax highlighting for other languages rather than inventing another one.
On Mastodon, Ben Ramsey brought up a start he had made at trying to port Pygments, a Python syntax highlighter, to PHP.
I ran across Chroma, which is a Go package that is built on top of the Pygments language definitions. They’ve converted the Pygments language definitions into an XML format. Those don’t completely handle 100% of the languages, but it covers most of them.
At the end of the day, both GNU Source-highlight and Pygments and variants are built on what are likely to remain imprecise parsers because they are mostly regex-based and just not the same lexing and parsing code actually being used to handle these languages.
PHP has long had it’s own built-in syntax highlighting functions (highlight_string()
and highlight_file()
) but it looks like the generation code hasn’t been updated in a meaningful way in about 25 years. It just has five colors that can be configured that it uses for <span style="color: #...;">
tags. There are many tokens that it simply outputs using the same color where it could make more distinctions. If it were to instead (or also) use CSS classes to mark every token with the exact type, you could do much finer-grained syntax highlighting.
Looks like an area ready for some experimentation.
Thoughts from SCALE 21x, day 4
Today was the last day of SCALE 21x. Again I didn’t make it out for the opening keynote, and I just took a quick spin around the expo floor to see it looking sort of quiet and winding down.
The first talk I attended was Jonathan Haddad on“Distributed System Performance Troubleshooting Like You’ve Been Doing it for Twenty Years” where he shared some of his insights from doing that the title said for companies like Apple and Netflix. His recommendation for greenfield deployments was to have Open Telemetry set up to collect traces and logs, and he was also a big fan of the BPF Compiler Collection (aka bcc-tools) for getting a realtime look into system issues. He was not a fan of running databases in containers, and even less of a fan of running them within Kubernetes. (You could almost see his eye twitch.)
The last talk that I attended (there were just two slots today) was Jen Diamond on “The Git-tastic Power of Conventional Commits.” It was a good talk that used a little light lexical analysis to explain the basic concepts of working with Git (and the revelation that it stands for “ global information tracker” although now a little more research shows that’s only sort-of true). This all led into talking about Conventional Commits which is a way of structuring commit messages, and how you could use that in automations and in driving semantic-versioning in the release process.
The final session was a closing keynote from Bill Cheswick titled “I Love Living in the Future: Half a Century of Computers, Software, and Security” but really could have just been “give the old guy the microphone and let him go!” I left a little over two hours ago, and I wouldn’t be surprised to hear that he’s still going. I hope they let him take a bathroom break.
Thoughts from SCALE 21x, day 3
Another day, another set of thoughts on the experience. It was a busy day at the 21st edition of the Southern California Linux Expo, and the site was more crowded because an episode of America’s Got Talent was being filmed at the Civic Auditorium that is between the two buildings that the conference were held in. If I’d been on the ball, I would have taken a picture of Howie Mandel standing outside his limo.
I will admit that I took my time in the morning and didn’t make it over to Pasadena until after the keynote that kicked off the day.
The first talk that I attended was “Contribution is not only a code.” by Tatiana Krupenya, the CEO of DBeaver. She did a great job of breaking down the many ways that people can contribute to open source development aside from writing code, and I appreciated her final point was that the simplest contributions that anyone can make that will be well-received is just a heart-felt thank you to maintainers of tools that you find valuable.
She also brought up what I am sure is a great talk by Zak Greant from Eclipsecon 2019 titled “When Your Happy Dreams Are About Dying” about burnout in the open source developer community, which I’m looking forward to catching up on.
After that, it was off to Brian Proffitt’s “Measuring the Impact of Community Events” where he provided his perspective from his roles at the Red Hat OSPO, Apache Software Foundation, and other places. It was a great companion to the first session, but more from the perspective of why companies and projects may want to think about measuring how they engage with the community.
I took another spin through the expo during what was supposed to be the lunch break, picked up my conference T-shirt and a free bucket hat from AWS.
After lunch, Tyler Menezes from CodeDay spoke about “Nurturing the Next Generation of Open Source Contributors” and how the non-profit he founded works to connect high school and college students from underprivileged backgrounds with resources to help them thrive in tech. One of the programs pairs small teams of students with a mentor to help them make a contribution to an open source project, and it sounds amazing. I plan to find a way to get involved once I have some my employment situation sorted out.
For the next talk was Heather Osborn on “Organic isn't always good for you” which was sort of a case study of her experience as a DevOps leader tackling the complicated environment that had taken root place at the startup she was working at, and how they figured out a strategy to straighten that out. It was really interesting to hear the language she used about convincing the company management to buy into the plan, which seemed more adversarial and dismissive than the working environments that I’ve been in.
“Solving ‘secret zero’, why you should care about SPIFFE!” by Mattias Gees was by far the most technical talk that I attended today. Like the presentation on Presto yesterday, it seemed a bit like the sort of system that is very impressive and I will probably never need.
The last talk I attended was Michael Gat on “Anti-Patterns in Tech Cost Management” which was pretty true to the title. It was a little light on the open source aspect, but there were definitely insights there on the importance of laying the groundwork early for being able to do cost analytics on systems you’ll be scaling. There were three or so questions from people that started with “I’m an engineer, and ...” which I thought was great. I think what bothered me about Heather Osborn’s talk was how it implied a certain distaste for connecting the engineering to the business realities, and I think it is very important for engineers to understand, and have respect for, business decision-making.
One more day to go. I am surprised how heavy the program is on cloud computing and DevOps, but I guess that’s a huge chunk of what people are working on these days. What I have been missing from the talks so far is programming-focused talks.
Thoughts from SCALE 21x, day 2
The second day of the Southern California Linux Expo meant the start of the expo, and the more talks.
I started the day with “Best Practices for Running Databases on Kubernetes” with Peter Zaitsev, who was a coworker at MySQL and went on to found Percona. While I am getting a better sense of what Kubernetes is all about and already had some idea of how databases might exist in that world, his talk was a great overview and the “best practices” seemed to cover a lot of bases.
That was followed by “Kubernetes and Distributed SQL Databases: Same Consistency With Better Availability and Scalability” which showed off using Kine as a way to plug in different systems as the data store for Kubernetes instead of etcd
. I wish the speaker had spent a little more time giving some practical examples of why is something you would even want to do. It was a good reminder that k3s exists and I should play around with it. And the speaker just using an outline in an open text editor (Pico!) as his slides reminded me of when I gave a talk on MySQL and PHP using plain-text slides. (Looks like my talk has been disappeared, though.)
After that, it was back over to the other side of the expo for a talk on “Leveraging PrestoDB for data success” which was an overview of the Presto project, which provides an ANSI SQL query interface to a collection of other data sources (my paraphrase). Kiersten Stokes, the presenter who works at IBM, called MySQL a “traditional database” which struck me as funny. Presto is a very slick and powerful system that I will probably never need. I appreciate that everyone I have seen talk about the concept of a “data lakehouse” is appropriately embarrassed about the name.
Before the next round of talks started, the expo floor finally opened, so I took a quick spin through that. It was pretty busy, and seemed like a good crowd of projects and companies. I think the largest footprint was maybe a couple of 10' × 40' booths from companies like AWS and Meta, but otherwise it was a lot of 10' × 10' booths with a couple of people handing out stickers or other promotional items from behind a table (and talking about their projects/companies).
After that I went back to the MySQL track (four talks!) to see “Design and Modeling for MySQL” which was really more of a speed-run of database history and concepts. The presenter made the classic mistake of white text on a dark background so it was pretty tough to see what he was showing until someone dimmed the lights.
That was followed by “Beyond MySQL: Advancing into the New Era of Distributed SQL with TiDB” from Sunny Bains, whose time as the MySQL/InnoDB team overlapped my time working at MySQL, but I don’t think we ever met. TiDB seems like a very impressive cloud-native distributed database which doesn’t actually derive from MySQL, but instead has chosen to be protocol and query-language compatible.
The last session I attended was a panel from the Open Government track on “The OSPO POV.” OSPO stands for “Open Source Program Office” and can act as kind of the interface between companies or organizations and the open source world. There were a bunch of projects and communities mentioned that I want to look into further: TODO Group, Fintech Open Source Foundation, CHAOSS (Community Health Analytics in Open Source Software), Sustain, The Open Source Way, Inner Source Commons, and OSPO++.
Things got busier today, which was nice to see. I wasn’t in a great headspace most of the day, which pretty much sucked, but I think I came away with a lot of things to dig into on my own, which is one of the reasons I wanted to attend.
How I use Docker and Deployer together
I thought I’d write about this because I’m using Deployer in a way that doesn’t really seem to be supported.
After the work I’ve been doing with Python lately, I can see how I have been using Docker with PHP is sort of comparable to how venv
is used there.
On my production host, my docker-compose
setup all lives in a directory called tmky
. There are four containers: caddy
, talapoin
(PHP-FPM), db
(the database server), and search
(the search engine, currently Meilisearch).
There is no installation of PHP aside from that talapoin
container. There is no MySQL client software on the server outside of the db
container.
I guess the usual way of deploying in this situation would be to rebuild the PHP-FPM container, but what I do is just treat that container as a runtime environment and the PHP code that it runs is mounted from a directory on the server outside the container.
It’s in ${HOME}/tmky/deploy/talapoin
(which I’ll call ${DEPLOY_PATH}
from now on). ${DEPLOY_PATH}/current
is a symlink to something like ${DEPLOY_PATH}/release/5
.
The important bits from the docker-compose.yml
look like:
services:
talapoin:
image: jimwins/talapoin
volumes:
- ./deploy/talapoin:${DEPLOY_PATH}
This means that within the container, the files still live within a path that looks like ${HOME}/tmky/deploy/talapoin
. (It’s running under a different UID/GID so it can’t even write into any directories there.) The caddy
container has the same volume setup, so the relevant Caddyfile
config looks like:
trainedmonkey.com {
log
# compress stuff
encode zstd gzip
# our root is a couple of levels down
root * {$DEPLOY_PATH}/current/site
# pass everything else to php
php_fastcgi talapoin:9000 {
resolve_root_symlink
}
file_server
}
(I like how compact this is, Caddy has a very it-just-works spirit to it that I dig.)
So when a request hits Caddy, it sees a URL like /2024/03/09
, figures out there is no static file for it and throws it over to the talapoin
container to handle, giving it a SCRIPT_FILENAME
of ${DEPLOY_PATH}
and a REQUEST_URI
of /2024/03/09
.
When I do a new deployment, ${DEPLOY_PATH}/current
will get relinked to the new release directory, the resolve_root_symlink
from the Caddyfile
will pick up the change, and new requests will seamlessly roll right over to the new deployment. (Requests already being processed will complete unmolested, which I guess is kind of my rationale for avoiding deployment via updated Docker container.)
Here is what my deploy.php
file looks like:
<?php
namespace Deployer;
require 'recipe/composer.php';
require 'contrib/phinx.php';
// Project name
set('application', 'talapoin');
// Project repository
set('repository', 'https://github.com/jimwins/talapoin.git');
// Host(s)
import('hosts.yml');
// Copy previous vendor directory
set('copy_dirs', [ 'vendor' ]);
before('deploy:vendors', 'deploy:copy_dirs');
// Tasks
after('deploy:cleanup', 'phinx:migrate');
// If deploy fails automatically unlock.
after('deploy:failed', 'deploy:unlock');
Pretty normal for a PHP application, the only real additions here are using Phinx for the data migrations and using deploy:copy_dirs
to copy the vendors
directory from the previous release so we are less likely to have to download stuff.
That hosts.yml
is where it gets tricky, because when we are running PHP tools like composer
and phinx
, we have to run them inside the talapoin
container.
hosts:
hanuman:
bin/php: docker-compose -f "${HOME}/tmky/docker-compose.yml" exec --user="${UID}" -T --workdir="${PWD}" talapoin
bin/composer: docker-compose -f "${HOME}/tmky/docker-compose.yml" exec --user="${UID}" -T --workdir="${PWD}" talapoin composer
bin/phinx: docker-compose -f "${HOME}/tmky/docker-compose.yml" exec --user="${UID}" -T --workdir="${PWD}" talapoin ./vendor/bin/phinx
deploy_path: ${HOME}/tmky/deploy/{{application}}
phinx:
configuration: ./phinx.yml
Now when it’s not being pushed to an OCI host that likes to fall flat on its face, I can just run dep deploy
and out goes the code.
I’m also actually running Deployer in a Docker container on my development machine, too, thanks to my fork of docker-deployer
. Here’s my dep
script:
#!/bin/sh
exec \
docker run --rm -it \
--volume $(pwd):/project \
--volume ${SSH_AUTH_SOCK}:/ssh_agent \
--user $(id -u):$(id -g) \
--volume /etc/passwd:/etc/passwd:ro \
--volume /etc/group:/etc/group:ro \
--volume ${HOME}:${HOME} \
-e SSH_AUTH_SOCK=/ssh_agent \
jimwins/docker-deployer "$@"
Anyway, I’m sure there are different and maybe better ways I could be doing this. I wanted to write this down because I had to fight with some of these tools a lot to figure out how to make them work how I envisioned, and just going through the process of writing this has led me to refine it a little more. It’s one of those classic cases of putting in a lot of hours to end up with a relatively few lines of code.
I’m also just deploying to a single host, deployment to a real cluster of machines would require more thought and tinkering.
Release early, release often
One of the benefits of starting Frozen Soup from a project template is that someone very smart (Simon) has done all the heavy lifting to make publishing it into the Python ecosystem really easy to do. So after I added a new feature today (pulling in external url(...)
references in CSS inline as data:
URLs), I went ahead and registered the project on PyPI, tagged the release on GitHub, and let the GitHub Actions that were part of the project template do the work of publishing the release. It worked on the first try, which is lovely.
I pushed more changes after I did that release, adding a way to set timeouts and fixing the first issue (that I also filed) about pre-existing data:
URLs getting mangled. I also added a quick-and-dirty server version which allows for getting the single-file HTML version of a page, and makes it a little easier to play around with the single-file version of live URLs without having to deal with saving and opening the files.
So I did a second release.
Introducing Frozen Soup
I made a new thing, which I decided to call Frozen Soup. It creates a single-file version of an HTML page by in-lining all of the images using data:
URLs, and pulling in any CSS and JavaScript files.
It is loosely inspired by SingleFile which is a browser extension that does a similar thing. There are also tools built on top of that which let you automate it, but then you’re spinning up a headless browser, and it all felt very heavyweight. The venerable wget
will also pull down a page and its prerequisites and rewrite the URLs to be relative, but I don’t think it has a comparable single-file output.
This may also exist in other incarnations, this is mostly an excuse for me to practice with Python. As such, it is a very crude first draft right now, but I hope to keep tinkering with it for at least a little while longer.
I have also been contributing some changes and test cases to ArchiveBox, but this is different yet also a little related.
Grinding the ArchiveBox
I have been playing around with setting up ArchiveBox so I could use it to archive pages that I bookmark.
I am a long-time, but infrequent, user of Pinboard and have been trying to get in the habit of bookmarking more things. And although my current paid subscription doesn’t run out until 2027, I’m not paying for the archiving feature. So as I thought about how to integrate my bookmarks into this site, I started looking at how I might add that functionality. Pinboard uses wget
, which seems simple enough to mimic, and I also found other tools like SingleFile.
That’s when I ran across mention of ArchiveBox and decided that would be a way to have the archiving feature I want and don’t really need/want to expose to the public. So I spun it up on my in-home server, downloaded my bookmarks from Pinboard, and that’s when the coding began.
ArchiveBox was having trouble parsing the RSS feed from Pinboard, and as I started to dig into the code I found that instead of using an actual RSS parser, it was either parsing it using regexes (the generic_rss
parser) or an XML parser (the pinboard_rss
parser). Both of those seemed insane to me for a Python application to be doing when feedparser has practically been the gold standard of RSS/Atom parsers for 20 years.
After sleeping on it, I decided to roll up my sleeves, bang on some Python code, and produced a pull request that switches to using feedparser
. (The big thing I didn’t tackle is adding test cases because I haven’t yet wrapped my head around how to run those for the project when running it within Docker.)
Later, I realized that the RSS feed I was pulling of my bookmarks would be good for pulling on a schedule to keep archiving new bookmarks, but I actually needed to export my full list of bookmarks in JSON format and use that to get everything in the system from the start.
But that importer is broken, too. And again it’s because instead of just using the json
parser in the intended way, there was a hack to work around what appears to have been a poor design decision (ArchiveBox would prepend the filename to the file it read the JSON data from when storing it for later reading) that then got another hack piled on top of it when that decision was changed. The generic_json
parser used to just always skip the first line of the file, but when that stopped being necessary, that line-skipping wasn’t just removed, it was replaced with some code that suddenly expected the JSON file to look a certain way.
Now I’ve been reading more Python code and writing a little bit, and starting to get more comfortable some of the idioms. I didn’t make a full pull request for it, but my comment on the issue shows a different strategy of trying to parse the file as-is, and if that fails, skip the first line and try it again. That should handle any JSON files with garbage in the first line, such as what ArchiveBox used to store them as. And maybe there is some system out there that exports bookmarks in a format it calls JSON that actually has garbage on the first line. (I hope not.)
So with that workaround applied locally, my Pinboard bookmarks still don’t load because ArchiveBox uses the timestamp of the bookmark as a unique primary key and I have at least a couple of bookmarks that happen to have the same timestamp. I am glad to see that fixing that is project roadmap, but I feel like every time I dig deeper into trying to use ArchiveBox it has me wondering why I didn’t start from scratch and put together what I wanted from more discrete components.
I still like the idea of using ArchiveBox, and it is a good excuse to work on a Python-based project, but sometimes I find myself wondering if I should pay more attention my sense of code smell and just back away slowly.
(My current idea to work around the timestamp collision problem is to add some fake milliseconds to the timestamp as they are all added. That should avoid collisions from a single import. Or I could just edit my Pinboard export and cheat the times to duck the problem.)
Oracle Cloud Agent considered harmful?
Playing around with my OCI instances some more, I looked more closely at what was going on when I was able to trigger the load to go out of control, which seemed to be anything that did a fair amount of disk I/O. What quickly stuck out thanks to htop
is that there were a lot of Oracle Cloud Agent processes that were blocking on I/O.
So in the time-honored tradition of troubleshooting by shooting suspected trouble, I removed Oracle Cloud Agent.
After doing that, I can now do the things that seemed to bring these instances to their knees without them falling over, so I may have found the culprit.
I also enabled PHP’s OPcache and some rough-and-dirty testing with good ol’ ab
says I took the homepage from 6r/s to about 20r/s just by doing that. I am sure there’s more tuning that I could be doing. (Requesting a static file gets about 200 r/s.)
By the way, the documentation for how to remove Oracle Cloud Agent on Ubuntu systems is out of date. It is now a Snap package, so it has to be removed with sudo snap remove oracle-cloud-agent
. And then I also removed snapd
because I’m not using it and I’m petty like that.
Fall down, go boom
I am either really good at making Oracle Cloud Infrastructure instances fall over, or the VM.Standard.E2.1.Micro
shape is even more under-powered than I expected. I had been using the Ubuntu “minimal” image as my base, so I thought I would try the Oracle Linux 8 image and I couldn’t even get it to run yum check-update
without that process getting killed. That seems like a less-than-ideal experience out of the box.
What seems to happen on the instances (with Ubuntu) that I am using to host this site is that if something does too much I/O, the load average spikes, and things slowly grind through before recovering. The problem is that something like “running composer” seems to be too much I/O, which makes it awkward to deploy code.
Another thing that seems to get out of control quickly is when I reindex the site with Meilisearch. Considering there is very little data being indexed, that obviously shouldn’t be causing any sort of trouble. I have two instances spun up now, so I can play with the settings on one without temporarily choking off the live site. It’s probably just a matter of setting the maximum indexing memory in Meilisearch’s configuration or constraining the memory on that container.
I also added a OCI Flexible Network Load Balancer in front of my instance so I can quickly switch things over to another without waiting on any DNS propagation. Maybe if Ampere instances ever become available in my region I will play around with splitting the deployment across multiple instances.
Coming to you from OCI
After some fights with Deployer and Docker, this should be coming to you from a server in Oracle Cloud Infrastructure. There are still no Ampere instances available, so it is what they call a VM.Standard.E2.1.Micro
. It seems be underpowered relative to the Linode Nanode that it was running on before, or maybe I just have set things up poorly.
But having gone through this, I have the setup for the “production” version of my blog streamlined so it should be easy to pop up somewhere else as I continue to tinker.
Docker, Tailscale, and Caddy, oh my
I do my web development on a server under my desk, and the way I had it set up is with a wildcard entry set up for *.muck.rawm.us
so requests would hit nginx
on that server which was configured to handle various incarnations of whatever I was working on. The IP address was originally just a private-network one, and eventually I migrated that to a Tailscale tailnet address. Still published to public DNS, but not a big deal since those weren’t routable.
A reason I liked this is because I find it easier to deal with hostnames like talapoin.muck.rawm.us
and scat.muck.rawm.us
rather than running things on different ports and trying to keep those straight.
One annoyance was that I had to maintain an active SSL certificate for the wildcard. Not a big deal, and I had that nearly automated, but a bigger hassle was that whenever I wanted to set up another service it required mucking about in the nginx
configuration.
Something I have wanted to play around with for a while was using Tailscale with Docker to make each container (or docker-compose
setup, really) it’s own host on my tailnet.
So I finally buckled down, watched this video deep dive into using Tailscale with Docker, and got it all working.
I even took on the additional complication of throwing Caddy into the mix. That ended up being really straightforward once I finally wrapped my head around how to set up the file paths so Caddy could serve up the static files and pass the PHP off to the php-fpm
container. Almost too easy, which is probably why it took me so long.
Now I can just start this up, it’s accessible at talapoin.{tailnet}.ts.net
, and I can keep on tinkering.
While it works the way I have it set up for development, it will need tweaking for “production” use since I won’t need Tailscale.
My history with programming languages
I think I have always had an interest in playing around with different programming languages. My first probably would have been Applesoft BASIC. In elementary school, I remember writing a bouncing-lines graphic toy more than once, and D&D character generators.
I am really not sure what came next, but it was probably mostly things like DOS batch files (and eventually 4DOS batch files).
I wrote some little utilities for DESQview in x86 assembly and even released them as freeware, but I haven’t been able to track them down again.
The first substantial project I did was probably the billing and reporting system for my dad’s company that was written in FoxPro before it was even acquired by Microsoft. This would have been where I first encountered SQL, too.
All of this would have been essentially self-taught. I vaguely remember participating in some computer classes or clubs, but nothing terribly structured. I’m sure that I was exposed to Pascal, probably Turbo Pascal, at some point in those.
One memory from my freshman year at college is that I used the (* multi-line syntax *)
of Pascal comments on an assignment instead of the { single-line syntax }
and the grader for the assignment had some sort of reaction about that.
I was exposed to a lot of programming languages in college, especially because I had to take the “Programming Languages” class twice after failing it the first time. (I had a rough time in my sophomore year.) It was taught by different professors who used different languages to teach the concepts. Some languages I remember doing at least an assignment or two in: C, Fortran, ML, Scheme, Lisp, Perl, COBOL, Ada, and APL. Part of Harvey Mudd College’s program is a capstone project in your senior year where you work on a project for an outside company or organization. The project I was part of was a wrote a tool for doing internationalization for voicemail prompts for Octel Communications using Visual Basic.
My first job after college was for an educational/games software company named Knowledge Adventure (KA) on a project that was released as “Steven Spielberg’s Director’s Chair.” At the time, all of the projects at the company were done using an in-house programming language named “Acomplish” (or maybe “Accomplish”) which was short for “A Computer English.” I have been trying to dig up some reference to or examples of it and have come up empty so far, but it was a natural-language-ish syntax that had originally been intended for producers to write in. By the time I worked there it was mostly being used by programmers who graduated from Mudd, CalTech, and UCLA, which is kind of funny.
It was while working at KA working on their website that I got involved in the early days of PHP. I guess here is where someone could make jokes about how obviously someone who had failed “Programming Languages” had a hand in PHP, but I don’t recall that I had very much, if any, involvement in the actual design of the language. (Although I remember having to write way too many emails to make that case that casting a string to an integer should not interpret it the same way as numeric literals because people would have brought out pitchforks when leading zeroes caused them to be interpreted as octal values.)
At KA, the second title I worked on (what was released as “Dr. Brain’s Thinking Games: IQ Adventure”) was coded in C/C++ after we had done some initial prototyping in Acomplish. The game had a multiplayer component and we ended up borrowing the graphics and networking library from our (at the time) sister company, Blizzard Entertainment. (I later went on to have the worst programming interview experience of my life, so far, there.)
When I left KA they were in the process of adopting a new engine for all of their projects that was Java-based, or maybe just evaluating it. In any case, I never ended up working with that engine but I did a small contract project around this time that was a Java applet. (It was for the website for a show on NBC.)
My next job was at HomePage.com, which was an idealab startup that was basically a second-generation GeoCities. (A number of the folks in the management team were actually ex-GeoCities people who cashed out when that was acquired by Yahoo.) We built our system in Perl (with mod_perl), and built a sort of primitive HTML templating system called Gear. The original parser was regex-based until one of the other senior engineers wrote a proper lexer and parser for it.
I’m not sure when I first started working in JavaScript, but it was probably somewhere in this period.
After HomePage.com, I ended up working for MySQL Ab leading the web team, which meant back to writing a bunch of PHP code. And I am sure I had encountered Python before this, but recently I dug up something I wrote during this time in Python that connected to our bugs database, announced new bugs, and could be queried in a few ways. During the rest of my time at MySQL, I did more C/C++ programming, probably more Perl, and even did a tiny bit of work with Ruby.
Other programming languages I have played with at some time or another: Logo, MATLAB, OCaml, Tcl, Dylan, Oberon, Modula, and Delphi (Object Pascal).
Most recently I have been playing around with Rust and Go, and done some reading on Swift.
This whole train of thought was actually triggered by seeing Delphi mentioned in a job description and it reminded me of how intrigued I was when I first encountered it. I have a soft spot for Pascal and the successors.
Poking around with Rust
A long time ago, I implemented a quick-and-dirty daemon in C that used the vendor’s support library to display messages on an LCD pole display at the store. The state of California requires that electronic point-of-sale systems have a customer-facing display, and this fulfilled that.
It was very simple, it just listened for TCP connections and displayed the text it was sent. Every 15 seconds it would reset the display to a hardcoded default message. When an item was added to an invoice in Scat POS, it would push the name and price to the daemon. It also pushed the total when payment was initiated.
Like I said, it was quick and dirty, and we used it for a decade and I never really got around to doing any of the basic improvements that I wanted to do, like being smarter about when to go back to the default display.
The LCD display was one of the things we didn’t manage to sell off when we closed down the store, so I took it home and now it’s on my desk, hooked up to the Raspberry Pi 4 that used to be our print server. I decided to use it as an excuse to start learning Rust.
I pulled out one of the examples from the code for a crate that wraps libusb
to provide access to USB devices, hit it with a hammer until I got it to push text to the display, and now I have the basis to re-implement what I had before and then give it the polish that I never did. Maybe implement a more user-friendly way of sending the various control codes from the user manual for doing things like clearing the screen.
That’s the theory, at least. The reality is that first I had to migrate all of my photos from Flickr to my own service, implement a way to add new photos to the collection, and then upload the photo I took of the display showing a simple message so I could blog about it.
And I am not sure if doing more with this is actually the next thing I’ll tackle, but writing about what I had done so far is at least something to check off the to-do list that I don’t have.
The code lives in the lcdpoled repository on GitHub. (The old C code is now off on a different branch.)
More syndication
Now that Bluesky is available without a waitlist and has a web interface, I have been playing around with it a little more. So in that spirit, posts here will get posted there just like I’ve been doing for Mastodon.
I’m just using this PHP interface for Bluesky because it was what I ran across first, but I probably should use Ben Ramsey’s socialweb/atproto which looks like a more rigorous implementation of the underlying AT protocol.
Anyway, it’s hacked together for now, and the very few followers I have over on Bluesky will now perhaps find these things in their feed.
You can even find links to the posts on Mastodon and Bluesky in the details of the entries here. One of my plans is to eventually pull in replies on either of those as comments here based on jwz’s hack for doing that with WordPress.
Sometimes the hammer is too big
Elasticsearch is something I have seen pop up in a lot of job listings, so I decided to play around with it to see if I could use it for the search on this site. I was able to get it set up fairly easily on my development server and shift over to using it there, but when I tried to bring it up on my production server, I ran into the problem that it is more resource-hungry than it can handle. This all runs on a Nanode, which is Linode’s tiniest virtual server that just has 1GB of RAM.
Right now I am using Sphinx which has been fine, but it hasn’t really been open source for quite a while.
I was going to try playing around with ZincSearch next. I was digging into it and it certainly sounds similar to what I want (a “lightweight alternative to Elasticsearch that requires minimal resources”) but it isn’t clear how active the project is or what sort of future it has. The documentation for ZincSearch is pretty but kind of scant. Looking into the search types it supports, I was left with questions about what syntax the querystring
type actually supported. So I looked at the code (which is slow reading since I am still fairly unfamiliar with Go), which I followed to the underlying bluge indexing library, which has even less documentation. I finally figured out that bluge was forked from bleve (which seems like something that would be nice to mention in the README
for bluge, but whatever). Bleve has the query string format documented. (Whew.)
But after all of that digging, I’m less certain about how much time I want to put into playing with ZincSearch since the underpinnings and their future seem shaky.
Typesense or Meilisearch is on my list to give a try next.
But in case it isn’t obvious, the things I have been digging into are sort of scattered right now.
Now the photos live here
I’d call it maybe an alpha release, but a very basic version of my photo library is now up and running. There is one last picture I need to migrate over from my old Flickr presence, but otherwise they should all have made it over. I should look at pulling in the photos from my Instagram account. The real test of things would be to load my iCloud photo library, but that is about 25,000 pictures and I’d certainly have to go through to see what could be public. It would probably be better to figure out how I want to get photos into the library going forward, and then I can back-fill photos from before this year.
Just implementing this has made me think a lot about how my system here is structured, how the data is structured, and how I want to structure it. I am trying to move towards adopting more of the IndieWeb principles and standards.
Enabling GD’s JPEG support in Docker for PHP 8.3
I am generating a ThumbHash for each photo in my new photo library using this PHP library, and it needs to use either the GD Graphics Library extension or ImageMagick to decode the image data to feed it into the hash algorithm.
The PHP library recommends the ImageMagick extension (Imagick) because GD still doesn’t support 8-bit alpha values, but I ran into the bug that prevents Imagick from building that is fixed by this patch that hasn't been pulled into a released version yet. Then I realized that none (or close to none) of the images I’d be dealing with use any sort of transparency, so GD would be fine. And it was already enabled in my Dockerfile
, so I should have been good to go.
But it turns out that although I thought I had included GD, I hadn’t actually properly enabled JPEG support in GD, so the ThumbHash library’s helper function to extract the image data it needed just failed on a call to ImageSX()
after ImageCreateFromString
had failed. (Here is a pull request to SRWieZ/thumbhash
to throw an exception on that failure, which would have saved me a few steps of debugging.)
Looking at the code for the GD extension, that should not have been a silent failure, so some digging may be required to figure out what happened with that. I may have just missed that particular error message in the logs.
Enabling JPEG support is fairly simple, although a lot of the instructions I found online were a little out of date. The important thing was that I needed to add this to my Dockerfile between installing the development packages and building the PHP extensions: docker-php-ext-configure gd --with-freetype --with-jpeg
.
So now I can successfully generate a ThumbHash for all of my photos, except for another bug I haven’t tracked down yet where it sometimes produces a hash that is longer than expected. The ThumbHash for this photo is 2/cFDYJdhgl3l2eEVMZ3RoOkD1na
which can be turned directly into this image:
Some notes on Flickr data migration
I decided to stop renewing my subscription with Flickr recently to create some incentive for me to self-host my photos and integrate them more closely here. Before my subscription lapsed, I requested an archive of all of my Flickr data, and now I am finally getting around to working with the data.
When you download your Flickr data it includes JSON files named like photo_50626142.json
and JPEG files named like young-mimes_50626142_o.jpg
and the name of the JPEG is not in the JSON data.
You can generate it, probably, using the name and ID but I’m not sure what the rules are for turning the name field into the snake-case form.
Except that images without a name have JPEG files named like 17483805680_f57f81feb5_o.jpg
. The id is at the beginning, the other bit is just random or something. (Looks like this is the same filename used for the original
URL in the JSON.)
The way to go seems to be just matching on the ID embedded in the filename. (That’s what the one other tool I’ve seen that uses the export data does.)
And when working through all of this, I found that I must have not downloaded one of the archive files from Flickr, because I was missing 83 JPEG files. I was able to use the JSON files to rescue them.
Now that I know that I actually have all of the data and all of the images are in a Backblaze B2 bucket fronted by Gumlet, the next step will be loading all of the relevant metadata into a database table and then wiring up some ways to browse the images here.
Where to put routing code
I won’t make another Two Hard Things joke, but an annoying thing in programming is organizing code. Something that bothered me over the years as I was developing Scat POS is that adding a feature that exposed new URL endpoints required making changes in what felt like scattered locations.
In the way that Slim Framework applications seem to be typically laid out, you have your route configuration in one place and then the controllers (or equivalent) live off with your other classes. Slim Skeleton puts the routes in app/routes.php
and your controller will live somewhere down in src
. Scat POS started without using a framework, and then with Slim Framework 3, so the layout isn’t quite the same but it’s pretty close. The routes are mostly in one of two applications, app/pos.php
or app/web.php
, and then the controllers are in lib
.
So as an example, when I added a way to send a text message to a specific customer, I had to add a couple of routes in app/pos.php
as well as the actual handlers in lib
.
(This was an improvement over my pre-framework days where setting up new routes could involve monkeying with Apache mod_redirect
configuration.)
Finally for one of the controllers, I decided to move the route configuration into a static method on the controller class and just called that from within a group. Here is commit adding a report that way, which didn’t have to touch app/pos.php
.
Just at quick glance, it looks like Laravel projects are set up more like the typical Slim Framework project with routes off in one place that call out to controllers. Symfony can use PHP attributes for configuring routes in the controller classes, which seems more like where I would want to go with this thinking.
I am not sure what initially inspired me to start using Slim Framework but if it seems like I am doing things the hard way sometimes, that is sort of intentional. On a project like this where I was the only developer, it was a chance to explore ideas and learn new concepts in pretty small chunks without having to buy in to a large framework. If I were to start the project fresh now, I might just use Symfony and find other new things to learn (like HTMX). If I had needed to hand off development of Scat POS to someone else, I would have needed to spend some time making things more consistent so there weren’t multiple places to look for routes, for example.
And as a side note, going back to that commit adding SMS sending to customers, you can see a bit of the interface I had to for popping up dialogs. It used Bootstrap modal components because it pre-dates browser support for <dialog>
. The web side of Scat POS actually evolved that to use browser-native dialogs (as a progressive enhancement) because I had rebuilt it all more recently and that side no longer used Bootstrap.
The joy of open source
As I mentioned, one of the reasons I was trying to get organized in setting up my development environment under chezmoi
was because I was wanted to start using Atuin, which does what it claims in making the shell magical. It stores and syncs the shell history (in an SQLite database behind-the-scenes), and while I have only just started using it, it seems pretty great. Before starting with it, I just had the bash history set up on my primary development machine to never expire, so there was six years of history there (now imported into Atuin).
But Atuin is fairly new, and I ran into some rough edges. One is that because I’m still using the bundled bash
on macOS, and it is a very old version of bash
, not everything works correctly (accessing the history by up-arrow just errors out). This has been fixed in the Atuin repo, so it should no longer be a problem after they roll out their next release.
As I finished integrating Atuin into my chezmoi
setup, I noticed that it seemed to no longer be updating the history on my Linux box. When I went into debugging it, I finally found that I was loading two different versions of bash-preexec
. One was being loaded by my chezmoi
setup (where it is set up to download to ~/.bash-preexec.sh
in my .chezmoiexternal.toml
), and the other was being loaded by /etc/profile.d/wezterm
. My version was 0.5.0, the version that WezTerm was loading was 0.4.1.
Between those two versions, bash-preexec
decided to change the variable that they use to prevent double-inclusion, but the implementation was sort of one-way: the new version would set both the old and new variables, but only check the new one. So if you loaded the old one first, the new one would still load and (apparently) not work correctly.
I let Wez know that he was bundling an old version, which he promptly updated (and so it’s fixed in the nightly builds now). I also submitted a patch to bash-preexec
to pay attention to the old guard variable, so whenever they do a new release that particular problem won’t bite anyone else. (They will face a new one, where having an old version of bash-preexec
loaded may prevent the newer version from loading, but that should be relatively straightforward to figure out.)
This journey brought to mind an experience I had at HomePage.com about 25 years ago. We had a fleet of front-end web servers that all accessed user storage on an NFS server (an F5 box) and we were running into trouble where the Linux machines would periodically get stuck or die. It was an all-hands-on-deck situation trying to figure it out, and I eventually hit enough things with hammers to figure out that it was a bug in Linux NFS and there was a fix in a later kernel version that I could just back port and suddenly our Linux machines were stable. I pointed out the fix that could be back ported to Alan Cox, then maintainer of the stable Linux kernel. It’s possible they were pulling my leg about it, but the marketing/PR people at the company talked about putting out a press release about how I had fixed a bug in NFS on Linux. I found the whole idea embarrassing and felt this was all just part of the normal open source process.
(Small side note: The company ended up migrating the front-end servers to FreeBSD, which had been a parallel investigation to me debugging the NFS problem.)
This is the sort of mess I enjoy digging my way out of, and it is generally more fun to do this in the open source world than in some company’s proprietary codebase.
Another tool I plan to explore is ble.sh
which is recommended by Atuin. The entire idea of line-editing and syntax highlighting built entirely in shell code sounds ludicrous.
Setting up house with chezmoi
It has been a long time since I was the sort of developer who fine-tuned his setup to be just so.
The .vimrc
on my development machine is less than 50 lines long. And a big chunk of that is a function that I don’t remember adding and I’m pretty sure I never used.
I made the small change to using WezTerm instead of the stock Mac OS X terminal but not for any particularly exciting reason. My configuration is very barebones, just changing the font, color scheme, and setting it up a very simple window title.
I figured it was time to get more organized, so one of the first things I’ve done is set up chezmoi to manage my configuration files. I want to start playing around with at least a slightly more sophisticated and consistent setup, and add things like Atuin into the mix.
Titles are where I can be abstruse
When I added auto-posting of entries here to my Mastodon account, it just took the title of the entry and posted that along with the link. But those end up being kind of cryptic, so now I made it possible to actually write the Mastodon post alongside the blog entry.
Just a small step in building out this online presence more fully and making sure it’s connected to other places more thoughtfully.
The next thing I want to do is build a new home for my photos. I stopped renewing my Flickr Pro subscription. I’ve thought about setting up a Pixelfed instance but that seems like overkill. I may use it as an excuse to build something in a language other than PHP because my resume could use that. Nearly all of my non-PHP work has been lost to me because it wasn’t open source.
thought i missed one: oscommerce
i ran across a reference to oscommerce in the slides of a tutorial i presented at o’really oscon in 2002(!) where i ran through of a survey of major php applications, and i thought that meant i had missed one in my round-up of open-source php point-of-sale applications.
but it’s an ecommerce platform, not a point-of-sale system and it doesn’t look like it has a module or add-on to provide a point-of-sale interface.
speaking of that, there are some point-of-sale add-ons for woocommerce, which is itself the ecommerce add-on to wordpress. it looks like the only open-source/free ones are built specifically for use with square or paypal terminals.
titi, a simple database toolkit
at some point in my life i got tired of writing all my SQL queries by hand, and was casting about for a database abstraction that simplified things. but i didn’t care for anything that required that i specify my actual SQL tables in code or another format. i wanted something that would just work on top of whatever tables i already had.
i don’t know what i considered at the time, but where i landed was using Idiorm and Paris, which bills itself as a “minimalist database toolkit for PHP5” which gives you a sense of its age. it was long ago put into maintenance-only mode by its developers, and eventually i ran across something that i wanted to fix or otherwise do that i knew would never be accepted upstream.
so i took the code that was in two distinct repositories, merged it together, tossed it in a new namespace, and renamed it Titi. i haven’t really done much with it beyond that, but i know there is code that i should be pulling back in from scat. an advantage to being a solo developer is you can kind of punch through abstraction layers to get things done, but that also leaves cleanup work to be tackled eventually.
should anybody else use this? maybe not. but it has been useful for me in my projects, and it’s also been a good playground to learn more about new php language features and tools.
(like most of my open source projects, this is named for a type of monkey, the titi monkey.)
scat is scatter-brained
while i folded all of the website/ecommerce parts of scat into the same repository as the point-of-sale system itself, it doesn’t really work out of the box and it is because of the odd way in which we run it for our store. the website used to be a separate application that was called ordure, so there’s a little legacy of that in some class names. i still think of the point-of-sale side as “scat” and the website side as “ordure”.
the point-of-sale system itself runs on a server here at the store (a dell poweredge t30), but our website runs on a virtual server hosted by linode. they run semi-independently, and they’re on a shared tailscale network.
ordure calls back to scat for user and gift card information, to send SMS messages, and to get shipment tracking information. so if the store is off-line, it mostly works and customers can still place orders. (but things will go wrong if they try to log in or use gift cards.)
there are scheduled jobs on the scat side that:
- push a file of the current inventory and pricing (every minute)
- pull new user signups (every minute)
- check for new completed orders and pull them over (every minute)
- push the product catalog and web content if a flag was set (checked every minute)
- push updated google/facebook/pinterest data feeds (daily)
- send out abandoned cart emails (daily)
so ordure has a copy of scat’s catalog data that only gets updated on demand but does get a slightly-delayed update of pricing and inventory levels. the catalog data gets transferred using ssh and mysqldump. (basically: it get dumped, copied over, loaded into a staging database, and a generated 'rename table' query swaps the tables with the current database, and the old tables get dropped so the staging area is clear for next time.)
not all of this is reflected within the scat code repository, and this post is just sort of my thinking through out loud where it has ended up. part of the reason for this setup is that the store used to have a janky DSL connection so i was minimizing any dependencies on both sides being available for the other to work.
as a side note, all of the images used in the catalog are stored in a backblaze b2 bucket and we use gumlet to do image optimizing, resizing, etc. when we add images to our catalog, it can be done by pulling from an external URL and the scat side actually calls out to the ordure side to do that work because when we were on that crappy DSL connection, pulling and pushing large images through that pipe was painful.
php pieces of what?
back in july 2010 i wrote about how i was frustrated with our point of sale system (Checkout, a Mac application which changed hands once or twice and is no longer being developed) and had taken a quick survey around to see what open source solutions there were.
the one that i mentioned there (PHP Point of Sale) is still around, but is no longer open source. here is a very early fork of it that still survives. i know at least one art supply store out there is using it (the closed-source version, not that early fork), but i haven’t really looked at it since 2010.
there are a few more php point of sale systems now.
the biggest is called Open Source Point of Sale and appears to be undergoing an upgrade from CodeIgniter 3 to CodeIgniter 4 right now. i spent a few minutes poking around the demo online, and i don’t think i would be happy using it. it is under an MIT license.
another big one is NexoPOS, which is GPL-licensed. i have not played around with the demo, but the supporting website looks pretty slick.
most of the others look like they are just experimental projects or not being actively used or developed.
something i think about a lot is whether i should be trying to take Scat POS beyond just using it ourselves. part of me feels like i am a seasoned enough developer to know that the work that would be required to give it the level of polish and durability to survive usage outside of our own doors could be substantial.
sidekiq for php?
it is a little strange still developing in php and having done it for so long, because you look at how other systems are built today and it isn’t always clear how that translates to php.
mastodon (the server software) is built primarily with ruby-on-rails, and uses a system called sidekiq to handle job processing. when you post to your mastodon server, it queues up a bunch of jobs that push it out to your subscribers, creates thumbnails of web pages, and all sorts of other stuff that may take a while so it makes no sense to make the web request hang around for it.
for scat pos, there are a few queue-like tasks that just get processed by special handlers that i use cron jobs to trigger. for example, when a transaction is completed it reports the tax information to our tax compliance service, but if that fails (because of connectivity issues or whatever) there’s a cron job that runs every night to re-try.
as best i can tell, the state of the art for php applications that want to have some sort of job queue system like sidekiq is Gearman and GearmanManager and it is wild to me that projects i remember starting up in 2008 are still just chugging along like that.
stable and well-understood technologies
AddyOsmani.com - Stick to boring architecture for as long as possible
Prioritize delivering value by initially leaning on stable and well-understood technologies.
i appreciate this sentiment. it is a little funny to me that what i can claim the most expertise in would probably be considered some of the most stable and well-understood technologies out there right now, but i have been working with them since they were neither. perhaps i have crossed from where as long as possible becomes too long, at least as far as employability is concerned.
logging with context
i have had this blog post about moving from logs to metrics open for a while, since i know one of the weak points in our systems right now is some pretty basic stuff like logging and monitoring. and then jwz ran into a problem with logging errors from php-fpm and what it reminded me about is how logs need to carry enough context so you can pull the threads together from something like a single request.
i have not wrapped my head around the idea of just using metrics, because that sounds like rather a lot of data to be storing. maybe i’m just an on-prem brain in a cloud world.
scat pos proof of life (screencasts)
i recorded a couple of quick screencasts to show cloning it from github and starting it up with docker-compose and going through the initial database setup and processing a sale with sample data.
like the website says, the system is a work in progress and not suitable for use by anyone, but we have been using it for more than ten years.
i am not sure if it is something that anyone else would want to use, but i figure one way to find that out is to at least start pushing it towards where that is even be feasible.
pinging blo.gs again
i guess it only makes sense that i should ping blo.gs when i post things here.
hard to believe that thing has been running for over twenty years now.
now with new comments
not that i think that there is anyone reading this, but you can now comment on entries for seven days after they have been posted.
you can throw html in your comment, but it will get filtered by html purifier.
spam will be deleted promptly if it wasn’t already blocked or sequestered by akismet.
another php akismet api implementation
in poking around with adding support for comments here, i looked at integrating with the akismet anti-spam service, and the existing php libraries for using it didn’t work how i wanted or brought in dependencies that i wanted to avoid. so i made a simple akismet-api package that just uses guzzlehttp under the hood.
i haven’t made a test suite or added real documentation yet, so you should consider it pre-production, but it seems to work okay.
cleantalk is another anti-spam service that we use for registrations and comments on our store website and their php library is kinda non-idiomatic and strange, too. a reimplementation of that might be in the cards.
don’t be too clever
what is it about websites for government entities that result in login systems that try to do something clever that just falls down in the real world? treasurydirect used to have a password entry system that relied on a virtual keyboard, which was an accessibility nightmare and of course did not play nicely with a password manager. calsavers, the state of california’s retirement savings program does something fancy when submitting passwords that results in apple’s built-in password management wanting to save the transformed password on every log in, which means the saved password no longer works.
one small project i have in mind is to explore passkeys and how to implement them, and i sure wish the folks at calsavers had spent time on that rather than whatever janky client-side password chicanery they have going on now.
tooting while blogging
i figured that i should do something clever like automatically post to my mastodon account when i posted here, but i was surprised to find that the state of mastodon api clients for php is pretty sad. php-mastodon was what i used to get it working, but it's really an incomplete implementation and the error handling is pretty much non-existent so it took way longer than it should have to get going.
(and put me down as someone who is glad that “tooting” is being pushed into the background as the term of art for posting on mastodon, but couldn’t resist using it this time.)
one person's technical debt
My 20 Year Career is Technical Debt or Deprecated
Everything eventually becomes tech debt, or the projects get sunsetted. If you are lucky, your code survives long enough to be technical debt to someone else.
i rather liked this piece on the ever-changing nature of software tools and how entropy catches up with us all, but what a focus on technical debt doesn't quite capture is the underlying value. old code has accumulated a lot of knowledge and value. it's why you don't just rewrite from scratch.
scat pos has been a one-person project for over a decade. at this point, it has literally encoded my experience in how to manage our retail store. you could throw it all away; start from scratch, or just switch to an off-the-shelf solution. but you would be throwing away a lot of accumulated knowledge and value.
made a new saddle
while being able to write entries and send them via email seemed like fun, the reality is that the setup was fragile. so it was enough of a hurdle to writing anything here that i rarely wanted to deal with it.
i do want to write more here, so i knocked together a basic web interface that will allow me to do that.
the biggest thing that i still haven't figured out is how i want to handle is images. i could go back to using flickr and embedding from there, or i could implement a basic media library. i think the long-term solution is probably doing it myself because that's kind of the reason for this place.
migrated to slim framework 4
a couple of weeks ago i finally took some time to upgrade the code for this blog to the latest major version of the slim framework. it is still a strange mash-up of framework and hand-coded sql queries and old php code but this should make it easier to for me to tinker with going forward. the main thing i need to do is add a way to post images again.
server-side tracking
i gave up on server-side event tracking on our website for now. segment was promising but the ecommerce functionality wasn’t at the level i needed for all of the platforms we integrate with, and the cost was just too steep. it’s based on what they call “monthly tracked users” and even our modest needs looked like it was going to be way more expensive than i could justify.
so i just migrated (back) to google tag manager loading everything and a simple javascript wrapper to generate all of the events for each service. since then, i ran across rudderstack, which seems very similar to segment but with an open-source implementation and what appears to be a more sensible pricing structure for their cloud service. it will be the top of my list of things to investigate whenever i want to revisit this again.
flipping switches
cloudflare zaraz is a great concept: manage the third-party code for your website sort of like google tag manager, but run as much of the code as possible in the cloud instead of the browser. but the execution is still rough around the edges, especially when it comes to the ecommerce functionality.
each of the platforms where we publish our catalog (and can use that to advertise) have their own way of collecting performance metrics. the way i had hacked support for each into our old website was messy and fragile. zaraz intervenes here with a simple zaraz.ecommerce(event, data)
call that pushes out the data to each of those third-party tools.
the problem is that how zaraz maps their simplified interface to those various systems is undocumented, and as near as the community can figure out, not always correct. i also found that if i enabled the ecommerce integration for facebook, it broke all of the ecommerce reporting everywhere.
i am still hopeful that they can work through the bugs and issues, add support for some of the other platforms that would be useful for us (like pinterest), and we can collect the data we need with a minimized impact on site performance.
the worst case is that i can just drop in my own implementation to turn those zaraz.ecommerce()
into the old browser-side integration and it will still be more streamlined than it used to be.
dipping my toes in go
one of the very first things i noticed when i migrated our website to a new server is that someone was running a vulnerability scanner against us, which was annoying. i cranked up the bot-fighting tools on cloudflare, but i also got fail2ban running pretty quickly so it would add the IP addresses for obviously bad requests to an IP list on cloudflare that would lock those addresses out of the site for a while. not a foolproof measure, of course, but maybe it just makes us a slightly harder target so they move on to someone else.
but fail2ban is a very old system with a pretty gross configuration system. i was poking around for a more modern take on the problem, and i found a simple application written in go called silencer that i decided to try and work with. i forked it so i could integrate it with cloudflare, and it was very straightforward. i also had to update one of the dependencies so it actually handled log file rotation. when i get time to hack on it some more, i’ll add handling for ipv6 as well as ipv4 addresses.
go is an interesting language. obviously i don’t have my head wrapped around the customs and community, so it seems a little rough to me, but it’s also not so different that i couldn’t feel my way around pretty quickly to solve my problem at hand.
another three years
another three years between entries. some stuff has happened. the store is still going, and i am still finding excuses to code and learn new things.
i wrote before about how i was converting scat from a frankenstein monster to a more modern php application built on a framework, which has more or less happened. there’s just a little bit of the monster left in there that i just need to work up the proper motivation to finish rooting out.
i also took what was a separate online store application built on a different php framework and made it a different face of scat. it is still evolving and there’s bits that make it work that aren’t really reflected in the repository, but it’s in production and seems to sort of work, which has been gratifying to get accomplished. the interface for the online store doesn’t use any javascript or css frameworks. between that and running everything behind cloudflare, it’s much faster than it used to be.
big, heavy, and wood
justin mason flagged this article about "The log/event processing pipeline you can't have" a while back, and it has been on my mind ever since. our digital infrastructure is split across a few machines (virtual and not) and i often wish that i had a more cohesive way of collecting logs and doing even minimally interesting things with them.
i think the setup there is probably overkill for what i want, but i love the philosophy behind it. small, simple tools that fit together in an old-school unix way.
i set up an instance of graylog to play with a state-of-the-art log management tool, and it is actually pretty nice. the documentation around it is kind of terrible right now because the latest big release broke a lot of the recipes for processing logs.
right now, the path i am using for getting logs from nginx in a docker container to graylog involves nginx outputting JSON that gets double-encoded. it’s all very gross.
i think i am having a hard time finding the correct tooling for the gap between “i run everything on a single box” and “i have a lot of VC money to throw at an exponentially scalable system”. (while also avoiding AWS.)
(the very first post to this blog was the same ren & stimpy reference as the title of this post.)
the state of things
just over seven years ago, i mentioned that i had decided to switch over to using scat, the point of sale software that i had been knocking together in my spare time. it happened, and we have been using it while i continue to work on it in that copious spare time. the project page says “it is currently a very rough work-in-progress and not suitable for use by anyone.” and that's still true. perhaps even more true. (and absolutely true if you include the online store component.)
it is currently a frankenstein monster as i am (slowly) transforming it from an old-school php application to being built on the slim framework. i am using twig for templating, and using idiorm and paris as a database abstraction thing.
i am using docker containers for some things, but i have very mixed emotions about it. i started doing that because i was doing development on my macbook pro (15-inch early 2008) which is stuck on el capitan, but found it convenient to keep using them once i transitioned to doing development on the same local server where i'm running our production instance.
the way that docker gets stdout and stderr wrong constantly vexes me. (there may be reasonable technical reasons for it.)
i have been reading jamie zawinski’s blog for a very long time. all the way back to when it was on live journal, and also the blog for dna lounge, the nightclub he owns. the most recent post about writing his own user account system to the club website sounded very familiar.
character encoding is still hard?
email2webhook is nice in theory, but fell flat in handling basic character encoding.
that i still have to fight the same sort of issues that i was dealing with about 16(!) years ago is somehow not at all surprising.
i’ve switched to using zapier’s email to webhook features. it will probably bring a different set of challenges.
😎
how to fix eleven bugs in mysql 5.1
my “mysql client fixes” branch on launchpad contains fixes for eleven bugs (nine of them reported on bugs.mysql.com).
don’t get too excited — these are all the lowest priority-level bugs, mostly typos in comments and documentation.
now i have to figure out the latest process for actually getting these changes into the official tree. there are different policies around how and when to push to trees since i was last doing any server development. from someone who is partially outside, it all seems very tedious and designed to make it impossible to fix anything. process gone bad.
the mysql server isn’t going to get the benefits of using a good, open-source distributed revision control system unless it stops
bug tracking and code review
i was going to write some reactions to an observation that postgresql has no bug tracker and its discussion last week, but lost the spark and abandoned the post after a few days. but today i ran across a quote from linus torvalds that neatly sums up my thoughts:
We’ve always had some pending/unresolved issues, and I think that as our tracking gets better, there’s likely to be more of them. A number of bug-reports are either hard to reproduce (often including from the reporter) or end up without updates etc.
before there was a bug tracking system for mysql, there was a claim that all bugs were fixed in each release (or documented), and there has been a lot of pain in seeing how well that sort of claim stacks up against a actual growing repository of bug reports. if the postgresql project were to adopt a bug-tracking system, i am certain that they would find the same issue. before long, they would be planning bug triage days just like every other project with a bug-tracking system seems destined to do.
another good email from linus about these issues was pointed out by a coworker, but this part in particular caught my eye:
Same goes for “we should all just spend time looking at each others patches and trying to find bugs in them.” That’s not a solution, that’s a drug-induced dream you’re living in. And again, if I want to discuss dreams, I’d rather talk about my purple guy, and the bad things he does to the hedgehog that lives next door.
the procedure at mysql for code reviews is that either two other developers must review the patch, or one of an elite group of developers who are trusted to make single reviews. then the developer can push their changes into their team trees, taking care to have merged the changes correctly in anywhere from one to four versions (4.1 and up).
this is a huge amount of friction, and is one of the most significant problems causing pain for mysql development. two reviewers is just too high of a bar for most patches, and having the rule makes the reviews rote and less useful. there is also an unreasonable amount of distrust being displayed by this procedure, that says that developers can’t be trusted to ask for help when they are unsure, but should feel free to make the occasional mistake by pushing something that isn’t quite right.
i wonder if we could be taking better lessons from linux’s hierarchical development model, with the pulling of code up through lieutenants to a single main repository, rather than the existing model that involves every developer moving their own stones up the pyramid. it would require some developers (more senior ones, presumably) to spend more of their time doing code management as opposed to actual coding.
monty is not particularly happy with the state of development of his brainchild now. would he be happier if he were in a linus-like role of rolling up patches and managing releases?
i wish had the patience to write at less length and greater coherence about this.
don’t ask too many questions
chyrp is a nice looking piece of blog software. individual posts can have different styles, something it borrowed from the hosted tumblr service. i was interested to read about “the sql query massacre of january 19th, 2008” but the numbers gave me pause — 21 queries to generate the index page? that is down from an astounding 116, but that still seems ridiculous to me.
the number of queries to generate the index of this site? two. one of them is SET NAMES utf8
. i could see boosting that to three or four if i moved some things like the list of links in the sidebar into the database, or added archive links. call it five if i had user accounts.
but right now, the number of queries used to load the index page on a chyrp site grows with the number of posts displayed on the front page. not only that, it grows by two times the number of posts on the front page.
chyrp could use a security audit, too.
what is 10% of php worth?
i am listed as one of the ten members of the php group. most of the php source code says it is copyright “the php group” (except for the zend engine stuff). the much-debated contributor license agreement for PDO2 involves the php group.
could i assign whatever rights (and responsibilities) my membership in the php group represents to someone else? how much should i try to get for it? i mean, if mysql was worth $1 billion....
i am still disappointed that a way of evolving the membership of the php group was never established.
connector/odbc 5.1.1 (beta!)
mysql connector/odbc 5.1.1-beta is available.
we didn’t implement all of the features in our original plan, but we decided to close out 5.1 to new features so that we could work on getting it to a production (GA) release as soon as possible.
5.1 has it’s share of bugs still, but we have tackled the most serious ones, and now that we are done with features (for the time being) we can focus on making the GA release shine.
now the race is on to see who gets out a 5.1 GA release first — the server or connector/odbc!
mac os x programming help needed
one of the features we had planned for mysql connector/odbc 5.1 is native setup libraries for the major platforms. we have the microsoft windows version going, and some code to get us going on linux/unix (using gtk instead of qt), but our gui team is too busy to get us started on a native mac os x version.
anyone want to pitch in by showing us how to get a basic dialog window to pop up based on a c library call? i think we will be able to customize it from there, but i am just unfamiliar enough with mac os x gui programming that i have a feeling it would take a long time for me to get that going.
connector/odbc 3.51.20 and 5.1.0
another month, another release of connector/odbc 3.51. there’s not a lot of bug fixes in this one, but we did manage to get the bug count under 80 bugs.
the reason there were fewer bug fixes in the release of 3.51.20 (other than there being fewer bugs to fix) was that we have been hard at work on connector/odbc 5.1.0, which builds on the 3.51 foundation to bring new functionality like unicode and descriptor support. there are more features planned, and you can see the release announcement for details. i hope that we’ll be able to keep on releasing new versions of 3.51 and 5.1 on a monthly basis.
connector/odbc 5.0 has met the same fate as the aborted 3.52 and 3.53 releases. it was an ambitious ground-up rewrite of the driver, but once we had put renewed efforts into getting the 3.51 code into better shape, it became clear that doing the same for a completely different code-base made little sense. we are going to be cherry-picking some of the 5.0 code for some of the new features.
i am sorry that we have been secretive about what was up with the future of 5.0, but we decided it was better to not talk about what was happening until we were confident about the decision to kill it.
connector/odbc 3.51.19
we managed to let a pretty significant regression sneak through in 3.51.18, so we’ve turned out a quick release of mysql connector/odbc 3.51.19. sorry for the hassle.
independence day for code
as i’ve been threatening to do for quite some time, i’ve finally made the source code for bugs.mysql.com available. it is not the prettiest code, and there’s still all sorts of hard-coded company-specific stuff in there. but it is free code, so stop complaining.
it is available as a bazaar repository at http://bugs.mysql.com/bzr/
. i have not yet set up any sort of fancy web view, or mirrored it to launchpad.
i plan to do the same for the lists.mysql.com code some day. one limiting factor now is that machine only has python 2.3 on it, and bazaar needs python 2.4.
my five mysql wishes
jay pipes started with his five mysql wishes, and others have chimed in. i guess i may as well take a whack at it.
connect by
. yeah, yeah. it’s not standard. i don’t care.- expose character-set conversions in the client library. all the code to convert between all of the character sets understood by the server is there, there’s just no public interface to it.
- online backup. it’s in progress, but this will make things so much better in so many ways. we could actually have reliable backups of bugs.mysql.com. and it’s going to make starting up new slaves so much easier in replication.
- re-learn how to ship software. the long release cycles of 5.0 and 5.1 have been pretty ridiculous, and i’m sure we can find a better way to add features without having to slog through months of bug-fixing to get a release to production quality. it is frustrating to ask for new features and have the fear that there won’t be a production release that includes them for another couple of years.
- fix planet mysql to handle utf-8. seriously, guys, it’s not that hard.
style="design: prettier"
i finally got tired of the index pages on the mysql mailing lists looking like ezmlm-cgi
, so i cribbed some design from the perl mailing lists and now the by-thread index pages include who participated in a thread. i didn’t steal the pagination of busy months. yet.
i need to package up more of the bits of code driving the mysql mailing lists. there are some quirks, but i like the way it all fits together.
i also need to put in the few hours it would take to make it possible to post to the lists from the web interface.
angry programming
mysql doesn’t have quite the number of fancy internal applications that you might suspect, and i got frustrated when the company started to roll out a system of monthly time-off reports based on emailing around an excel spreadsheet. (to add icing to that cake, they kept sending out the excel sheet with password protection!)
last friday, i spent an afternoon cooking up this little proof-of-concept application that tracked the same information as the spreadsheet, but in tasty web format, with some ajax goodness (courtesy of prototype).
as it turns out, there was an official company tool for doing this that was in the works, but they hadn’t bothered to let anyone know it was imminent. i’m told it is sox-compliant and configurable six ways to sunday. i haven’t seen it yet.
so my meager efforts did not go to waste, i just spent another half hour to make this a standalone demo (rather than tying into our internal personnel database). perhaps someone else can find some use for it, or take some inspiration from it.
here’s the simple workflow for the application:
- employee clicks on days they took off in a month.
- employee clicks button to get month approved, which sends email to boss.
- boss reads email and follows link to view the report online.
- boss clicks the button to approve the report, which sends mail to the employee and the finance department.
- the finance department does whatever it does with the data. the employee can no longer change it.
obviously that’s not quite all you would want for a fully-functional application, but it is most of the way there. i think it’s already better than the system that involved emailing an excel spreadsheet around.
jobs at mysql
mysql has quite a few open job listings. some positions of note: web developer, support engineer, maintenance developer, qa engineer, and performance architect. all of these positions are available world-wide, so you get to work from home. some of the other jobs from the full list are location-specific.
if you mention that i referred you for some of these positions and are then hired, i get some sort of referral bonus.
backcountry programming
i’m back to doing some work on connector/odbc, fixing bugs in the “stable” version (otherwise known as 3.51.x). we have another project going that is a ground-up rewrite of the driver, but i’m not really involved with that.
the state of the stable version of the driver is pretty sad. i keep running into pockets of code that are truly frightening. my first big foray was a bug i reported when i was trying to get the test suite to run, and it basically entailed throwing out a function that made no sense and replacing it with code that has such niceties as comments.
as i’ve started into the catalog functions, starting from this ancient bug, i’m finding even more frightening (and untested) code.
my general goal is to leave things cleaner than i’ve found them, doing things as incrementally as i can. we’re going to be building out a richer test suite, which will be a tremendous help, both in getting the “stable” version of the driver into better shape, and proving the capabilities of the rewrite.
i know it has been a long time since the last connector/odbc 3.51 release — kent, one of the build team members, is working on scripting the building, testing, and packaging so that we can crank out builds more consistently and reliably. unfortunately, a lot of the magic incantations were lost as people moved on to other work or other companies. the days of connector/odbc being the neglected stepchild of mysql products may be coming to an end.
colobus on google code
i created a google code project for colobus and imported the old releases into the repository. i’m not a huge subversion fan, but it didn’t make much sense to stick with bitkeeper with no free version available, and making system administration google’s problem seemed like a good thing.
i’ve folded in a couple of patches that were sitting in my inbox, and have another bigger one to put the group information into the database.
being known for being you
mike kruckenberg shared his observations from watching mysql source code commits, and jay pipes commented about this commit from antony curtis which had him excited. now that’s how open source is supposed to work, at least in part.
i replied to a later version of that commit to our internal developer list (and antony), pointing out that with just a little effort the comment would be more useful to people outside of the development team. “plugin server variables” doesn’t really do it justice, and “WL 2936” is useful to people who can access our internal task tracking tool, but does no good to people like mike.
the other reason it is good to engage the community like this is because it is very healthy for your own future. being able to point to the work i had done on open source and the networking that came from that have both been key factors in getting jobs for me. i’m sure it will be useful next time i am looking, too.
joining activerecord with mysql 5
dhh committed a patch for activerecord to make it work with mysql 5 that was subsequently reverted because it broke things on postgres and sqlite.
obviously we’d like ruby on rails to work with mysql 5, but because there was no test case committed along with either of these changes, i don’t really know the root cause of the problem. dhh claims it is the changes that made mysql conform to the standard sql join syntax, but i can’t evaluate that because i can’t reproduce the problem.
any activerecord gurus want to point me in the right direction?
better out-of-the-box mysql support for ruby on rails
activerecord now supports mysql 4.1 (and later) out of the box whether you are using new or old-style passwords, because they applied my patch for handling the related protocol changes correctly.
(it’s not quite out-of-the-box yet — the fix will appear in the next major release of rails, i guess. it’s fixed in their repository.)
now if only the upstream developer would show signs of life, and get that fixed. i’d complain about that more, but there’s a lot of windows around here.
don’t bother paddling upstream
so turns out that the ruby on rails developers had already added 4.1 authentication support for their bundled version of ruby/mysql, but they’ve found the upstream maintainer as unresponsive as i have. their implementation wasn’t quite complete, so i’ve submitted a patch to round it out.
the version included with ruby on rails doesn’t include the test suite, though.
more ruby/mysql love
i’ve updated my patch for new-style mysql authentication for ruby/mysql, with a new test case for the change_user
method (and support for same with new authentication).
i’ve even tested this against a 4.0 server, so i’m pretty sure i didn’t break anything.
new-style mysql authentication in pure ruby
ruby has two modules for connecting to mysql. one is called mysql/ruby and is built in top of the standard libmysqlclient
c library. the other is called ruby/mysql and is pure ruby. the problem with the latter is that it is a from-scratch implementation of the mysql network protocol, and the authentication handshake changed in mysql 4.1.
but here is a patch to add support for new-style mysql authentication to ruby/mysql. it should also deal with the other protocol changes that came along at the same time. it doesn’t do anything to expose server-side prepared statements.
it is only lightly tested. in particular, i haven’t tried to connect to a pre-4.1 version of the server. it should still work, but it is entirely possible i screwed it up. i’m also still just learning ruby, so there are some ugly bits.
i think having more from-scratch implementations of the protocol is a good thing. there’s at least five — the server code itself (and client library), connector/j, connector/net, ruby/mysql, and Net::MySQL (perl). once we have these all collected into an integration test suite, the server developers will get much better feedback when they go off the reservation with the protocol.
where should i be lurking?
trying to find places where people talk about using python, ruby, and php with mysql has been a bit of a challenge.
the problem on the php side is that php forum on forums.mysql.com is so filled with pre-beginner-level questions that it’s barely worth it for me to spend my time digging through it.
for python, the python forum on forums.mysql.com is nearly a ghost town. the forums for the mysql-python project seem slightly active, but the sourceforge forum interface is just bad. (not that any web-based forum isn’t starting from a bad place.) the db-sig mail archives also have some interesting discussions.
for ruby, the ruby forum on forums.mysql.com is even quieter than the python one, and i haven’t found anywhere else.
another thing i’ll take a look at is apr_dbd_mysql, which is not part of the main apr-util repository because of licensing issues (ugh).
where else should i be looking?
acronyms
tim bray coined MARS: it stands for “mysql + apache + ruby + solaris.” (get the shirt.)
bill de hóra proposed MADD: “mysql + apache + django + debian.”
when forwarding the above to an internal mailing list at mysql, i proposed MAUDE: “mysql + apache + ubuntu + django + eclipse.” the logo would be a picture of bea arthur, of course.
but mårten mickos, ceo of mysql, came up with MARTEN: “mysql + apache + ruby + tomcat + eclipse + nagios.”
or would that be åpache?
one month to go
the mysql users conference 2006 is only a month away. i’m just going to be dropping in for one day to give two talks — “embedding mysql” and “practical i18n with php and mysql.”
there is also a great lineup of other speakers, tutorials, and keynotes. i’m going to miss the keynote by mark shuttleworth, but i am looking forward to the keynote by the founder of rightnow.
the truth is out there
i talked at scale 4x, and you can download the exciting slides. the picture is of future oracle employee and zend co-founder andi gutmans, and there are a few more pictures from the first day.
(neither andi nor dave from sleepycat admitted to the imminent acquisitions of their companies by oracle.)
numbers singly
the “generally available” (or production-ready) version of mysql 5.0 was officially released today, and kai voigt, one of the mysql trainers, has posted a sudoku solver written as an sql stored procedure. the solver was actually written by per-erik marten, the main implementor of mysql’s stored procedure support.
it’s probably not the best showcase of stored procedures, but it is a nifty little hack.
matt sergeant’s article about using qpsmtpd is noteworthy for reasons other than it has my name in it.
i’m still running a pretty minimal set of qpsmtpd plugins since i upgraded by server to ubuntu. my main source of spam is my old college address, which is so ancient that it is deeply embedded in the mailing lists that spammers swap. and apparently they aren’t running very good antispam software at hmc. (i think they expect each user to set up spamassassin on their own.)
here’s an interesting tidbit i picked up from the hmc cs department site: a co-inventor of sql, don chamberlin, is a fellow alum.
stefan esser has dug up and fixed more php xml-rpc vulnerabilities, and best of all, has worked with the package maintainers to purge them of their use of eval()
.
stefan can be a bit of a blowhard, but it’s excellent work like this that makes that easier to swallow.
more jobs at mysql
it occurred to me that i mentioned the product engineer position, but there are a number of other jobs at mysql that are open, including web developer.
the race is on
stefan esser dissects one tip in a bad article about php, but is merciful in leaving the others alone. one thing you’ll note if you line up this second article with the first article is that not only are the tips not very good, the author can’t count to ten.
and on the useful-php-news front, andrei’s unicode work* has landed in the php development tree, and rasmus sparked a long discussion of other php6 features. the perennially lost cause of trying to rename functions and change their argument order resurfaces, of course, but it doesn’t look like anyone is taking it all that seriously.
the race is now on between perl6 and php6.
- other people have been involved, i’m sure. i just don’t know who they are.
damian conway’s “ten essential development practices” article (via daring fireball) may appear on perl.com, but the basics are applicable to any software project.
i would put “use a revision control system” way at the top of the list, and i would also add “use a bug-tracking system.”
there was a hole in the pear xml-rpc package, and as a result many php-based applications had a security hole as a result, such as the many php blogging apps.
the thing is, this came about because the xml-rpc library builds up some code and calls eval()
. whoever wrote code to parse xml-rpc by building code and calling eval()
should have their computer taken away. and then possibly be beaten with it.
the pear code is actually a fork of edd dumbill’s php xml-rpc code, and this is not the first security hole that has been discovered in that code as a result of this positively shameful architecture. i will not be at all surprised if it is not the last.
and for those keeping score at home, i pointed out how dumb this was almost four years ago.
a few resources
here’s a few resources that someone may find helpful:
- php’s htmlspecialchars() function, useful for encoding user input that may contain characters like <
- php’s addslashes() function, useful for escaping user input for putting into an sql query (even better is to use a parameter-based query api)
- a list of the top ten php security vulnerabilities
and don’t forget that in php, variables like $_SERVER['REQUEST_URI']
and $_SERVER['HTTP_REFERER']
are user input.
happy birthday, php
just ten short years ago, php appeared on the scene.
the first time i wrote any php code was about eight or nine years ago. the most recent was about eight or nine minutes ago.
thanks to everyone who has made that all possible. especially, of course, rasmus, who we blame it all on. (well, most of us do.)
short tags and other php coding things
i like php’s short tags. i feel sad for people who feel they need to use the ‘<?php’ construct all the time. or worse, ‘<?php echo’ where a ‘<?=’ will do.
one part of my always-evolving personal php coding style is how i embed sql statements into my code. i used to generally do it like:
$query = "SELECT id,name,url,rss,md5sum,method,updated AS up," . " UNIX_TIMESTAMP(lastchecked) AS lastchecked," . " UNIX_TIMESTAMP(updated) AS updated" . " FROM blogs " . " WHERE updated > NOW() - INTERVAL 10 MINUTE AND method = 0" . " ORDER BY up DESC" . " LIMIT 10";
but lately i’ve been doing:
$query= "SELECT id,name,url,rss,md5sum,method,updated AS up, UNIX_TIMESTAMP(lastchecked) AS lastchecked, UNIX_TIMESTAMP(updated) AS updated FROM blogs WHERE updated > NOW() - INTERVAL 10 MINUTE AND method = 0 ORDER BY up DESC LIMIT 10 ";
it makes it easier to cut-and-paste into the mysql
client for testing.
O_NONBLOCK broken on darwin and irix?
i’ve been dealing with a mysql bug on irix and mac os x that turned up in our test suite once i fixed the kill
test to actually do the test properly. after much digging in code and staring at debug traces, i noticed on irix that in the thread that is being killed, it was stuck in a blocking read that the calling code believed would be non-blocking.
by changing our setting of the non-blocking mode to use both O_NDELAY
and O_NONBLOCK
instead of just O_NONBLOCK
, i was able to get the code to work. but i’m not sure why it is necessary.
on the bright side, this may also fix this bug about wait_timeout
not working on mac os x.
i may not be doing web development for my day job any more, but i put a little more elbow grease into the mysql bugs database to add two new features that people have asked for at various times: subscribing to updates on bugs, and making private comments. i also cleaned up the database structure a bit. for example, instead of storing email addresses for the assigned developer and reviewer, it actually has a proper link to the user database.
it’s not a particularly pretty code base (although i clean it up as i go), but i’m rather fond of this little bugs system.
cnet’s coverage of the bitkeeper kerfuffle revealed the osdl employee who drove the wedge: andrew tridgell, of samba fame.
bitmover is dropping the free version of bitkeeper, which is a shame. i’m not sure that we have decided what to do. i wish bazaar ng was a little further along. it looks like it is shaping up to be the best-of-breed of the new generation of open-source version control systems.
i got suckered into agreed to pick up some slack at this year’s mysql user conference, and will be giving a talk on “embedded mysql.”
the conference should be a lot of fun this year — from what i’ve heard, the number of signups has been huge, and we’re still four weeks away. one of the tutorials, advanced mysql performance optimization, is already sold out. being back in the bay area is definitely a good thing for this sort of conference.
wait, did someone just say four weeks away? i guess i need to figure out what this embedded mysql stuff is all about.
i’ll also be doing what is called guru best practices: php with andi gutmans.
URI::Fetch is a new perl module from ben trott (of movable type renown) that does compression, ETag, and last-modified handling when retrieving web resources. the lazyweb delivers again.
speaking of that, i found i had to do one additional thing to my php code that fetches pages because of a non-existent workaround for server bugs in the version of curl i’m using. so when blo.gs fetches a page to verify a ping and gets a particular compression-related error, it goes back out and requests the page again without compression.
city of angels to adopt open source?
a few los angeles city councilmembers have introduced a measure to have the city study using open-source software, and putting the possible money saved towards hiring new police officers. it sounds like a great plan, and i hope to get around to writing my city councilmember soon to encourage her to support the motion.
speaking of my city councilmember, i have gotten four calls from her campaign in the last few days. one of them was actually from the councilmember herself (before this open-source motion came up) due to some sort of mix-up by her campaign staff that led her to believe i had some issue i wanted to discuss. as i was sucking on the world of warcrack pipe at the time, i was in no mood to talk to her. then today was call number four, and i pointed out to the caller that if they called me again, i would almost certainly not vote for her in the upcoming primary. (the only other call i’ve gotten is from the bernard parks mayoral campaign.)
repeating myself
for the blo.gs cloud service, i had written a little server in php that took input from connections on a unix domain socket and repeated it out to a number of external tcp connections. but it would keep getting stuck when someone wasn’t reading data fast enough, and i couldn’t figure out how to get the php code to handle that.
so i rewrote it in c. it’s only 274 lines of code, and it seems to work just fine. it was actually a pretty straightforward port from php to c, although i had to spend some quality time refreshing my memory on how all the various socket functions work.
there’s a visible bump in the graph of my outgoing bandwidth from after this was fixed.
new job
after the first of the year, i’ll be starting my new job. but as seems to be the trend these days, it’s not a new job with a new company, just a new job with the same company.
i’m getting out of doing web development, at least for my day job. that’s why mysql ab is hiring a webmaster (which isn’t exactly the job i have now, but it basically the person who will take on the biggest chunk of what i was doing).
what i’m going to be doing is joining the development team, with my initial focus being maintenance programming for the server. i’m going back to my roots, and getting my hands dirty with “real” programming again. and i don’t think there’s any better way to learn the ins-and-outs of a system than chasing down bugs. just fixing the bug in how CREATE TABLE ... SELECT
statements were logged for replication gave me a good reason to get up-to-speed on several aspects of how things work under the hood.
this article by rands about the type of employee who has gotten locked into a role goes part of the way in explaining why i’m moving on from my current position. even if trying to become irreplaceable by being the only one who knows how to do something is not your goal, it is easy for that to happen by default if you’re in the same position for too long. so i hope that shaking things up will be good for the company as a whole, and not just for own own mental health.
one thing i’ll likely do early in the new year is get a new machine for doing development. i’m thinking of a athlon64 shuttle system, which i can get pretty loaded within my annual work computer budget. i may also upgrade my desktop (which is a personal machine) so that i can use the monitor with the development box when necessary (although it would run headless most of the time, and i doubt i’ll spring for a kvm or anything fancy like that). instead of actually getting a new desktop machine, one possibility is just selling the 17" imac and getting an apple cinema display and using that with my laptop (and the development machine).
(the fact that said development machine would likely be powerful enough to run world of warcraft well is entirely coincidental.)
enemies of carlotta is a mailing list manager in the style of ezmlm, but written in python by lars wirzenius. one problem with it is that it is written as a pretty monolithic application, as opposed to ezmlm’s series of commands that are run for a few different addresses. but there’s some interesting design decisions made. it doesn’t implement digests yet.
one of my biggest annoyances with ezmlm these days is that the digest generation is not character-encoding aware. so for a list like the mysql japanese list, the digests, particularly the plain-text one, look like garbage. this is more frustrating because i spent a fair amount of time making sure the web-based archive got the encoding issues right.
the mysql lists are set up so that both mime-encoded and plain-text digests are generated, using a dummy list and some judicious symlinks. when we took over the maxdb lists from sap, the existing lists only had a plain-text format, and the subscribers clamored for that when we only had the mime-encoded versions available.
three out of four ain’t bad
mysql 4.1.8 is out and it includes a fix for the bug that had been plaguing blo.gs. it also contains a fix i made for another bug.
i now have code in linux, mysql, and php. if only my patch to apache had been accepted, i’d have code in the whole LAMP stack. (the CookieDomain configuration setting was finally added about two years later, but not using my patch.)
more later.
another planet
planet mysql collects the blogs of various mysql employees and community members. (well, the only community members included right now are ex-employees.)
“X-Mailer: RLSP Mailer” appears to be a highly reliable indicator for spam, at least judging by the 250 or so messages i’ve gotten with that header in the last several months, which appear to all be variants of lottery and 419 spam. one place it comes up in a google search is the source for myphpnuke. i wonder if there’s a connection.
that reminds me: i should start using the spamassassin backport, to join the world of spamassassin 3.0. something to add to the list of things to play with over the long holiday weekend.
stealing an idea (and four lines of code) from shelley powers, i’ve implemented a very basic live comment preview. i need to read up on this xml http request object thing (which this does not use) to try doing other and more clever things. (christian stocker’s livesearch is a good example of clever xml http request object usage.)
that was easy. i bolted on the basic livesearch here. the integration could (and maybe someday will) be improved, but it was quite easy to get going.
layers
BEA’s Apache Beehive Hits Milestone
Officials at BEA Systems announced the first milestone release of their open source project, Apache Beehive.
i’m shocked, shocked to find that gambling is going on in here!
the aside on john lim’s php weblog about the “wintermute” nickname makes me laugh. i always treat it as shorthand for “someone who can be safely ignored because they thought they were clever for using a nickname from a classic cyberpunk book.”
mysql users conference call for papers extended
the conference website doesn’t reflect the new date yet, but the call for papers for the mysql users conference is being extended until november 15. so you have a second chance if you missed the initial call for papers, and are just hearing about it now. the speaker notification date isn’t changing. we just came to our senses and realized we didn’t need three whole weeks to make decisions on which sessions and tutorials to accept.
with the way things are going, we may have so many great talks that i don’t need to do one.
more on the uc2005 call for papers
i’m the chair for the lamp track, and would gladly accept bribes for submitted proposals. the best bribe is the submission of a good tutorial or session proposal. here’s the summary of the lamp track:
LAMP
This track is for anyone who is already using or considering making the open source LAMP (Linux, Apache, MySQL, PHP/Perl/Python) stack a core component of their software infrastructure. How is the LAMP stack is being used to power massive websites while maintaining a cost-effective TCO?
Topics:
- New LAMP Releases and Features
- Application Architectures
- Best Practices
- Case Studies
- LAMP related components (FreeBSD, NetWare, Windows)
there’s a new site opposing software patents in the european union called, appropriately enough, no software patents! the very clever might notice that the domain is registered in my name (along with the .org and .net variants), which is just a side effect of how the project got initiated within mysql (whose position on patents is online, incidentally). my involvement basically ended at registering the domain names, and they will be getting transferred to someone else soon.
eventum
i have been terribly remiss in not mentioning eventum before. it’s the issue-tracking tool that the mysql support team uses, and it is also used for task-tracking by a growing number of groups within the company. we liked it so much, we hired the author, bought out the software, and got it released under the gpl.
post chaining
powerblogs, the closed-source, hosted blog application that a few well-known political blogs use (among others, presumably) introduced a feature recently called “post chaining”. it’s sort of an on-the-fly categorization, where the posts in the chain are automatically linked to by all of the other entries in the chain. so if i provided a link to this posting in the middle of a chain about jimmy swaggart’s stupid kill-gay-people remark, you can easily find what earlier and later posts on the same issue had to say.
theoretically, saving a document as a pdf file is easy on mac os x. as a practical matter, the option is grayed out in the print box when i try to do it. and if i go into the output options, it won't allow me to save as pdf from there, either (but it will allow me to save as postscript).
i do remember being able to do this once upon a time. i think it stopped working sometime after i reinstalled with panther.
i couldn’t come up with an excuse to include an omnigraffle graphic.
flow
i wrote nearly 1200 words this afternoon. it is one of those things where getting started is the hard part. and, i guess, going back and editing and rewriting to make what is written actually coherent. but i’ll experience more of that tomorrow.
is there anything about unicode and character set support in mysql 4.1 that you want explained? now would be a good time to tell me.
yummy
after meeting joshua schachter at foo camp, i decided that i should take another look at del.icio.us, which he created. i’d consider it a bit of a spiritual cousin of blo.gs, in that it is basically not-for-profit, and a way for him to blow off creative steam.
here’s my del.icio.us page, which i may at some point figure out a way to incorporate here.
i am definitely planning on revamping the crude category system i wrote for this blogs to be tag-based. tags are what all the cool kids are doing these days.
php vs. perl for web development
joe johnston explains why php is more popular than mod_perl for web development. the short answer is that they solve different problems.
i've been thinking about this with regard to python recently. i'd love to learn more python, and use it in the web space, but mod_python
is more like mod_perl
than php, and when i'm developing web stuff, my thinking is matches the php model closer than the mod_(python|perl)
model.
more on the cloud
ben hyde writes very smart things about collaborative model synchronization
based on my earlier post about decentralized notifications and content distribution.
the privacy issue is something i forgot to mention, but is definitely another factor to consider. (i’m not sure it is a critical issue, but my perspective is likely skewed by how public i am with my list of subscribed blogs.)
here’s another: how does the publisher know how many people are following what they write? (again, not something i personally feel is critical, but i also rarely even look at the stats or logs of my sites.)
decentralized web(site|log) update notifications and content distribution
this is something that has been on my mind lately, and hope to talk about with smart people this weekend. (“the first rule is...”)
in a bit of interesting timing, this little software company in redmond recently hit the wall in dealing with feeding rss to zillions of clients on one of their sites.
in preparation, i’ve been digging into info on some of the p2p frameworks out there. the most promising thing i’ve come across is scribe. the disappointing thing (for me) is that it is built with java, which limits my ability to play with it.
while it would be tempting to think merely about update notifications, that just doesn’t go far enough. even if you eliminated all of the polling that rss and atom aggregation clients did, you would have just traded it for a thundering-herd problem when a notification was sent out. (this is the problem that shrook’s distributed checking has, aside from the distribution of notifications not being distributed.)
the atom-syntax list has a long thread on the issue of bandwidth consumption of rss/atom feeds, and bob wyman is clearly exploring some of the same edges of the space as me.
maybe it’s useful to sketch out a scenario of how i envision this working: i decide to track a site like boing boing, so i subscribe to it using my aggregation client. when it subscribes, it gets a public key (probably something i fetch from their server, perhaps embedded in the rss/atom feed). my client then hooks into the notification-and-content-distribution-network-in-the-sky, and says “hey, give me updates about boingboing”. later, the fine folks at boing boing (or xeni) post something, and because they’re using fancy new software that supports this mythical decentralized distribution system, it pushes the entry into the cloud. the update circulates through the cloud, reaching me in a nice ln(n) sort of way. my client then checks that the signature actually matches the public key i got earlier, and goes ahead and displays the content to me, fresh from the oven.
another scenario: now when i subscribe to jeremy zawodny’s blog, who has been slow to update his weblog software (in my hypothetical scenario) because he’s too busy learning how to fly airplanes, i don’t get updates whenever he publishes. but there’s enough other readers running this cloud-enabled aggregation software that when they decide they haven’t seen an update recently, they go ahead and poll his site. but when they notice an update, they inject it into the cloud. or they even notify the cloud that there hasn’t been an update.
obviously that second situation is much less ideal: there’s no signature, so some bozo could start injecting postgresql is great!
entries into the jeremy zawodny feed space. or someone could just feed nothing changed
messages, resulting in updates not getting noticed. the latter is fairly easy to deal with (add a bit of fuzzy logic there, where clients sometime decide to check for themselves even when they’ve been told nothing is new), but i’m not so sure about the forgery problem in the absence of some sort of signing mechanism.
in addition to notification, a nice feature for this cloud to have would be caching. that way when i wake up my machine in the morning, the updates i’ve missed can stream in from the network of peers who have been awake, and i don’t have to bother the original sites.
i don’t think there is going to be a quick and easy solution to this, but i hope to aid in the bootstrapping. if nothing else, blo.gs can certainly gateway what it knows about blog updates into whatever system materializes. (it certainly can’t scale any worse than the existing cloud interface, which is pretty inefficient given the rate that pings are coming in.)
a footnote on the signing mechanism: there’s the xml-signature syntax and processing specification that covers this. i haven’t really looked at in detail to know what parts of the problem it solves or does not solve.
(anybody who suggests bittorrent as a key component of the solution will have to work much harder to get a passing grade.)
mailing list wishlist
justin mason has a mailing list wishlist. the ezmlm-based system for the mysql mailing lists does the archive-permalink thing. it is added to the message as the List-Archive header. (maybe that is an abuse of that header, but it seems more relevant than just putting the link to the main archive in the header.)
there are a number of things i don’t like about ezmlm, but the biggest advantage is that it is decomposed into enough distinct little bits that it is not difficult to rip out and replace specific bits. for example, you can replace the subscription confirmation (and make it web-based, and not vulnerable to stupid autoresponders subscribing themselves) by just adding a program into the manager
that handles them before they get to ezmlm-request
and ezmlm-manage
.
i haven’t spent a lot of time futzing with mailman, but i’ve never really cared for it as a mailing list user.
but i’m not sure it really matters. all the kids are crazy about web-based forums these days. people who recognize the superiority of mailing lists are dinosaurs.
xplanet desktop background for mac os x
justin mason provides desktop background images generated using xplanet and satellite cloud data.
i couldn’t get the recommended mac os x tool (geektool) to work, so i came up with a lower-tech solution. i created a folder ~/Pictures/Backgrounds/, and set up a cron job to pull down the latest image every hour (using curl, and limited to the hours i’m likely to be awake to avoid some unnecessary traffic). and then in the system preferences, i set up the background to change picture every 5 minutes, with the folder i created selected. since there is only ever one picture in there, it just reloads that image.
it’s not quite the ideal solution (it would be nice to be able to just signal that the image should be reloaded after it is updated, rather than having it do it every five minutes), but what i did was easy to set up.
the image is fascinating. you can see all sorts of other tropical storms that you don’t hear about in the news, and right now you can see a cloud front moving across the midwest.
new colobus release
i popped out a new colobus release. nothing exciting, just some performance tweaking to take better advantage of the database back-end.
ballistic
in the race of who would snap first and rewrite all of ezmlm in perl, it looks like ask has jumped to the head of the pack. now i just have to find some time to play around with it (and pitch in — i’d particularly like to implement flexible digest generation that wasn’t oblivious to character sets).
failed password for root from ...
what is with the recent uptick in failed ssh logins everywhere? a few weeks ago, i almost never got emails from the automatic log watchers about these, now i get at least one or two a day, all from different ip addresses. usually they’re attempted root logins, but sometimes they’re attempts to log in as other role accounts (like bin
).
for the record
rasmus first implemented the handling of urls like http://php.net/base64_encode on october 6, 2000. (i did something similar for urls like http://mysql.com/select on march 11, 2003.)
i’m thinking that this might be a good topic for an article for the mysql developer zone.
generating a last-modified time from php
while the getlastmod()
function can tell you when the main file was last modified, it would be cool if php kept track of the most recent last modification time of all included files, assuming that it is already doing a stat()
on each file as it includes it.
the value may not always be directly applicable (sometimes you are pulling in other data, or database information), but it would be useful. i guess you would want another function to inject possible timestamps into the mix.
the alternative is to iterate over the results of get_included_files() and stat()
all of them.
or i guess you could just live with getlastmod()
, and ignore the fact that it isn’t accurate when you do something like change the header that you’re including via some other mechanism.
uppsala calling
i picked up a grandstream budgetone 101 to use with the sip (voip) server we have set up at work. i just plugged in the server, username, and password info, and now i can talk with my colleagues all over the world, without long distance charges. pretty nifty. (well, it will be more nifty once more of said colleagues also get their hands on voip phones, or headsets with a softphone.)
i do wish it was a dual-line phone that handle pots in addition to voip.
when i’m bored, i can dial the echo server number and talk to myself via a server in sweden, which has a pleasant perversity to it.
错
reading php 5: a sign that php could soon be owned by sun requires the installation of a tin-foil hat, and buying in to the premise that zend == php. (via harry fuecks.)
i don’t even know where to start with a statement like this:
Some very useful functions have been added to PHP5. It’s been nine years in the making, but PHP5 now includes two functions to uuencode and uudecode. Combining those functions with the new socket and stream functions, developers can create a lots of "kewl" applications. An application to automatically encode and decode files to and from news servers comes to mind as an example of how to incorporate these new functions.
java does not appear to have built in uuencode and uudecode functions. clearly php is superior! (you see, i’m being sarcastic....)
on a slight tangent, is it just me, or could the migrating to php 5 section of the php manual use a once-over by someone with a firmer grasp of english grammar? (no disrespect to the authors meant, it just has a surprising number of clunky statements.)
the funny bit about danny o’brien's notes on andy oram’s talk at oscon is that he’s clearly the person who was entering just as i was leaving. andy had just pulled up the slide about trackback when i stepped out. (i had a flight to catch.)
validating utf-8 by regex
in actually using the regex in this w3c faq, i noticed that it has a few typos: the first three escapes are missing the 'x' to put them into hex. i’ve let the author know. the corrected example:
$field =~
m/^(
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$/x;
getting the right abstraction from your database abstraction layer
jeremy zawodny rails against database abstraction layers, particularly those that aim to provide database independence. the abstraction layer that i use is really just aimed at making it so i can do more with less code. it’s inspired a bit by the PEAR DB layer, but without the database independence cruft. it means i can write $db->get_assoc("SELECT foo, bar FROM users WHERE id = ?", $id)
instead of having the same four lines of query building and row fetching all over the place.
this is actually the object-oriented version of the procedural version of this that i used to use. i needed to be able to use it in situations where i may have multiple connections to different databases open, so the old abstraction layer got painted with a thin OO veneer. the mysql extension’s handling of a default connection is useful for the lazy programmer, but doesn’t mix well with functions that take a variable number of arguments.
“hello shoe, meet the other foot.”
the guys who founded zend used to get hot under the collar because they believed they weren’t getting the recognition they deserved for php, whereas rasmus was more frequently quoted, interviewed, and credited. Andi and I created the language in 1997, when we needed a solution to implement a shopping cart project for University. Some ideas were borrowed from PHP/FI, a tool that we tried to use beforehand, but that proved to be far too limited and unreliable for our purposes.
(via linux today.)
apachecon cfp
the apachecon 2004 call for participation is out. submission deadline is 23 july 2004, right before oscon. perhaps i’ll submit the follow-up to my oscon talk. (which would be talking about the i18n features of mysql 4.1 and php5, which i plan on only mentioning briefly in my oscon talk.)
php’s dumb xml parsing behavior
steve minutillo, author of feed on feeds, runs headlong into the execrable character encoding behavior of php’s xml parsing functions. hey, i was complaining about that just last year... (via phil ringnalda.)
and a related link, this article from the w3c explains how to deal with encoding issues in forms and has a nice regex that verifies whether a string is valid utf-8.
here’s some links culled from an i18n discussion on the twiki site:
Now that I've looked a bit more, there are many algorithms out there for charset detection, but most are aimed at HTML page auto-detection, and may well not work well for URLs:
- Frank Tang's charset detection links - includes simple Perl UTF-8 detector based on legal codings
- Excellent paper on Mozilla's 3-part algorithm using coding legality, character frequencies and two-character frequencies - detects the language as well as the encoding. Too complex for use on URLs, but looks very good.
- Discussion on IRC auto-detection of charsets
- Simple UTF-8 detector in C
- CPAN:Unicode::Japanese - includes auto-detection for various Japanese charsets
- CPAN:Encode::Guess - auto-detection from suitably dissimilar charsets (needs Perl 5.8)
- Browser detection for forms input datatypes including useful undocumented JavaScript to check IE's current charset (try this out now if you are using IE - see Sandbox.TestCharset).
- TextCat, tool for language detection - in Perl, OpenSource
i really need to write the slides for my talk at oscon, which will cover exactly this sort of thing.
msn.co.kr gets header encoding wrong?
someone posted a question to one of the mysql mailing lists, and the archives weren’t displaying the korean characters correctly.
the encoded bit of the From header looks like =?ks_c_5601-1987?B?7JygIOywve2YuA==?=
, but if i treat the content as utf-8, it is displayed correctly (at least identically to how my mail program displays it). but for the body, i need to recode the content from mscp949 (another name for ks_c_5601-1987, according to this email to ietf-charsets) to utf-8 to get it to display correctly (or at least something resembling correct).
so i can get the message to display correctly (i think), but only by cheating: treating ks_c_5601-1987 as utf-8 when recoding the headers, and as mscp949 when recoding the body. that’s just a little gross.
this will surely cause problems in the face of another mail client that actually uses that character set correctly. it appears to be unique to microsoft mailers, though, so perhaps they got it wrong consistently.
and, of course, it is entirely possible that the results i’m displaying now are completely wrong, and it says rude things about elmo where the MSN advertisement is supposed to be. (although judging from the babelfish translation of the text, i think it is correct.)
colobus 2.0 released
blogging tools and code quality
rayg takes a look at some of the php-based alternatives to moveable type. i have to agree with his assessment of wordpress and serendipity. the code is just messy, like most open-source php projects. my impression of textpattern was that it is even messier.
of course, this is mostly an aesthetic judgement. one thing they all have in common is good, and especially good-looking, user interfaces. i suspect that’s more important than the code quality for most bloggers.
(and to be fair, i’ve only looked at version 1.0.2 of wordpress, an older version of s9y, and an early beta of textpattern. they may have all improved.)
i’m still of the roll-my-own blogging tool mentality. it isn’t often that i miss a feature that one of the off-the-shelf packages would give me, and it’s not a large amount of code to hack on for fun. the code for the blogging bits of this site is less than 1000 lines. the code for another blog of mine is less than 100 lines, plus the 936 lines for textile. (it doesn’t supporting comments, though, and does post-by-email instead of having a web form.)
another colobus release?
i was all ready to write an blog entry about an upcoming release of colobus (my nntp server for ezmlm mailing lists), when i happened to look at one of my terminal windows and notice a very bad rm -rf colobus*
. uh, oops.
i’ve recreated the changes (the hard part was writing it the first time), now i just need to do more testing to make sure i really did recreate the changes. putting the 165,478 messages from the mysql general mailing list into the database takes a while.
i also have (and did not accidently delete, at least not yet) a replacement for most of the rest of the bits of the web-based archives that are currently served up by ezmlm-cgi
. which for lists.mysql.com, is just the listings — i already replaced the message view with code based on the lists.php.net code.
it handles the encoding of the posts to the japanese users list (unlike the current ezmlm-cgi
listings), which is cool, even if i can’t understand it. it also handles that wacky Antwort prefix the germans love so much.
i should really package up the web frontend stuff someday, too. there’s really not much of anything specific about the lists.mysql.com setup to it. it just mimics ezmlm-cgi
right now — i need to think more about how i really want it to look and work.
slides from “mysql and php: best practices”
the slides from my talk at the mysql users conference are available online now.
i’m most proud of the image on slide six when it comes to matching the image to the text, although it’s not my favorite of the images i used. maybe for my next talk, i’ll find an illustrator to do custom images. (i should have it written well in advance this time, since i’ll be writing an article for the mysql developer zone based on the topic, and possibly presenting it elsewhere before the o’reilly conference.)
omnigraffle is cool
i needed to remake some images for this article about storage engines in mysql, so after futzing around in illustrator for a bit, i remember omnigraffle, and registered it by the time i was making the third image. with some practice, i could be dangerous with this tool.
speaking of articles on the mysql developer zone, we’re working on publishing a new article there every week. if you’re interested in writing an article, drop me a line. i can’t promise fortune, but there may be a little fame. it may be a good way to find an open source programming job. (or support job, or documentation job, or ...)
someone put peanut butter in my шоколад
i’m a little undecided as to whether i really, really hate trying to track down problems with character encodings, or really enjoy it. there’s something about groveling through hex dumps trying to figure out which bytes are missing, incorrect, or shouldn’t be there in some EUC-JP encoded text, causing it to render funny little chinese characters instead of the correct funny little japanese characters.
i think it is a little surprising that there only two talks at the o’reilly open source conference that touch on internationalization and localization.
at least i’m getting some practical experience getting stuff like this to render correctly. or so i’m told. i may actually know what i’m talking about by the time i have to give the talk.
sam ruby has been writing various interesting things on this topic recently.
it’s a shame in particular that there’s no perl talk dealing with unicode issues. i’m still foggy what magic it is that perl does under the hood with regards to that.
gzip vs. bzip2 vs. rzip for log files
with my curiousity piqued by jeremy’s tests of gzip vs. bzip2 vs. rzip using a bunch of mail as the test data, i tried compressing an apache log file with the three tools, plus lzop:
program | cpu time (s) | size |
---|---|---|
gzip | 19.210 | 28,362,079 |
gzip -9 | 32.400 | 27,036,433 |
bzip2 -9 | 849.489 | 15,496,248 |
rzip | 147.460 | 18,823,330 |
lzop | 3.240 | 48,719,254 |
lzop -9 | 80.810 | 32,531,485 |
the original file size is 295,927,205.
it’s too bad rzip can’t decompress to a stream. that makes it much less attractive as a log compression solution.
my two ₰s
i was going to point out unicode font info as a useful tool for looking at the unicode character space, and point out how it would be nice if you could navigate the space regardless of font, and then have it tell you which fonts included which characters. but while i was fiddling with it, i stumbled on the character palette built into mac os x, which does exactly that. the unicode font info tool is still handy for being able to see the nitty-gritty of the font details, but the built-in character palette is super nifty. (via lordpixel’s advogato diary.)
mark pilgrim’s article on determining the character encoding of a feed touches on more stuff related to my practical i18n talk.
here’s another essay on UTF from tim bray.
(don’t mind me, i’m just making sure i’ve marked some of these articles for future reference when i actually start to write said talk.)
i ♣ encoding problems
some notes on character encoding issues. this is the sort of stuff i plan on covering in my talk at the o’reilly open source conference, with a focus on the practical issues of dealing with it from php and mysql. (via simon willison.)
colobus 1.2
it has been over two years since the last release of colobus, my nntp server written in perl that runs on top of ezmlm archives. this new release just incorporates a couple of accumulated bug fixes and tiny features.
i have a proof-of-concept version that uses a mysql backend. i’ll get that code folded in and cleaned up and make a 2.0 release some day.
tips on building networked applications
once upon a time, someone wrote an essay about things to keep in mind when developing or designing networked applications. (one point, or maybe the main point, being that you shouldn’t treat remote procedure calls just like local procedure calls.)
ring any bells for anyone?
oscon 2004 and mysql users conference 2004
my talk for the 2004 o’reilly open source convention was accepted: practical i18n with php and mysql
.
before that, i’ll be speaking at the 2004 mysql users conference on mysql and php: best practices
. (part of my rough schedule for traveling to orlando and cancun in april.)
for both talks, i’ll actually be pushing the boundaries of my own experience a bit. it’s a good way to force myself to learn more. (they’ll also both be all-new, or mostly all-new, talks.)
who wants to place bets on whether i end up in the last speaking slot for both conferences? i always seem to end up there.
shh
one of the secrets of doing things on the web is how little hardware and clever coding you really need. www.mysql.com is a dual p3/850 serving over 16 million page views per month.
the coding is quite ham-handed for the most part, honestly.
here’s how ham-handed: except for the documentation pages, all of the .html
pages are handled via a php script that creates an output buffer, includes the file, captures the output buffer into a variable, outputs the header (which gets things like the page title from a variable that were set when the file was included), outputs the data, and then outputs the footer. (the documentation pages are actually php files that include calls to generate the header and footer.) require_once
all over the place. a three-element include_path
. no Last-Modified
header (or conditional GET
handling, obviously). no php compiler cache. 96 RewriteRule
. the news on the front page? pulled from the database on every hit. (the query cache is turned on, however.)
things do gradually improve. i finally nuked the include file that had gems like this one:
function open_tr() { echo("<tr>"); }
i’ve also finally pared things down so the list of country names is only in one common include file. (actually, that’s not quite true. there are some old forms that define it on their own. one of them has three copies. and the geoip code has its own copy.)
maybe this isn’t the best time to mention i’ll be giving a best practices talk at the 2004 mysql users conference.
in defense of connect by
something on the near-term todo list in the mysql manual is oracle-like
. every once in a while, someone drops a comment there to say that there’s no way that connect by prior ...
connect by
should be implemented, because the sql standard specifies another syntax for recursive queries, known as with
.
as it turns out, ibm’s db2 implements the with
syntax, and here’s a nice article on the difference between the two syntaxes.
i can’t see how anyone can look at that article and clamor for the with
syntax instead of connect by
. i look at the statement using connect by
and the results it gives, and can think of several ways i could apply it in applications i’ve built or want to build. i look at the with
syntax and get dizzy. the syntax of with
just looks incredibly un-natural, even for sql syntax, in a way that connect by
does not.
there are undoubtedly things you can do using the with
syntax that you couldn’t with connect by
, but nobody has been able to point them out to me. and as far as i can tell from the article on ibm’s site, getting the type of results i’m interested in requires a stored procedure and query that is at least four times as verbose as oracle’s syntax.
(disclaimer: i work for mysql ab, but am not part of the development team, have no special insights into when either syntax will get implemented. i suspect that both will eventually be implemented: connect by
as an aid to people transitioning away from oracle and because it is something a lot of people ask for, and with
as part of our commitment to supporting the sql standards.)
a µ problem
planet apache does not handle utf-8-encoded content correctly. maybe it isn’t planet apache’s fault. it is trying to set the encoding in a <meta http-equiv="Content-Type" ... > header, but camino, safari, ie5/mac, and ie5.5/win all ignore it. i’m not sure what the rules are with regard to the content type being specified with different charsets in the response headers and in a <meta> element.
i’m not surprised it doesn’t work, there’s still a lot of gaps in being able to use utf-8 pervasively. i’m actually generating curly quotes by typing them instead of using something fancy like textile. (and since i’m always having to search to find this: source code for textile 2.)
protected by spf
i’ve set up the dns entries for making the domains under my immediate control protected by spf.
this means that for mail transfer agents that pay attention to spf data, they will know that the mail is bogus if it claims to come from one of my domains but is not actually sent from my machine. (or any machine, for some of the domains that never send mail.)
the lists.mysql.com server has been checking spf info for a while, and it blocks a dozen or so messages a day. that’s a really tiny percentage of the 150,000 incoming messages per day, but it does show that the system works when people publish the data.
i guess the next thing to do will be to get entries set up for other domains not under my direct control, but under my influence.
there’s all sorts of interesting data i’m logging on both my own mail server and lists.mysql.com. some day i should really write some tools to help analyze it. part of the problem is that there’s just too much stuff making it through the front-line filters. the lists.mysql.com smtp server still accepts about 25,000 messages a day, and even my own mail server accepts about 500 a day.
i’m still seeing about 20 spam messages get through a day. about two-thirds of that comes via work addresses (like the webmaster address), another one-sixth to my address here, and the rest via various other addresses. (that doesn’t include worms or worm-related bounces.) i could eliminate some of that by refusing mails sent via my alumni.hmc.edu address that is spam-tagged but still forwarded.
i’m still holding the line on doing any actual delivery-time filtering. once mail is accepted by my mail server, it goes into a regular mailbox, not something that fills up with piles of crap that i only check every three months. so when you send a mail and i don’t reply, it probably means i’m ignoring you. (don’t be offended, i do that to everyone.)
(disclaimer: spf is not the ultimate solution to kill all spam. but it would serve to eliminate some classes of spam, and helps out on the joe job
front.)
return HTML::br();
jeremy’s thoughts on developers who build from scratch vs. those that bring their own toolkits misses what i think is one big issue: how does each type of programmer fit into a larger team?
i would have serious qualms about hiring someone who had their pet framework that they used for building things. i think part of that comes from seeing so many half-witted frameworks. (anyone who has written a library or class to generate HTML with function, method, or object names like p
and br
deserves to be shunned.)
but perhaps i’m seeing three categorizations where jeremy sees two. i see people who build from scratch, those who seek out and use common frameworks and libraries from resources like PEAR and CPAN, and those who have built their own toolkits and frameworks. it’s the people at both ends that i worry about. of those three categorizations, i tend to wobble between the first and second.