March, 17, 2024 archives
Thoughts from SCALE 21x, day 4
Today was the last day of SCALE 21x. Again I didn’t make it out for the opening keynote, and I just took a quick spin around the expo floor to see it looking sort of quiet and winding down.
The first talk I attended was Jonathan Haddad on“Distributed System Performance Troubleshooting Like You’ve Been Doing it for Twenty Years” where he shared some of his insights from doing that the title said for companies like Apple and Netflix. His recommendation for greenfield deployments was to have Open Telemetry set up to collect traces and logs, and he was also a big fan of the BPF Compiler Collection (aka bcc-tools) for getting a realtime look into system issues. He was not a fan of running databases in containers, and even less of a fan of running them within Kubernetes. (You could almost see his eye twitch.)
The last talk that I attended (there were just two slots today) was Jen Diamond on “The Git-tastic Power of Conventional Commits.” It was a good talk that used a little light lexical analysis to explain the basic concepts of working with Git (and the revelation that it stands for “ global information tracker” although now a little more research shows that’s only sort-of true). This all led into talking about Conventional Commits which is a way of structuring commit messages, and how you could use that in automations and in driving semantic-versioning in the release process.
The final session was a closing keynote from Bill Cheswick titled “I Love Living in the Future: Half a Century of Computers, Software, and Security” but really could have just been “give the old guy the microphone and let him go!” I left a little over two hours ago, and I wouldn’t be surprised to hear that he’s still going. I hope they let him take a bathroom break.
Time to modernize PHP’s syntax highlighting?
This blog post about “A syntax highlighter that doesn't suck” was timely because recently I had been kicking at the code for the syntax highlighter that I use on this blog. It’s a very old JavaScript package called SHJS based on GNU Source-highlight.
I created a Git repository where I imported all of the released versions of SHJS and then tried to update the included language files to the ones from the latest GNU Source-highlight release (which was four years ago), but ran into some trouble. There are some new features to the syntax files that the old Perl code in the SHJS package can’t handle. And as you might imagine, the pile of code involved is really, really old.
That new PHP package seems like a great idea and all, but I really like the idea of leveraging work that other people have done to create syntax highlighting for other languages rather than inventing another one.
On Mastodon, Ben Ramsey brought up a start he had made at trying to port Pygments, a Python syntax highlighter, to PHP.
I ran across Chroma, which is a Go package that is built on top of the Pygments language definitions. They’ve converted the Pygments language definitions into an XML format. Those don’t completely handle 100% of the languages, but it covers most of them.
At the end of the day, both GNU Source-highlight and Pygments and variants are built on what are likely to remain imprecise parsers because they are mostly regex-based and just not the same lexing and parsing code actually being used to handle these languages.
PHP has long had it’s own built-in syntax highlighting functions (highlight_string()
and highlight_file()
) but it looks like the generation code hasn’t been updated in a meaningful way in about 25 years. It just has five colors that can be configured that it uses for <span style="color: #...;">
tags. There are many tokens that it simply outputs using the same color where it could make more distinctions. If it were to instead (or also) use CSS classes to mark every token with the exact type, you could do much finer-grained syntax highlighting.
Looks like an area ready for some experimentation.