https://trainedmonkey.com/tag/codecode @ trainedmonkeyjim winstead jr.2024-03-24T01:10:53ZIs GitHub becoming SourceForget v2.0?tag:trainedmonkey.com,2024-03-23:33342024-03-24T01:10:53Z2024-03-23T18:10:51-07:00<p>Back in the day, open source packages used <a href="https://sourceforge.net">SourceForge</a> for distribution, issue tracking, and other bits of managing the community around projects but it eventually became a wasteland of neglected and abandoned projects and was referred to as <em>SourceForget</em>.</p>
<p>As I have been poking around at adding Markdown parsing and syntax highlighting to my PHP project, I can’t help but feel like GitHub is taking on some of those qualities.</p>
<p><a href="https://github.com/erusev/parsedown">Parsedown</a> is (was?) a popular PHP package for parsing Markdown, but the main branch hasn’t seen any development in at least five years, and the “2.0” branch appears to have stalled out a couple of years ago. Good luck figuring out if any of the <a href="https://github.com/erusev/parsedown/forks">1,100 forks</a> is where active development has moved.</p>
<p>I think it would be good if more community norms and best practices were developed around the idea of the community of a project being able to take over maintenance when the developer steps away. What’s the solution to the <a href="https://github.com/search?q=is%3Aissue%20abandoned&type=issues">thousands of open issues on GitHub that ask if a project is abandoned</a>?</p>
<p>Here is <a href="https://github.com/zyedidia/micro/issues/2956">an issue I found on one project</a> where the developer is trying to hand over more access to community members, and I wonder if a guide to taking your project through that transition would have been valuable to move it along.</p>
<p>Another way this comes up that is very relevant is the assertion put forth in <a href="https://andrewkelley.me/post/redis-renamed-to-redict.html">“Redis Renamed to Redict”</a> which really asks the question what moral rights the community has to a project.</p>
<p>(SourceForge also came to be loaded down with advertising and I remember it being kind of a miserable website to use, and as GitHub loads up with “AI” features and feels increasingly clunky to use, it’s just another way I wonder if we are seeing history repeat itself.)</p>Writing software is funtag:trainedmonkey.com,2024-03-23:33332024-03-23T23:56:47Z2024-03-23T16:56:45-07:00<p>Writing software is fun. (For me. Your mileage may vary. But I am not alone in feeling this way.)</p>
<p>This means it is a particularly fraught field for exploitation.</p>
<p>A comparison I would make is to making music. Practically every musical biopic (or fictional version) features the part of the story where the artist (<a href="https://en.wikipedia.org/wiki/Ray_(film)">Ray</a>, <a href="https://en.wikipedia.org/wiki/That_Thing_You_Do!">The One-ders</a>, <a href="https://en.wikipedia.org/wiki/Elvis_(2022_film)">Elvis</a>, <a href="https://en.wikipedia.org/wiki/Dreamgirls_(film)">The Dreams</a>, <a href="https://en.wikipedia.org/wiki/Bohemian_Rhapsody_(film)">Queen</a>, <a href="https://en.wikipedia.org/wiki/Josie_and_the_Pussycats_(film)">The Pussycats</a>, etc.) who is creating and/or performing music for their love of creating and performing comes under the influence of someone who sees the potential for money to be made. They have more experience in the business related to the craft, and they use that information asymmetry to exploit the artist.</p>
<p>The business of music has been around quite a bit longer than the business of writing software, and it is still messy and there are constant struggles and upheavals over the rights of artists, how to distribute the money when it gets made, and what sort of gatekeeping goes on within the business.</p>
<p>Seven years ago I <a href="https://www.metafilter.com/170213/Crunch-trades-short-term-gains-for-long-term-suffering#7208954">pointed out that the games industry was having the same discussions about “crunch time”</a> as <a href="https://web.archive.org/web/20120513013711/https://www.gamasutra.com/view/feature/131600/recovery_mode_taking_control_of_.php">20 years before that</a>. It’s always been a segment of the industry fed on the enthusiasm of people who think writing games is fun.</p>
<p>All of this to say, that as we enter <a href="https://www.theregister.com/2024/03/22/redis_changes_license/">another cycle of software licensing shenanigans</a> in the open source world, I am interested, invested, and extremely tired.</p>
<p>Sometimes I just want to bang on the <s>drums</s> keyboard all day, share that with others, and forget that it is part of this complex ecosystem of people who are coming at it from different angles.</p>Time to modernize PHP’s syntax highlighting?tag:trainedmonkey.com,2024-03-18:33292024-03-19T02:04:13Z2024-03-18T19:04:12-07:00<p>This blog post about <a href="https://stitcher.io/blog/a-syntax-highlighter-that-doesnt-suck">“A syntax highlighter that doesn't suck”</a> was timely because recently I had been kicking at the code for the syntax highlighter that I use on this blog. It’s a <a href="http://shjs.sourceforge.net">very old JavaScript package called SHJS</a> based on <a href="https://www.gnu.org/software/src-highlite/">GNU Source-highlight</a>.</p>
<p>I created a Git repository where I imported all of the released versions of SHJS and then tried to update the included language files to the ones from the latest GNU Source-highlight release (which was four years ago), but ran into some trouble. There are some new features to the syntax files that the old Perl code in the SHJS package can’t handle. And as you might imagine, the pile of code involved is really, really old.</p>
<p>That new PHP package seems like a great idea and all, but I really like the idea of leveraging work that other people have done to create syntax highlighting for other languages rather than inventing another one.</p>
<p>On Mastodon, Ben Ramsey brought up <a href="https://phpc.social/@ramsey/112118688673952027">a start he had made at trying to port Pygments, a Python syntax highlighter, to PHP</a>.</p>
<p>I ran across <a href="https://github.com/alecthomas/chroma?tab=readme-ov-file">Chroma</a>, which is a Go package that is built on top of the Pygments language definitions. They’ve converted the Pygments language definitions into an XML format. Those don’t completely handle 100% of the languages, but it covers most of them.</p>
<p>At the end of the day, both GNU Source-highlight and Pygments and variants are built on what are likely to remain imprecise parsers because they are mostly regex-based and just not the same lexing and parsing code actually being used to handle these languages.</p>
<p>PHP has long had it’s own built-in syntax highlighting functions (<code>highlight_string()</code> and <code>highlight_file()</code>) but it looks like the generation code hasn’t been updated in a meaningful way in about 25 years. It just has five colors that can be configured that it uses for <code><span style="color: #...;"></code> tags. There are many tokens that it simply outputs using the same color where it could make more distinctions. If it were to instead (or also) use CSS classes to mark every token with the exact type, you could do much finer-grained syntax highlighting.</p>
<p>Looks like an area ready for some experimentation.</p>Thoughts from SCALE 21x, day 4tag:trainedmonkey.com,2024-03-17:33282024-03-18T01:49:58Z2024-03-17T18:49:57-07:00<p><a href="https://trainedmonkey.com/photo/01hs7hb8ktn4pzh0ev1p1wezv4"><img id="photo_01hs7hb8ktn4pzh0ev1p1wezv4" src="https://tmky.gumlet.io/upload/IMG_0618.jpeg?width=800&height=800&mode=fit&s=3b7e69c0fbc95b405425be8c400e1f80" width="800" height="202" alt="Bill Cheswick being introduced for his closing keynote titled “I Love Living in the Future! Half a Century of Computers, Software, and Security” at the Southern California Linux Expo 21."></a></p>
<p>Today was the last day of <a href="https://www.socallinuxexpo.org/scale/21x">SCALE 21x</a>. Again I didn’t make it out for the opening keynote, and I just took a quick spin around the expo floor to see it looking sort of quiet and winding down.</p>
<p>The first talk I attended was Jonathan Haddad on“Distributed System Performance Troubleshooting Like You’ve Been Doing it for Twenty Years” where he shared some of his insights from doing that the title said for companies like Apple and Netflix. His recommendation for greenfield deployments was to have <a href="https://opentelemetry.io">Open Telemetry</a> set up to collect traces and logs, and he was also a big fan of the <a href="https://github.com/iovisor/bcc">BPF Compiler Collection (aka bcc-tools)</a> for getting a realtime look into system issues. He was not a fan of running databases in containers, and even less of a fan of running them within Kubernetes. (You could almost see his eye twitch.)</p>
<p>The last talk that I attended (there were just two slots today) was Jen Diamond on “The Git-tastic Power of Conventional Commits.” It was a good talk that used a little light lexical analysis to explain the basic concepts of working with Git (and the revelation that it stands for “ global information tracker” although now a little more research shows <a href="https://initialcommit.com/blog/How-Did-Git-Get-Its-Name">that’s only sort-of true</a>). This all led into talking about <a href="https://www.conventionalcommits.org/en/v1.0.0/">Conventional Commits</a> which is a way of structuring commit messages, and how you could use that in automations and in driving semantic-versioning in the release process.</p>
<p>The final session was a closing keynote from Bill Cheswick titled “I Love Living in the Future: Half a Century of Computers, Software, and Security” but really could have just been “give the old guy the microphone and let him go!” I left a little over two hours ago, and I wouldn’t be surprised to hear that he’s still going. I hope they let him take a bathroom break.</p>Thoughts from SCALE 21x, day 3tag:trainedmonkey.com,2024-03-16:33272024-03-17T05:42:35Z2024-03-16T22:42:34-07:00<p>Another day, another set of thoughts on the experience. It was a busy day at the 21st edition of the <a href="https://www.socallinuxexpo.org/scale/21x">Southern California Linux Expo</a>, and the site was more crowded because an episode of <i>America’s Got Talent</i> was being filmed at the Civic Auditorium that is between the two buildings that the conference were held in. If I’d been on the ball, I would have taken a picture of Howie Mandel standing outside his limo.</p>
<p>I will admit that I took my time in the morning and didn’t make it over to Pasadena until after the keynote that kicked off the day.</p>
<p>The first talk that I attended was “Contribution is not only a code.” by Tatiana Krupenya, the CEO of <a href="https://dbeaver.com">DBeaver</a>. She did a great job of breaking down the many ways that people can contribute to open source development aside from writing code, and I appreciated her final point was that the simplest contributions that anyone can make that will be well-received is just a heart-felt thank you to maintainers of tools that you find valuable.</p>
<p>She also brought up what I am sure is <a href="https://www.youtube.com/watch?v=TheWCAIH-gA">a great talk by Zak Greant from Eclipsecon 2019 titled “When Your Happy Dreams Are About Dying”</a> about burnout in the open source developer community, which I’m looking forward to catching up on.</p>
<p>After that, it was off to Brian Proffitt’s “Measuring the Impact of Community Events” where he provided his perspective from his roles at the Red Hat OSPO, Apache Software Foundation, and other places. It was a great companion to the first session, but more from the perspective of why companies and projects may want to think about measuring how they engage with the community.</p>
<p>I took another spin through the expo during what was supposed to be the lunch break, picked up my conference T-shirt and a free bucket hat from AWS.</p>
<p>After lunch, Tyler Menezes from <a href="https://www.codeday.org/">CodeDay</a> spoke about “Nurturing the Next Generation of Open Source Contributors” and how the non-profit he founded works to connect high school and college students from underprivileged backgrounds with resources to help them thrive in tech. One of the programs pairs small teams of students with a mentor to help them make a contribution to an open source project, and it sounds amazing. I plan to find a way to get involved once I have some my employment situation sorted out.</p>
<p>For the next talk was Heather Osborn on “Organic isn't always good for you” which was sort of a case study of her experience as a DevOps leader tackling the complicated environment that had taken root place at the startup she was working at, and how they figured out a strategy to straighten that out. It was really interesting to hear the language she used about convincing the company management to buy into the plan, which seemed more adversarial and dismissive than the working environments that I’ve been in.</p>
<p>“Solving ‘secret zero’, why you should care about SPIFFE!” by Mattias Gees was by far the most technical talk that I attended today. Like the presentation on Presto yesterday, it seemed a bit like the sort of system that is very impressive and I will probably never need.</p>
<p>The last talk I attended was Michael Gat on “Anti-Patterns in Tech Cost Management” which was pretty true to the title. It was a little light on the open source aspect, but there were definitely insights there on the importance of laying the groundwork early for being able to do cost analytics on systems you’ll be scaling. There were three or so questions from people that started with “I’m an engineer, and ...” which I thought was great. I think what bothered me about Heather Osborn’s talk was how it implied a certain distaste for connecting the engineering to the business realities, and I think it is very important for engineers to understand, and have respect for, business decision-making.</p>
<p>One more day to go. I am surprised how heavy the program is on cloud computing and DevOps, but I guess that’s a huge chunk of what people are working on these days. What I have been missing from the talks so far is programming-focused talks.</p>Thoughts from SCALE 21x, day 2tag:trainedmonkey.com,2024-03-15:33262024-03-16T05:01:30Z2024-03-15T22:01:28-07:00<p><a href="https://trainedmonkey.com/photo/01hs2s44cewhtxr6zp5rrpvea1"><img id="photo_01hs2s44cewhtxr6zp5rrpvea1" src="https://tmky.gumlet.io/upload/IMG_0614.jpeg?width=800&height=800&mode=fit&s=01e9c2c7021a0429c7b8a5dbafd65557" width="800" height="583" alt="Peter Zaitsev, on the right, speaking at SCALE 21x. The slide reads: “#4 Keep Data per Pod Small / 50TB of data connected to a single POD is not a good idea.”"></a></p>
<p>The second day of the <a href="https://www.socallinuxexpo.org/scale/21x">Southern California Linux Expo</a> meant the start of the expo, and the more talks.</p>
<p>I started the day with “Best Practices for Running Databases on Kubernetes” with Peter Zaitsev, who was a coworker at MySQL and went on to found <a href="https://www.percona.com">Percona</a>. While I am getting a better sense of what Kubernetes is all about and already had some idea of how databases might exist in that world, his talk was a great overview and the “best practices” seemed to cover a lot of bases.</p>
<p>That was followed by “Kubernetes and Distributed SQL Databases: Same Consistency With Better Availability and Scalability” which showed off using <a href="https://github.com/k3s-io/kine">Kine</a> as a way to plug in different systems as the data store for Kubernetes instead of <code>etcd</code>. I wish the speaker had spent a little more time giving some practical examples of why is something you would even want to do. It was a good reminder that <a href="https://k3s.io">k3s</a> exists and I should play around with it. And the speaker just using an outline in an open text editor (Pico!) as his slides reminded me of when I gave <a href="https://trainedmonkey.com/2003/07/17/mysql_and_php">a talk on MySQL and PHP using plain-text slides</a>. (Looks like my talk has been disappeared, though.)</p>
<p>After that, it was back over to the other side of the expo for a talk on “Leveraging PrestoDB for data success” which was an overview of the <a href="https://prestodb.io">Presto</a> project, which provides an ANSI SQL query interface to a collection of other data sources (my paraphrase). Kiersten Stokes, the presenter who works at IBM, called MySQL a “traditional database” which struck me as funny. Presto is a very slick and powerful system that I will probably never need. I appreciate that everyone I have seen talk about the concept of a “data lakehouse” is appropriately embarrassed about the name.</p>
<p>Before the next round of talks started, the expo floor finally opened, so I took a quick spin through that. It was pretty busy, and seemed like a good crowd of projects and companies. I think the largest footprint was maybe a couple of 10' × 40' booths from companies like AWS and Meta, but otherwise it was a lot of 10' × 10' booths with a couple of people handing out stickers or other promotional items from behind a table (and talking about their projects/companies).</p>
<p>After that I went back to the MySQL track (four talks!) to see “Design and Modeling for MySQL” which was really more of a speed-run of database history and concepts. The presenter made the classic mistake of white text on a dark background so it was pretty tough to see what he was showing until someone dimmed the lights.</p>
<p>That was followed by “Beyond MySQL: Advancing into the New Era of Distributed SQL with TiDB” from Sunny Bains, whose time as the MySQL/InnoDB team overlapped my time working at MySQL, but I don’t think we ever met. <a href="https://www.pingcap.com/tidb/">TiDB</a> seems like a very impressive cloud-native distributed database which doesn’t actually derive from MySQL, but instead has chosen to be protocol and query-language compatible.</p>
<p>The last session I attended was a panel from the Open Government track on “The OSPO POV.” OSPO stands for “Open Source Program Office” and can act as kind of the interface between companies or organizations and the open source world. There were a bunch of projects and communities mentioned that I want to look into further: <a href="https://todogroup.org/">TODO Group</a>, <a href="https://www.finos.org/">Fintech Open Source Foundation</a>, <a href="https://chaoss.community/">CHAOSS (Community Health Analytics in Open Source Software)</a>, <a href="https://www.sustainoss.org/">Sustain</a>, <a href="https://www.theopensourceway.org/">The Open Source Way</a>, <a href="https://innersourcecommons.org/">Inner Source Commons</a>, and <a href="https://www.ospoplusplus.org/">OSPO++</a>.</p>
<p>Things got busier today, which was nice to see. I wasn’t in a great headspace most of the day, which pretty much sucked, but I think I came away with a lot of things to dig into on my own, which is one of the reasons I wanted to attend.</p>How I use Docker and Deployer togethertag:trainedmonkey.com,2024-03-09:33242024-03-10T03:34:08Z2024-03-09T17:00:28-08:00<p>I thought I’d write about this because I’m using <a href="https://deployer.org/">Deployer</a> in a way that <a href="https://github.com/deployphp/deployer/issues/3362">doesn’t really seem to be supported</a>.</p>
<p>After the work I’ve been doing with Python lately, I can see how I have been using Docker with PHP is sort of comparable to how <a href="https://docs.python.org/3/library/venv.html"><code>venv</code></a> is used there.</p>
<p>On my production host, my <code>docker-compose</code> setup all lives in a directory called <code>tmky</code>. There are four containers: <code>caddy</code>, <code>talapoin</code> (PHP-FPM), <code>db</code> (the database server), and <code>search</code> (the search engine, currently Meilisearch).</p>
<p>There is no installation of PHP aside from that <code>talapoin</code> container. There is no MySQL client software on the server outside of the <code>db</code> container.</p>
<p>I guess the usual way of deploying in this situation would be to rebuild the PHP-FPM container, but what I do is just treat that container as a runtime environment and the PHP code that it runs is mounted from a directory on the server outside the container.</p>
<p>It’s in <code class="sh_sh">${HOME}/tmky/deploy/talapoin</code> (which I’ll call <code class="sh_sh">${DEPLOY_PATH}</code> from now on). <code class="sh_sh">${DEPLOY_PATH}/current</code> is a symlink to something like <code class="sh_sh">${DEPLOY_PATH}/release/5</code>.</p>
<p>The important bits from the <code>docker-compose.yml</code> look like:</p>
<pre class="sh_sh">services:
talapoin:
image: jimwins/talapoin
volumes:
- ./deploy/talapoin:${DEPLOY_PATH}
</pre>
<p>This means that within the container, the files still live within a path that looks like <code>${HOME}/tmky/deploy/talapoin</code>. (It’s running under a different UID/GID so it can’t even write into any directories there.) The <code>caddy</code> container has the same volume setup, so the relevant <code>Caddyfile</code> config looks like:</p>
<pre class="sh_sh">trainedmonkey.com {
log
# compress stuff
encode zstd gzip
# our root is a couple of levels down
root * {$DEPLOY_PATH}/current/site
# pass everything else to php
php_fastcgi talapoin:9000 {
resolve_root_symlink
}
file_server
}</pre>
<p>(I like how compact this is, Caddy has a very it-just-works spirit to it that I dig.)</p>
<p>So when a request hits Caddy, it sees a URL like <code>/2024/03/09<wbr>/how_i_use_docker_and_deployer_together</code>, figures out there is no static file for it and throws it over to the <code>talapoin</code> container to handle, giving it a <code>SCRIPT_FILENAME</code> of <code>${DEPLOY_PATH}<wbr>/release/5/site/index.php</code> and a <code>REQUEST_URI</code> of <code>/2024/03/09<wbr>/how_i_use_docker_and_deployer_together</code>.</p>
<p>When I do a new deployment, <code>${DEPLOY_PATH}/current</code> will get relinked to the new release directory, the <code>resolve_root_symlink</code> from the <code>Caddyfile</code> will pick up the change, and new requests will seamlessly roll right over to the new deployment. (Requests already being processed will complete unmolested, which I guess is kind of my rationale for avoiding deployment via updated Docker container.)</p>
<p>Here is what my <code>deploy.php</code> file looks like:</p>
<pre class="sh_php"><?php
namespace Deployer;
require 'recipe/composer.php';
require 'contrib/phinx.php';
// Project name
set('application', 'talapoin');
// Project repository
set('repository', 'https://github.com/jimwins/talapoin.git');
// Host(s)
import('hosts.yml');
// Copy previous vendor directory
set('copy_dirs', [ 'vendor' ]);
before('deploy:vendors', 'deploy:copy_dirs');
// Tasks
after('deploy:cleanup', 'phinx:migrate');
// If deploy fails automatically unlock.
after('deploy:failed', 'deploy:unlock');
</pre>
<p>Pretty normal for a PHP application, the only real additions here are using Phinx for the data migrations and using <code>deploy:copy_dirs</code> to copy the <code>vendors</code> directory from the previous release so we are less likely to have to download stuff.</p>
<p>That <code>hosts.yml</code> is where it gets tricky, because when we are running PHP tools like <code>composer</code> and <code>phinx</code>, we have to run them inside the <code>talapoin</code> container.</p>
<pre class="sh_sh">hosts:
hanuman:
bin/php: docker-compose -f "${HOME}/tmky/docker-compose.yml" exec --user="${UID}" -T --workdir="${PWD}" talapoin
bin/composer: docker-compose -f "${HOME}/tmky/docker-compose.yml" exec --user="${UID}" -T --workdir="${PWD}" talapoin composer
bin/phinx: docker-compose -f "${HOME}/tmky/docker-compose.yml" exec --user="${UID}" -T --workdir="${PWD}" talapoin ./vendor/bin/phinx
deploy_path: ${HOME}/tmky/deploy/{{application}}
phinx:
configuration: ./phinx.yml
</pre>
<p>Now when it’s not being pushed to an OCI host that likes to fall flat on its face, I can just run <code>dep deploy</code> and out goes the code.</p>
<p>I’m also actually running Deployer in a Docker container on my development machine, too, thanks to my fork of <a href="https://github.com/jimwins/docker-deployer"><code>docker-deployer</code></a>. Here’s my <code>dep</code> script:</p>
<pre class="sh_sh">#!/bin/sh
exec \
docker run --rm -it \
--volume $(pwd):/project \
--volume ${SSH_AUTH_SOCK}:/ssh_agent \
--user $(id -u):$(id -g) \
--volume /etc/passwd:/etc/passwd:ro \
--volume /etc/group:/etc/group:ro \
--volume ${HOME}:${HOME} \
-e SSH_AUTH_SOCK=/ssh_agent \
jimwins/docker-deployer "$@"</pre>
<p>Anyway, I’m sure there are different and maybe better ways I could be doing this. I wanted to write this down because I had to fight with some of these tools a lot to figure out how to make them work how I envisioned, and just going through the process of writing this has led me to refine it a little more. It’s one of those classic cases of putting in a lot of hours to end up with a relatively few lines of code.</p>
<p>I’m also just deploying to a single host, deployment to a real cluster of machines would require more thought and tinkering.</p>Release early, release oftentag:trainedmonkey.com,2024-03-07:33222024-03-08T01:01:00Z2024-03-07T17:00:58-08:00<p>One of the benefits of starting <a href="https://github.com/jimwins/frozen-soup">Frozen Soup</a> from a project template is that someone very smart (<a href="https://simonwillison.net">Simon</a>) has done all the heavy lifting to make publishing it into the Python ecosystem really easy to do. So after I added a new feature today (pulling in external <code>url(...)</code> references in CSS inline as <code>data:</code> URLs), I went ahead and registered <a href="https://pypi.org/project/frozen-soup/">the project on PyPI</a>, tagged the release on GitHub, and let the GitHub Actions that were part of the project template do the work of publishing the release. It worked on the first try, which is lovely.</p>
<p>I pushed more changes after I did that release, adding a way to set timeouts and fixing the first issue (that I also filed) about pre-existing <code>data:</code> URLs getting mangled. I also added a quick-and-dirty server version which allows for getting the single-file HTML version of a page, and makes it a little easier to play around with the single-file version of live URLs without having to deal with saving and opening the files.</p>
<p>So I did a second release.</p>Introducing Frozen Souptag:trainedmonkey.com,2024-03-06:33212024-03-18T20:25:55Z2024-03-06T18:26:21-08:00<p>I made a new thing, which I decided to call <a href="https://github.com/jimwins/frozen-soup">Frozen Soup</a>. It creates a single-file version of an HTML page by in-lining all of the images using <code>data:</code> URLs, and pulling in any CSS and JavaScript files.</p>
<p>It is loosely inspired by <a href="https://github.com/gildas-lormeau/SingleFile">SingleFile</a> which is a browser extension that does a similar thing. There are also tools built on top of that which let you automate it, but then you’re spinning up a headless browser, and it all felt very heavyweight. The venerable <code>wget</code> will also pull down a page and its prerequisites and rewrite the URLs to be relative, but I don’t think it has a comparable single-file output.</p>
<p>This may also exist in other incarnations, this is mostly an excuse for me to practice with Python. As such, it is a very crude first draft right now, but I hope to keep tinkering with it for at least a little while longer.</p>
<p>I have also been contributing some changes and test cases to <a href="https://ArchiveBox.io/">ArchiveBox</a>, but this is different yet also a little related.</p>Grinding the ArchiveBoxtag:trainedmonkey.com,2024-02-25:33192024-02-26T03:18:26Z2024-02-25T19:18:25-08:00<p>I have been playing around with setting up <a href="https://archivebox.io/">ArchiveBox</a> so I could use it to archive pages that I bookmark.</p>
<p>I am a long-time, but infrequent, user of <a href="https://pinboard.in/">Pinboard</a> and have been trying to get in the habit of bookmarking more things. And although my current paid subscription doesn’t run out until 2027, I’m not paying for the archiving feature. So as I thought about how to integrate my bookmarks into this site, I started looking at how I might add that functionality. Pinboard uses <code>wget</code>, which seems simple enough to mimic, and I also found other tools like <a href="https://github.com/gildas-lormeau/SingleFile">SingleFile</a>.</p>
<p>That’s when I ran across mention of ArchiveBox and decided that would be a way to have the archiving feature I want and don’t really need/want to expose to the public. So I spun it up on my in-home server, downloaded my bookmarks from Pinboard, and that’s when the coding began.</p>
<p>ArchiveBox was having trouble parsing the RSS feed from Pinboard, and as I started to dig into the code I found that instead of using an actual RSS parser, it was either parsing it using regexes (the <code>generic_rss</code> parser) or an XML parser (the <code>pinboard_rss</code> parser). Both of those seemed insane to me for a Python application to be doing when <a href="https://github.com/kurtmckee/feedparser">feedparser</a> has practically been the gold standard of RSS/Atom parsers for 20 years.</p>
<p>After sleeping on it, I decided to roll up my sleeves, bang on some Python code, and produced a <a href="https://github.com/ArchiveBox/ArchiveBox/pull/1362">pull request that switches to using <code>feedparser</code></a>. (The big thing I didn’t tackle is adding test cases because I haven’t yet wrapped my head around how to run those for the project when running it within Docker.)</p>
<p>Later, I realized that the RSS feed I was pulling of my bookmarks would be good for pulling on a schedule to keep archiving new bookmarks, but I actually needed to export my full list of bookmarks in JSON format and use that to get everything in the system from the start.</p>
<p>But <a href="https://github.com/ArchiveBox/ArchiveBox/issues/1347">that importer is broken, too</a>. And again it’s because instead of just using the <code>json</code> parser in the intended way, there was a hack to work around what appears to have been a poor design decision (ArchiveBox would prepend the filename to the file it read the JSON data from when storing it for later reading) that then got another hack piled on top of it when that decision was changed. The <code>generic_json</code> parser used to just always skip the first line of the file, but when that stopped being necessary, that line-skipping wasn’t just removed, it was replaced with some code that suddenly expected the JSON file to look a certain way.</p>
<p>Now I’ve been reading more Python code and writing a little bit, and starting to get more comfortable some of the idioms. I didn’t make a full pull request for it, but <a href="https://github.com/ArchiveBox/ArchiveBox/issues/1347#issuecomment-1963185874">my comment on the issue</a> shows a different strategy of trying to parse the file as-is, and if that fails, skip the first line and try it again. That should handle any JSON files with garbage in the first line, such as what ArchiveBox used to store them as. And maybe there is some system out there that exports bookmarks in a format it calls JSON that actually has garbage on the first line. (I hope not.)</p>
<p>So with that workaround applied locally, my Pinboard bookmarks still don’t load because ArchiveBox <em>uses the timestamp of the bookmark as a unique primary key</em> and I have at least a couple of bookmarks that happen to have the same timestamp. I am glad to see that fixing that is <a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Roadmap">project roadmap</a>, but I feel like every time I dig deeper into trying to use ArchiveBox it has me wondering why I didn’t start from scratch and put together what I wanted from more discrete components.</p>
<p>I still like the idea of using ArchiveBox, and it is a good excuse to work on a Python-based project, but sometimes I find myself wondering if I should pay more attention my sense of code smell and just back away slowly.</p>
<p>(My current idea to work around the timestamp collision problem is to add some fake milliseconds to the timestamp as they are all added. That should avoid collisions from a single import. Or I could just edit my Pinboard export and cheat the times to duck the problem.)</p>Oracle Cloud Agent considered harmful?tag:trainedmonkey.com,2024-02-23:33182024-02-24T01:35:41Z2024-02-23T17:35:39-08:00<p>Playing around with my OCI instances some more, I looked more closely at what was going on when I was able to trigger the load to go out of control, which seemed to be anything that did a fair amount of disk I/O. What quickly stuck out thanks to <code>htop</code> is that there were a lot of <a href="https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/manage-plugins.htm">Oracle Cloud Agent</a> processes that were blocking on I/O.</p>
<p>So in the time-honored tradition of troubleshooting by shooting suspected trouble, I removed Oracle Cloud Agent.</p>
<p>After doing that, I can now do the things that seemed to bring these instances to their knees without them falling over, so I may have found the culprit.</p>
<p>I also enabled PHP’s <a href="https://www.php.net/manual/en/book.opcache.php">OPcache</a> and some rough-and-dirty testing with good ol’ <a href="https://httpd.apache.org/docs/2.4/programs/ab.html"><code>ab</code></a> says I took the homepage from 6r/s to about 20r/s just by doing that. I am sure there’s more tuning that I could be doing. (Requesting a static file gets about 200 r/s.)</p>
<p>By the way, <a href="https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/manage-plugins.htm#uninstall-oracle-cloud-agent">the documentation for how to remove Oracle Cloud Agent</a> on Ubuntu systems is out of date. It is now a <a href="https://snapcraft.io">Snap</a> package, so it has to be removed with <code>sudo snap remove oracle-cloud-agent</code>. And then I also removed <code>snapd</code> because I’m not using it and I’m petty like that.</p>Fall down, go boomtag:trainedmonkey.com,2024-02-22:33172024-02-23T03:35:59Z2024-02-22T19:35:57-08:00<p>I am either really good at making Oracle Cloud Infrastructure instances fall over, or the <code> VM.Standard.E2.1.Micro</code> shape is even more under-powered than I expected. I had been using the Ubuntu “minimal” image as my base, so I thought I would try the Oracle Linux 8 image and I couldn’t even get it to run <code>yum check-update</code> without that process getting killed. That seems like a less-than-ideal experience out of the box.</p>
<p>What seems to happen on the instances (with Ubuntu) that I am using to host this site is that if something does too much I/O, the load average spikes, and things slowly grind through before recovering. The problem is that something like “running composer” seems to be too much I/O, which makes it awkward to deploy code.</p>
<p>Another thing that seems to get out of control quickly is when I reindex the site with Meilisearch. Considering there is very little data being indexed, that obviously shouldn’t be causing any sort of trouble. I have two instances spun up now, so I can play with the settings on one without temporarily choking off the live site. It’s probably just a matter of setting the maximum indexing memory in Meilisearch’s configuration or constraining the memory on that container.</p>
<p>I also added a <a href="https://www.oracle.com/cloud/networking/load-balancing/">OCI Flexible Network Load Balancer</a> in front of my instance so I can quickly switch things over to another without waiting on any DNS propagation. Maybe if Ampere instances ever become available in my region I will play around with splitting the deployment across multiple instances.</p>Coming to you from OCItag:trainedmonkey.com,2024-02-21:33162024-02-22T04:00:26Z2024-02-21T20:00:23-08:00<p>After some fights with <a href="http://deployer.org">Deployer</a> and Docker, this should be coming to you from a server in <a href="https://www.oracle.com/cloud/">Oracle Cloud Infrastructure</a>. There are still no Ampere instances available, so it is what they call a <code>VM.Standard.E2.1.Micro</code>. It seems be underpowered relative to the Linode Nanode that it was running on before, or maybe I just have set things up poorly.</p>
<p>But having gone through this, I have the setup for the “production” version of my blog streamlined so it should be easy to pop up somewhere else as I continue to tinker.</p>Docker, Tailscale, and Caddy, oh mytag:trainedmonkey.com,2024-02-20:33142024-02-20T22:14:00Z2024-02-20T14:13:59-08:00<p>I do my web development on a server under my desk, and the way I had it set up is with a wildcard entry set up for <code>*.muck.rawm.us</code> so requests would hit <code>nginx</code> on that server which was configured to handle various incarnations of whatever I was working on. The IP address was originally just a private-network one, and eventually I migrated that to a Tailscale tailnet address. Still published to public DNS, but not a big deal since those weren’t routable.</p>
<p>A reason I liked this is because I find it easier to deal with hostnames like <code>talapoin.muck.rawm.us</code> and <code>scat.muck.rawm.us</code> rather than running things on different ports and trying to keep those straight.</p>
<p>One annoyance was that I had to maintain an active SSL certificate for the wildcard. Not a big deal, and I had that nearly automated, but a bigger hassle was that whenever I wanted to set up another service it required mucking about in the <code≥nginx</code> configuration.</p>
<p>Something I have wanted to play around with for a while was using <a href="https://www.tailscale.com/">Tailscale</a> with Docker to make each container (or <code>docker-compose</code> setup, really) it’s own host on my tailnet.</p>
<p>So I finally buckled down, watched this <a href="https://www.youtube.com/watch?v=tqvvZhGrciQ">video deep dive into using Tailscale with Docker</a>, and <a href="https://github.com/jimwins/talapoin/commit/3cb2f0d6fcc88f6ac420e52a25a424b78014cfdf">got it all working</a>.</p>
<p>I even took on the additional complication of throwing <a href="https://caddyserver.com">Caddy</a> into the mix. That ended up being really straightforward once I finally wrapped my head around how to set up the file paths so Caddy could serve up the static files and pass the PHP off to the <code>php-fpm</code> container. Almost too easy, which is probably why it took me so long.</p>
<p>Now I can just start this up, it’s accessible at <code>talapoin.{tailnet}.ts.net</code>, and I can keep on tinkering.</p>
<p>While it works the way I have it set up for development, it will need tweaking for “production” use since I won’t need Tailscale.</p>My history with programming languagestag:trainedmonkey.com,2024-02-16:33122024-02-17T02:32:14Z2024-02-16T18:32:13-08:00<p>I think I have always had an interest in playing around with different programming languages. My first probably would have been Applesoft BASIC. In elementary school, I remember writing a bouncing-lines graphic toy more than once, and D&D character generators.</p>
<p>I am really not sure what came next, but it was probably mostly things like DOS batch files (and eventually <a href="https://en.wikipedia.org/wiki/4DOS">4DOS</a> batch files).</p>
<p>I wrote some little utilities for <a href="https://en.wikipedia.org/wiki/DESQview">DESQview</a> in assembly and even released them as freeware, but I haven’t been able to track them down again.</p>
<p>The first substantial project I did was probably the billing and reporting system for my dad’s company that was written in <a href="https://en.wikipedia.org/wiki/FoxPro">FoxPro</a> before it was even acquired by Microsoft. This would have been where I first encountered SQL, too.</p>
<p>All of this would have been essentially self-taught. I vaguely remember participating in some computer classes or clubs, but nothing terribly structured. I’m sure that I was exposed to Pascal, probably <a href="https://en.wikipedia.org/wiki/Turbo_Pascal">Turbo Pascal</a>, at some point in those.</p>
<p>One memory from my freshman year at college is that I used the <code>(<em> multi-line syntax </em>)</code> of Pascal comments on an assignment instead of the <code>{ single-line syntax }</code> and the grader for the assignment had some sort of reaction about that.</p>
<p>I was exposed to a lot of programming languages in college, especially because I had to take the “Programming Languages” class twice after failing it the first time. (I had a rough time in my sophomore year.) It was taught by different professors who used different languages to teach the concepts. Some languages I remember doing at least an assignment or two in: C, Fortran, ML, Scheme, Lisp, Perl, COBOL, Ada, and APL. Part of <a href="https://www.hmc.edu/">Harvey Mudd College</a>’s program is a capstone project in your senior year where you work on a project for an outside company or organization. The project I was part of was a wrote a tool for doing internationalization for voicemail prompts for Octel Communications using Visual Basic.</p>
<p>My first job after college was for an educational/games software company named Knowledge Adventure (KA) on a project that was released as “Steven Spielberg’s Director’s Chair.” At the time, all of the projects at the company were done using an in-house programming language named “Acomplish” (or maybe “Accomplish”) which was short for “<u>A</u> <u>Com</u>puter Eng<u>lish</u>.” I have been trying to dig up some reference to or examples of it and have come up empty so far, but it was a natural-language-ish syntax that had originally been intended for producers to write in. By the time I worked there it was mostly being used by programmers who graduated from Mudd, CalTech, and UCLA, which is kind of funny.</p>
<p>It was while working at KA working on their website that I got involved in the early days of <a href="https://www.php.net/">PHP</a>. I guess here is where someone could make jokes about how obviously someone who had failed “Programming Languages” had a hand in PHP, but I don’t recall that I had very much, if any, involvement in the actual design of the language. (Although I remember having to write way too many emails to make that case that casting a string to an integer should not interpret it the same way as numeric literals because people would have brought out pitchforks when leading zeroes caused them to be interpreted as octal values.)</p>
<p>At KA, the second title I worked on (what was released as “Dr. Brain’s Thinking Games: IQ Adventure”) was coded in C/C++ after we had done some initial prototyping in Acomplish. The game had a multiplayer component and we ended up borrowing the graphics and networking library from our (at the time) sister company, Blizzard Entertainment. (I later went on to have the worst programming interview experience of my life, so far, there.)</p>
<p>When I left KA they were in the process of adopting a new engine for all of their projects that was Java-based, or maybe just evaluating it. In any case, I never ended up working with that engine but I did a small contract project around this time that was a Java applet. (It was for the website for a show on NBC.)</p>
<p>My next job was at HomePage.com, which was an idealab startup that was basically a second-generation GeoCities. (A number of the folks in the management team were actually ex-GeoCities people who cashed out when that was acquired by Yahoo.) We built our system in Perl (with mod_perl), and built a sort of primitive HTML templating system called Gear. The original parser was regex-based until one of the other senior engineers wrote a proper lexer and parser for it.</p>
<p>I’m not sure when I first started working in JavaScript, but it was probably somewhere in this period.</p>
<p>After HomePage.com, I ended up working for MySQL Ab leading the web team, which meant back to writing a bunch of PHP code. And I am sure I had encountered Python before this, but recently I dug up something I wrote during this time in Python that connected to our bugs database, announced new bugs, and could be queried in a few ways. During the rest of my time at MySQL, I did more C/C++ programming, probably more Perl, and even did a tiny bit of work with Ruby.</p>
<p>Other programming languages I have played with at some time or another: Logo, MATLAB, OCaml, Tcl, Dylan, Oberon, Modula, and Delphi (Object Pascal).</p>
<p>Most recently I have been playing around with Rust and Go, and done some reading on Swift.</p>
<p>This whole train of thought was actually triggered by seeing Delphi mentioned in a job description and it reminded me of how intrigued I was when I first encountered it. I have a soft spot for Pascal and the successors.</p>