Entries tagged 'python'
Is this Twig or Jinja? Maybe both!
A project I have been playing around with the last couple of weekends has been making a Python version of this site. The code, which is very rough because I barely know what I’m doing and I’m in the hacking-it-together phase as opposed to trying to make it pretty, is in this GitHub repository.
I am using the Flask framework with SQLAlchemy and Jinja.
I was interested to see if I could just use the same templates as my PHP version, which uses Twig, but there have been a few sticking points:
- The Twig
escape
filter takes an argument to more finely control the context it is being used in so it knows how to escape within HTML, or a URI, or an HTML attribute. Jinja’sescape
doesn’t take an argument. I was able to override it take an extra argument, but mostly ignore it for now. - Jinja doesn’t have Twig’s ternary
?:
operator. Not surprising, Python doesn’t either. I rewrote those bits of templates to use slightly more verboseif
blocks. - Jinja doesn’t have Twig’s string comparators like
matches
andstarts with
. Looks like I can get rid of the need for them, but I just punted on those for now. - Jinja doesn’t have a
block()
function. I think I can also avoid needing it. - Jinja’s
url_for()
method expects a more Python-ic argument list, likeurl_for('route', var = 'value')
but Twig uses a dictionary likeurl_for('route', { 'var' : 'value' })
. I was able to override Jinja’s version to handle this, too. - I’ll need to implement versions of Twig’s
date()
function and filter.
I had cobbled together a way on the Twig side to let me store some templates (side navigation, the “Hire me!” message on the front page) in the database, so my next trick is going to implement template loaders for both the PHP and Python versions so that is more cleanly abstracted. I have the Python side of that done already.
I hope to eventually create a Rust version of this, too, and it will be interesting to see what new complications using Tera will bring.
But I still haven’t found what I’m looking for
I’m still looking for a job.
It is a new month, so I thought it was a good time to raise this flag again, despite it being a bad day to try and be honest and earnest on the internet.
I wish I was the sort of organized that allowed me to run down statistics of how many jobs I have applied to and how many interviews I have gone through other than to say it has been a lot and very few.
Last month I decided to start (re)developing my Python skills because that seems to be much more in demand than the PHP skills I can more obviously lay claim to. I made some contributions to an open source project, ArchiveBox: improving the importing tools, writing tests, and updating it to the latest LTS version of Django from the very old version it was stuck on. I also started putting together a Python library/tool to create a single-file version of an HTML file by pulling in required external resources and in-lining them; my way of learning more about the Python culture and ecosystem.
That and attending SCALE 21x really did help me realize how much I want to be back in the open source development space. I am certainly not dogmatic about it, but I believe to my bones that operating in a community is the best way to develop software.
I think my focus this month has to be on preparing for the “technical interview” exercises that are such a big of the tech hiring process these days, as much as I hate it. I think what makes me a valuable senior engineer is not that I can whip up code on demand for data structures and algorithms, but that I know how to put systems together, have a broader business experience that means I have a deeper of understanding of what matters, and can communicate well. But these tests seem to be an accepted and expected component of the interview process now, so it only makes sense to polish those skills.
(Every day this drags on, I regret my detour into opening a small business more. That debt is going to be a drag on the rest of my life, compounded by the huge weird hole it puts in my résumé.)
Frozen Soup is now based
Frozen Soup, my Python library/tool for creating a single-file version of an HTML page got another release that adds handling of <base> and specifying selectors to knock out.
This is another one where I had to re-do the release because of something dumb. This time it was forgetting to bump the version in pyproject.toml
. I should look into how to have it automatically figure that out from the tag during release.
Monolith is another project like this that is written in Rust.
Time to modernize PHP’s syntax highlighting?
This blog post about “A syntax highlighter that doesn't suck” was timely because recently I had been kicking at the code for the syntax highlighter that I use on this blog. It’s a very old JavaScript package called SHJS based on GNU Source-highlight.
I created a Git repository where I imported all of the released versions of SHJS and then tried to update the included language files to the ones from the latest GNU Source-highlight release (which was four years ago), but ran into some trouble. There are some new features to the syntax files that the old Perl code in the SHJS package can’t handle. And as you might imagine, the pile of code involved is really, really old.
That new PHP package seems like a great idea and all, but I really like the idea of leveraging work that other people have done to create syntax highlighting for other languages rather than inventing another one.
On Mastodon, Ben Ramsey brought up a start he had made at trying to port Pygments, a Python syntax highlighter, to PHP.
I ran across Chroma, which is a Go package that is built on top of the Pygments language definitions. They’ve converted the Pygments language definitions into an XML format. Those don’t completely handle 100% of the languages, but it covers most of them.
At the end of the day, both GNU Source-highlight and Pygments and variants are built on what are likely to remain imprecise parsers because they are mostly regex-based and just not the same lexing and parsing code actually being used to handle these languages.
PHP has long had it’s own built-in syntax highlighting functions (highlight_string()
and highlight_file()
) but it looks like the generation code hasn’t been updated in a meaningful way in about 25 years. It just has five colors that can be configured that it uses for <span style="color: #...;">
tags. There are many tokens that it simply outputs using the same color where it could make more distinctions. If it were to instead (or also) use CSS classes to mark every token with the exact type, you could do much finer-grained syntax highlighting.
Looks like an area ready for some experimentation.
Release early, release often
One of the benefits of starting Frozen Soup from a project template is that someone very smart (Simon) has done all the heavy lifting to make publishing it into the Python ecosystem really easy to do. So after I added a new feature today (pulling in external url(...)
references in CSS inline as data:
URLs), I went ahead and registered the project on PyPI, tagged the release on GitHub, and let the GitHub Actions that were part of the project template do the work of publishing the release. It worked on the first try, which is lovely.
I pushed more changes after I did that release, adding a way to set timeouts and fixing the first issue (that I also filed) about pre-existing data:
URLs getting mangled. I also added a quick-and-dirty server version which allows for getting the single-file HTML version of a page, and makes it a little easier to play around with the single-file version of live URLs without having to deal with saving and opening the files.
So I did a second release.
Introducing Frozen Soup
I made a new thing, which I decided to call Frozen Soup. It creates a single-file version of an HTML page by in-lining all of the images using data:
URLs, and pulling in any CSS and JavaScript files.
It is loosely inspired by SingleFile which is a browser extension that does a similar thing. There are also tools built on top of that which let you automate it, but then you’re spinning up a headless browser, and it all felt very heavyweight. The venerable wget
will also pull down a page and its prerequisites and rewrite the URLs to be relative, but I don’t think it has a comparable single-file output.
This may also exist in other incarnations, this is mostly an excuse for me to practice with Python. As such, it is a very crude first draft right now, but I hope to keep tinkering with it for at least a little while longer.
I have also been contributing some changes and test cases to ArchiveBox, but this is different yet also a little related.
Grinding the ArchiveBox
I have been playing around with setting up ArchiveBox so I could use it to archive pages that I bookmark.
I am a long-time, but infrequent, user of Pinboard and have been trying to get in the habit of bookmarking more things. And although my current paid subscription doesn’t run out until 2027, I’m not paying for the archiving feature. So as I thought about how to integrate my bookmarks into this site, I started looking at how I might add that functionality. Pinboard uses wget
, which seems simple enough to mimic, and I also found other tools like SingleFile.
That’s when I ran across mention of ArchiveBox and decided that would be a way to have the archiving feature I want and don’t really need/want to expose to the public. So I spun it up on my in-home server, downloaded my bookmarks from Pinboard, and that’s when the coding began.
ArchiveBox was having trouble parsing the RSS feed from Pinboard, and as I started to dig into the code I found that instead of using an actual RSS parser, it was either parsing it using regexes (the generic_rss
parser) or an XML parser (the pinboard_rss
parser). Both of those seemed insane to me for a Python application to be doing when feedparser has practically been the gold standard of RSS/Atom parsers for 20 years.
After sleeping on it, I decided to roll up my sleeves, bang on some Python code, and produced a pull request that switches to using feedparser
. (The big thing I didn’t tackle is adding test cases because I haven’t yet wrapped my head around how to run those for the project when running it within Docker.)
Later, I realized that the RSS feed I was pulling of my bookmarks would be good for pulling on a schedule to keep archiving new bookmarks, but I actually needed to export my full list of bookmarks in JSON format and use that to get everything in the system from the start.
But that importer is broken, too. And again it’s because instead of just using the json
parser in the intended way, there was a hack to work around what appears to have been a poor design decision (ArchiveBox would prepend the filename to the file it read the JSON data from when storing it for later reading) that then got another hack piled on top of it when that decision was changed. The generic_json
parser used to just always skip the first line of the file, but when that stopped being necessary, that line-skipping wasn’t just removed, it was replaced with some code that suddenly expected the JSON file to look a certain way.
Now I’ve been reading more Python code and writing a little bit, and starting to get more comfortable some of the idioms. I didn’t make a full pull request for it, but my comment on the issue shows a different strategy of trying to parse the file as-is, and if that fails, skip the first line and try it again. That should handle any JSON files with garbage in the first line, such as what ArchiveBox used to store them as. And maybe there is some system out there that exports bookmarks in a format it calls JSON that actually has garbage on the first line. (I hope not.)
So with that workaround applied locally, my Pinboard bookmarks still don’t load because ArchiveBox uses the timestamp of the bookmark as a unique primary key and I have at least a couple of bookmarks that happen to have the same timestamp. I am glad to see that fixing that is project roadmap, but I feel like every time I dig deeper into trying to use ArchiveBox it has me wondering why I didn’t start from scratch and put together what I wanted from more discrete components.
I still like the idea of using ArchiveBox, and it is a good excuse to work on a Python-based project, but sometimes I find myself wondering if I should pay more attention my sense of code smell and just back away slowly.
(My current idea to work around the timestamp collision problem is to add some fake milliseconds to the timestamp as they are all added. That should avoid collisions from a single import. Or I could just edit my Pinboard export and cheat the times to duck the problem.)
where should i be lurking?
trying to find places where people talk about using python, ruby, and php with mysql has been a bit of a challenge.
the problem on the php side is that php forum on forums.mysql.com is so filled with pre-beginner-level questions that it’s barely worth it for me to spend my time digging through it.
for python, the python forum on forums.mysql.com is nearly a ghost town. the forums for the mysql-python project seem slightly active, but the sourceforge forum interface is just bad. (not that any web-based forum isn’t starting from a bad place.) the db-sig mail archives also have some interesting discussions.
for ruby, the ruby forum on forums.mysql.com is even quieter than the python one, and i haven’t found anywhere else.
another thing i’ll take a look at is apr_dbd_mysql, which is not part of the main apr-util repository because of licensing issues (ugh).
where else should i be looking?
more technobabble
working on the mysql bugs system filled the transition from me working on falcon to joining the connectors team, where i’ll be focusing on the connectivity for scripting languages.
my initial focus will be on python, ruby, and php. i haven’t figured out exactly what it is that i’ll be doing, but a likely candidate for my first big task will be building out the test suites for these so that they can eventually become part of our build verification process.