January, 30, 2024 archives
Some notes on Flickr data migration
I decided to stop renewing my subscription with Flickr recently to create some incentive for me to self-host my photos and integrate them more closely here. Before my subscription lapsed, I requested an archive of all of my Flickr data, and now I am finally getting around to working with the data.
When you download your Flickr data it includes JSON files named like photo_50626142.json
and JPEG files named like young-mimes_50626142_o.jpg
and the name of the JPEG is not in the JSON data.
You can generate it, probably, using the name and ID but I’m not sure what the rules are for turning the name field into the snake-case form.
Except that images without a name have JPEG files named like 17483805680_f57f81feb5_o.jpg
. The id is at the beginning, the other bit is just random or something. (Looks like this is the same filename used for the original
URL in the JSON.)
The way to go seems to be just matching on the ID embedded in the filename. (That’s what the one other tool I’ve seen that uses the export data does.)
And when working through all of this, I found that I must have not downloaded one of the archive files from Flickr, because I was missing 83 JPEG files. I was able to use the JSON files to rescue them.
Now that I know that I actually have all of the data and all of the images are in a Backblaze B2 bucket fronted by Gumlet, the next step will be loading all of the relevant metadata into a database table and then wiring up some ways to browse the images here.
Enabling GD’s JPEG support in Docker for PHP 8.3
I am generating a ThumbHash for each photo in my new photo library using this PHP library, and it needs to use either the GD Graphics Library extension or ImageMagick to decode the image data to feed it into the hash algorithm.
The PHP library recommends the ImageMagick extension (Imagick) because GD still doesn’t support 8-bit alpha values, but I ran into the bug that prevents Imagick from building that is fixed by this patch that hasn't been pulled into a released version yet. Then I realized that none (or close to none) of the images I’d be dealing with use any sort of transparency, so GD would be fine. And it was already enabled in my Dockerfile
, so I should have been good to go.
But it turns out that although I thought I had included GD, I hadn’t actually properly enabled JPEG support in GD, so the ThumbHash library’s helper function to extract the image data it needed just failed on a call to ImageSX()
after ImageCreateFromString
had failed. (Here is a pull request to SRWieZ/thumbhash
to throw an exception on that failure, which would have saved me a few steps of debugging.)
Looking at the code for the GD extension, that should not have been a silent failure, so some digging may be required to figure out what happened with that. I may have just missed that particular error message in the logs.
Enabling JPEG support is fairly simple, although a lot of the instructions I found online were a little out of date. The important thing was that I needed to add this to my Dockerfile between installing the development packages and building the PHP extensions: docker-php-ext-configure gd --with-freetype --with-jpeg
.
So now I can successfully generate a ThumbHash for all of my photos, except for another bug I haven’t tracked down yet where it sometimes produces a hash that is longer than expected. The ThumbHash for this photo is 2/cFDYJdhgl3l2eEVMZ3RoOkD1na
which can be turned directly into this image: