James Hare

Ditching a self-hosted Wikibase and migrating the data to Wikibase.Cloud: the case study of Wikibase Registry

On April 25, 2018, participants at a workshop in Antwerp created the Wikibase Registry. It was created to serve as a central wiki in a broader network of federated Wikibase instances, and itself was a Wikibase.

At the time this wiki was set up, there were no convenient options for Wikibase hosting available. Wikibase.Cloud did not yet exist. It was therefore set up using a Docker-based deployment on a virtual machine in Wikimedia Cloud Services, which provides virtual machines free of charge for projects like these. The resulting product was pretty basic. The lack of effective anti-spam measures led to the wiki being locked down. This meant that if anyone wanted to make edits, they first needed to get their accounted approved by an administrator. Over time, fewer new editors were approved, and the number of contributions to the wiki declined. Wikibase Registry fell behind on software updates, including critical security updates. The query service, which helped make the Registry useful to begin with, also went down, with apparently no one noticing.

Setting up a custom Wikibase has been a very difficult task. When I first started experimenting with Wikibase in 2015, with the first-generation Librarybase, your only option was to install the Wikibase extension(s) manually on your own MediaWiki instance. There were hardcoded values specific to Wikidata that needed to be fixed manually. And this did not automatically get you accessory services like Blazegraph, which is crucial for a useful Wikibase. The release of standard Docker containers has helped simplify the process greatly, but the work involved is nonetheless significant.

Wikibase.Cloud effectively eliminates this challenge altogether. Wikibase.Cloud is operated as a free service of Wikimedia Deutschland, a German charitable organization recognized as an official affiliate of the Wikimedia Foundation. Anyone who is getting started with Wikibase has the option to get a custom Wikibase with no development work and no financial cost. Predictably, this has led to the number of Wikibases in existence increasing. With the usage of Wikibase.Cloud growing, and participation on Wikibase Registry declining, I decided to set up a new registry of Wikibases on Wikibase.Cloud called Wikibase World. I then shut down the old Registry, with the permission of the maintainers, and ensured that links to the old would continue to forward to their new counterparts.

This post documents my process for carrying out this migration, including the various decisions I made. The goal is not to document the process of migrating a Wikibase, whole-cloth, from one server to another. That is a straightforward affair better covered by other articles. This, rather, is a messier process, where a new Wikibase with distinct P-IDs and Q-IDs takes over for a predecessor. This post may be able to help you whether you are migrating a self-hosted Wikibase to a new wiki on Wikibase.Cloud or to another pre-existing Wikibase.

Creating the new Wikibase #

When I created Wikibase World, my goal was not specifically to replace the Wikibase Registry. What I actually wanted to do instead was to migrate my own list of Wikibases I had been keeping in Airtable to a Wikibase. I had been using Airtable to keep track of interesting Wikibases, including several that were not documented in the Wikibase Registry as they were newer. Airtable was also convenient because it allowed me to express relationships between the Wikibases in my database and current/future projects of mine. Ultimately I decided the data would be more valuable in a Wikibase, where others could create new entries and improve existing ones.

I created a new wiki at Wikibase.Cloud using wikibase.world as a custom domain. (You also have the option to set up a wiki with a wikibase.cloud subdomain). I chose the domain as it was available, I liked the name, and I long felt that a Wikibase that sat at the center of the Wikibase federation should have a memorable URI. (Before, the domain was wikibase-registry.wmflabs.org.)

With Wikibase World created, I began filling out the wiki with entries based on those in my Airtable. First was Wikidata, at Q1. I worked through my table of wikis, referencing the Wikibase Registry as an additional data source. At this stage, my goal was not to swallow the entire Wikibase Registry. From my point of view, I had no authority to shut down or replace the Registry. However I could use it (particularly, its rich pre-existing vocabulary of properties) to help build out this new project.

With a sufficient body of data I announced the new Wikibase World in early June of this year. One of the original developers of the Wikibase Registry saw my announcement and suggested that the time had come to retire the Registry. At that point I pursued merging the two sites in earnest.

Mapping the old Wikibase with the new #

Rather than export the Wikibase Registry’s database and have it imported into Wikibase.Cloud, I instead decided to pursue Wikibase World as a new project. The most significant consequence of this decision is that items that were granted one identifier in the old system could no longer be guaranteed that identifier in the new system. This is because Wikibase assigns identifiers sequentially starting from Q1, and once an identifier is assigned, it cannot be reassigned. Wikibase World assigning Q6 to Wikimedia Cloud Services means it is no longer available for DroidWiki. This break in numbering meant that I would need to map the assigned numbers on the old wiki with the new wiki.

On any Wikibase, you can get a list of all items at Special:AllPages, with the Item namespace selected. (On Wikidata, items are located in the main namespace.) You can also get a list of all properties at Special:ListProperties. Wikibase Registry only had around 150 items, so it was easy to go through each record by hand. After passing through each item on Wikibase World that also had an entry in Wikibase Registry, I created new items in passes until I had worked through almost every record on Wikibase Registry. Most records between the wikis had very clear 1:1 relationships between each other. Others required more research to determine what unambiguously it was. My goal was to ensure each item on Wikibase World could be unambiguously identifiable, but because of this, I chose not to transfer seven items and four properties. This is equivalent to me deleting those records.

At first I tracked the mapping between the two wikis in a Google Sheet, but when it came time to actually enforcing these redirects, I turned to Flask. Flask is a framework for writing simple web applications in Python. The app I wrote is quite simple: it takes web requests to the old domain, translates the entity number from Registry to World, and redirects the user to the new record. The app responds to the many different ways you can access a page through MediaWiki, including short-form paths like this: /wiki/Item:Q2, longer-form paths like this: index.php?title=Item:Q2, Wikibase-specific access routes like Special:EntityData, and others. It does not support api.php so any links to there will break, or more technically, redirect to Wikibase World’s api.php with no change in parameters. Within Wikimedia Cloud VPS, I configured the wikibase-registry.wmflabs.org domain to target the VM containing my Flask app instead of the Wikibase. Once I did this, all traffic to the Wikibase Registry started forwarding to Wikibase World. I have posted a GitHub Gist of the app.py file that you can copy for your own project.

Here are some easter eggs: If you visit the /mapping path on the redirect service, you will see a JSON-formatted mapping between Wikibase Registry and Wikibase World. /dropped includes the list of dropped entities.

I also posted the World mappings to the Registry items themselves, shortly before locking the database of that wiki altogether, using the “exact match” property. This kept a record, close to the original, of the relationship between a given item and its successor on the new wiki. This is particularly useful for anyone using a dump of the wiki, as I will elaborate on below. I chose to add the identifiers to the new wiki on the old wiki, instead of the other way around, because from my point of view, it does not sense to add identifiers to a defunct service on a brand-new one. It would be useful to only a very niche group of people, while anyone using the Wikibase Registry archive benefits from knowing the Wikibase World ID.

I cannot stress enough the importance of setting up a forwarding service when changing the URL paths of your content. The web is built on links, especially the semantic web. Every broken link is a tragedy, and each unresolvable link undermines the usefulness of the web. If your Wikibase was never used by anyone other than yourself, you may get away with simply discarding the old wiki. However, doing this work ensures there is a link between the old and new projects so that you are not starting entirely from scratch. Once you put a link or resource out on the Internet, you don’t know who might be using it.

Archiving the old Wikibase #

Although I had migrated any and all meaningful content to the new wiki, I still wanted to make the original available as an archive. One reason is that I chose to make Wikibase World as a new site, rather than as a direct continuation of its predecessor. If I chose not to migrate something to Wikibase World, it would at least be available in the archive.

Before shuttering the wiki for good I posted a message on the wiki’s primary discussion page announcing this impending change while helping users create new accounts. I waited around two weeks before enabling the redirect service. I also added a notice to the site’s MediaWiki:Sitenotice page announcing this change. I set the expectation that the wiki would go offline permanently after those two weeks before I figured out a way to restart the Wikibase’s Docker container with a new LocalSettings.php file that added a $wgReadOnly value, setting the wiki as read-only. Disabling the wiki at this level ensured that no one at any access level could edit it. Since most security concerns are mitigated with a locked database, this archived version of the wiki can stay online until mid-2025, when the Debian 12 operating system is discontinued. At that point, Wikibase Registry will go offline permanently. The redirect service will be maintained indefinitely; it’s thankfully low-maintenance.

After retargeting the Registry’s old domain to the redirect service, I then set up a new subdomain for the archive: wikibase-registry-archive.wmcloud.org. This was the same Wikibase Registry as before, now frozen in time with a large notice on the top (incorrectly) announcing that it would all go offline July 1, 2023. With the wiki now in its final state, I got to work dumping the wiki for preservation. Through MediaWiki’s dumpBackup.php and the Wikibase extension’s dumpRdf.php script, I prepared XML and RDF dumps of the wiki. All wikis running the MediaWiki software can produce XML dumps, which include page contents at all revisions as well as revision metadata, but not private information like passwords or email addresses. The RDF dump is particular to Wikibase, and comprises the knowledge graph of that wiki as expressed through its entities. I uploaded these dumps to the Internet Archive, ensuring that they will stay online indefinitely at no cost to myself.

Announcing the new Wikibase #

Finally, I announced Wikibase World to the Wikibase User Group mailing list. I had not previously announced in a very public or conspicuous forum that I was working on this project, so in this one email I had to bring people up to speed on why the Wikibase Registry was going away, what it was being replaced with, how I was handling the potential for broken links, and where archived copies of the wiki could be found. In my opinion this email is a succinct guide to carrying out such a migration for any wiki. Now that it is a complete project, I wrote this post as a more detailed complement to that email.

Migrating your own Wikibase #

If you have a self-hosted Wikibase that you would like to migrate to a cloud provider, Wikibase.Cloud is a good option that is free of cost. However, setting up an entire Wikibase for your project is not the only option. If you would instead like to incorporate your data into a pre-existing Wikibase, I operate a couple of Wikibases that you may be interested in:

If you go the route of creating a Wikibase, I highly encourage you to register it in Wikibase World. Anyone can create an account and participate.

I am happy to help with any Wikibase migration projects. Feel free to schedule some time to talk, or send over an email.