James Hare

Three new concepts for organizing work on Wikipedia: Workspaces, Buckets, and Sprints

How do you suppose Wikipedia keeps itself up to date across its several million articles?

The short answer is that the community works together to make its work smoother. It documents years of debates and precedents so that the answer to a perennial editorial question is just a link away. It comes up with processes for managing conflicts within the community. And it has built a large library of open-source software tools to make its work more efficient; the Toolhub catalog tracks over 3,000.

They also form working groups that are often known as WikiProjects. Although Wikipedia has a rich library of software tools that help you improve Wikipedia, there are not many tools to help you build WikiProjects. While I have built tools in the past to try to do this, as I will describe below, the results were mixed.

After several years of gaining experience and perspective, I think I know how I would approach this problem again. My proposed solution is to build new data services revolving around outstanding article tasks, enabling a new generation of bots that use standardized wiki templates. Or, to summarize: workspaces, buckets, and sprints.

Background #

Wikipedia’s user experience is not very sophisticated. Much of the Wikimedia Foundation’s investment in software development has focused on features very close to the core use case of Wikipedia, including article editing and translating. I can tell you as someone who started editing Wikipedia in 2004 that the experience of editing Wikipedia articles has improved drastically since then. However, there are fewer features that go beyond the core editing experience and more into social interaction. I attribute this largely to their failed attempt at building a new discussion system; this caused them to mostly shy away from social functionality, only recently revisiting the subject with Discussion Tools in 2019. This has overall left the user experience feeling fairly basic by the standards of modern web applications.

What Wikipedia lacks in built-in features more than makes up for in its capacity for customization. Users build the features they need. MediaWiki has its own templating language and eventually added support for both the Lua programming language and CSS within templates as well. Wiki pages can be automatically updated by bots written in Python (or most any programming language), subject to approval and regulation by the community. Toolforge provides a platform for operating wiki-adjacent web apps, with free-of-charge virtual machines for larger projects. And if those aren’t good enough, and you want to run custom JavaScript on a certain page or on all pages, you have options to do that too. These radical customization options are Wikipedia’s secret weapon to its sustainability, perhaps one of many. A relatively small number of extremely dedicated volunteers can keep up with a text the volume of English Wikipedia, that is constantly being updated by anonymous strangers, with the help of an ever-expanding suite of self-developed tools.

WikiProjects emerged from this ecosystem. They are not a software feature of Wikipedia so much as a social feature. You create a WikiProject by creating another wiki page, like you would create a new article. You can combine together pre-existing templates, bot-updated reports, and external software tools to make your project more sophisticated, but this is entirely on you.

I have found WikiProjects interesting for a while. In 2007 I started WikiProject Pharmacology and in 2009 I re-started WikiProject Elections and Referendums. Both are projects that continue to operate to this day, serving as a focus point for contributors interested in those subjects.

In 2015 and 2016 I was awarded two grants from the Wikimedia Foundation to research how to make it easier for volunteer contributors on Wikipedia to organize groups on-wiki. This initiative was called “WikiProject X”.

WikiProject X leveraged most of the options Wikipedia gives you to customize pages and the user experience in general. WikiProject X built several tools within Wikipedia’s ecosystem, including a set of templates for building out a project page and a Python bot that provided automated reports. An example of these tools in action is WikiProject Women’s Health, which has mostly held up over the years despite the lack of maintenance. WikiProject Women in Red and WikiProject Medicine are two very successful projects using heavily customized variants of the WikiProject X templates. With limited funding for two people, part-time, for months at a time, I would say WikiProject X managed to put forward a vision of what WikiProjects could be.

The benefit of writing templates and scripts, rather than building a full software feature in MediaWiki’s native PHP, was that we could build and deploy a prototype in a matter of weeks. We set up the templates on-wiki and used them on the pages where we were cleared to use them. Contrast this with the process of getting a new software extension deployed to Wikipedia’s production environment, which includes extensive security and performance review and can take over a year. Compared to that, on-wiki templating and tool development is relatively permissionless and drastically faster, and surprisingly powerful when combined with external web applications (or better yet, built-in interface gadgets).

I did not fully realize this until after I attempted to codify the prototypes of WikiProject X into a more formal MediaWiki extension, for deployment to Wikipedia’s production environment, called CollaborationKit. Even if we were able to get a prototype deployed to the beta cluster, the work needed to make it perform to Wikipedia’s standards and integrate with the rest of production ultimately stalled the project. Meanwhile, WikiProject Women in Red, originally a WikiProject X pilot project, had blossomed to become one of the largest WikiProjects on the entire site. WikiProject Medicine, another large project, unilaterally adopted the template framework with some customizations. CollaborationKit was eventually un-deployed from the beta cluster, while the template system managed to increase in adoption over time, through no marketing on my part. Tragically, I needed to move on to other things, and I have not been able to pay significant attention to these tools since 2017.

After working on WikiProject X and other freelance and grant projects, I spent a couple years working at the Wikimedia Foundation, after which I started consulting on Wikipedia, Wikidata, and product development through my own business. Over time I found my need for these kinds of tools returning. I had previously been hopeful that if I had not succeeded in revamping WikiProjects, someone else would; perhaps the Foundation would re-visit the subject. However, as far as I can tell, WikiProjects work the same way they have always worked.

The opportunity #

In 2011 I and several other Wikipedia editors in the Washington, DC area founded Wikimedia DC to build partnerships with organizations in the area. While doing this work I had come to view WikiProjects as key to making outreach to potential partners smoother. When we onboarded a new Wikipedia editor, there were no clear and obvious prompts on how to get started. Wikipedia does have to-do lists, but to understand how to take action based on them requires you to read through multiple pages of documentation. There are dauntingly long maintenance backlogs to work through, and pretty much any random article will have some kind of complaint-box at the top suggesting how the article could be better, but rarely is any of this presented in an organized, easy to sort and access way, directly to the user. Recommendation algorithms that help editors exist, such as SuggestBot, but they are not built directly into Wikipedia’s interface. WikiProjects come pretty close to approximating this desired experience.

One common feature of Wikipedia institutional partnerships is “edit-a-thons,” or group editing events. People would gather, typically in person, to work in real time on a particular subject. These events are typically paired with training, so at the beginning of an event, you would learn how to edit Wikipedia, and then you would begin editing. Without automated recommendations, the organizers need to produce their own lists of tasks, and this requires additional work. None of this is to say that using software tools to semi-automatically generate task lists is impossible, but that it requires additional know-how of Wikipedia’s ecosystem. Any lists produced this way have limited visibility across Wikipedia, and to the users who could most benefit from them. Wikipedia’s idea of task recommendation is not a frictionless experience.

The opportunity is fairly straightforward: the contributors who know most about their subject area on Wikipedia could build a backlog of tasks to do, and the partnering organization could work through that. The WikiProject provides contextualized on-boarding based on that editor’s interest without relying on that user’s personal data. A partnering organization could expand their participation by offering resources and making other announcements to relevant Wikipedians through the WikiProject. Individual contributors, not connected to any particular organization, benefit from WikiProjects as a small, human-cognizable subset of Wikipedia’s community, a group that is easier to approach and interact with. The more accessible Wikipedia is to someone, the easier it will be for them to stay on, especially when it gets stressful, and it can get stressful.

Three concepts #

Last fall, I began working on what I called “focus tools.” The goal of focus tools was to abstract WikiProject functionality from the specific use case of WikiProjects. This would allow “WikiProjects” of different kinds to be built without all the work that is needed to build a WikiProject currently. These tools would allow you to focus on a given area of Wikipedia.

My first opportunity to begin implementing these focus tools came in early 2023. I worked under contract with the organization Hacks/Hackers to build new features for the Vaccine Safety project. Vaccine Safety seeks to ensure that articles on Wikipedia that discuss vaccines cite reliable sources. For Vaccine Safety I set up automated reports of sources used in vaccine-related articles. I created new bots and templates to support this work.

The work I did for the Vaccine Safety project was the first phase in support of a larger product vision. This vision is organized around three new pieces of technical infrastructure. “Workspaces” is a system of templates that can be used to build WikiProjects or similar project areas, on-wiki. “Buckets” are quick projects with auto-updating project scopes, supported by a new Pageset API. “Sprints” organize outstanding tasks related to articles for coordinated editing, built on a new Taskset API. Below I will describe how these facets work together.

On-wiki workspaces #

A “workspace” is any wiki page being used to plan wiki editing, be it a WikiProject, an institutional partnership planning page, or one’s own sandbox. Wikipedia:Workspaces is a new set of templates that can be used to build WikiProjects or any other page used for collaboration or organizing work. Workspaces comprises mobile-responsive templates for building out project navigation headers, page/task lists, responsive-width content columns, and displaying headline metrics. For an example of a page that uses all these templates, see the Sources subpage for the Vaccine Safety project.

My experience developing templates for WikiProject X informed my approach to Workspaces. With what I had previously built, if you wanted to build a new WikiProject, you would use a meta-template called Load WikiProject Modules, and define each feature of the project as a parameter to this one template. The goal of this approach was to make setting up a WikiProject a turnkey experience—just add the template to the page and you’re done. In practice, this made the project templates very inflexible. WikiProjects, even ones using standard layouts, are diverse in their content and how they present it. Taking away flexibility from the user is not consistent with Wikipedia’s overall user experience.

The design goal for Workspaces is thus to create a system of templates that work well together and do not clash, but critically, can also be used independently of one another. Each template produced as part of the Workspaces project is fully self-sufficient and can be used in totally different contexts. This is a departure from the WikiProject X strategy and more in line with how Wikipedia templates are usually built. With adjustments to the underlying Lua modules to support template parameter name localization, these templates could also be exported to Wikimedia projects in other languages or other wikis using MediaWiki software.

My other goal for Workspaces is to provide a standard way to invoke bots to run on wiki pages. For example, say you wanted a metrics dashboard for your WikiProject, and you wanted your metrics to be provided by “XYZ bot”. At the moment, each bot developer needs to devise their own way to process requests and act on them. For example, ListeriaBot, which produces lists of pages based on Wikidata queries, uses Template:Wikidata list to format its output, but also screens which pages call that template to ascertain where to post its reports. However, if “XYZ bot” used the standard metrics dashboard template, invoking the bot could be as simple as adding {{metrics dashboard|bot=XYZ bot}} to your project page. While the alert list template might not be suitable for something like ListeriaBot, a future template could be. Bot developers should remain free to implement bots how they wish, but they should not need to re-invent the wheel either.

Article buckets #

While some WikiProjects are task-oriented and include the entire encyclopedia in their scope, most WikiProjects are scoped around a subject. This could be as broad as biographies, i.e., all people with Wikipedia articles, or as narrow as just one very notable person.

To operationalize this scope, you need to create a flat list of articles. Wikipedia categories can help you build this list, but they are not themselves flat lists. Categories can arbitrarily become parent- or subcategories of one another. This almost guarantees that any automated traversal of the category tree will spiral out of control and include pages very far afield of the initial scope. A WikiProject would typically get around this by building its own parallel categorization system; see this example from WikiProject Military History. This is very labor intensive and can leave behind a lot of clutter, especially as these projects become inactive.

Newer projects can use the aforementioned ListeriaBot to build these lists through queries to Wikidata, but many existing tools expect you to have a category system. Listeria is just one of many pieces of software that use the Wikidata Query Service to put together lists of articles or topics based on arbitrary combinations of criteria such as “women writers born in the 19th century”. This work linking topics to each other, and to their respective Wikipedia articles, happens on Wikidata, and so no work is duplicated. PagePile allows similar static reports based on a broader set of criteria than just categories or Wikidata queries, but these are not posted directly to the wiki.

Listeria is a great tool for producing a list of pages on-demand that you could then build a work list around. Nonetheless it is limited in its application. Listeria produces a wiki table as its output, which is useful for humans but not for any bots that could use these page lists. What if a query written by an editor for use in Listeria could be re-used by other bots? And what if these Wikidata-generated lists could be updated the way categories on Wikipedia are?

Earlier this year I began work on a codebase called “Pageset” that, humbly, puts together lists of Wikipedia articles (in any language) based on either category membership or Wikidata queries. I built it to support a single project, the Vaccine Safety project, which uses a combination of category membership and Wikidata queries to define its scope.

Pageset’s development has only started. Future plans include allowing anyone to define sets of pages through an easy-to-use interface, and to then form composed page sets through unions, intersections, and set differences of existing page sets. While I suppose I can’t force anyone to build page sets in a modular way and then compose them together, I would strongly encourage it. If you were building a project around the aforementioned example of women writers in the 19th century, imagine how much easier it would be if you could define a new project by mashing together existing lists of women, writers, and 19th century biographies, instead of having to define this complex scope yourself.

Where this will go beyond existing tools like PagePile is support for access over an API. This will allow bots to use Pageset as a source of page titles, meaning that a list built in one place is reusable in others, just like how bots can navigate Wikipedia categories through Wikipedia’s API. Pageset’s API will be used by bots like Credibility bot, which updates the Vaccine Safety project with reports on what sources are being used in articles. Credibility bot currently uses Pageset internally to build the list of ~800 pages that are analyzed by the bot. All bots should be able to benefit from this, and that is my plan.

Pageset will also form the backbone for Buckets, which you can think of as like mini-WikiProjects. You fill out a form specifying what you want your bucket to be about, you specify where on-wiki you want to set up the bucket, and upon submitting the form a project page is built using the Workspaces templates. You can then customize this page like any other wiki page. The definition of the bucket is updated as the scope is refined and as articles are created and deleted within scope. You can create this page as a subpage of Wikipedia:Bucket, or you could title it like a WikiProject and build a WikiProject this way, or you could make it part of your personal sandbox.

Task-based sprints #

You have the Wikipedia articles that comprise the scope of your project, and you have a page on-wiki with automatically updated reports. These reports may give you signals about sections of Wikipedia that need work. They may provide metrics about what Wikipedia’s contributors have achieved there to date. What is missing from this picture is explicit prompts of what work needs to be done within the subject area.

There is no shortage of work to be done on Wikipedia, and there is no shortage of lists or other systems tracking these opportunities. Arguably, this is part of the problem. Wikipedia is ultimately a collection of documents, rather than a structured database. This means that each to-do list created on-wiki is going to be its own silo, and Wikipedia is littered with these silos.

As an arbitrary example, if you go to WikiProject Missing Encyclopedic Articles, you then need to navigate through many different pages to find things to work on. This is not a searchable database, but many different text documents you need to sort through manually. If you were building a new WikiProject, you could spend weeks scouring Wikipedia for all these different lists. Not just the tasks identified at the WikiProject Missing Encyclopedic Articles hub, but any tasks individual WikiProjects or even individual edit-a-thons have surfaced. These lists also have a maintenance burden; once articles are written, they need to be manually removed from the to-do list.

For existing articles, the situation is more straightforward, as templates and categories can be used to directly associate a given task (copyediting, adding references, etc.) with their applicable articles. If you want a list of all articles with unsourced statements, simply visit Category:All articles with unsourced statements. There is nonetheless a maintenance burden here. If a tag appears on an article saying it needs to have pictures added, and then someone subsequently adds pictures, that person may or may not remove the tag. The hesitation could come from not knowing if the task is sufficiently complete. They may also feel that the judgment necessarily needs to come from another person serving as a reviewer. This ambiguity can result in maintenance tags persisting on articles for several years.

Coordinated task management was pursued during the second round of WikiProject X development. I approached this general problem with a Django app called Wikipedia Requests, meant to be a new database of requests to create new Wikipedia articles or improve existing ones. Tasks in this database could include comments and you could mark a task as resolved at the push of a button. The idea was that, incrementally, manually developed lists would be migrated to this new database, with the original lists being replaced with bot-updated ones. Wikipedia Requests is no longer operational, but I have preserved the database and I plan on incorporating the data into a new product.

Ultimately, I think I had the wrong idea with how I built Wikipedia Requests. Trying to migrate the wealth of pre-existing (if poorly organized) task items on Wikipedia to a new database was going to be a losing battle, like bailing water out of the ocean. This is why my upcoming data service Taskset will take a slightly different approach. Rather than exist as a new destination for requests, Taskset will be built by screening existing backlogs on-wiki, with tasks marking themselves as resolved based on automatic criteria. For example, if the task is to create a new article, the task is resolved when the article is created. This way, the way you keep Taskset up to date is by keeping Wikipedia up to date – no need to update two different things.

The Taskset API would enable the creation of new bots and new tools. To address the backlog of pages with maintenance templates, I would build an accessory app that pulls outstanding backlog items from the Taskset API, shows page diffs between when the tag was added and current day, and then queues up diffs for review, allowing you to quickly remove maintenance tags from pages that no longer need them. As maintenance tags are removed from Wikipedia articles, Taskset updates its database. Where this needs to be given more thought is how to manage requests involving pages that do not yet exist; while a link to their Wikidata ID would be unambiguous, there is no guarantee that a given “missing article” would have a Wikidata ID or ever will.

While you could produce task lists within buckets and WikiProjects, you could also create a task-driven workspace called a Sprint. Sprints are inspired by the online edit-a-thons that WikiProject Women in Red holds. During these month-long campaigns, a theme and selection of article subjects are chosen for the group to work on, and all progress is tracked and reported back through bot-updated reports. A sprint could be built around a one-time event, or as an ongoing campaign. If a bucket is used to generally keep track of the progress of articles in a given area, a sprint keeps track of relevant tasks. A sprint could a subpage of Wikipedia:Sprint, or of a WikiProject, or its own standalone wiki page.

Unlike Workspaces and Buckets, work on Sprints has not yet started.

My vision #

Whether it was as a volunteer organizer for Wikimedia DC or as a paid contractor for organizations like Hacks/Hackers and the National Institute for Occupational Safety and Health, I have had certain recurring needs in my work, as have the people I have worked with. They included setting up project pages, coming up with metrics for a relevant set of articles, figuring out how to identify what those articles are, and finding tasks for people to work on at editing events.

None of these things are impossible. It is possible to do all of these things if you have sufficient knowledge of how to navigate Wikipedia’s maintenance workflows and community processes. If you have knowledge of Wikipedia-specific software tools, you can use those too.

Consider the situation where you do not have knowledge of these things. To you, Wikipedia is an online general reference that you sometimes read and consult. It teaches you about current events, physics, exploding whales, and everything that happened in 1848. You may have already known that anyone can edit it, or you could have just realized it for the first time. But you have only ever interacted with what we call “namespace zero” – the main encyclopedic content of Wikipedia. You have yet to interact with talk pages, or project pages, or user pages, or templates, or anything like that. This is how the vast majority of people interact with Wikipedia.

There are two ways you bridge this gap from reader to editor. One option is to just figure it out. More likely, you are going to ask someone to help you. Perhaps you’ll leave a message on a talk page asking for help, or you reach out to a local Wikimedia chapter, or you hire a consultant to help you. I would consider this a high-friction experience. It makes organizations depend on either Wikipedians who are not being compensated for their time, or on paid specialists. I do not think this problem can be eliminated entirely, but I believe I have figured out key pieces of infrastructure that help make this work easier for all of us. If this work is easier, we can support more partners, and in an ideal world, partners could onboard themselves.

Beyond partnerships, I want the experience of editing Wikipedia itself to be more organized. Since Wikidata launched a decade ago, Wikipedia and the Wikimedia ecosystem broadly has been moving strongly in the direction of structured, semantic databases. In Wikidata we have a strong way of associating Wikipedia articles across languages with their underlying concepts, but we do not have anything comparable for our backlogs. Starting a WikiProject often involves setting up an elaborate system of categories. And all of our to-do lists are siloed text documents.

Wikipedia editors deserve, and require, a convenient user experience. Wikipedia’s long term sustainability as a project requires more efficient coordination. And if the user experience is more accessible, more people can become Wikipedians. There is no reason why it can’t be as easy to start a WikiProject as it is to start a Facebook group.

Making this happen #

The first phase of this project with Vaccine Safety is complete, and I am currently working on obtaining funding to allow further development.

If you are a Wikipedia editor, the easiest way for you to lend your support posting on the user page for Credibility bot, the bot I developed for the Vaccine Safety project. Feedback, whether you are interested in signing up your project as a pilot, or if you have detailed criticism, is useful.

Everything built for this project is free and open source software, meaning that once I have written it and it is out there, it belongs to the community. This project will be built by and for the community that makes Wikipedia possible.

If you have any other questions and prefer to contact me privately, you can reach me at [email protected].