How blocklists prevent the internet to be decentralized – and safe.

TL;DR: We are currently experiencing recurring malware upload waves, incurring pain to volunteers and disruptions for users we can do very little about. Some thoughts to start a discussion.

Safety on the Internet is crucial. At Codeberg, we are working towards this goal every day. Moderating a platform can become a great headache, you don't want to imagine what kind of s...tuff we need to review day-to-day.

But you know what currently is the biggest headache to us? Blocklists. Yes. The well-intending, self-appointed, unaudited and unaccountable policing "services", non-commercial and commercial.

This post is a rant of our team about the strike first, collaborate later attitude of many blocklists on the web, which has led to a lot of (in our opinion unnecessary) effort on our side, and a repeated bad experience for end-users.

We think, that the current blocklist ecosystem is not helping the small and independent contributors on the web, but reinforce and widen the gap between free and independent sites, and large proprietary platforms. How often get the big boys blocked? Who will get onto the list without getting checked twice? Instance operators like Codeberg face a lot of headache if the situation does not improve.

Sharing information about bad actors was a great idea, but ...

The idea of sharing information about website safety is a great one, or let's say at least well-intended. Before blocklists were widely available, you had no option but to inform the website owner or operator about a threat, then hope for them to take action.

This turns out to be a problem when websites are unmaintained (and thus compromised, because no one cares), or if the website operator spreads malware on purpose.

Blocklists extend the possibility to defend end-users on the web, and they allow to mark websites as unsafe when the websites don't moderate their content on their own.

Today, however, blocklist have to a great extent replaced maintenance efforts by website operators informing each other about threats on their systems. Blocklists are casually used (without double-checking who is getting blocked) even by the largest infrastructure operators like GMX (popular email service in Germany) and Quad9 (a broadly used non-profit DNS service with a mission we very much welcome). The policy is Strike first, maybe adjust later.

Once a domain is blocked by infrastructure providers, it is extremely hard to figure out the root cause, and actual blocklist maintainers are more often than not unreachable and unresponsive (this includes commercial services). Lucky you if you find a contact for delisting, permanent allowlisting for platforms like Codeberg is just not part of the intended service.

The internet ought to be a distributed world, yet it is more centralized than anything before ...

The internet has been designed as reliant, decentralized network. Everybody invited to join and contribute. In theory, everyone still can go ahead and host their own public service, email, web server, or a collaboration platform like e.g. an instance of Gitea/Forgejo. Codeberg does. Yet, in reality, whoever seriously dares to try, faces all kinds of problems, hardly manageable for an individual, or a small group of people – shoutout to everyone who is hosting a service as a single person.

Caring for the service and fighting abuse is hard enough in day-to-day business. Additionally, instead of helping these instance admins, they are fought – and their domains blocked.

Big proprietary websites also host malware. At scale. Enter the name of your favorite search engine, shared storage space provider into a news search site, add keywords like "malware" and "hash", and you find plenty examples.

In the past, we reported a lot of content to Microsoft GitHub. Their reaction time was much slower than ours, and they appear to be one of the best places for spreading malware. Yet, they have a free ticket. Google Drive and Dropbox even have been in the news for their slow response times. Would any blocklist provider dare to lock them out? Who would use the blocklist anymore? The current ecosystem of RBL and other blocklists is an exclusive filter on the independent internet. It does harm us, not the ones who actually spread malware, at scale.

What if ... we want to regain control over content? What if the Internet is again formed by many small public instances?

Dealing with malware in practice

We have observed these ways on how we learn about malware:

In an estimated 1% of cases, we receive an email to abuse@codeberg.org, and process it within a few hours. Total downloads of malware: Usually a few dozen at most. We usually act before the storm.

In the other 99% of cases where our support team becomes active, we investigate user reports of connection issues, realizing after the fact that independent or commercial blocklist distributors just disconnected part of the internet from Codeberg.org, without ever notifying us. Listening to feedback from users takes time and effort until we realized there is a mention of PiHole, browser addons or similar tools, and it takes further time to identify where our domain is listed.

We learn about yet-another-domain-blocklist number #0815 that some well-meaning small company or independent individual started to create, maintain and advertise (for a while, until it becomes stale like so many others), and need even more research to find the actual offending file, and try to contact the owner or maintainer. Lucky again if we receive a response.

The latter case is connected to a lot of headache and investigation effort, because lists are more often than not aggregated and sourced from many sources, incorporating input without validation or questioning validity. Often the actual source files went offline, or forms for delisting requests don't work, contact emails get no response.

The result of the latter case: Instead of effectively fighting malware, we spend a lot of time playing detectives on the big network of aggregators and metalists until we find the actual problem. The malware has been downloaded several thousand times before we take it down, and we go to bed with a headache.

But can't you ...

Yes, if we were a big for-profit company, we could likely do a lot differently. But we aren't. We are a small team of volunteers, maintaining a platform that per definition enables users to create and maintain every kind of content imaginable. This is literally mission and feature, not a bug.

We do our best to process abuse reports within a few hours at most. We actively watch out for suspicious access patterns and attachments.

Running virus detection software did not yield good results in the past. Often we deal with first-publications of malware. All too often they aren't even detected by the largest commercial malware detection services. Running the libre ClamAV on our attachments has proven to lack even the slightest trace of efficiency, and is a great waste of computing power: We literally never had a positive match yet. All the trusted tools turned out to be as effective as snake oil.

Doing better

Since we have been struck by another malware wave in recent days, and seen another wave of blocklists reporting us to downstream consumers, without ever notifying us, we have invested more effort to improve our internal tooling. In that regard, the blocklists also had positive effects, but we could have done more if the time wasn't wasted on research and end-user support, explaining them that their PiHole or similar was likely to be the problem, or helping users with their lost email, because their mailservers refused to talk to us.

Strike first, collaborate later. But exactly the collaboration part is what we would need to efficiently fight malware and abuse on the Internet. What can we do to receive reports by security researchers in the first instance, before we are shut down from the web? We are very open to suggestions!

Accountability and trust

Lastly, we want to thank everyone who makes the web a safer place, explicitly including maintainers of threat lists. For example, the tools at abuse.ch have been helpful to fight malware on Codeberg, although we'd love to see them use Mastodon instead of requiring Twitter accounts as login option :-)

Blocklists allow what e.g. Mastodon moderation does: They hold website owners accountable for their action. Everyone can decide which one to trust to moderate for you if instance or website admins fail. But please, delegate this trust with care!

We acknowledge that this power is a great tool. But with great power comes great responsibility. Every individual providing a blocklist to downstream users and customers must be accountable for correctness and fairness of the content of this blocklist.

Is the collateral damage we incur by disrupting essential service for a lot of legit people (just because we observe a few bad actors) really what helps us to make this world a better place?

Who holds creators of blocklists accountable for adding false positive domains? Whom do we trust to moderate our communication infrastructure, whom do we trust to enable and disable access to the web?

Is Strike-first-adjust-later fair and legal?

Are we asking too much if we think that the purpose of a website and provided content should be reviewed prior to blocking a whole domain? If we think that the operator shall be contacted with reasonable reaction period? Every law concerned with illegal content, from DMCA violations, to personal information rights, to even the worst imaginable offenses, requires a notification of the operator and provision of reasonable time for takedown.

Calling out actors who repeatedly wait until the deadline passes to act might be fair. Asking for better response times than local law might be fair, too. But is a strike-first-maybe-correct-later approach fair, when there was no information sent to the website operator? Is it even compatible with today's law?

How can we make the web a collaborative place again, if we hand over policing of access to opaque and unaccountable blocklist providers? How many blocklists do we need? Would it be better to sustain some well-moderated blocklists, just like Codeberg members grouped together to provide a smooth software forge experience with combined forces?

We hope some of these questions allow for a discourse about this matter. We think, that the current approach for blocklists needs a drastic change, if we want to stop sustaining the monopoly of proprietary platforms.

Kind Regards, and with slight headache
Your Codeberg Team!

PS: If you want to help our fight against malware, feel free to contact us at moderation@codeberg.org. We are ready to discuss insights, make new malware available for review and more. If it helps anyone, we will continue to share the malware we find on VirusTotal or other channels.

We're looking forward to your feedback, and making our experiences available to smooth the way for more public services operated by non-profits instead of mega-corps.