Dark mode switch icon Light mode switch icon

How to delete personal data from Wayback Machine

ca. 2200 words

I’ve been maintaining persistent online identity for more than two decades. The side effect is that if you know where and what to search for, you will come across traces of my online activity going back to early 2000s, including juicy things the 2024 me may not want you to see. A lot of that is possible thanks to Wayback Machine operated by Internet Archive.

I love Internet Archive. I deeply appreciate their work. Wayback Machine is an excellent research tool and powerful weaponry against various malicious attempts to rewrite history. However, Wayback Machine is also a valuable source of data for open-source intelligence and I can imagine many stressful situations that could emerge from that.

The internet keeps evolving, sometimes in very hostile ways. At some point of my life I had to sit down and think about my personal threat model. Without getting into too much detail, I decided that I still enjoy putting my real name next to things I produce online, but I’m not particularly fond of sharing too much data at once. Searching through old snapshots of my defunct websites in Wayback Machine I found a lot of cases when the past me would violate that principle in particularly scary ways. So I decided to act on it.

Internet Archive is particularly meticulous at archiving the internet and I deeply appreciate that. But they also allow for deleting saved snapshots assuming certain conditions are met. Their support article provides a good introduction on how to initiate the process.

I managed to successfully exercise that freedom to get a few old websites of mine removed, both defunct and still existing, both owned by me and controlled by a third party. While the process may not look particularly clear and straightforward from the outside, I assure you it’s not as daunting as it seems.

Prerequisites

Some important stuff to consider before we progress further:

One important thing

Before we get to the technical part, let me state this clearly:

Be. Nice.

In gloomy times of people’s data being fair game for “AI” companies, ad brokers and all kinds of shady actors making a business model out of theft, Internet Archive is one of few entities I consider a good actor in this world.

It’s a non-profit digital library that does titanic work providing free access to numerous resources, including websites, print materials, multimedia, and god-knows-what-else. We need (even if we don’t deserve) such a powerful advocate for free and open Internet, either for ourselves or future generations.

Again: be nice to those folks.

I saw online articles suggesting requesting data removal by plain ol’ GDPR request. It’s a valid method of accomplishing the goal, but it feels a bit like shooting sparrows with cannons. There’s no need to escalate things this quickly this early. I’m pretty sure a polite request with sufficient justification is a good start (and in most cases it should lead to a positive resolution).

Seriously, my European friends. If you aren’t in a hurry, just ask nicely and see what happens next.

Scope

This article is based on my direct experience getting the following resources deleted, all of them belonging to me throughout their entire life cycle:

All of these requests were successfully completed within a few working days.

Note for my Polish friends. Switching languages for a moment: emaila do Internet Archive zapewne wypada napisać po angielsku, ale nie ma problemu ze zgłaszaniem polskojęzycznych stron. Nigdy nie poproszono mnie o wyjaśnienie czegokolwiek, co wynikałoby z bariery językowej. Więc jeśli chcecie usuwać swoje stare blogaski z Onetu, śmiało. 😄

Anyway, let’s do this!

To get our data removed from Wayback Machine, we should start with compiling the list of URLs to be removed, as well as answers to the following questions:

Depending on answers to the questions above, you want to follow one of the scenarios discussed below.

Starter kit

Draft an email that should be sent to info@archive.org:

Hello Web Archive,

I kindly request erasure of the following resources available through Wayback Machine:

  • http://example.org - all website snapshots saved until January 2014
  • http://example.com - all website snapshots between 2005 and 2012

I request deletion of those resources due to the fact they contain personal data I am no longer comfortable sharing publicly.

Your sincerely,
John Doe

But don’t send it just yet!

You have to provide proof you’re authorized to submit the request. The proof should make it clear you are (or were) the legitimate owner of websites you listed in your request.

How to validate the request? That’s what we’ll discuss next. Each scenario can be resolved by answering to at least one of the questions I’m listing below.

Once you identify your type of request and gather enough information to legitimize your request, add it to your email and send it.

Deleting snapshots of an existing website (that you still control)

This scenario is straightforward.

Is there any email published in the website snapshots? Can you still access it? If the answer is ‘yes’, send your request to Internet Archive from that email. In your deletion request, include the URL listing that email.

Can you publish something under that website URL? If the answer is ‘yes’, publish a subpage with a copy of the request you’re going to send via email, and include the URL of that subpage in your email.

This is just a few ways of verifying website ownership. Others may involve adding specific DNS records or using email address visible on a WHOIS lookup listing, but I didn’t try these.

Deleting snapshots of existing social media accounts (Twitter / X, Linkedin…)

Still no big deal.

Can you post something publicly and provide link to it? If so, post the content of your Internet Archive request (something along the lines of Hello Internet Archive, I'd like this profile to be excluded from Wayback Machine, thank you!) in a way it’s publicly available, and include that link in your deletion request email. In case of Twitter / X, a tweet posted from a publicly visible account will do. In case of LinkedIn or Facebook, publicly visible status update should work.

Is your email address listed on your profile? If so, use that email to contact Internet Archive and include the link to the place of your profile where that email is listed. If not, add it and then use it maybe?

Be careful - some social media platforms are particularly hostile to non-registered visitors by hiding user profiles (or all data in general) behind a login form. I don’t know if Internet Archive utilizes their own user accounts to verify profile ownership.

If your social media account doesn’t exist any more, head to the ‘ultimate’ case below.

Deleting snapshots of websites you don’t control any more

Here’s where things may get tricky, but the case isn’t hopeless.

Is there any email published in the website snapshots? Can you still access it? If the answer is ‘yes’, send your request to Internet Archive from that email. In your deletion request, include the URL listing that email.

Note that some hosting platforms recycle usernames. Internet domains return to the pool if nobody else wants to purchase them. Therefore, a potential yet unproven idea could be to obtain the hosting package / domain in question just so you can publish your request to Internet Archive on it. But you’re trying it at your own risk.

None of that works? Well, it’s time for…

The ‘ultimate’ case: deleting data from third-party sites you don’t control any more

Tl;dr: prepare a copy of your identity document containing your personal data that can be validated against the resource you want to delete.

This applies to:

In all of these cases, I proceeded in the following way: I sent a deletion request including a thorough explanation why I couldn’t prove my ownership of the URLs in question (stating the truth usually helps - if the platform doesn’t exist any more, just say so). Internet Archive responded with a link to deletion request form. The form made it possible to request deletion of a few different types of resources (domains / subdomains, account platforms, etc.) as well as prove my ownership in a few different ways.

If every single method fails, you can upload and submit a photo of your identity card. The photo doesn’t have to be complete, i.e. irrelevant personal data can be covered.

Whether Internet Archive warrants this much trust is up to your discretion.

What else can be helpful?

There are a few extra things that may have made my requests look more credible, but don’t quote me on that. I’m just enumerating practices I followed without much thought.

Summing up

Internet Archive doesn’t bite.

I didn’t intend this article to be a guide on privacy or threat modelling. However, if you consider records of your past activity a significant attack vector, I strongly encourage you to search and request the most intimidating artifacts to be deleted.

Also, consider donating to Internet Archive. If they don’t deserve, nobody does.

Originally published on by Łukasz Wójcik