How to delete personal data from Wayback Machine
I’ve been maintaining persistent online identity for more than two decades. The side effect is that if you know where and what to search for, you will come across traces of my online activity going back to early 2000s, including juicy things the 2024 me may not want you to see. A lot of that is possible thanks to Wayback Machine operated by Internet Archive.
I love Internet Archive. I deeply appreciate their work. Wayback Machine is an excellent research tool and powerful weaponry against various malicious attempts to rewrite history. However, Wayback Machine is also a valuable source of data for open-source intelligence and I can imagine many stressful situations that could emerge from that.
The internet keeps evolving, sometimes in very hostile ways. At some point of my life I had to sit down and think about my personal threat model. Without getting into too much detail, I decided that I still enjoy putting my real name next to things I produce online, but I’m not particularly fond of sharing too much data at once. Searching through old snapshots of my defunct websites in Wayback Machine I found a lot of cases when the past me would violate that principle in particularly scary ways. So I decided to act on it.
Internet Archive is particularly meticulous at archiving the internet and I deeply appreciate that. But they also allow for deleting saved snapshots assuming certain conditions are met. Their support article provides a good introduction on how to initiate the process.
I managed to successfully exercise that freedom to get a few old websites of mine removed, both defunct and still existing, both owned by me and controlled by a third party. While the process may not look particularly clear and straightforward from the outside, I assure you it’s not as daunting as it seems.
Prerequisites
Some important stuff to consider before we progress further:
- I’m an European, which means I have access to this notorious atomic device called GDPR. I don’t know if that affected the way my requests were fulfilled in any way, and to be honest, I hope that wasn’t the case. Privacy and security shouldn’t be region-locked to one specific continent.
- Personal data was involved. Okay, revealing my real name is not a concern within my threat model. However, revealing my real name next to certain specific data points I no longer want to share becomes an issue. Internet Archive themselves state that the sole fact of mentioning one’s name may not be a sufficient reason for the resource to be deleted, which means I could have failed if I tried to delete any mentions of me from third-party websites.
- I wasn’t in a rush. Nothing of value was under immediate threat due to my data being visible in Wayback Machine. My requests were processed within a few working days (usually less than 5) and I didn’t mind waiting. I’m not sure if there’s any way of speeding things up in case of emergency. I just wanted to proactively resolve a minor opsec issue.
- I’m an individual, not a company. Some of my requests involved a site that serves as my ‘business’ website, but that doesn’t change the fact I operate as an individual person. I don’t know what happens in case of requests made on behalf of companies or institutions.
- There are no guarantees, especially in ambiguous cases. Internet Archive makes it clear the outcome isn’t always guaranteed to be positive. Therefore, it is our responsibility to prove we’re authorized to submit the request and that valid reasons exist for that request to be fulfilled.
- A lot of interesting records of my online past were deleted in the process. I may not be particularly proud of what I used to think and write as a teenager, but everyone has their share of brainless things done in the past. Losing my old writing makes me sad, but certain things I have access to these days are worth protecting at the cost of a few nostalgia trips in 2030s.
One important thing
Before we get to the technical part, let me state this clearly:
Be. Nice.
In gloomy times of people’s data being fair game for “AI” companies, ad brokers and all kinds of shady actors making a business model out of theft, Internet Archive is one of few entities I consider a good actor in this world.
It’s a non-profit digital library that does titanic work providing free access to numerous resources, including websites, print materials, multimedia, and god-knows-what-else. We need (even if we don’t deserve) such a powerful advocate for free and open Internet, either for ourselves or future generations.
Again: be nice to those folks.
I saw online articles suggesting requesting data removal by plain ol’ GDPR request. It’s a valid method of accomplishing the goal, but it feels a bit like shooting sparrows with cannons. There’s no need to escalate things this quickly this early. I’m pretty sure a polite request with sufficient justification is a good start (and in most cases it should lead to a positive resolution).
Seriously, my European friends. If you aren’t in a hurry, just ask nicely and see what happens next.
Scope
This article is based on my direct experience getting the following resources deleted, all of them belonging to me throughout their entire life cycle:
- one active Twitter / X account,
- one blog hosted on a popular platform,
- selected old snapshots of existing domains (still under my control),
- a few defunct websites hosted on subdomains provided by third parties,
- some other resources I forgot to mention.
All of these requests were successfully completed within a few working days.
Note for my Polish friends. Switching languages for a moment: emaila do Internet Archive zapewne wypada napisać po angielsku, ale nie ma problemu ze zgłaszaniem polskojęzycznych stron. Nigdy nie poproszono mnie o wyjaśnienie czegokolwiek, co wynikałoby z bariery językowej. Więc jeśli chcecie usuwać swoje stare blogaski z Onetu, śmiało. 😄
Anyway, let’s do this!
To get our data removed from Wayback Machine, we should start with compiling the list of URLs to be removed, as well as answers to the following questions:
- is that URL still active? If I lost access to it, can I regain it, even for a brief moment?
- does that domain / subdomain belong to me or somebody else?
- do I want the entire website to be excluded, or just specific snapshots from the past?
Depending on answers to the questions above, you want to follow one of the scenarios discussed below.
Starter kit
Draft an email that should be sent to info@archive.org:
Hello Web Archive,
I kindly request erasure of the following resources available through Wayback Machine:
- http://example.org - all website snapshots saved until January 2014
- http://example.com - all website snapshots between 2005 and 2012
I request deletion of those resources due to the fact they contain personal data I am no longer comfortable sharing publicly.
Your sincerely,
John Doe
But don’t send it just yet!
You have to provide proof you’re authorized to submit the request. The proof should make it clear you are (or were) the legitimate owner of websites you listed in your request.
How to validate the request? That’s what we’ll discuss next. Each scenario can be resolved by answering to at least one of the questions I’m listing below.
Once you identify your type of request and gather enough information to legitimize your request, add it to your email and send it.
Deleting snapshots of an existing website (that you still control)
This scenario is straightforward.
Is there any email published in the website snapshots? Can you still access it? If the answer is ‘yes’, send your request to Internet Archive from that email. In your deletion request, include the URL listing that email.
Can you publish something under that website URL? If the answer is ‘yes’, publish a subpage with a copy of the request you’re going to send via email, and include the URL of that subpage in your email.
This is just a few ways of verifying website ownership. Others may involve adding specific DNS records or using email address visible on a WHOIS lookup listing, but I didn’t try these.
Deleting snapshots of existing social media accounts (Twitter / X, Linkedin…)
Still no big deal.
Can you post something publicly and provide link to it? If so, post the content of your Internet Archive request (something along the lines of Hello Internet Archive, I'd like this profile to be excluded from Wayback Machine, thank you!) in a way it’s publicly available, and include that link in your deletion request email. In case of Twitter / X, a tweet posted from a publicly visible account will do. In case of LinkedIn or Facebook, publicly visible status update should work.
Is your email address listed on your profile? If so, use that email to contact Internet Archive and include the link to the place of your profile where that email is listed. If not, add it and then use it maybe?
Be careful - some social media platforms are particularly hostile to non-registered visitors by hiding user profiles (or all data in general) behind a login form. I don’t know if Internet Archive utilizes their own user accounts to verify profile ownership.
If your social media account doesn’t exist any more, head to the ‘ultimate’ case below.
Deleting snapshots of websites you don’t control any more
Here’s where things may get tricky, but the case isn’t hopeless.
Is there any email published in the website snapshots? Can you still access it? If the answer is ‘yes’, send your request to Internet Archive from that email. In your deletion request, include the URL listing that email.
Note that some hosting platforms recycle usernames. Internet domains return to the pool if nobody else wants to purchase them. Therefore, a potential yet unproven idea could be to obtain the hosting package / domain in question just so you can publish your request to Internet Archive on it. But you’re trying it at your own risk.
None of that works? Well, it’s time for…
The ‘ultimate’ case: deleting data from third-party sites you don’t control any more
Tl;dr: prepare a copy of your identity document containing your personal data that can be validated against the resource you want to delete.
This applies to:
- internet domains or subdomains you no longer control and can’t get back,
- deleted social media profiles, or social media profiles on defunct platforms,
- any other resources you produced, but can’t get back for any reason.
In all of these cases, I proceeded in the following way: I sent a deletion request including a thorough explanation why I couldn’t prove my ownership of the URLs in question (stating the truth usually helps - if the platform doesn’t exist any more, just say so). Internet Archive responded with a link to deletion request form. The form made it possible to request deletion of a few different types of resources (domains / subdomains, account platforms, etc.) as well as prove my ownership in a few different ways.
If every single method fails, you can upload and submit a photo of your identity card. The photo doesn’t have to be complete, i.e. irrelevant personal data can be covered.
Whether Internet Archive warrants this much trust is up to your discretion.
What else can be helpful?
There are a few extra things that may have made my requests look more credible, but don’t quote me on that. I’m just enumerating practices I followed without much thought.
- I use an established online identity I’ve been maintaining consistently for very, very long time, and that is easily verifiable with any search engine. Also, I use email in my own domain to send requests.
- I used my full name and surname. Since my requests were about personal data, one could effortless find my name between a bunch of foreign words.
- I explained all of my cases to the best of my ability. Platforms like Twitter or Wordpress.com are probably well-known enough. A few old websites of mine, however, were published at subdomains provided by commercial entities. I opted for transparency and hoped for the best.
Summing up
Internet Archive doesn’t bite.
I didn’t intend this article to be a guide on privacy or threat modelling. However, if you consider records of your past activity a significant attack vector, I strongly encourage you to search and request the most intimidating artifacts to be deleted.
Also, consider donating to Internet Archive. If they don’t deserve, nobody does.