“Internet History Is Fragile”: Archiving and Preserving The Web

“Once it’s on the internet, it’s there forever.” That is the popular sentiment. But what happens when web data has been altered or deleted, how do we access the original data? Recently, there was public outcry in response to the removal of several pages from the official White House website by the Trump administration. Pages on LGBTQ, civil rights, and climate changes were removed within moments of President Trump’s inauguration. This erasure was particularly alarming for many people because it indicated the new administration’s sentiment towards minorities and the environment. Many people also believed these pages were perminately deleted and its data could never be accessed again. However, these web pages were in fact migrated to an archived version of Obama’s administration website. Even though the web data was migrated, its swift removal from the White House website reminded me that valuable information can easily be removed from public access. As users have the ability to alter and delete web data, data itself is rather fragile and transient.

Many of the open datasets removed from https://t.co/ESq0PmWmZG can be found in our Wayback Machine at https://t.co/bNwlONwfGa pic.twitter.com/lAXoLAG1Wi

— Internet Archive (@internetarchive) February 10, 2017

While I was on Twitter today, I came across a company that is trying to address this issue. The Wayback Machine, which was created by Internet Archive, is a database that seeks to archive, preserve, and provide accesss to digitized and born-digital materials. The Wayback Machine acts as an “Internet Library” by preserving vast amounts of web content to provide digital artifacts of our political and cultural lives. Web data is captured by automated crawlers and arranged into timelines to safeguard against modification and deletion. As you can see above, Internet Archive recently announced on Twitter that deleted datasets from the official White House website can be accessed through the Wayback Machine.

“Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive’s mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars.”¹

For the past 20 years, the Wayback Machine has recorded and preserved billions of web pages for the benefit of researchers, historians, students, and the general public. To mitigate web breakdowns and link rots, the Wayback Machine has also created a Chrome extension that detects dead web pages and provides users with the archived version. This provides a quick and simple solution for individuals who are attempting to access Internet content. As web data is impermanent, the Wayback Machine offers an important service that can be utilized by journalists, researchers, and digital humanists who are interested in collecting or tracking changes in public datasets. I am excited to search through this “Internet Library” to see what kind of content they have, and what kind of research I can do with it. If you are interested in learning more about the Wayback Machine, watch the video below.

From Frequently Asked Questions page under “The Wayback Machine.” Response to question: “Why is the Internet Archive collecting sites from the Internet? What makes the information useful?” ↩

“Internet History Is Fragile”: Archiving and Preserving The Web

Related

Leave a ReplyCancel reply

Share:

Related

Leave a ReplyCancel reply