FAQ

WAYBACK MACHINE

Learn more about the Luxembourg Web Archive’s search engine and playback tool.
Although our web archive is only accessible from within the National Library, you can check out the “Help and FAQ” section right here:

Wayback Machine – Help (EN)Wayback Machine – Aide (FR)Wayback Machine FAQ
Where can I consult the web archive and why is it not accessible online?

Archiving a website essentially means downloading all of its contents and layout and saving a copy. Making our web archive available online would mean to infringe on the copyrights of every site in the archive with every archival copy. Therefore the access to consulting the Luxembourg Web Archive is only possible from within the National library of Luxembourg (terminals in the reading rooms). However, since we collaborate with the Internet Archive, the world’s largest web archiving initiative, it is possible that archived versions of Luxembourgish sites are available in the Internet Archive’s Wayback Machine.

How to get to the BnL
Why do you archive the Internet?

There are two main reasons for archiving the Internet:
– To prevent the loss of information from the Luxembourg web and maintain access to its contents.
– Enrich the patrimonial collections of the National Library of Luxembourg.
In order to learn more about the urgency and benefits of webarchiving, continue here:

What we do
How do you select new websites?

The BnL conducts two types of web harvests:
– Biannual, large scale crawls of all “.lu” domains.
– Targeted, event- or topic based crawls.

Both methods bear certain advantages and disadvantages:
– The domain crawls cover the entirety of “.lu” addresses, but are rather tardy in capturing sites that are changing at a rapid pace, or have disappeared between two harvests.
– The targeted crawls try to harvest as much information as possible about a certain event or topic within a selective number of addresses over a limited time frame. This approach is more time-consuming in determining the seed list and setting the depth and frequency of harvests.

Both methods will inevitably fall short in archiving the entirety of information that makes up the Luxembourgish web, because there are always technical or organizational limitations to how much can be captured and processed. Therefore a combination of both methods allows for a more adaptive archiving strategy.

How do I know if my website has been archived?

The website owner will be able to see in the access logs requests with the user agent “NLUX_IAHarvester” and a URL will be given in the access log where the website owner can get more information.
We are happy to provide any information on our collection policy and technical details of the archiving process. Of course you are always welcome to consult the web archive in the National Library.

Contact
Can I send in suggestions?

We need the support of website owners and anyone interested in keeping the web alive and available for future generations. Since the BnL is also committed to providing a valuable service by safeguarding copies of pertinent websites, we are hoping for everyone who owns or creates a new website in Luxembourg, to nominate their site to the BnL web archive.
Generally speaking, we try to be as inclusive as possible in adding new seeds and expanding the web archive. However for technical and organizational reasons we have to set a collection policy directed by certain priorities, fields of interest and in some cases also exclusion criteria.

Contact
What if I don’t want my website to be downloaded?

It is important to note that there is a difference between archiving a website and making an archival copy accessible in the web archive. As stated in the Règlement grand-ducal of November 6th 2009 concerning the legal deposit, the BnL has the duty of archiving and preserving all publications from, and in relation with Luxembourg. For instance we keep copies of websites in the archive that have been deemed illegal and had to be taken down from the web. These sites might be a valuable source of information in historical, sociological, or any other field of scientific research, but will not be made accessible for the general public in the web archive.

Our mission
Are you archiving private information about me?

Our web crawler is only examining openly visible parts of the Internet and we generally respect Robots.txt settings. Therefore the objective of the web archive is not to “pirate” or “hack” sites or to compromise private information, but rather to generate a copy of already freely available information. In certain cases, within a highly selective approach, the BnL will also archive social media pages from public figures, important events or pertinent discussions on topics linked to event-based crawls. Again, our web crawler is not able to move past the privacy settings on social media and we will solely archive contents that were meant for public display in the first place.
We invite you to learn more about the exclusion criteria for the Luxembourg Web Archive:

Restrictions
For how long will you keep copies of archived websites?

In accordance with the patrimonial objectives of the BnL, the Luxembourg Web Archive is set up to ensure the long term preservation and access to its contents. There is no expiration date or cut off to the National library’s archives and therefore the legal deposit for any online publications is also supposed to be a valuable service for site owners. Over time, with regular crawls of different scales, the Web Archive will continue to grow and allow the user to browse through all of the captured versions in the Wayback Machine’s timeline.

Important terms and expressions

The Internet

is an electronic communications network that connects computers around the world.

merriam-webster
The web

is the part of the Internet that can be accessed by a browser. For instance, email and apps are also part of the Internet, but not the web.

merriam-webster
Website

is a form of online publication, the ensemble of several pages linked together and browsable on the Internet.

merriam-webster
Web archiving

is the practice of downloading and archiving parts of the web in order to preserve its contents and ensure long term access to information.

Domain

is a subdivision of the Internet denoted in an address with a unique abbreviation (such as .lu, or .com).

merriam-webster
URL

the address of a resource (such as a document or website) on the Internet.

merriam-webster
Seed

is a URL-address, used as a starting point for web crawls. One seed can lead to a number of different pages, so the more seeds are “sown”, the more extensive the results of a web harvest will be.

Seed list

comprises all the seeds that were used to build a collection. This list will give you an idea which websites can be found in the collection, however it doesn’t necessarily mean that every page of every website was captured.

Harvest

describes the process of crawling and downloading parts of the Internet, often used as a synonym for web crawl in the context of web archiving.

Web crawler

also called spider, scans every element of a website, following every link and tracing every component on every page. Crawlers are also used for web-indexing by search engines, allowing for faster and more efficient search results by frequent crawls.

Collection policy

is the description of standards and procedures followed while building a collection. A detailed policy helps in understanding the contents and limitations of a collection and informs the user about the web archive’s operating principles.

Domain crawls, thematic and event collections

Domain crawls capture a snapshot of a large number of seeds, in our case all .lu domains, which we capture twice a year.
 Thematic or event collections aim at a specific topic or event, potentially with a higher frequency of captures of a smaller number of seeds.

Missing something?

What terms and expressions do you think are missing from this page?
Help us in expanding the Luxembourg Web Archive dictionary by sending in your questions and suggestions.
Also remember that we are looking for contributions of new and noteworthy websites to be included in the archive. Simply contact us, or use the submission form under “participate and contribute” below: