Luxembourg News Media – webarchive.lu

Aim

This collection aims to capture a maximum of information from Luxembourg news media websites.
The National Library of Luxembourg has been archiving the websites from this collection since June 1st 2020. The collection includes the websites of newspapers, online news outlets, magazines, radio and TV stations.

Seed list

Seeds serve as a starting point for web crawls. One seed can lead to a number of different pages. The more seeds are used, the more extensive the results of a web harvest will be. Not all seeds in this list are active at the moment. Websites might change, news outlets might close and we constantly have to adapt to technical changes.

Download seed list

Coverage

The social media channels are not included in this collection and are currently not being archived on a regular basis. Due to the long list of websites to cover, the large amount of data and updates day after day, it is in some cases impossible to archive every site, every day. We try to find the optimal balance between technical possibilities and completeness of coverage.

Foreign websites

Since the topic of this collection is news media from Luxembourg, no foreign websites are being included.

What we captured

There are currently over 70 websites being harvested on a daily / weekly / monthly or quarterly basis.
There are still a lot of adjustments to do and the media landscape can change every day.
This will be an ongoing thematic collection and we will keep you posted with more detailed information, after completing the initial pilot project.

About the collection

Our domain crawls form the basis of the web archive. A large number of websites, harvested all at once, creating a “snapshot” of the Luxembourg web at a given moment. However, these crawls take around one month to complete, and we are only able to operate 2 domain crawls per year.

Naturally there are a lot of areas on the web, where we miss out on changes in between domain crawls. In order to complete the picture formed by the large scale crawls, we are also implementing thematic collections: concentrating on types of websites and topics which warrant more attention and more frequent captures.

The Internet is all about the latest buzz. Topics and events that occupy the flow of new information and mark a specific moment in Internet history. These events are captured in event collections, adding to the domain crawls and thematic collections. With different methods and different collections, all captures of all websites are integrated into the same web archive.

News media play an important role in all event collections. However, the scope is in this case always time-bound and the coverage of news media is limited to the duration of the project. This collection plans to offer an ongoing coverage of Luxembourg news media, with an evolving seed list and an adaptive harvesting strategy.