Datasets

Development

Canadian
There are large bodies of content on the web related to Canadian culture and history: smaller sets of materials created by researchers, and large digitized collections held by memory institutions like Canadiana and Library and Archives Canada. We need better ways of accessing this content.

Cultural
There are also millions of books and periodicals, and other content that has been digitized by groups such as the Internet Archive, The Hathi Trust, and Project Gutenberg, as well as much native web content relevant to cultural research. LINCS will provide new ways of discovering and using these kinds of materials.

Linked
LINCS builds on much existing work that has been done towards an open, semantically structured web, from W3C standards and established ontologies to major community projects such as DBpedia and Wikidata. We aim to strengthen the Linked Open Data ecology through high quality open content and open-source tools.

Research
Datasets carefully curated by Canadian researchers are at the core of LINCS, which will mobilize this material and interlink it to other related content. The Source Datasets are rich and diverse, as are the Research Themes that will be developed by linking them.

These source datasets, with their various formats and origins, will work through a range of conversion processes in LINCS.

LINCS will incorporate datasets from a diverse array of researchers, institutions, and areas of focus. These datasets will include Canadian, Cultural, and Research data with ~80TB of media and millions of objects and texts linked to billions of semantic web assertions or triples.

A cross-section of the datasets included in LINCS

Download

The three types of data conversion undertaken by LINCS, ranging from most human-labour intensive (human-expert validation) to most computer driven (automated NLP conversion). LINCS source data will include researcher datasets from sources such as GitHub, Scholars Portal, and individual institutional repositories; linked platforms such as the Canadian Writing Research Collaboratory; partner datastores from partners such as Library and Archives Canada and Canadiana; and protected data sources from places such as the HathiTrust and the Internet Archive. All of the source data will be stored in VM containers on Compute Canada. These will act as data capsules for protected data.

LINCS source data will include researcher datasets, linked platforms, partner datastores, and protected data sources — all of which will be stored in data capsules for protected data

Download