OPEN SCIENCE

A non-exhaustive list for open science & corpora resources is presented here. Drop a comment below if you know other resources not currently listed. Thanks in advance for your contribution! [last updated: April 2024]


OPEN SCIENCE INITIATIVES

BITSS: Berkeley Initiative for Transparency in the Social Sciences

RIOT: Reproducible, Interpretable, Open, Transparent Science Club

ReproducibiliTea: network of journal clubs (also check the reading lists!)

FORTT: Framework for Open and Reproducible Research Training

CREP: Collaborative Replications and Education Project

IRIS: digital repository of instruments and materials for research into second languages

OASIS: Open Accesible Summaries in Language Studies (subscribe to their monthly newsletter for the latest SSCI & AHCI publications in lang. learning & teaching, multilingualism)

OPEN SCIENCE GUIDING DOCUMENTS

The Open Handbook of Linguistic Data Management by MIT Press (2022)

The Turing Way: A handbook for reproducible, ethical, and collaborative research by Alan Turing Institute (2022)

BITSS MOOC: 5-week online open course based on grad course by T.Miguel at UC Berkeley

FORRT Glossary: definitions of terminology, related terms, references

Data Management Expert Guide for social sciences by CESSDA

Open and Reproducible Science Literature: community-curated summaries of OS literature

[TUR] Dilbilim Araştırmalarında Açık Bilim by Ataman, Çağlar, & Kırkıcı (2021)

[BONUS] List of articles w/ a critical view on Open Science by M. Rubin (2014-2022)

Pownall, M., Azevedo, F., Aldoh, A., Elsherif, M., Vasilev, M., Pennington, C. R., Robertson, O., Tromp, M. V., Liu, M., Makel, M. C., Tonge, N., Moreau, D., Horry, R., Shaw, J., Tzavella, L., McGarrigle, R., Talbot, C., Parsons, S., & FORRT. (2021). Embedding open and reproducible science into teaching: A bank of lesson plans and resources. Scholarship of Teaching and Learning in Psychology. Advance online publication. https://doi.org/10.1037/stl0000307

CORPUS TOOLS

List of corpus tools collection of links for tools of corpus compilation and analysis

Lancaster Stats Tools online tool box for calculating freq, dispersion, collocations, keywords, correlations

Association Measure calculator Phil Durrant’s spreadsheet for AM for collocations

LL and effet size calculator P. Rayson’s online calculator and customisable spreadsheet for keyness analysis

[BONUS] Corpus Finder for info abt available English lang. corpora, curated by Varieng @ Helsinki Uni.

CORPORA DATABASES

Databases below provide either open access or grant researcher access upon request.

CLARIN Resource Families: corpora of various languages clustered within 13 corpora types available in the CLARIN infrastructure

English-Corpora Database: free online interface for English corpora created by Prof. M. Davies

CQPWeb: Corpora database covering present-day English corpora, Historical English corpora, learner English corpora, corpora of European, South Asian and East Asian languages, maintained by Prof. A. Hardie

The BNC2014: access granted through CQPWeb

Learner corpora around the world

International Corpus of Learner English (ICLE): web based trial version provides concordancing with metadata filtering options for limited number of texts.

HKBU Corpus of Political Speeches: web-based concordance of Chinese and English texts in political discourse

TED Corpus Search Engine: POS-tagged corpus of English TED Talk data

The Turkish National Corpus (TNC): 50 million words, balanced and a representative corpus of contemporary Turkish overing a period of 24 years (1990-2013), 98% is written data + 2% is spoken data.

Also:

Corpus construction & query tool #LancsBox provides preloaded corpora including BROWN, LOB, The BNC2014 and Sketch Engine provides 30-day free tier access to a range ot other corpora.