A non-exhaustive list for open science & corpora resources is presented here. Drop a comment below if you know other resources not currently listed. Thanks in advance for your contribution! [last updated: April 2024]
▸OPEN SCIENCE INITIATIVES
BITSS: Berkeley Initiative for Transparency in the Social Sciences
RIOT: Reproducible, Interpretable, Open, Transparent Science Club
ReproducibiliTea: network of journal clubs (also check the reading lists!)
FORTT: Framework for Open and Reproducible Research Training
CREP: Collaborative Replications and Education Project
IRIS: digital repository of instruments and materials for research into second languages
OASIS: Open Accesible Summaries in Language Studies (subscribe to their monthly newsletter for the latest SSCI & AHCI publications in lang. learning & teaching, multilingualism)
▸OPEN SCIENCE GUIDING DOCUMENTS
The Open Handbook of Linguistic Data Management by MIT Press (2022)
The Turing Way: A handbook for reproducible, ethical, and collaborative research by Alan Turing Institute (2022)
BITSS MOOC: 5-week online open course based on grad course by T.Miguel at UC Berkeley
FORRT Glossary: definitions of terminology, related terms, references
Data Management Expert Guide for social sciences by CESSDA
Open and Reproducible Science Literature: community-curated summaries of OS literature
[TUR] Dilbilim Araştırmalarında Açık Bilim by Ataman, Çağlar, & Kırkıcı (2021)
[BONUS] List of articles w/ a critical view on Open Science by M. Rubin (2014-2022)
Pownall, M., Azevedo, F., Aldoh, A., Elsherif, M., Vasilev, M., Pennington, C. R., Robertson, O., Tromp, M. V., Liu, M., Makel, M. C., Tonge, N., Moreau, D., Horry, R., Shaw, J., Tzavella, L., McGarrigle, R., Talbot, C., Parsons, S., & FORRT. (2021). Embedding open and reproducible science into teaching: A bank of lesson plans and resources. Scholarship of Teaching and Learning in Psychology. Advance online publication. https://doi.org/10.1037/stl0000307
▸CORPUS TOOLS
List of corpus tools collection of links for tools of corpus compilation and analysis
Lancaster Stats Tools online tool box for calculating freq, dispersion, collocations, keywords, correlations
Association Measure calculator Phil Durrant’s spreadsheet for AM for collocations
LL and effet size calculator P. Rayson’s online calculator and customisable spreadsheet for keyness analysis
[BONUS] Corpus Finder for info abt available English lang. corpora, curated by Varieng @ Helsinki Uni.
▸CORPORA DATABASES
Databases below provide either open access or grant researcher access upon request.
CLARIN Resource Families: corpora of various languages clustered within 13 corpora types available in the CLARIN infrastructure
English-Corpora Database: free online interface for English corpora created by Prof. M. Davies
CQPWeb: Corpora database covering present-day English corpora, Historical English corpora, learner English corpora, corpora of European, South Asian and East Asian languages, maintained by Prof. A. Hardie
The BNC2014: access granted through CQPWeb
Learner corpora around the world
International Corpus of Learner English (ICLE): web based trial version provides concordancing with metadata filtering options for limited number of texts.
HKBU Corpus of Political Speeches: web-based concordance of Chinese and English texts in political discourse
TED Corpus Search Engine: POS-tagged corpus of English TED Talk data
The Turkish National Corpus (TNC): 50 million words, balanced and a representative corpus of contemporary Turkish overing a period of 24 years (1990-2013), 98% is written data + 2% is spoken data.
Also:
Corpus construction & query tool #LancsBox provides preloaded corpora including BROWN, LOB, The BNC2014 and Sketch Engine provides 30-day free tier access to a range ot other corpora.