View on GitHub

openlexicon

Access to lexical databases

SUBLEX-us

SubtlexUS is database containing word frequencies based on English and American movies and TV series subtitles (51 million words in total). Two measures are provided:

  1. The frequency per million words, called SUBTLEXWF (Subtitle frequency: word form frequency)
  2. The percentage of films in which a word occurs, called SUBTLEXCD (Subtitle frequency: contextual diversity; see Adelman, Brown, & Quesada (2006) for the qualities of this measure).

Table: SUBTLEXus74286wordstextversion.tsv

Website: https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus

Authors: Boris New and Marc Brysbaert

Publication:

Brysbaert, Marc, and Boris New. 2009. “Moving beyond Kučera and Francis: A Critical Evaluation of Current Word Frequency Norms and the Introduction of a New and Improved Word Frequency Measure for American English.” Behavior Research Methods 41 (4): 977–990. (pdf)

Brysbaert, Marc, Boris New, and Emmanuel Keuleers (2012). “Adding Part-of-Speech Information to the SUBTLEX-US Word Frequencies.” Behavior Research Methods 44 (4): 991–997. (pdf)

LICENSE: CC-BY-SA

Online access Openlexicon