View on GitHub


Access to lexical databases


SubtlexUS is database containing word frequencies based on English and American movies and TV series subtitles (51 million words in total). Two measures are provided:

  1. The frequency per million words, called SUBTLEXWF (Subtitle frequency: word form frequency)
  2. The percentage of films in which a word occurs, called SUBTLEXCD (Subtitle frequency: contextual diversity; see Adelman, Brown, & Quesada (2006) for the qualities of this measure).

Table: SUBTLEXus74286wordstextversion.tsv


Authors: Boris New and Marc Brysbaert


Brysbaert, Marc, and Boris New. 2009. “Moving beyond Kučera and Francis: A Critical Evaluation of Current Word Frequency Norms and the Introduction of a New and Improved Word Frequency Measure for American English.” Behavior Research Methods 41 (4): 977–990. (pdf)

Brysbaert, Marc, Boris New, and Emmanuel Keuleers (2012). “Adding Part-of-Speech Information to the SUBTLEX-US Word Frequencies.” Behavior Research Methods 44 (4): 991–997. (pdf)


Online access Openlexicon