This article presents a lexical analysis of utterances produced by Polish speakers from two age groups: 18–25 and 65–80 years. The aim of the study is to identify and compare words characteristic of the two generations using corpus-based methods. The research material consists of a corpus of contemporary spoken Polish containing over 2.5 million tokens and including texts from several sources: the Spokes conversational corpus, subtitles from YouTube videos, biographical narratives from the Oral History Archive, and parliamentary speeches. A total of 25 texts were selected for each age group. Characteristic lexical items were identified using the TF-IDF (Term Frequency–Inverse Document Frequency) measure calculated in the R environment with the tidytext and dplyr packages. The results reveal clear lexical differences between the two groups. In the speech of younger speakers, the most prominent items include colloquial expressions, filled pauses, phatic markers and functional vulgarisms. In contrast, the speech of older speakers is characterised by words referring to the past, biographical experiences and family relations. The findings suggest that lexical differences between the groups are influenced not only by speakers’ age but also by the communicative contexts represented in the corpus.
You may also start an advanced similarity search for this article.