This article considers the process of obtaining text data and the methodology of creating text corpora as well as the selection and the definition of individual lexical units in order to create a lexicon of crime vocabulary in Polish. The language material was developed and used in order to create an IT system supporting Polish uniformed services in searching for crimes committed or planned on the Internet. The crime categories considered were the following: smuggling and trafficking of drugs, cigarettes, alcohol, vehicles and machinery, weapons and explosives, trafficking in human goods and organs, trafficking and falsification of documents, sexual crimes and paedophilia. As a result of the work, a collection of over three thousand words and phrases was created. Additionally, a linguistic dataset of 3337 full texts from online sources was collected. The lexicon has been adapted to the requirements of computer processing for the needs of three system modules: Definition, Context, and Translator. The linguistic material was collected from various types of anonymous forums, advertising sites online, where there is no content control, moderation and administration. The linguistic material has been tested and implemented in the AISearcher Border Guard System.
<< < 3 4 5 6 7 8 9 10 11 12 > >>
You may also start an advanced similarity search for this article.