Text Classification on Basis of “False / True” using Methods of Automatic Text Processing

Litvinova Tatyana Aleksandrovna
Voronezh State Pedagogical University
Seredin Pavel Vladimirovich
Voronezh State Pedagogical University
Litvinova Olga Aleksandrovna
National Research Center «Kurchatov Institute»
Lyell John Robert
National Research University Higher School of Economics

Журнал: Научный диалог

Номер: 10(58)    Год: 2016    Страницы: 70-83

DOI:    

Ключевые слова

корпус текстов  распознавание лжи в речи  компьютерная лингвистика  корпусы текстов  LIWC  text corpus  lie recognition in speech  computational linguistics  corpus of texts  LIWC  


The work is devoted to the problem of classification of texts in Russian language for “false / true” parameter. It is noted that people recognize the lie in speech at the level of random variables, therefore, the tools that help people to recognize intentionally distorted information are needed. It is indicated that the problem of recognition the lie in speech is studied for a long time, but in the last 10-15 years the methods of corpus and computational linguistics have been used to solve it. It is emphasized that most of the similar works are made in the English language, while Russian language is on the periphery of such studies. The authors built special corpus of false and true narratives on the theme “How did I spend yesterday?” from each respondent, N = 173. The texts were processed using the Russian version of the LIWC program with users’ dictionaries. The results of method development are presented based on the use of variation coefficient and the analysis of the distribution of values of the parameters of the text. The proposed method allowed to classify the text as false or true with an accuracy of 68.3 %. It is shown that the model with different accuracy classifies the texts by men and women, which means that in the future it is necessary to build such models based on the characteristics of the authors of texts, including gender. Исследование выполнено при поддержке гранта РГНФ № 15-34-01221 «Детекция лжи в письменном тексте: корпусное исследование».


