Text Classification on Basis of “False / True” using Methods of Automatic Text Processing

Litvinova Tatyana Aleksandrovna
Voronezh State Pedagogical University
Seredin Pavel Vladimirovich
Voronezh State Pedagogical University
Litvinova Olga Aleksandrovna
National Research Center «Kurchatov Institute»
Lyell John Robert
National Research University Higher School of Economics

Журнал: Научный диалог

Номер: 10(58)    Год: 2016    Страницы: 70-83

DOI:     УДК:    

Ключевые слова

корпус текстов  распознавание лжи в речи  компьютерная лингвистика  корпусы текстов  LIWC  text corpus  lie recognition in speech  computational linguistics  corpus of texts  LIWC  


The work is devoted to the problem of classification of texts in Russian language for “false / true” parameter. It is noted that people recognize the lie in speech at the level of random variables, therefore, the tools that help people to recognize intentionally distorted information are needed. It is indicated that the problem of recognition the lie in speech is studied for a long time, but in the last 10-15 years the methods of corpus and computational linguistics have been used to solve it. It is emphasized that most of the similar works are made in the English language, while Russian language is on the periphery of such studies. The authors built special corpus of false and true narratives on the theme “How did I spend yesterday?” from each respondent, N = 173. The texts were processed using the Russian version of the LIWC program with users’ dictionaries. The results of method development are presented based on the use of variation coefficient and the analysis of the distribution of values of the parameters of the text. The proposed method allowed to classify the text as false or true with an accuracy of 68.3 %. It is shown that the model with different accuracy classifies the texts by men and women, which means that in the future it is necessary to build such models based on the characteristics of the authors of texts, including gender. Исследование выполнено при поддержке гранта РГНФ № 15-34-01221 «Детекция лжи в письменном тексте: корпусное исследование».


Левицкий В. В. Квантитативные методы в лингвистике / В. В. Левицкий. -Винница : Нова книга, 2007. - 264 с.  
Литвинова Т. А. Исследование лингвистических характеристик текстов, содержащих намеренно искаженную информацию, с помощью программы Linguistic Inquiry and Word Count / Т. А. Литвинова, О. А. Литвинова // Вестник МГОУ. Серия, Лингвистика. - 2015. - № 4. - С. 71-77.  
Литвинова Т. А. К проблеме стабильности характеристик идиостиля / Т. А. Литвинова // Известия Южного федерального университета. Филологические науки. - 2015. - № 3. - С. 98-106.  
Almela Á. Seeing through Deception : A Computational Approach to Deceit Detection in Written Communication [Electronic resource] / Á. Almela, V.-G. Rafael, C. Pascual // LESLI. - 2013. - N 1 (1). - Access mode : http://www.lesli-journal.org/ojs/index.php/lesli/article/view/5/5.  
Bond Ch. F. Jr. Accuracy of Deception Judgments / Ch. F. Jr. Bond, B. M. DePaulo // Pers Soc Psychol Rev. - 2006. - Vol. 10, N 3. - P. 214-234.  
Burgoon J. K. Interpersonal deception : III effects of deceit on perceived communication and non-verbal behavior dynamics / J. K. Burgoon, D. B. Buller // Journal of Nonverbal Behavior. - 1994. - Vol. 18 (2). - P. 155-184.  
Fitzpatrick E. Building a data collection for deception research / E. Fitzpatrick, J. Bachenko // E. Fitzpatrick, J. Bachenko, T. Fornaciari (eds). Proc. of the EACL Workshop on Computational Approaches to Deception Detection. - 2012. - P. 31-38.  
Levitan S. Identifying Individual Differences in Gender, Ethnicity, and Personality from Dialogue for Deception Detection / S. Levitan [et al.] // NAACL Workshop on Computational Approaches to Deception Detection. - San Diego, 2016.  
Litvinova T. “RusPersonality” : A Russian corpus for authorship profiling and deception detection / T. Litvinova [et al.] // Proceedings of International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT). - Sankt-Petersburg, 2016a. - P. 1-7.  
Litvinova T. Predicting the gender of an author of a russian text using regression and classification techniques [Electronic resource] / T. Litvinova [et al.] // J. Baixeries, D. I. Ignatov, D. Ilvovsky, A. Panchenko. (eds.). Proceedings of the Third Workshop on Concept Discovery in Unstructured Data. - Moscow, 2016b. - P. 44-53. - Access mode : http://ceur-ws.org/Vol-1625/.  
Litvinova T. Russian Deception Bank : A Corpus for Automated Deception Detection in Text / T. Litvinova, O. Litvinova // A. Horák, K. Pala, P. Rychlý, A. Rambousek (eds.). Community-based Building of Language Resources (CBBLR 2016). - Brno, 2016c. - P. 1-7.  
Mihalcea R. The Lie Detector : Explorations in the Automatic Recognition of Deceptive Language / R. Mihalcea, C. Strapparava // Proceedings of the Association for Computational Linguistics (ACL-IJCNLP 2009). - Singapore, 2009.  
Newman M. L. Lying Words : Predicting Deception From Linguistic Styles / M. L. Newman [et al.] // Personality and Social Psychology Bulletin. - 2003. -Vol. 29 (5). - P. 665-675.  
Pennebaker J. W. The development and psychometric properties of LIWC2007 / J. W. Pennebaker [et al.]. - Austin, TX : LIWC.net, 2007.  
Pérez-Rosas V. Gender differences in deceivers writing style / V. Pérez-Rosas, R. Mihalcea // Lecture Notes in Computer Science. - 2014. - Vol. 8856. - P. 163-174.  
Pisarevskaya D. Rhetorical Structure Theory as a Feature for Deception Detection in News Reports in the Russian Language : paper presented at the Artificial Intelligence and Natural Language & Information Extraction, Social Media and Web Search (AINL-ISMW) [Electronic resource] / D. Pisarevskaya. - 2015. - Access mode : https://www.fruct.org/publications/ainl-abstract/files/Pis.pdf.  
Vrij A. Detecting lies and deceit : Pitfalls and opportunities / A. Vrij. -Chischester : John Wiley and Sons, 2010.  

Полный текст статьи