Background:
Deception is the intentional falsification of truth. There are many shades of deception from outright lies to 'spin'. Content in social networking sites such as Craigslist, Facebook, blogs, emails, witness testimonies in a court of law, answers to job interview questions, personal essays in online dating services, insurance claim forms, etc. are candidates for deception. The Internet is evolving into a medium that is beyond just web searching. Text based applications such as emails, social networks, chat rooms, blogs, etc. are some applications that are already popular or gaining popularity. E-mail is one of the most commonly used communication media today. Clearly, this presents opportunities for deceptive or fraudulent activities. Deception is interpreted to be the manipulation of a message to cause a false impression or conclusion.
Summary:
An apparatus and method for determining whether a text is deceptive may comprise analyzing a body of textual content known to be one of text containing true content and text containing deceptive content; identifying psycho-linguistic cues that are indicative of a text being deceptive; statistically analyzing, via a computing device, a given text based upon the psycho-linguistic cues to determine if the text is deceptive. The apparatus and method may further comprise weighting the psycho-linguistic cues and statistically analyzing based on the weighted psycho- linguistic cues. The statistically analyzing step may be performed using one of a cue matching analysis, a weighted cue matching analysis, a Markov chain analysis, and a sequential probability ratio testing binary hypothesis analysis. The psycho-linguistic cues may be separated into categories, including increasing trend cues and decreasing trend cues and analyzed according to presence in a category from within the categories.
The disclosed subject matter overcomes the disadvantages and shortcomings of the prior art discussed above by providing a method for the detection of deception in electronic text, communication texts, such as, email texts, based on novel psycho-linguistic modeling and statistical analysis methods. More particularly, the present invention utilizes novel psycho-linguistic modeling which has identified 16 psycho-linguistic cues that are strong indicators of deception in email, and which employs four deception detectors. Experiments using real-life data show that the four deception detectors can achieve high rates of accuracy with low false alarm probabilities. A data compression-based model building and classification method to classify a target text document as deceptive or not is also described. The model building process is based on prediction by partial matching. Such as is discussed in Cleary, et al., “Data compression using adaptive coding and partial string matching.” IEEE Transactions on Communications, Vol. 32, pp. 396–402 (April 1984), the disclosure of which is incorporated by reference. The classification problem is the result of a cross entropy optimization problem.
Benefits:
- Psycho-linguistic modeling and statistical analysis for detecting deception.
Applications:
- Helps in reducing the rate on online-fraud
- Better helps in narrowing down a suspect in an online-fraud case.
Link to full patent: Psycho-Linguistic Statistical Deception Detection From Text Content
FY10-041
Information Assurance/Cyber Security
Rajarathnam Chandramouli
David Zimmerman Director of Technology Commercialization Stevens Institute of Technology dzimmer3@stevens.edu