Computer scientists at Queen's University in Kingston have developed a new approach to tracking e-mails that could help nab terrorists and detect white-collar crime.
Realizing that people leave a "signature" when using carefully chosen words while being deceptive, Prof. David Skillicorn and student Parambir Keila have developed a surveillance program that detects deceptive word patterns in suspicious e-mails.
"The bottomline is that people who are trying to do something that's unusual for them have a very hard time doing it naturally," Skillicorn told Business Edge.
Unlike traditional intelligence surveillance software, which searches for words in e-mails that it draws from a library of key words, Skillicorn's technology looks at the patterns in which words are written in a large set of messages.
![]() |
| Michael Lea, Business Edge |
| Queen's University computer scientist David Skillicorn has found a myriad of uses for program. |
People who are trying to hide something leave unique signatures because they write differently from those who aren't trying to hide anything, said Skillicorn.
"If you're an Al Qaida terrorist, you can be pretty sure that you shouldn't talk about bombs and dynamite and nuclear and things like that ... so you're constantly trying to do a substitution of words in your mind you don't want to say with words you think are OK," he said.
"The problem is that when you're choosing those words, especially choosing them on the fly, you're going to choose words that are quite different from the words you're trying to replace. And, in particular, their natural frequency in English, say, is going to be very different from the natural frequency of the word you were going to use," he said.
"That's actually how our programs start to look for those sorts of things."
Although Skillicorn began developing the technology in the aftermath of Sept. 11, 2001, with the tracking of terrorist e-mails in mind, it has a number of spinoff applications.
It could detect employee fraud or a group of employees who are running a child-pornography ring inside a company.
By using Skillicorn's method of analysing word patterns in e-mails, an organization could use the technology to reveal the key players in a group who are communicating over the Internet without having to read a single message.
Skillicorn and Keila had the chance to test their approach about a year ago, when federal energy regulators in the U.S. posted to the web 1.5 million e-mails from employees of disgraced U.S. energy conglomerate Enron.
"Until this was available, nobody knew what e-mail in general looked like," Skillicorn said.
He and Keila were able to track changes in the correspondence because the Enron e-mails had been sent before, during and after the accounting crisis.
As the crisis escalated, the researchers discovered the e-mails contained simpler sentences, fewer personal pronouns and more negatives as the writers tried to distance themselves from what they were saying.
"When people are lying they try to distance themselves from what they're saying. They tell a story that's a little bit simpler than a normal, off-the-cuff story would be and they do tend to reflect some of their self-negativity in the language they use," Skillicorn said. "So there's a kind of signature for deceptiveness which you can also apply in the context of word frequency patterns."
The fact that the content of the e-mails doesn't have to be read is significant, because an intelligence agency or other organization deals with millions of e-mails.
This new approach, which Skillicorn admits could already be in use by the intelligence community unbeknownst to the public, ranks the e-mails into groups so that only the most interesting one per cent would have to be read by human eyes.
"If we can find a model for what we're looking for - i.e. deception - or for normality, we can rank the set of messages from most to least interesting," he said.
"Some part of the top of the list can then be picked out for further analysis."
University researchers aren't the only ones developing techniques for interpreting the contents of e-mail.
Suhayya Abu-Hakima, vice-president of content technology at Ottawa-based digital security firm Entrust, said her company began selling its own content-analysis solution to commercial clients in June 2004.
Abu-Hakima estimated that up to 25 other companies are working on their own content analysis solutions worldwide, but that Entrust is probably one of the largest.
"Essentially, we see a myriad of applications for content-analysis technology," she said.
Entrust is using its own version as a server that checks outbound and inbound e-mail to make sure that the private information of its clients' customers, such as health-care information and credit card numbers, doesn't inadvertently get out.
There's demand for the product.
Credit card companies levy fines against organizations that irresponsibly allow credit card numbers to slip out. In some cases, it can cost up to $100,000 for such a mistake.
Such errors add up, Abu-Hakima pointed out. "If I were to accidentally let out 10, well that's $1 million," she said.
The technology can also be used for corporate compliance to monitor for harassment or for abusive or offensive behaviour among staff or with customers in e-mails.
According to IDC, a global market and intelligence firm, up to 31 per cent of organizations have fired people because of something that happened via e-mail.
"The e-mail could have been to do with harassment, letting out information that was sensitive to customers or any sort of misbehaviour, if you will, over e-mail," Abu-Hakima said, adding that 60 per cent of organizations are now scanning outbound e-mail.
If Entrust's technology catches an e-mail it doesn't think should go out, it bounces it back to the sender and tells them to reconsider sending the message.
"We can actually give them what I call the sober second thought," she said.
Entrust has sold the technology to banks, telecommunications and insurance companies, and government departments that have to adhere to health-care privacy constraints.
"Traditionally, the companies who have bought Entrust technologies in the past have been those that want to secure their information and they want to make sure that it's protected ... so it's very much the same kind of companies," she said.
Some organizations are still using the primitive word-based method of e-mail scanning while others use higher-level methods such as linguistic analysis, which examines words and phrases.
Entrust uses statistical and linguistic analysis in its platform.
"The statistical approach allows you to be very fast and the linguistic approach allows you to be very accurate," she said.
Statistical analysis involves methods such as the counting of words, the frequency that a word is used, probabilities around a word, word placement and others.
"I think it's definitely related to what he [Skillicorn] is doing, but the underlying technology is probably different," she said. "We're potentially attacking the same types of problems, but from different angles."
(Frank Armstrong can be reached at armstrong@businessedge.ca)







