October 04, 2007
Technology and Counterterrorism: Machines are Still Stupid
My job demands a lot of research and analysis. Most of the time, I can't keep up with the sheer volume of incoming information and have to cherrypick items that seem significant. This is a common problem among analysts, particularly ones tasked with monitoring and evaluating trends (e.g. financial markets, politics). Data mining has been used extensively in a variety of sectors to parse information so that analysts can deal with it more effectively. Of course, figuring out what to mine is key because one can easily succumb to the GIGO problem without realizing.
Recently, I came across a press release about Arizona University's Dark Web Portal, a resource for counterterrorism analysts.
Funded by the National Science Foundation and other federal agencies, Hsinchun Chen and his Artificial Intelligence Lab at the University of Arizona have created the Dark Web project, which aims to systematically collect and analyze all terrorist-generated content on the Web.
Ha. The hyperbole begins.
Dark Web also uses complex tracking software called Web spiders to search discussion threads and other content to find the corners of the Internet where terrorist activities are taking place. But according to Chen [the project lead], sometimes the terrorists fight back.
"They can put booby-traps in their Web forums," Chen explains, "and the spider can bring back viruses to our machines." This online cat-and-mouse game means Dark Web must be constantly vigilant against these and other counter-measures deployed by the terrorists.
My first thought (after laughing at the "cat-and-mouse" bit) was that the crawlers were getting confused by dynamic elements or downloading infected files linked to from those sites (this is a common problem on sites where pirated material is shared). In fact they admitted this in one of their papers:
We set up our spider program to not only download textual messages, but also download multimedia documents, archive documents (e.g., ZIP files, etc.), and other non-standard files...
Some forums may contain hyperlinks that trap a spider program in a loop (e.g. calendars, forum internal search engines, etc.). If the spidering process does not finish in a reasonable time, we need to examine the spidering log and exclude the vicious links from future spidering process.
Forum calendars as "counter-measures deployed by terrorists". Way to sex up these dry studies, NSF.
In any case, I read a bunch of published papers on the Dark Web Portal (really they should have called it the Evil Web Portal for maximum effect) to try and figure out what these scientists were actually doing. The details are probably very boring for non-technical people, but essentially they are trying to build an authoritative "collection" of sites run by terrorist orgs. This is not a bad idea in principle, because if you have a good list of sites and get a bunch of computers to crunch them, a lot of useful things might turn up. However, per the GIGO rule, the value of the data is dependent on the value of the collection.
My first concern was the manner in which they created a "seed" list to build their collection. Terrorist groups were identified using a variety of sources, including the State Department, the UK and Australian governments. This graph describes the number of groups supplied by each source:
Terrorist groups identified by the State Department are marked (US). However, the majority of groups were supplied by the United States Committee for a Free Lebanon (USCFAFL). Who are these guys and why are they considered "reliable"? That and the fact that MEMRI kept being cited as a source for media and jihad-website monitoring made me a bit skeptical about the collection's utility. Perhaps they just got the colors wrong in Excel.
Still, there are some potentially interesting apps being developed, such as Writeprint, which can analyze writing style to determine if different forum postings are written by the same author. Of course, it only works with extremely small samples and can't personally identify anyone unless they happen to be posting under a real name. Even a 95% success rate is a lot of false positives if your sample size is over 100,000. Recall that 7-year-old boy who was stopped three times in US airports because of an unfortunate name match. Customs officials believed a computer even as they stood face to face with a grade-school kid on his way to Disneyworld (see also: Garbage In, Gospel Out).
Regardless, the point I'm trying to make here is that machines are stupid. They are incapable of making simple distinctions that even a human toddler could make.They can chew through mountains of data and give you a specific (and possibly dumb) answer to a query, but they can't tell the difference between a guy who posts on a jihadi forum because he's angry about not getting laid and a guy who's actually planning to blow up a train station.
If I were a terrorist, would I recruit people or talk about my plans in a public forum or Yahoo group? Or would I lurk on discussion sites, watching for certain profiles and then maybe use a PM invite people to some innocuous event (like a university iftar) where I can assess them further? Machines are not a replacement for good old-fashioned HUMINT.
Posted by eerie at October 4, 2007 07:19 AM
Filed Under: Terrorism
TrackBack URL for this entry:
Glad to see we have a new author, perhaps she can introduce herself.
Posted by: The Lounsbury at October 4, 2007 02:06 PM
Ha. I missed you too.
Posted by: eerie at October 4, 2007 02:08 PM
Great stuff from the new gal! I wonder if she'll write a review of A.H. Ali's Infidel.
Posted by: matthew hogan at October 4, 2007 02:54 PM
Well, I wouldn't want to frighten away new authors, you know we once had a Site Owner that was frightened away by that task.
However, as to substance: freelebanon seems to be quite the Loony Right Bolshy plus Right Maronite Fascist take on Lebanon: this list of Syrian collaborators with its editorial additions is amusing.
Anyone taking this seriously as a source... well, another failed American solution obsessed with technology and process, and empty of understanding.
Posted by: The Lounsbury at October 4, 2007 03:04 PM
Oh oh, the founder of USCFAFL also co-chaired a study with none other than Daniel Pipes.
Posted by: eerie at October 4, 2007 03:13 PM
I don't know about machines being stupid...
case in point: http://www.aqoul.com/insult.html
You know, I really think we need to start thinking beyond the dark portal, to Draenor where the Orcs live.
In other news, perhaps the terrists can hack ALL of our internets. Or webs.
Posted by: Ilan at October 4, 2007 05:17 PM
Heh, and the New York Times and CNN are listed as "pro-Arab media". Also, I love how they quote Etienne Saqr as an authoritative source. He's probably the closest thing that the lunatic right-wing fringe among Lebanon's Christians ever got to a pure Nazi, who urged every Lebanese to kill a Palestinian, wanted to ban all parties with a "non-Lebanese" ideology, and bring back the Phoenician alphabet...
(Nowadays, he's living in Israel, awaiting another 1982, hoping to return gloriously on top of a Merkava. Sort of puts the railing against neighbouring states for harboring extremists in perspective.)
Posted by: alle at October 4, 2007 09:00 PM
free-lebanon.org comes out through a whois (which of course is an extremely basic sort of search) as:
Messianic Vision on Park Ave in NYC
Abdelnour, Ziad 212-452-2680"
Which is odd. I am not clear on the connection between a registrant and an administrator, but why would it be Messianic Vision (which seems to me to mean that its a messianic judaism group)? Is messianic judaism that popular in Lebanon?
I wonder if they have any connection to freelebanon.com. I don't see any outside of a similarity in the name (which, if anything, makes me think that they're NOT connected in this case), but freelebanon.com was claiming that there was a coup d'etate going on in Syria back in 2005.
As far as the usefulness of such a program, even if garbage is going into it ALONG WITH useful information, then it could be useful. I don't think that the idea is to use this system to absolutely ID individual terrorists who are plotting attacks, but rather to track the entire spectrum of jihadi sympathetic people, and then produce a list for actual humans to investigate further. So they might get a list with a lot of garbage, which they can then exclude as they investigate and get to any real information.
Posted by: nygdan at October 5, 2007 09:22 AM
the 'Dark Web' used to be used just for those sites not easily found by search engines or generally accessible, but this is a new use of the term - and the USACFL has been around for a long time, with a wonderful webpage going back to 2000 that had a flag of Lebanon with an 'all-seing eye-on-pyramid' that gave lots of comfort to paranoids and Lebanese who were given no reason to think that this implied a plot by American neo-conservatives to bring totalitarian American one-worldism to the Middle East.
Hezbollah (not to mention the even more benighted bearded-ones-in-caves) keep winning the propaganda battles for Arab minds, and some morons in the White House can't figure out why...
Posted by: dawud at October 5, 2007 11:53 AM
and asking for intelligence per se, given that the administration doesn't reward intelligence but loyalty, and that critical thinking has long been on the wane - reference Colin Powell's critical thinking shutting off before he endorsed the UN speech in 2003, waving a vial of baking powder he pretended was Anthrax and speaking of a 'slam dunk' case for WMDs; or the legion of critics, including Michael Schearer ('Anonymous' author of 'Imperial Hubris' and 'Through our enemies' eyes') - down to 'Against all Enemies' or former intelligence officer Robert Steele at the Hacker conference ( http://www.hopenumbersix.net/speakers.html ) - the critical thinking you're asking for is in short demand, and not something that gets people advanced in public office these days...
Posted by: dawud at October 5, 2007 12:05 PM
Effort to pinpoint who is behind a website is pretty unsignificant. Something that kids can do in seconds in most cases, or in case of proxies, ISPs, etc., hiding that information, something that you can trace back through their employees who have access to that information anyway.
So if terrorists really used the web as a medium to spread their propaganda, then some buddies at the DoD are not doing their job or want them there. Otherwise, it's a farce for the technology illiterate.