The story of the War in Afghanistan is one that has been told many times from many different perspectives. The release of the data to Wikileaks opens a new window to the events that have unfolded, but so far it does not actually seem to contain any new revelations, merely finer detail to an already known picture.
Being a data scientist, I naturally took to the opportunity, downloading and immediately munging it. Several others have done the same (notably Drew Conway), and some of their results are much prettier and possibly more instructive than what I was able to extract on my own in the past couple of weeks. In fact, it was my inability to come up with any brilliant new insights that made me hesitate to post this blog. However, there is one event in the data that persuaded me to go ahead, despite the fact that I was unable to construct brilliant visualizations or discover hidden agents manipulating the action behind a shadowy curtain. I was nearly a week into my analysis before I realized this particular event was in the dataset. I quickly narrowed down the entries to those reporting “friendly KIA” on 1/24/09, and read the title:
(EXPLOSIVE HAZARD) IED EXPLOSION RPT (RCIED) 3/8 USMC WEAPONS COY IVO (ROUTE 515): 1 CF KIA 2 CF WIA
and the description:
At 1007Z, RC West reported IED strike. While conducting a framework patrol, FF reported they struck an IED. BDA: 1x USMC KIA, 1x USMC WIA (CAT B), 1x USMC WIA (CAT C), MEDEVACed to BSN R2E, 1x UAHMMWV vehicle damaged. FF established cordon. Vehicle is on fire and is unable to be recovered att. FF did not report type of IED. NFI att.
At 1824Z, RC South reported:
At 1817Z, EOD exploited the site and will report the device type through the CIED chain. Destroyed vehicle was recovered to COP Puller. Updated BDA: 1x USMC KIA, 2x USMC WIA (1x CAT B, 1x CAT C) and 1x HMMWV destroyed. NFTR. Event closed at 1817Z.ISAF # 01-1001
That “1x USMC KIA” is my friend, Julian Brennan.
To be honest, I was never that close to Julian—to me he was mostly my friend’s younger brother. But the ties between him and I are numerous and enduring. He and both of his siblings went to the same very small private school that my sister and I attended from 1st to 12th grade. His dad sang songs at my summer camp and his mom taught me art for a couple of years. When I heard he had joined the Marines I was surprised, especially given his family’s left-leaning tendencies and his aspirations to become a Broadway stage actor. But of course, that surprise was nothing compared to the shock and sadness I felt when I found out he had been killed by an IED in Afghanistan.
When the war started in 2001, it was a knee-jerk reaction to the WTC bombings, and there probably wasn’t an American alive who wasn’t happy to see the Taliban bombed back to the stone age. By the time the Wikileaks data begins—January 1, 2004—enthusiasm for the war had waned and focus had shifted to Iraq. Probably because of this shift in focus, the frequency of events at the beginning of the data set are relatively sparse (see Figure 1). The hope of finding bin Laden had all but disappeared, so there were relatively fewer actions on the ground. However, by mid-2006, the Iraq war had passed a turning point — al-Maliki became the new Prime Minister in May, Abu Musab al Zarqawi was killed in June and by December Saddam Hussein had been executed — and focus may have shifted back to Afghanistan. This explanation for the uptick in deaths post-2006 is pure speculation, and considering my ignorance in the matter, is probably wrong. It would be interesting to compare the timeline of deaths in Iraq with those in Afghanistan, to see if they are correlated or anti-correlated.
As you can see, the number of enemy killed is significantly above that of civilians, Afghan troops or U.S. troops, and the number of civilians and Afghans killed are both greater than the number of U.S. killed. It’s also clear that 2009 was a bad year for everyone.
In Figure 2, you can also see which regions were worst. It shows where in Afghanistan all of the deaths were and who was killed. The arc of violence from southern Afghanistan to western Afghanistan follows the main road (A01) connecting Qandahar and Kabul, with the very worst located in and around Qandahar.
The figures should be taken with a grain of salt as well. It appears that there is a large number of civilians killed in the north-central region (technically the Bamyan province), but reading the summary you see that “Initial report of 67 fatalilties was in fact ’6 or 7 fatalities’ misunderstood and turned out to be 2 confirmed at this time.” On the other hand, the deaths of 55 civilians in the Qandahar region are real, the result of a suicide bomber at a “dog fight related picnic”. Moreover, the distribution of event sizes (by size I mean the number killed) follows the pattern observed for other terrorist events (see Figure 3), which suggests that most of the numbers are accurate.
Perhaps the best way to see the increase in violence and focus of violence is to turn it into a movie. This is exactly what I have done, and definitely what took the most time to create. In the movie, I mark some key events in the timeline, including the day Julian was killed. I think seeing his face associated with this one small point that lasts for just a couple of seconds, turns the abstract visualization of dots on a map into something real and tangible. I know personally the pain and damage that losing Julian caused, and every single point on that map represents countless families and friends likewise affected.
I’ve looked at this dataset now, and believe there may be some insights hidden in its depths, though I am doubtful any amount of digging will lead to serious revelations. And down the road I may write some more about other analyses I have done. But this post is dedicated to Julian, his family, and all those who lost someone in this war in Afghanistan.
[Update: You can find a full manuscript on how to do behavioral research on AMT by Sid Suri and me here.]
This blog is intended to help researchers utilize Amazon’s Mechanical Turk to recruit and pay participants for online experiments. Feel free to email me if you have any questions or comments about this blog or AMT in general.
[Note: Right now this guide is specifically directed to Mac users. Most of this should be easy to adapt to PC and especially easy to adapt to Linux/Unix OS, but the minutia are for Macs. I may update this later with variations for other OSs]
- Create accounts on Mechanical Turk
- Create a Requester account on AMT
- You want to “Get Results,” so click on “Get Started” on the right side of the page
- Enter your email and password
- Enter your company & address, read the AMT agreement, check the box and click the button.
- Create an Amazon Payments Account
- Log in using the same email and password as before
- Enter information for your desired payment information (e.g. credit card).
- Create an Amazon Web Services Account
- Log in with the same email and password. It will pre-fill your address for you
- Accept the terms, and your account is created!
- Get Amazon’s Command line Tools.
- Don’t just click download!
- Scroll down the window and look for the link to get the Unix CLTs (without JRE).
- The unzipped folder will be called
aws-mturk-clt-1.3.0. - Rename the folder to something easy and move it to an easy place.
- Set your environment variables
- Start a Terminal session
- Edit “
.profile“ - Add the lines:
- “
export JAVA_HOME=/usr“ - “
export MTURK_CMD_HOME=[CLTdir]/[AMTools]“ - “
export HIT_HOME=[TaskDir]/[TaskFolder]“
- “
- Type “
. ./.profile” to load the environment variables you just created.
- Insert your account keys and direct the scripts to Amazon’s Sandbox
- Get your Access Key and Secret Key from Amazon.
- Open the file
[CLTdir]/[AMTools]/bin/mturk.propertieswith a text editor - Copy the Access Key & Secret Key to the appropriate lines in mturk.properties
- Insert a ‘#’ to comment out the line directing the scripts to the live Mechanical Turk site and remove the ‘#’ to uncomment the line directing the scripts to Amazon’s Sandbox.
- Create a Requester account on AMT
- Modify the programs for your experiment
- Download these scripts to use & modify for your task
- Put the files in your
[TaskFolder]. - Rename all of the “
yourtask.*” files to a name of your choice. I will reference these as[yourtask].from here on. - Modify
[yourtask].inputfor your task- The first row are column headers. A single column labeled “HITid” or “Condition” is sufficient
- Every row is a HIT, labeled by the columns; so # rows = # HITs to create. Each row can contain information about that HIT, such as the condition assignment if you are allocating Turkers into different experimental conditions.
- Modify
[yourtask].propertiesfor your tasktitle:When viewing HITs, this is what Turkers will see first.description:If Turkers click on your HIT, they can see this description. It should be short, less than 50 words.keywords:Turkers can search for tasks using these keywords. They are also visible after clicking to see the description.reward:This is how much you will pay a Turker for completing the HIT, usually on the order of $0.01 – $0.10.assignments:This is how many Turkers can work on the same HIT at once. You can either have one HIT and multiple assignments, or multiple HITs with one assignment. If you do the latter and need to be sure each person only does the task once, you must keep track of the Turker IDsannotation:A value used to uniquely identify the HIT, drawn from the input file, using$[field name]where[field name]is a column in the input file.assignmentduration:This is the amount of time (in seconds) a Turker has to complete the HIT. If they don’t complete it before the time is up, their work is voided and the HIT is returned to the list for someone else to work on it. Use this as a way of cleaning up abandoned HITs, not as a way of making Turkers finish the task on time — use your own mechanisms to make that happen.hitlifetime:This is how long a HIT will be listed without being accepted. If no one has chosen to work on the task inhitlifetimeseconds, it is removed from the list.autoapprovaldelay:Once a HIT has been completed, you need to review the work and accept or reject it. If you do nothing with a completed HIT, it will automatically be approved (and the Turker will be paid) in this amount of time.
- Modify
[yourtask].questionfor your taskExternal HIT:Put in the destination for the Turkers here. Note: This page has to be a portal into your experiment, not the first page of your experiment. I explain this in more detail at step HFrame Height:This defines the minimum height of the frame that will appear in the Amazon window. Adjust it to fit your needs
- Modify
run.shfor your task- At the end of the long line you’ll see “
-maxhits” followed by a number. This is the maximum number of HITs (out of the number of rows in the input file) you will load when you run this script. Make it small for testing purposes–you can always go back and change it when actually running the experiment.
- At the end of the long line you’ll see “
- Modify
introexample1.phpfor your experiment- If directing the participants off site, you need a way to link what they do on your site with their Turker ID and assignment ID. There are two ways to do this:
- Include their Turk ID and Assignment ID in the URL directing them to your site.
- Give them something that identifies them uniquely which they can input on the AMT site.
- Instructions on how to modify the file for both of these cases is included in the comments in the file itself.
- If directing the participants off site, you need a way to link what they do on your site with their Turker ID and assignment ID. There are two ways to do this:
- Modify
introexample2.phpfor your experiment- Put information about your HIT before the HTML form
- Change the action of the form to the first page of your experiment (e.g.,
"http://www.myuniversity.edu/myexperiment") - Note: Make sure your first page of the experiment records the user’s assignment ID (it is passed as
$_REQUEST['assignmentId']). You will need it to let them submit the HIT.
- Modify
exitexample.phpfor your experiment- Modify the first line (enclosed in php comments) to pass the worker’s assignment ID. It is currently written so you can pass it as a hidden variable in a form (the same way it was passed from introexample2.php to the first page of your experiment), but if you are keeping records in a database you can modify it so the php variable takes the assignment ID from the database.
- Test your experiment on Sandbox
- Double-check your files are pointed to Sandbox
- Open the file
[CLTdir]/[AMTools]/bin/mturk.propertieswith a text editor - Insert a ‘#’ to comment out the line directing the scripts to the live Mechanical Turk site and remove the ‘#’ to uncomment the line directing the scripts to Amazon’s Sandbox.
- Edit your submit button (in
introexample1.phporexitexample.php) to point toworkersandbox.mturk.cominstead ofwww.mturk.com
- Open the file
- From
[TaskDir]type./run.sh
- Run through your experiment as a Turker
- Try to mess it up. Some suggestions:
- Get past the first page without accepting the HIT
- Use the browser navigation buttons: back, refresh, stop.
- Let the time expire after accepting a HIT
- Others??
- Make sure all experimental conditions are working and assignment to experimental conditions is working correctly.
- Get list of HITs
- From
[TaskDir]type./getResults.sh - This will output a file called
[yourtask].results.
- From
- Review the HITs
- Open
[yourtask].resultswith a spreadsheet editor - There will be columns with information about who completed what HIT, and columns with any information you passed to AMT with the final submit button
- There is also a column labeled “reject”. If there are any HITs you want to reject, put a ’1′ in this column in the appropriate row. If you wish to accept all of them (i.e., not reject any) you can skip to the next step.
- Open
- Send your review of the HITs to Amazon
- From
[TaskDir]type./reviewResults.sh
- From
- Accept the HITs and delete them
- From
[TaskDir]type./acceptAndDeleteResults.sh
- From
- Double-check your files are pointed to Sandbox
- Load and run your experiment
- Convert your files to run on the live site
- Open the file
[CLTdir]/[AMTools]/bin/mturk.propertieswith a text editor - Insert a ‘#’ to comment out the line directing the scripts to Amazon’s Sandbox and remove the ‘#’ to uncomment the line directing the scripts to the live Mechanical Turk site.
- Edit your submit button (in
introexample1.phporexitexample.php) to point towww.mturk.cominstead ofworkersandbox.mturk.com
- Open the file
- Load your HITs to run your experiment
- From
[TaskDir]type./run.sh
- From
- Verify all of the data has been collected
- From
[TaskDir]type./getResults.sh - Check the output to stdout or in
[yourtask].resultsto verify all HITs have been submitted
- From
- Review the HITs
- Open
[yourtask].resultswith a spreadsheet editor - There will be columns with information about who completed what HIT, and columns with any information you passed to AMT with the final submit button
- There is also a column labeled “reject”. If there are any HITs you want to reject, put a ’1′ in this column in the appropriate row. If you wish to accept all of them (i.e., not reject any) you can skip to the next step.
- Open
- Send your review of the HITs to Amazon
- From
[TaskDir]type./reviewResults.sh
- From
- Accept the HITs and delete them
- From
[TaskDir]type./acceptAndDeleteResults.sh
- From
- Pay bonuses to Turkers if necessary
- Convert your files to run on the live site
I recommend creating or designating an email address and Amazon account specifically for this purpose other than a personal email address. For instance, one could create a lab email account that all lab members have access to and use it to create the AMT accounts (although the payments will have to come from an account in someone’s name). Inevitably there is feedback from Turkers as well as account update information from Amazon itself, and the quantity of emails can be irritating. A good filter could work, too. I also recommend using the same account throughout Steps 1-3, as it makes things easier in the creation of the accounts and for navigating between them later.
From this point, you can use their pre-made templates. I have never done it, but it seems fairly straightforward (see their User Guide PDF). One advantage is that it lets you use their servers for hosting the work for free. One disadvantage, as far as I can tell, is that it only allows you to create HITs that exist on a single page using simple HTML, so dynamic content is limited to inserting variables in a form. To be sure, there is much you can do with this, but it also limits the range of things that can be done.
My preference is to use their Command Line Tools (CLTs), which is what I will be describing here, although the support for it seems to be declining and I worry that at one point they will discontinue its use entirely. For instance, the CLT documentation is now hidden behind the “Resource Center” tab, rather than an option out front on the “Design” page, and it’s even harder to find the actual download location.
Next you’ll create a payments account. This account lets you put money into it using a credit card or money transfer from a bank. The money you put into this account will be used to pay the Turkers, and Amazon makes adding money and making payments very simple. In many ways it is the primary reason to use AMT over other possible crowd-sourcing tools.
Next you’ll create an Amazon Web Services Account. This account is required for interacting with AMT programatically or using the CLTs.
You now have all of the accounts you will be using to set up tasks on AMT. Next you’ll actually get the CLTs you use to interact with AMT. The CLTs amount to a set of scripts that can be easily modified for task creation. There is a set of videos designed for PC users to help you with the CLTs, so if that is your OS of choice, I recommend watching them. They’re somewhat helpful for everyone else, too, but I hope this walkthrough makes them unnecessary.
You’ll be interacting with the tools using the command line (via Terminal or X11 in Mac), and since they reference the directories containing the tools are and your task-specific scripts, it’s a good idea to think about where you want these files to go. To facilitate this, you’ll make some environment variables that point to these paths, so that if you ever move the files you can just change the environment variables and everything will continue to run smoothly.
I will refer to the directory you put the folder in as [CLTdir] and the folder name as [AMTools], so in actuality on my computer the path is /Users/Winteram/AMTools/, but I will refer to it in this document as [CLTdir]/[AMTools]/. Similarly, I’ll refer to the directory you keep your AMT tasks in as [TaskDir] and the folder that contains the task-specific scripts as [TaskFolder].
The next two steps are important, and vary depending on how you want to run the experiment. In one version, participants do the experiment at a completely separate site, outside of the AMT HIT frame. The portal page just tells them the URL and lets them submit the HIT right away. You record their Turker ID, and approve the HIT (and pay whatever bonus) based on their performance on the web site.
In another version, the portal page and the experiment are kept in Amazon’s frame, and all of the navigation between pages is done inside the frame. My sense is that participants are somewhat more wary of HITs that send them off-site, so keep that in consideration when deciding how to proceed.
Version 1
Version 2
You will see an output listing the created HITs and their respective HITids, and at the end, the url where you can find your HITs listed on Amazon’s Sandbox. The HITids are stored in a file called [yourtask].success. Note: don’t run ./run.sh again until you have deleted the HITs. Running it again overwrites the [yourtask].success file, and the script to delete the HITs relies on the HITids in this file. Running the file twice could leave some HITs listed with no easy way to approve or delete them.
Once you feel you’ve got your experiment running smoothly, or you need to load some new HITs to continue testing, you’ll need to delete the HITs. You can only delete HITs that have not been accepted or have already been completed–you will get an error on HITs that are currently being worked on.
Once you have finished this last step, you can repeat the cycle again until you feel you have the hang of it and are confident your experiment is running smoothly.
That covers it! You should now have all the data you want at a very cheap price.
This will probably be a short one, since it’s late and I’m already fading, but…
The question I am considering is how human cognition has changed over the course of history. Julian Jaynes’ book, “On the breakdown of consciousness in the bicameral mind,” has a theory that I would consider outlandish. When he argued at the beginning of the book that historically we were writing before we had self-awareness, and used the argument that we can write without paying attention as evidence, I basically shut down on it. But I felt like he was onto something, like he had the right collection of facts but the wrong conclusion, and I have mulled over possible alternatives since then.
The theoretical / philosophical foundation I am working from is that of the extended mind. Several philosophers, notably Andy Clark, have suggested that we structure our external environment in order to supplement and complement our internal cognitions, and that the appropriate view is that the cognitive system includes these external artifacts. A favorite example is the hand-held calculator: with it, a person can multiply or divide large numbers; without it, and the person cannot. Thus, on the basis of a math test, the person with the calculator is smarter than the person without.
In fact, the extent to which these epistemic technologies exist is, well, mind-boggling. Some are real technologies, like a calculator or the internet (or older technologies, like the abacus and the printing press). These are physical items that improve our ability to do mental operations. Others are more techniques than technologies, like the quick-sort algorithm, or long multiplication. These are things that can be taught that improve our abilities to solve problems. What is fantastic is that these methods can be internalized, learned ways of thinking about things, that actually make us more capable of solving problems.
Probably the most dramatic epistemic technology was writing itself. By writing down one’s thoughts, they were there to be reckoned later, reflected on from a more objective viewpoint. One could hear one’s words as though they were coming from another person. Anyone that has written a paper knows how useful that can be. Suddenly, things that used to require large amounts of effort and time to rehearse and remember could be written down once and forgotten, provided the physical copy was kept secure. Writing in and of itself probably increased our cognitive capacities by several orders of magnitude.
I also think of Helen Keller’s description of how she felt learning what language was. Knowing that there was a symbol for water opened the possibility of symbols for everything, symbols that could be used to conjure an awareness of something in absentia. Suddenly, the potential to interact with the world grew. Just think of how powerful the simple ability to request a glass of water can be.
My point is that these technologies, from language, to writing, to calculators, have changed the way people think and are able to think about the world. By adapting cognitively to these epistemic technologies, we must have changed the way we think in some very fundamental ways. And crucially, like all technology, these are spread culturally. Therefore, I think that there must be some record in history of how these epistemic technologies spread (following the clash of nations and/or tribes), and therefore the potential for observing what effect they had on how people thought (or at least, how they wrote). I’m not sure what kind of cognitive tests you can do with writing samples, but it’s probably worth a look as well.
I think an assistant professor’s job should be to prepare a graduate student for later collaboration with a full professor. The assistant professor would have the new graduate students, and would help them learn the important things, both about their field and how to be a graduate student. The graduate students who are already qualified for their Ph.D. would get to collaborate with the full professors. The assistant professors tend to be closer in age to the graduate students anyway, provide a good role model, and will have fresher memories about how to learn to be a graduate student. The assistant professors who are better role models will prepare better Ph.D. students, and most likely be better full professors. The new graduate students probably take more time to instruct than older ones, but tend to do the research the professor is doing rather than their own thing. In this way, assistant professors get to establish their line of research (and themselves) so they can continue with their own research as full professors, while devoting less time to the more advanced graduate students. The collaborations that occur between the advanced graduate students and the full professors also give assistant professors the chance to extend their research and connect with already established lines of research by being part of the collaboration. And the graduate student gets a more guided path in their graduate career. Basically, everybody wins.
The way we transform the three-color information from the retina into the opponent-process signal in the lateral geniculate nucleus (LGN) has to happen electrochemically.
1. We can use this pathway to show how neurons electrochemically transform the signal, something that has implications for more complicated types of signal processing.
2. It’s possible the final signal that is processed in higher levels of visual processing uses both kinds of information. (or is it? I don’t know enough visual neuroscience to know.)
The same points, of course, apply to auditory processing if there are two differentiable signals that are transforms of each other. The ultimate point is that the signals in audition and vision can be approximated by continuous functions that have well-defined transforms, and that this kind of processing happens in the brain.
On the other hand, it could be a coincidence, like that calculator that seems to add 4 + 4, until you type in 5 + 5 and it still says 8. But I guess that would also be testable by changing the input signal… hmm….
There are many brilliant people right now in the field of vision science and visual neuroscience, and I know very little about it, so this is almost certainly either an obvious idea that was discovered long ago, or one that is invalid for reasons of which I am unaware.
The way blogs are feeding information into the network (and cable) news programs is just like it was before – word of mouth – the only difference being that there is more of a paper trail and more accessible to more people. The genuine news services (i.e., those with fact-checkers and accountability) need to take that into account.
—-
The copper bosses killed you, Joe.
“I never died,” said he.
“I never die.” said he.











