Casualties in the War in Afghanistan

On August 14, 2010, in Politics, Research, by Winter

The story of the War in Afghanistan is one that has been told many times from many different perspectives. The release of the data to Wikileaks opens a new window to the events that have unfolded, but so far it does not actually seem to contain any new revelations, merely finer detail to an already known picture.

Being a data scientist, I naturally took to the opportunity, downloading and immediately munging it. Several others have done the same (notably Drew Conway), and some of their results are much prettier and possibly more instructive than what I was able to extract on my own in the past couple of weeks. In fact, it was my inability to come up with any brilliant new insights that made me hesitate to post this blog. However, there is one event in the data that persuaded me to go ahead, despite the fact that I was unable to construct brilliant visualizations or discover hidden agents manipulating the action behind a shadowy curtain. I was nearly a week into my analysis before I realized this particular event was in the dataset. I quickly narrowed down the entries to those reporting “friendly KIA” on 1/24/09, and read the title:

(EXPLOSIVE HAZARD) IED EXPLOSION RPT (RCIED) 3/8 USMC WEAPONS COY IVO (ROUTE 515): 1 CF KIA 2 CF WIA

and the description:

At 1007Z, RC West reported IED strike. While conducting a framework patrol, FF reported they struck an IED. BDA: 1x USMC KIA, 1x USMC WIA (CAT B), 1x USMC WIA (CAT C), MEDEVACed to BSN R2E, 1x UAHMMWV vehicle damaged. FF established cordon. Vehicle is on fire and is unable to be recovered att. FF did not report type of IED. NFI att.

At 1824Z, RC South reported:
At 1817Z, EOD exploited the site and will report the device type through the CIED chain. Destroyed vehicle was recovered to COP Puller. Updated BDA: 1x USMC KIA, 2x USMC WIA (1x CAT B, 1x CAT C) and 1x HMMWV destroyed. NFTR. Event closed at 1817Z.

ISAF # 01-1001

That “1x USMC KIA” is my friend, Julian Brennan.

Julian BrennanTo be honest, I was never that close to Julian—to me he was mostly my friend’s younger brother. But the ties between him and I are numerous and enduring. He and both of his siblings went to the same very small private school that my sister and I attended from 1st to 12th grade. His dad sang songs at my summer camp and his mom taught me art for a couple of years. When I heard he had joined the Marines I was surprised, especially given his family’s left-leaning tendencies and his aspirations to become a Broadway stage actor. But of course, that surprise was nothing compared to the shock and sadness I felt when I found out he had been killed by an IED in Afghanistan.

When the war started in 2001, it was a knee-jerk reaction to the WTC bombings, and there probably wasn’t an American alive who wasn’t happy to see the Taliban bombed back to the stone age. By the time the Wikileaks data begins—January 1, 2004—enthusiasm for the war had waned and focus had shifted to Iraq. Probably because of this shift in focus, the frequency of events at the beginning of the data set are relatively sparse (see Figure 1). The hope of finding bin Laden had all but disappeared, so there were relatively fewer actions on the ground. However, by mid-2006, the Iraq war had passed a turning point — al-Maliki became the new Prime Minister in May, Abu Musab al Zarqawi was killed in June and by December Saddam Hussein had been executed — and focus may have shifted back to Afghanistan. This explanation for the uptick in deaths post-2006 is pure speculation, and considering my ignorance in the matter, is probably wrong. It would be interesting to compare the timeline of deaths in Iraq with those in Afghanistan, to see if they are correlated or anti-correlated.

As you can see, the number of enemy killed is significantly above that of civilians, Afghan troops or U.S. troops, and the number of civilians and Afghans killed are both greater than the number of U.S. killed. It’s also clear that 2009 was a bad year for everyone.

Enemy DeathsFriendly Deaths

Host Nation DeathsCivilian Deaths

In Figure 2, you can also see which regions were worst. It shows where in Afghanistan all of the deaths were and who was killed. The arc of violence from southern Afghanistan to western Afghanistan follows the main road (A01) connecting Qandahar and Kabul, with the very worst located in and around Qandahar.

The figures should be taken with a grain of salt as well. It appears that there is a large number of civilians killed in the north-central region (technically the Bamyan province), but reading the summary you see that “Initial report of 67 fatalilties was in fact ’6 or 7 fatalities’ misunderstood and turned out to be 2 confirmed at this time.” On the other hand, the deaths of 55 civilians in the Qandahar region are real, the result of a suicide bomber at a “dog fight related picnic”. Moreover, the distribution of event sizes (by size I mean the number killed) follows the pattern observed for other terrorist events (see Figure 3), which suggests that most of the numbers are accurate.

Frequency of incident sizes

Perhaps the best way to see the increase in violence and focus of violence is to turn it into a movie. This is exactly what I have done, and definitely what took the most time to create. In the movie, I mark some key events in the timeline, including the day Julian was killed. I think seeing his face associated with this one small point that lasts for just a couple of seconds, turns the abstract visualization of dots on a map into something real and tangible. I know personally the pain and damage that losing Julian caused, and every single point on that map represents countless families and friends likewise affected.

I’ve looked at this dataset now, and believe there may be some insights hidden in its depths, though I am doubtful any amount of digging will lead to serious revelations. And down the road I may write some more about other analyses I have done. But this post is dedicated to Julian, his family, and all those who lost someone in this war in Afghanistan.

 

Elena Kagan email analysis

On July 1, 2010, in Politics, Research, by Winter

So I heard through the grapevine (i.e., Twitter) that the Clinton Library was releasing all emails associated with Elena Kagan in light of her Supreme Court nomination.  My first thought was, “Hey, I can analyze that as a social network!”

Of course, as with nearly all data, there was a bit of cleaning to do, and this was a tough one.  Unfortunately when I started I didn’t realize that Sunlight Labs had already converted the emails into a gmail-like format, so I probably duplicated some effort.  Ah, well.  At any rate, all of the emails started out in PDF format with the (sometimes OCR’d) text embedded.  So my first step was to extract the text from the PDFs, which I did using an Automator script on my Mac.  Unfortunately, I wasn’t able to get the text for all of the PDFs, but I got what I could.  I then wrote a python script that went through these incredibly messy text files and pulled out the creation date, recipient lists, subject lines, and text for each email and output it in a clean format.  (I later wrote another script that turned that file into a tab-delimited file for easy dB consumption).  Finally, I wrote one last script to turn the clean email data into a co-recipient graph that I could analyze and draw (see below).

The largest connected component of Kagan's co-recipient graph

First, let me explain this picture.  Each circle represents a recipient of an email from Elena Kagan from 1995-1999 during her tenure in the Clinton Administration.  Two circles are connected by a line if they were co-recipients (“cc’d”) on a single email.  The bigger the circle is, the more email they received over the four years, and the thicker the line, the more often the two were co-recipients on emails from Kagan.

Of course, without labels it’s all just a bunch of balls and sticks, but with them the picture is so covered in ink you couldn’t make sense of it anyways.  So instead I’ll just review some numbers, point out some key features, and show some more understandable pictures shortly.

Across her whole tenure, the most frequently emailed recipient was Laura Emmett, (158 emails) who was Policy Assistant in the Office of Domestic Policy with Kagan, followed by Bruce Reed (124 emails) who was Chief Domestic Policy Adviser and,  more importantly, the Director of the Domestic Policy Council (DPC), which I believe makes him Kagan’s boss at the time.  The next most emailed, Cynthia Rice (then Special Assistant to the President for Domestic Policy) only received half of what Kagan’s boss received (58 emails).

One interesting feature of the graph is the clique sticking out of the top.  The bridge node is Jon Schnur, then Associate Director for Educational Policy, and the clique consists of Barbara Chow, Charles Konigsberg, Kate Donovan, Sandra Yamin, Broderick Johnson, and Tanya Martin.  I could only track down a couple of those names, not sure how they’re related.

One method social network researchers use to find important people in a graph is to look at centrality.  I consider three forms, closeness centrality, which measures the average number of steps from the person to all other people, betweenness centrality, which measures how many connecting paths pass through the person, and eigenvector centrality, which is loosely how “well-connected” a person is.  Of course, in a co-recipient graph they take on slightly different flavors, as the edges represent co-affiliations based on emails received from Kagan.

Closeness in a co-recipient graph can be thought of us how central the actor was to Kagan.  The fact that Bruce Reed (her boss) appears as the most central figure in her emails by this measure is therefore reassuring.  Interestingly, the next most central person by this measure is Maria Echaveste, who was Director of Public Liason (i.e., in charge of interacting with special interest groups) in 1997 and in 1998 became Clinton’s Deputy Chief of Staff.

The betweenness measure also places Bruce Reed at the center, but second to him is Kathleen Wallman, who was associated with Economic Policy and the office of White House Counsel. Eigenvector centrality also finds Diana Fortuna as a central player, which makes sense as she was Associate Director of the DPC under Bruce Reed.

1996

I did promise more pretty pictures, and here’s the first one.  Now that I’ve introduced some of the key players and the methods I used, I’ll break it down by year, which helps tell the story a bit more.  In 1996, Elena Kagan was Deputy Assistant to the President for Domestic Policy.  The most central player at the time for Kagan was James Jukes, but unfortunately I have not been able to determine who this person is (although I believe he is currently associated with the OMB).  I do know that he is connected to Ellen Seidman, who was Special Assistant for Economic Policy. The other two folks who make up the central triangle are Todd Stern, who dealt with environmental issues such as the Kyoto protocol, and Carol Rasco, who went on to be the Senior Adviser to the Secretary of Education.  This suggests those three issues — finance, environment, and education — were the focus of Kagan’s time as Deputy Assistant.  The eigenvector centrality measure turns up another player: Jeremy Ben-ami, who was the Deputy Domestic Policy Adviser, clearly another key figure in the Domestic Policy group in the Clinton Administration.

1997

In 1997, Kagan became Deputy Assistant to the President for Domestic Policy and Deputy Director of the DPC, essentially moving up in the Administration’s Domestic Policy group.  Although you can’t quite see it, Bruce Reed (58 emails) is again the most central, sitting in between several frequently emailed people, including Laura Emmett (66 emails), Sylvia Mathews (22 emails), who was Chief of Staff to the Treasury Secretary and then Deputy Chief of Staff to Clinton, Nicole Rabner (29 emails), who was Special Assistant to the President for Domestic Policy, and Diana Fortuna (19 emails).  Maria Echaveste shows up again as a highly central person in Kagan’s emails, and as can be seen in the lower middle left of the picture above, Tracey Thornton also seems to be a key middle-person—she was Special Assistant to the President for Legislative Affairs.  As a potential point of interest, you can also see Rahm Emanuel out to the right, connected to Sylvia Mathews.

1998

In 1998, Elena Kagan continued her work on Domestic Policy.  Here we see a somewhat sparser graph with seemingly distant clusters.  The large nodes in the middle are the usual suspects: Laura Emmett, Bruce Reed, and Cynthia Rice.  There are some interesting bridges, though, like Julie Fernandes on the left (another Special Assistant to the President for the DPC) and Martha Foley on the right (Deputy Assistant to the President and Senior Legislative Counsel).  Another new, fairly central email recipient is Pulitzer prize winner Thomas L. Freedman, who was the White House correspondent for the New York Times, and is now pretty well known for his books such as, “Hot, Flat, and Crowded.”

1999

There are many fewer emails from 1999, so the resulting graph is very sparse.  You can see that the clique I mentioned at the beginning, connected by Jonathan Schnur, shows up this year.  It turns out the key player connecting him to other recipients is Irene Bueno, another Special Assistant to the President for the DPC.

So, that’s about it.  I don’t think there are any huge insights here, but it was fun looking through the graphs and seeing who comes out as important, as well as learning about Kagan’s inner circle(s) during those days.  I also did some text mining, but that’s not my bread-and-butter, and I wasn’t able to extract much meaningful content out of it, so I decided not to include it in this post.  Maybe I’ll do more if there’s enough interest.

Tagged with:  

[Update: You can find a full manuscript on how to do behavioral research on AMT by Sid Suri and me here.]

This blog is intended to help researchers utilize Amazon’s Mechanical Turk to recruit and pay participants for online experiments. Feel free to email me if you have any questions or comments about this blog or AMT in general.

[Note: Right now this guide is specifically directed to Mac users. Most of this should be easy to adapt to PC and especially easy to adapt to Linux/Unix OS, but the minutia are for Macs. I may update this later with variations for other OSs]

  1. Create accounts on Mechanical Turk
    1. Create a Requester account on AMT
      1. You want to “Get Results,” so click on “Get Started” on the right side of the page
      2. Enter your email and password
      3. Enter your company & address, read the AMT agreement, check the box and click the button.
    1. Create an Amazon Payments Account
      1. Log in using the same email and password as before
      2. Enter information for your desired payment information (e.g. credit card).
    1. Create an Amazon Web Services Account
      1. Log in with the same email and password. It will pre-fill your address for you
      2. Accept the terms, and your account is created!
    1. Get Amazon’s Command line Tools.
      1. Don’t just click download!
      2. Scroll down the window and look for the link to get the Unix CLTs (without JRE).
      3. The unzipped folder will be called aws-mturk-clt-1.3.0.
      4. Rename the folder to something easy and move it to an easy place.
    1. Set your environment variables
      1. Start a Terminal session
      2. Edit “.profile
      3. Add the lines:
        • export JAVA_HOME=/usr
        • export MTURK_CMD_HOME=[CLTdir]/[AMTools]
        • export HIT_HOME=[TaskDir]/[TaskFolder]
      4. Type “. ./.profile” to load the environment variables you just created.
    1. Insert your account keys and direct the scripts to Amazon’s Sandbox
      1. Get your Access Key and Secret Key from Amazon.
      2. Open the file [CLTdir]/[AMTools]/bin/mturk.properties with a text editor
      3. Copy the Access Key & Secret Key to the appropriate lines in mturk.properties
      4. Insert a ‘#’ to comment out the line directing the scripts to the live Mechanical Turk site and remove the ‘#’ to uncomment the line directing the scripts to Amazon’s Sandbox.
  2. I recommend creating or designating an email address and Amazon account specifically for this purpose other than a personal email address. For instance, one could create a lab email account that all lab members have access to and use it to create the AMT accounts (although the payments will have to come from an account in someone’s name). Inevitably there is feedback from Turkers as well as account update information from Amazon itself, and the quantity of emails can be irritating. A good filter could work, too. I also recommend using the same account throughout Steps 1-3, as it makes things easier in the creation of the accounts and for navigating between them later.

    From this point, you can use their pre-made templates. I have never done it, but it seems fairly straightforward (see their User Guide PDF). One advantage is that it lets you use their servers for hosting the work for free. One disadvantage, as far as I can tell, is that it only allows you to create HITs that exist on a single page using simple HTML, so dynamic content is limited to inserting variables in a form. To be sure, there is much you can do with this, but it also limits the range of things that can be done.

    My preference is to use their Command Line Tools (CLTs), which is what I will be describing here, although the support for it seems to be declining and I worry that at one point they will discontinue its use entirely. For instance, the CLT documentation is now hidden behind the “Resource Center” tab, rather than an option out front on the “Design” page, and it’s even harder to find the actual download location.

    Next you’ll create a payments account. This account lets you put money into it using a credit card or money transfer from a bank. The money you put into this account will be used to pay the Turkers, and Amazon makes adding money and making payments very simple. In many ways it is the primary reason to use AMT over other possible crowd-sourcing tools.

    Next you’ll create an Amazon Web Services Account. This account is required for interacting with AMT programatically or using the CLTs.

    You now have all of the accounts you will be using to set up tasks on AMT. Next you’ll actually get the CLTs you use to interact with AMT. The CLTs amount to a set of scripts that can be easily modified for task creation. There is a set of videos designed for PC users to help you with the CLTs, so if that is your OS of choice, I recommend watching them. They’re somewhat helpful for everyone else, too, but I hope this walkthrough makes them unnecessary.

    You’ll be interacting with the tools using the command line (via Terminal or X11 in Mac), and since they reference the directories containing the tools are and your task-specific scripts, it’s a good idea to think about where you want these files to go. To facilitate this, you’ll make some environment variables that point to these paths, so that if you ever move the files you can just change the environment variables and everything will continue to run smoothly.

    I will refer to the directory you put the folder in as [CLTdir] and the folder name as [AMTools], so in actuality on my computer the path is /Users/Winteram/AMTools/, but I will refer to it in this document as [CLTdir]/[AMTools]/. Similarly, I’ll refer to the directory you keep your AMT tasks in as [TaskDir] and the folder that contains the task-specific scripts as [TaskFolder].

  3. Modify the programs for your experiment
    1. Download these scripts to use & modify for your task
    2. Put the files in your [TaskFolder].
    3. Rename all of the “yourtask.*” files to a name of your choice. I will reference these as [yourtask]. from here on.
    4. Modify [yourtask].input for your task
      1. The first row are column headers. A single column labeled “HITid” or “Condition” is sufficient
      2. Every row is a HIT, labeled by the columns; so # rows = # HITs to create. Each row can contain information about that HIT, such as the condition assignment if you are allocating Turkers into different experimental conditions.
    5. Modify [yourtask].properties for your task
      1. title: When viewing HITs, this is what Turkers will see first.
      2. description: If Turkers click on your HIT, they can see this description. It should be short, less than 50 words.
      3. keywords: Turkers can search for tasks using these keywords. They are also visible after clicking to see the description.
      4. reward: This is how much you will pay a Turker for completing the HIT, usually on the order of $0.01 – $0.10.
      5. assignments: This is how many Turkers can work on the same HIT at once. You can either have one HIT and multiple assignments, or multiple HITs with one assignment. If you do the latter and need to be sure each person only does the task once, you must keep track of the Turker IDs
      6. annotation: A value used to uniquely identify the HIT, drawn from the input file, using $[field name] where [field name] is a column in the input file.
      7. assignmentduration: This is the amount of time (in seconds) a Turker has to complete the HIT. If they don’t complete it before the time is up, their work is voided and the HIT is returned to the list for someone else to work on it. Use this as a way of cleaning up abandoned HITs, not as a way of making Turkers finish the task on time — use your own mechanisms to make that happen.
      8. hitlifetime: This is how long a HIT will be listed without being accepted. If no one has chosen to work on the task in hitlifetime seconds, it is removed from the list.
      9. autoapprovaldelay: Once a HIT has been completed, you need to review the work and accept or reject it. If you do nothing with a completed HIT, it will automatically be approved (and the Turker will be paid) in this amount of time.
    6. Modify [yourtask].question for your task
      1. External HIT: Put in the destination for the Turkers here. Note: This page has to be a portal into your experiment, not the first page of your experiment. I explain this in more detail at step H
      2. Frame Height: This defines the minimum height of the frame that will appear in the Amazon window. Adjust it to fit your needs
    7. Modify run.sh for your task
      1. At the end of the long line you’ll see “-maxhits” followed by a number. This is the maximum number of HITs (out of the number of rows in the input file) you will load when you run this script. Make it small for testing purposes–you can always go back and change it when actually running the experiment.
    1. Modify introexample1.php for your experiment
      1. If directing the participants off site, you need a way to link what they do on your site with their Turker ID and assignment ID. There are two ways to do this:
        1. Include their Turk ID and Assignment ID in the URL directing them to your site.
        2. Give them something that identifies them uniquely which they can input on the AMT site.
      2. Instructions on how to modify the file for both of these cases is included in the comments in the file itself.
    1. Modify introexample2.php for your experiment
      1. Put information about your HIT before the HTML form
      2. Change the action of the form to the first page of your experiment (e.g., "http://www.myuniversity.edu/myexperiment")
      3. Note: Make sure your first page of the experiment records the user’s assignment ID (it is passed as $_REQUEST['assignmentId']). You will need it to let them submit the HIT.
    2. Modify exitexample.php for your experiment
      1. Modify the first line (enclosed in php comments) to pass the worker’s assignment ID. It is currently written so you can pass it as a hidden variable in a form (the same way it was passed from introexample2.php to the first page of your experiment), but if you are keeping records in a database you can modify it so the php variable takes the assignment ID from the database.
  4. The next two steps are important, and vary depending on how you want to run the experiment. In one version, participants do the experiment at a completely separate site, outside of the AMT HIT frame. The portal page just tells them the URL and lets them submit the HIT right away. You record their Turker ID, and approve the HIT (and pay whatever bonus) based on their performance on the web site.

    In another version, the portal page and the experiment are kept in Amazon’s frame, and all of the navigation between pages is done inside the frame. My sense is that participants are somewhat more wary of HITs that send them off-site, so keep that in consideration when deciding how to proceed.

    Version 1

    Version 2

  5. Test your experiment on Sandbox
    1. Double-check your files are pointed to Sandbox
      1. Open the file [CLTdir]/[AMTools]/bin/mturk.properties with a text editor
      2. Insert a ‘#’ to comment out the line directing the scripts to the live Mechanical Turk site and remove the ‘#’ to uncomment the line directing the scripts to Amazon’s Sandbox.
      3. Edit your submit button (in introexample1.php or exitexample.php) to point to workersandbox.mturk.com instead of www.mturk.com
    2. From [TaskDir] type ./run.sh
    1. Run through your experiment as a Turker
    2. Try to mess it up. Some suggestions:
      1. Get past the first page without accepting the HIT
      2. Use the browser navigation buttons: back, refresh, stop.
      3. Let the time expire after accepting a HIT
      4. Others??
    3. Make sure all experimental conditions are working and assignment to experimental conditions is working correctly.
    1. Get list of HITs
      1. From [TaskDir] type ./getResults.sh
      2. This will output a file called [yourtask].results.
    2. Review the HITs
      1. Open [yourtask].results with a spreadsheet editor
      2. There will be columns with information about who completed what HIT, and columns with any information you passed to AMT with the final submit button
      3. There is also a column labeled “reject”. If there are any HITs you want to reject, put a ’1′ in this column in the appropriate row. If you wish to accept all of them (i.e., not reject any) you can skip to the next step.
    3. Send your review of the HITs to Amazon
      1. From [TaskDir] type ./reviewResults.sh
    4. Accept the HITs and delete them
      1. From [TaskDir] type ./acceptAndDeleteResults.sh
  6. You will see an output listing the created HITs and their respective HITids, and at the end, the url where you can find your HITs listed on Amazon’s Sandbox. The HITids are stored in a file called [yourtask].success. Note: don’t run ./run.sh again until you have deleted the HITs. Running it again overwrites the [yourtask].success file, and the script to delete the HITs relies on the HITids in this file. Running the file twice could leave some HITs listed with no easy way to approve or delete them.

    Once you feel you’ve got your experiment running smoothly, or you need to load some new HITs to continue testing, you’ll need to delete the HITs. You can only delete HITs that have not been accepted or have already been completed–you will get an error on HITs that are currently being worked on.

    Once you have finished this last step, you can repeat the cycle again until you feel you have the hang of it and are confident your experiment is running smoothly.

  7. Load and run your experiment
    1. Convert your files to run on the live site
      1. Open the file [CLTdir]/[AMTools]/bin/mturk.properties with a text editor
      2. Insert a ‘#’ to comment out the line directing the scripts to Amazon’s Sandbox and remove the ‘#’ to uncomment the line directing the scripts to the live Mechanical Turk site.
      3. Edit your submit button (in introexample1.php or exitexample.php) to point to www.mturk.com instead of workersandbox.mturk.com
    2. Load your HITs to run your experiment
      1. From [TaskDir] type ./run.sh
    3. Verify all of the data has been collected
      1. From [TaskDir] type ./getResults.sh
      2. Check the output to stdout or in [yourtask].results to verify all HITs have been submitted
    4. Review the HITs
      1. Open [yourtask].results with a spreadsheet editor
      2. There will be columns with information about who completed what HIT, and columns with any information you passed to AMT with the final submit button
      3. There is also a column labeled “reject”. If there are any HITs you want to reject, put a ’1′ in this column in the appropriate row. If you wish to accept all of them (i.e., not reject any) you can skip to the next step.
    5. Send your review of the HITs to Amazon
      1. From [TaskDir] type ./reviewResults.sh
    6. Accept the HITs and delete them
      1. From [TaskDir] type ./acceptAndDeleteResults.sh
    7. Pay bonuses to Turkers if necessary

That covers it! You should now have all the data you want at a very cheap price.

 

How science is it?

On October 26, 2008, in Research, by Winter

When I tell people that I study psychology, I am often suddenly forced to explain to people—sometimes, to complete strangers—that I do not know whether their mother’s infidelity led to their teenage bedwetting, nor can I scientifically interpret their dream about hamsters seizing control of Dublin. I fully understand why many people have this opinion of psychology; there are countless mountebanks perpetuating this vision of psychology as a means of lining their pockets. But it is doubly frustrating because this quackery is one reason psychology and other social sciences are condescended as “soft” sciences, particularly by those who are involved in the “hard” sciences.

So what is this distinction between “hard” and “soft” science? The funny thing is, I’ve never gotten an adequate, operational definition of the distinction, even from people who arrogate that soft sciences shouldn’t even be called science. It seems to me that’s exactly the sort of thing “hard” scientists should be able to provide. (I know, I know, you can taste the bitter from there.) Most simply say that hard sciences include the natural and physical sciences—that is, physics, chemistry, and depending on who you ask, biology—and everything else is soft. Others prefer to just say that social sciences are soft, and everything else is hard. You can get a sense of the controversy this stirs in the labcoats just by reading the Wikipedia entry on hard science.

In my mind, science is defined by the scientific method. If you go through the process of generating hypotheses, carefully testing those hypotheses through observation and experimentation, and evaluating your hypotheses on the basis of your observations, then what you are doing is science. If you use experimental, objective data to draw logical conclusions, then you are doing science. If the methodologies are identical, and only the subject matter is different, I would argue, how can you distinguish between “hard” and “soft?

Or at least, that is how I would have argued before a week or two ago. The confluence of two events forced a realization on me. The first event was my immersion in Asimov’s Foundation Series (which if you haven’t read, you should). The second event was a news report that scientists had successfully predicted the collision of an asteroid with the Earth’s atmosphere. That was when it hit me: social sciences lack prediction.

Now, it’s not true that there is no prediction in social science. If you give me some carefully controlled laboratory conditions, I can give you predictable results. Or if you take a large sample of people, I can tell you some things about how they will react—take supply and demand curves as an example. And for a small set of results in social science, I can actually predict how you will react. [A trivial example: Picture a coffee cup in you head. Is the handle on the right?]

However, the subject matter of the “soft” sciences make prediction inherently more difficult. People are a funny bunch. Whenever a rule in social science is discovered, there seem to be exceptions immediately available. Even the “laws” of economics break down, sometimes in unexpected ways. And if messiness of the data isn’t enough, add to it a moral proscription against truly accurate findings! If I can really predict what you are going to do, then I must be tampering with forces that ought not be tampered with. I am treading into the bailiwick of psychics and fortune-tellers. I am stepping on the very toes of free will itself!! It is no wonder that people want to dismiss the social sciences as “soft”, because as long as they are soft they are comforting because they lend helpful advice, but are not threatening to rob us of our individuality.

However, I would argue that despite the difficulties in the data and the moral barriers against it, it is of the highest imperative that we (i.e., humanity) do our very best to treat social science as a “hard” science. If physics or chemistry solves the problem of generating cheap energy, it will be on the shoulders of economists and political scientists to work out the distribution of wealth. If physics or chemistry or biology develops a genocidal weapon, an understanding of the human condition will be the only prevention to its use. The cure for cancer may lie in genetic engineering, but the cure for conflict can only lie in the social sciences.

So my answer is simply this… the social sciences, as true scientific disciplines (rather than simply philosophy and religion), are young sciences. The natural sciences have enjoyed the scientific method for over a millennium—its application to human behavior is younger than a century. But human behavior as a subject of scientific scrutiny is not just “hard,” it is essential for our future.

Tagged with:  

Historical Cognition

On August 23, 2008, in Research, by Winter

This will probably be a short one, since it’s late and I’m already fading, but…

The question I am considering is how human cognition has changed over the course of history. Julian Jaynes’ book, “On the breakdown of consciousness in the bicameral mind,” has a theory that I would consider outlandish. When he argued at the beginning of the book that historically we were writing before we had self-awareness, and used the argument that we can write without paying attention as evidence, I basically shut down on it. But I felt like he was onto something, like he had the right collection of facts but the wrong conclusion, and I have mulled over possible alternatives since then.

The theoretical / philosophical foundation I am working from is that of the extended mind. Several philosophers, notably Andy Clark, have suggested that we structure our external environment in order to supplement and complement our internal cognitions, and that the appropriate view is that the cognitive system includes these external artifacts. A favorite example is the hand-held calculator: with it, a person can multiply or divide large numbers; without it, and the person cannot. Thus, on the basis of a math test, the person with the calculator is smarter than the person without.

In fact, the extent to which these epistemic technologies exist is, well, mind-boggling. Some are real technologies, like a calculator or the internet (or older technologies, like the abacus and the printing press). These are physical items that improve our ability to do mental operations. Others are more techniques than technologies, like the quick-sort algorithm, or long multiplication. These are things that can be taught that improve our abilities to solve problems. What is fantastic is that these methods can be internalized, learned ways of thinking about things, that actually make us more capable of solving problems.

Probably the most dramatic epistemic technology was writing itself. By writing down one’s thoughts, they were there to be reckoned later, reflected on from a more objective viewpoint. One could hear one’s words as though they were coming from another person. Anyone that has written a paper knows how useful that can be. Suddenly, things that used to require large amounts of effort and time to rehearse and remember could be written down once and forgotten, provided the physical copy was kept secure. Writing in and of itself probably increased our cognitive capacities by several orders of magnitude.

I also think of Helen Keller’s description of how she felt learning what language was. Knowing that there was a symbol for water opened the possibility of symbols for everything, symbols that could be used to conjure an awareness of something in absentia. Suddenly, the potential to interact with the world grew. Just think of how powerful the simple ability to request a glass of water can be.

My point is that these technologies, from language, to writing, to calculators, have changed the way people think and are able to think about the world. By adapting cognitively to these epistemic technologies, we must have changed the way we think in some very fundamental ways. And crucially, like all technology, these are spread culturally. Therefore, I think that there must be some record in history of how these epistemic technologies spread (following the clash of nations and/or tribes), and therefore the potential for observing what effect they had on how people thought (or at least, how they wrote). I’m not sure what kind of cognitive tests you can do with writing samples, but it’s probably worth a look as well.

 

tenure track

On May 2, 2005, in Research, by Winter

I think an assistant professor’s job should be to prepare a graduate student for later collaboration with a full professor. The assistant professor would have the new graduate students, and would help them learn the important things, both about their field and how to be a graduate student. The graduate students who are already qualified for their Ph.D. would get to collaborate with the full professors. The assistant professors tend to be closer in age to the graduate students anyway, provide a good role model, and will have fresher memories about how to learn to be a graduate student. The assistant professors who are better role models will prepare better Ph.D. students, and most likely be better full professors. The new graduate students probably take more time to instruct than older ones, but tend to do the research the professor is doing rather than their own thing. In this way, assistant professors get to establish their line of research (and themselves) so they can continue with their own research as full professors, while devoting less time to the more advanced graduate students. The collaborations that occur between the advanced graduate students and the full professors also give assistant professors the chance to extend their research and connect with already established lines of research by being part of the collaboration. And the graduate student gets a more guided path in their graduate career. Basically, everybody wins.

Tagged with:  

vision SPT

On February 16, 2005, in Research, by Winter

The way we transform the three-color information from the retina into the opponent-process signal in the lateral geniculate nucleus (LGN) has to happen electrochemically.

1. We can use this pathway to show how neurons electrochemically transform the signal, something that has implications for more complicated types of signal processing.

2. It’s possible the final signal that is processed in higher levels of visual processing uses both kinds of information. (or is it? I don’t know enough visual neuroscience to know.)

The same points, of course, apply to auditory processing if there are two differentiable signals that are transforms of each other. The ultimate point is that the signals in audition and vision can be approximated by continuous functions that have well-defined transforms, and that this kind of processing happens in the brain.

On the other hand, it could be a coincidence, like that calculator that seems to add 4 + 4, until you type in 5 + 5 and it still says 8. But I guess that would also be testable by changing the input signal… hmm….

There are many brilliant people right now in the field of vision science and visual neuroscience, and I know very little about it, so this is almost certainly either an obvious idea that was discovered long ago, or one that is invalid for reasons of which I am unaware.

Tagged with:  

The Bloggernet

On February 16, 2005, in Research, by Winter

The way blogs are feeding information into the network (and cable) news programs is just like it was before – word of mouth – the only difference being that there is more of a paper trail and more accessible to more people. The genuine news services (i.e., those with fact-checkers and accountability) need to take that into account.

—-
The copper bosses killed you, Joe.
“I never died,” said he.
“I never die.” said he.

Tagged with:  

I heart I heart huckabees

On February 10, 2005, in Research, by Winter

Ok, here’s where I begin: I heart I heart huckabees.

Now: Would we change who we are by changing our friends? We know that mere presence can increase liking. By “hanging out” with friends, we are engaging in the same activities, making us have “common fate.” However, it is plain and simple that we can choose who our friends are, any more than we can choose who we are.

The universe is a figure/ground problem. Too often we focus on the figure without realizing it is also a ground. For instance, it is just as relevant who we don’t hang out with, and why. Has the question of broken links been addressed in the literature?

Spending time when you have to. When you have to cooperate, even compete against others. Forced admission into a group. Voluntary admission into a group. Who do you spend time with?

What are the processes that lead to liking as well as disliking? Sometimes we talk about growing out of (touch, liking) with a group of people. Is it because we are changing, as we think? We change, not grow.. well, grow and change, become different by changing our environment by changing ourselves by changing…

There are micro-causes there, chaos that dictates. Stochastic noise. Error. A miscommunication leads to a breakdown. A chance encounter on the bus. But also common trends, things that can be followed.

Do people change groups of friends at regular times in their lives? We move, too, lose touch because of effort. Sometimes just a schedule change can rearrange who you are spending time with. Some are planned, common to everyone: grade school, middle school, high school. Others are voluntary. Are there patterns in the voluntary change?

Conformity. Adopting the beliefs of those around you. Social norms. What do I mean by voluntary? I’m different, I’ve grown since we last saw each other. You haven’t changed a bit. You’re the spitting image of a young man I once knew. Time is forward for us.

Listening, observing, very important. Someone must share to hear. Figure/ground, baby.

Tagged with:  

Random ideas

On July 27, 2004, in Research, by Winter

Ok, so here’s a question:

Could you prove that different people and/or governments define “terrorist” differently? And that what gets defined as a terrorist dictates the groups and individuals that are targeted? I wouldn’t be surprised if there was high variability in the characteristics that add up to a “terrorist” between different countries. I also wouldn’t be surprised if the definition targets groups and individuals that threaten the government more so than it targets groups and individuals that threaten the people. This is a possible study, right?

Next:

It is an easy analogy between a person’s moods and a place’s weather. Perhaps this is so because the processes that underlie both of them share important features. Consider: weather has broad patterns that can be identified but are hard to predict specifically because they are influenced by very small fluctuations in the environment. Moods are similar, especially if it ends up that they are dictated by chaotic chemical changes in the brain. Unfortunately, the analogy falls apart when you try to figure out what a person’s environment translates to for weather patterns. My best guess would be the earth, since I’m not counting earthquakes as weather, :) but I don’t think that works very well.

Finally:

The current state of social network analysis focuses on a vertex / edge description. This makes sense for the data that has been dealt with most — web page links, citation networks, social network websites like Friendster or MySpace, etc. However, I think that real social network behavior is probably closer to chemical reactions, with important qualitative features of the vertices, the people, and the edges, their interactions. It’s like the field is trying to do chemistry with a single element at a time. Fil’s work is at least a step in the right direction, because (using the vertex / edge description and web pages) it adds the content of the web pages as a predictor for a link being made from one page to the other.

Tagged with: