Lecture Notes for Research Methods

I’m teaching a new class this semester, a masters-level class on research methods. It could be taught as simply the second semester of an econometrics sequence, but I’m taking a different approach, trying to think about what will help students do effective empirical work in policy/political settings. We’ll see how it works.

For anyone interested, here are the slides I will use on the first day. I’m not sure it’s all right, in fact I’m sure some of it is wrong But that is how you figure out what you really think and know and don’t know about something, by teaching it.

After we’ve talked through this, we will discuss this old VoxEU piece as an example of effective use of simple scatterplots to make an economic argument.

I gave a somewhat complementary talk on methodology and heterodox macroeconomics at the Eastern Economics Association meetings last year. I’ve been meaning to transcribe it into a blogpost, but in the meantime you can listen to a recording, if you’re interested.

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

7 thoughts on “Lecture Notes for Research Methods”

Chris Pepin says:

September 1, 2018 at 1:29 pm

These slides are truly informative. I particularly liked the points on cause and effect, as well as this:
“How much” questions are more interesting than \whether”
questions
Keshav says:

September 4, 2018 at 7:00 pm

I was going to quibble that regression could be used either to estimate causal effects, *or* to approximate a conditional expectations function/predict one variable given some others. Even if your question is descriptive rather than causal, you might be interested in using regressions to answer that descriptive question.

But Cosma Shalizi has just posted his lecture notes on linear regression, so you should ignore my quibble and read those instead:
http://www.stat.cmu.edu/~cshalizi/TALR
1. JW Mason says:
  
  September 4, 2018 at 8:31 pm
  
  That’s an interesting point. My initial reaction is that I’m not sure I agree. Would conditional expectations normally be thought of as ceteris paribus? I.e. if we think of a regression as just a way of calculating E(X|Y), why would we worry about controls? Your expected wage conditional on being a woman reflects any differences in eduation, occupation, etc. as well as “pure” gender discrimination. I don’t understand why you would use a conventional regression methodology if you weren’t asking a causal question. Certainly you’d agree that that’s what the vast majority of people doing regressions think they are doing, at least?
  
  But Cosma Shalizi knows a thousand times more about this than I ever will so yes, I will read his notes.
  1. JW Mason says:
    
    September 4, 2018 at 8:43 pm
    
    Just started reading. Very nice. I can’t imagine ever teaching econometrics at a level where I would use something like this, but I’m sure I will learn a lot myself from reading it.
    
    My favorite part so far is footnote 5: “Even the idea that the variables we see are randomly generated from a probability distribution is a usually-untestable assumption.” In most presentations of statistics and econometrics I’ve seen, the idea that there might not be an underyling distribution is completely glossed over. In my experience, it’s a hard point to get someone with a conventional economics training to even understand.
  2. Keshav says:
    
    September 5, 2018 at 1:06 am
    
    I agree that probably most people running regressions (in economics at least, I don’t know about other disciplines) are either explicitly or implicitly trying to estimate a causal effect, and this usually seems to be the motivation for including controls. The exceptions would include prediction; summarizing the data; estimating a non-recursive system where the coefficients on endogenous variables don’t have a causal interpretation; rational expectations econometrics (e.g. Euler equation estimation) where the conditional expectation being approximated is that of the agents in the model and the parameters being estimated (e.g. the elasticity of substitution) have at best a rather complicated causal interpretation. And yes, it’s true that if you were using regression to approximate E(Y|X1) you wouldn’t need controls, you would only need controls if you want to approximate E(Y|X1,X2,…) (in which case, the coefficient on X1 is the change in the expected value of Y as we move from one observation to another observation with a higher value of X1 and the same value of X2, etc.).
    
    Judea Pearl argues that we should emphasize as strongly as possible the difference between these two interpretations of linear equations (conditional expectations and causal effects) – I am neither a statistician nor an econometrician but I find his arguments persuasive. Of course, economists do already worry a lot about endogeneity, but this doesn’t mean that they are always clear about what `causal effects’ they are estimating (i.e. precisely what hypothetical interventions they have in mind, what is being varied and what is being fixed), at least, not in macro. One way to fix this might be to emphasize that what regression gives you, in the first instance, is an approximation to E(Y|X). A causal effect is something conceptually completely distinct from that, and you might or might not get it from a regression. If memory serves, my textbooks were clear on the first part (regression gives you the CEF) but vague on what a causal effect was.
    
    Anyway, my main point was that it might be useful to keep the conditional expectation interpretation of regressions in mind *especially* if you think many interesting questions are descriptive rather than causal (though to be fair, if you want to approximate the CEF, there might be better ways than linear regression).
JW Mason says:

September 5, 2018 at 1:30 am

Thanks, this is super helpful. As I said in the post, I’ve never taught this before and one of my goals is to clarify my own thinking about this stuff, which is definitely a work in progress. Tho in practice the class is not going to get into these conceptual issues much at all. It’s going to be much more hands on practice working with real data in R, for students with mostly very limited math and economics backgrounds.

I will look for Judea Pearl’s stuff, I’m not familiar. Are there other people you’d suggest who have interesting things to say about the conceptual/methodological basis of econometrics?
Keshav says:

September 5, 2018 at 2:30 am

Yes! But these suggestions largely reflect my idiosyncratic interest in the narrow question of whether and how one can interpret simultaneous equations/equilibrium models causally, what it could possibly mean to talk about `mechanisms’ in equilibrium models, etc. Definitely not a comprehensive overview of econometric methodology.

besides Pearl’s recent book (which is excellent), Woodward (2003), Making Things Happen is a good presentation of the manipulationist view of causality

Strotz and Wold (1960), Recursiveness vs nonrecursive systems
Wold (1954), Causality and econometrics
Bentzel and Hansen (1954), On recursiveness and interdependency in economics models
– one fascinating thing about Wold’s view which I would like to understand better is its connection with the Swedish disequilibrium/sequence analysis school.

Simon (1953), Causal ordering and identifiability
Marschak (1950), Statistical inference in economics: An introduction (or any description of the Cowles program)
Heckman and Vytlacil (2007) Econometric evaluation of social programs, part I
– for me, Simon’s notion of block- or vector-causality was the key to understanding the Cowles program and how it differs from the approach Wold advocates

Imbens and Rubin (2015) Causal Inference in Statistics, Social and Biomedical Sciences, chapter 2 (has a brief history of the potential outcomes framework)
Deaton and Cartwright’s papers on RCTs and mechanisms – I found these very useful when I first read them, less useful now.

Spanos (2010), Statistical adequacy and the trustworthiness of empirical evidence – this (and his other work) makes the point that models might be either statistically inadequate and/or substantively misspecified – these are both bad, but they are *different* problems

And anything Andrew Gelman has ever written.

Comments are closed.