21 October 2021

This is what happens when you tell women they're bad at math

Boy have I found a sexy economics paper to share with you all.

And by sexy I mean interesting, provocative, useful and upsetting. With a splash of Bayesian statistics. (why, what do you mean by sexy?). 

As is typical in economics, the authors have tried to hide its sex appeal with the title: A (Dynamic) Investigation of Stereotypes, Belief-Updating, and Behavior, by Katherine B. Coffman, Paola Ugalde Araya & Basit Zafar. But do not be fooled! 

What's the paper about?

This paper looks at how men and women update their beliefs and choices based on feedback on their verbal skills and maths. It addresses the gender gap in beliefs about skills, gender gaps in appetites for competition based on those skills, and the gender differences in how men and women change their beliefs and choices when they get feedback on their skills.

What's the big finding?

Here's the clanger because I don't believe in saving the conclusion until last: when women receive bad feedback, especially in math, they update their beliefs and choices more negatively than men do. They are also more likely to remember negative feedback in the future and let it guide their decisions.

As a grown woman with a complicated relationship with math, this hit hard. Math and I carry a lot of baggage. 

But let's go back a bit...how is the paper different from what we know already?

Here's the stuff that's already been established in previous literature:

  • There's a gender gap in labour market outcomes, even amongst the highly skilled (and yeah, there's a whole pool of literature in what explains that gender gap which I won't get into).
  • Women are less likely to major in high-earning STEM or business fields.
  • Women are significantly less likely to opt into competitive tournaments than men, and growing evidence suggests men tend to be more over-confident than women on average.
  • Providing feedback about performance can reduce gender gaps in competitive tournament entry in laboratory settings. But lab settings have tended to be one-off situations.
  • When participants take an IQ test and receive feedback on their performance, one month later, beliefs are more responsive to positive than negative feedback, and positive feedback is more likely to be accurately recalled. 
What makes this study different from similar literature on gender gaps in beliefs and competitive choices:
  • The authors investigate how the effects of feedback dissipate over time, and how the fade-out (if any) may depend on the stereotype of the domain (verbal abilities as a typically female domain and math as a typically male domain) and gender. 

So what did the study actually do?

I think I'm going to have to quote at length here, because I don't think my rewording can do this experiment justice. Here's a simple (I mean...simple is a relative concept here) explanation of what they study is trying to do:

Our experiment is inspired by many educational settings, where students take introductory courses in different domains, receive noisy feedback, and then decide what to specialize or compete in. We use a stylized, controlled environment to mimic important features of this setting, producing several advantages. 

First, we observe individual measures of ability in both domains. Second, we observe exogenous changes in the individual’s information set (which are quite hard to isolate in non-experimental settings), allowing us to cleanly study belief updating. Third, we have precise measures of beliefs. And, finally, we have well-defined measures of payoffs for the chosen domain as well as for the counterfactual domain - this offers us an advantage since counterfactual payoffs are, by definition, not observed in the field. 

Our design allows us to collect detailed information about beliefs, choices, and recall at different points in time in both a female and a male-typed domain. This allows us to ask whether there are differences across men and women and/or differences across the associated stereotype of the task. Thus, we present results in terms of two gender gaps: the male − female gap (average differences between men and women) and the gender-congruence gap (average differences between individuals in the gender-congruent domain and individuals in the gender-incongruent domain). Both gaps are potentially important for understanding gender disparities in educational and career settings of interest.

And in plain English? 

The experiment took place in two sessions, one week apart.

In the first session, participants (university age) take two assessment quizzes: one in math and one in verbal skills. Next, participants report their beliefs about their absolute (do they think they did well?) and relative performance (do they think they did well relative to their peers?) in each domain. The experimenters next inform participants that they will take a second round of quizzes one week later. 

Participants are then given a series of choices about how they would like to be compensated for their future performance in round 2. This is meant to measure choices around appetite for competition and pay-for-performance:

  • First, they choose between being paid for math performance under a piece-rate scheme or for verbal performance under a piece-rate scheme ($1 per correct answer). This indicates their preferred domain.
  • Next, the experimenters elicit participants' willingness-to-accept competition in each domain using price lists. Participants make a series of choices between receiving either $1 per correct answer in verbal (math) or entering a competitive pay scheme in math (verbal). The competitive option pays $X per correct answer in math (verbal) if they place in top the 40% of performers in the Round 2 math (verbal) quiz, but 0 otherwise. For each domain, we vary X from $1.5 to $4 across the rows. This indicates their willingness to compete with their peers.

Some participants then receive feedback about their relative performance (i.e. their performance relative to their peers). The peer reference group is randomly computer-generated, so the participant learns whether they performed better or worse than a random peer. Randomness is key here because the participant has no idea about average performance: they don't know if they got lucky or unlucky with the computer-drawn peer. So the information is useful, but noisy.

For half of the sample of participants who received feedback, the experimenters ask participants to update their beliefs about performance and choices (see above) again immediately after the receipt of this feedback. 

The other half of the sample that receives feedback leaves the first session without providing updated beliefs or choices. 

All participants return for the second session one week later. In the second session, participants share their beliefs and choices again, including those that did not receive feedback in the first session.

All participants then take the two Round 2 quizzes. Participants are also asked to recall the feedback they received in each domain at the end of the second session

What were the results?

Apart from the headline finding, here are some other interesting results:

  • Men and women respond similarly to positive feedback if the have the same starting point, and they update their beliefs and choices similarly in response to positive feedback.
  • The impact of bad news fades less over time than the impact of good news, particularly for women (relative to men) and for individuals who receive bad news in incongruent domains--women who receive bad news about their math skills, men who are bad at verbal skills-- (compared with congruent domains).
  • Women's beliefs about their performance are more pessimistic than men's, and women are less willing to compete.
  • Immediately after feedback, gender gaps are somewhat reduced, particularly for beliefs. However, in the week following feedback, gaps grow back toward their starting point. In particular, gender gaps in choices one week later are indistinguishable from gender gaps at baseline.
  • The results are not driven by simply forgetting the feedback: 88% of feedback is accurately recalled one week later.

So to round it all up:
If we take a man and a woman with the same performance and the same initial beliefs, then provide the same bad news, the woman holds more pessimistic beliefs about herself one week later compared to the man. Similarly, even if we hold fixed performance and initial choices, women (compared to men) are less willing to compete one week after bad news. 

Now what?

As the authors allude to, these findings have implications for understanding how men and women might choose their fields of study, and subsequently, the gender ratios in certain domains and overall differences in labour market outcomes. 

I mean, this is just one study. It doesn't tell us the state of the universe. You'd need more studies looking into the same thing to know if we're really onto something here. 

But at the very least, I think this study helps us think about the multiple points where gender gaps can take place, emphasises the need to investigate how prior beliefs develop in the first place, and urges better thinking on how to cushion the long-lasting effects of negative feedback.