Very few days go by without a new article describing the limits of published scientific research. The headline cases are about scientists who plagarize or completely fabricate data. Yet, in my experience, most scientists are actually quite ethical, meticulous, hard-working, and really concerned with finding the truth. Still, non-scientists would likely be surprised to know that a large number of scientific studies are actually false. An Amgen study found that 46 out of 53 studies with ¡®landmark¡¯ findings were unable to be replicated. A team at Bayer found a slightly more optimistic picture where 43 out of 65 studies revealed inconsistencies when tested independently. Scientific journals continue to accept articles based on the novelty and projected impact of the submission, yet simulations illustrate how the bias of journals toward publishing novel results likely leads to an environment where most published results are actually false. My home discipline of psychology is currently doing some soul searching as it¡¯s a relatively open secret that many results are difficult to reproduce such that a systematic reproducibility project is taking place.
Crowdsourcing is, and always has been the solution. Indeed, the phrase at the bottom of Google Scholar, ¡°standing on the shoulders of giants¡±, acknowledges that science has always been about crowdsourcing, as every scholar is collaborating with the scholars before them. Findings are not produced in a vacuum and build upon (or challenge) previous findings. Replication by others, which effectively crowdsources verification of results, is at the heart of the scientific method. It is perhaps a sign of the narcissism of our age that scientists feel compelled to believe that they discover things largely independently, such that they feel compelled to attack when their findings are challenged. Yet a willingness to be wrong about something is essential to learning, as we can¡¯t learn to walk without falling or learn about relationships without heartbreak. When science becomes more about ego, career, and grant money, it naturally becomes less accurate. Insisting that findings be crowdsourced solves this. No single study, paper, or research group can prove anything by themselves.
Crowdsourcing is not simply averaging the opinions of the masses, as those who would argue against that straw man would have you believe. Mathematically, crowdsourcing is about reducing the influence of sources of error and there is a great deal of academic research on this topic. A good crowdsourcing algorithm does not weight all inputs equally, but instead seeks to identify clustered sources of error, which explains why aggregating across people with diverse personalities, perspectives or job functions produces better results. Inputs need to have some signal vs. noise and need to have uncorrelated error. The unfortunate assumption in most research is that error is uncorrelated statistical noise that can be dealt with using statistical tests. Yet error also occurs due to the unconscious biases of researchers, the sheer number of researchers trying to find novel findings, the degrees of freedom that a researcher has in trying to prove their hypothesis, the non-randomness of sampling, and the volume of available statistical tests that a researcher can use. Given all these other sources of error, it is no wonder that many findings are false. A good crowdsourcing algorithm would be weighted such that true results would have to be shown by multiple researchers using multiple methods, multiple samples, multiple statistical tests, and multiple paradigms. This requires crowdsourcing as no single person can do all this, and even if they could, they would still represent a single source of error.
Technology enables crowdsourcing to be conducted far more efficiently, as has been proven by successful science crowdsourcing projects like GalaxyZoo, FoldIt, Seti@Home, and psychology¡¯s reproducibility project. Trends like citizen science, the quantified self, open access publishing, and interdisciplinarity improve the diversity of perspectives which mathematically improves the ability to find truth. Every meta-analysis result and Nate Silver¡¯s success in aggregating polls in the last election take advantage of the mathematical principles that underlie crowdsourcing, specifically the certainty that aggregating across sources of error produces more truth. In our daily lives, we all crowdsource knowledge that we are uncertain about, looking for confirmation from multiple independent sources when we are skeptical. This same skepticism serves scientists well and scientists should embrace being wrong, confident that the broader truth will be revealed when all data is aggregated intelligently and all perspectives are valued. Crowdsourcing is not some new technique that threatens to fundamentally change scientific research. Rather, it is an extension of the collective effort of knowledge aggregation that is the heart of science and scientists should embrace it as such.
Awhile ago, I read about a survey given to Harvard Medical school students about whether they would prefer to live in a world where they had a higher absolute amount of some beneficial good or a higher relative amount. For example, participants had a choice of living in a world where they make $100,000 and everyone else makes $200,000 (absolutely better) or one where they make $50,000 and everyone else makes $25,000 (relatively better), explicitly assuming buying power remains the same. The same types of choices were made for IQ, education, vacation time, attractiveness, and other goods, with the choice being between having more of something (absolute) or having more than other people (relative). The survey results often generate a lot of discussion, in my experience, as people are intrigued by the idea that lots of people would give up money, just to be better than others. In truth, other studies have shown that almost everyone cares about relative concerns, just perhaps in different circumstances.
I ran the same survey at yourmorals.org, and the results are similar to the original study, with some important differences (see graph below). Importantly, the % of people who chose a world of relative income was smaller than in the original study, where 50% of participants chose relative position. Perhaps people at Harvard are simply more competitive? Mean scores are quite variable in different non-representative samples, so I wouldn’t put much stock in them, but perhaps more interesting is that the relationship between variables replicates. Our results converge with the idea that some goods are more positional than others. Specifically, the same things that people thought were more appropriate to think of in relative terms in the original study (praise and attractiveness) were thought to be relative in our sample, with vacation time being the least relative good. The graph below shows questions in rough decreasing order of concern about relative position.
Our data suggests that some people think of things as more relative than others. Cronbach’s alpha for the items in the graph was .80, meaning that answers positively correlate and it is reasonable to think of answers to these diverse questions as all representing some general underlying preference for relative or absolute position.
Interestingly, it appears that conservatives care more about relative position compared to both liberals and libertarians. Perhaps this converges with the idea that conservatives have a more competitive orientation, leading to positive beliefs about competitive markets and competitive sports, both of which are found in our data as well.
The current data is based on 5,795 participants (3,559 liberals, 632 conservatives, 569 libertarians, and 1,035 others) who took this survey. This means that aside from political orientation, we could look at other factors that are associated with preference for relative or absolute goods. For example, concern for positional goods is negatively correlated with Big 5-Agreeableness (r=-.13, p<.001), Openness to Experience (r=-.09, p<.001), and positively correlated with Neuroticism (r=.07, p<.001). These are very modest correlations made significant by the sample size that took both measures (3,844). If other people have ideas for personality variables that may explain why some people prefer relative vs. absolute goods, please leave a comment with your ideas.
Wikipedia defines a moral hazard as “when a party insulated from risk behaves differently than it would behave if it were fully exposed to the risk.” By this definition, the financial crisis is a classic tale of moral hazard. I recently stayed up til 3am finishing Michael Lewis’ book, The Big Short, which explains the financial crisis in character driven terms that are accessible to non-experts. The quick summary of the crisis is that people and companies made big bets on the real estate market not falling (since it hadn’t fallen recently), and did not understand the risks they were taking. However, what people did is nowhere near as interesting as thinking about why they did it.
The most classic case of perverse motivation and moral hazard is the case of Wing Chau, who “was making $140,000 a year managing a portfolio for the New York Life Insurance Company. In one year as a CDO manager, he’d taken home $26 million.” (p.142) For what was he paid? CDO’s are the instruments that allowed people to bet on the housing market. Wing Chau’s clients, pension funds that only looked at the AAA ratings these instruments got from rating agencies (more on this moral hazard later), lost a ton of money, but Chau himself was “paid a fee of .01 percent off the top, before any of his investors saw a dime, and another, similar fee, off the bottom…His goal, he explained, was to maximize the dollars in his care.” Simple put, he was paid on volume, not on performance. This may seem odd, but other such situations exist. Real estate agents also get paid largely on volume, as they don’t get you a higher price, but do make more money the more homes they can sell quickly. Loan originators, such as New Century (p.169) or Countrywide, had similar incentives as they made loans and sold them, making them indifferent as to whether the borrower could actually be paid back.
The ratings agencies themselves get paid on the volume of bonds they rate. ”Moody’s…revenues had boomed, from $800 million in 2001 to $2.03 billion in 2006. Some huge percentage of the increase…flowed from the arcane end of the home finance sector, known as structured finance. The surest way to attract structured finance business was to accept the assumptions of the structured finance industry.” Pension funds often have rules that state that they can only invest in bonds that have high enough ratings, but how useful are these ratings likely to be given that the companies that create these bonds pay the agencies to rate them. It’s the same practice that incentivized Arthur Anderson to “audit” Enron, with the fees paid by Enron, with similarly disastrous consequences for those who believed in such audits.
Still, some CEOs are paid based on the performance of their companies. Are those incentives enough to create a lack of moral hazard? The book gives many instances where there is still much moral hazard, as individuals have lots of upside, but very little risk. If the company makes money, they make millions. If the company loses money, then maybe they find a new job, but they lose nothing. Consider the tale of Howie Hubler, whose group was at one time responsible for 20 percent of Morgan Stanley’s profits. He was paid $25 million a year, but was “no longer happy working as an ordinary bond trader. The best and the brightest Wall Street traders are quitting their big firms to work at hedge funds, where they can make not tens but hundreds of millions.” Morgan Stanley made a deal with Hubler to pay him a lot more money, whereupon he subsequently lost $9 billion. Hubler appears to have been honest, but mistaken, and now runs a company where the slogan “100% of the shots you don’t take don’t go in”. That makes perfect rational sense. If you go to a casino and earn 10% of the winnings and lose 0% of the losses, you can make a lot of money just by making bigger and bigger bets.
Having limited risk, but huge potential gain means that even the dumbest individual can make money. Based on performance, Hubler’s previous gains weren’t necessarily due to skill, but rather to circumstance. Steve Eisman, a central character in the book who foresaw the collapse “got himself invited to a meeting with the CEO of Bank of America, Ken Lewis. ’I was sitting there listening to him. I had an epiphany. I said to myself, ‘Oh my God, he’s dumb!” They shorted Bank of America along with UBS, Citigroup, Lehman Brothers, and a few others.” (p. 174) Dumb is perhaps too strong a word, but it seems self-evident that money managers are rewarded as if they are better at money management than they actually are. There is a psychological dimension to this. Both liberals and conservatives attribute their success in work life to ability and effort more than luck or circumstance. Conservatives and libertarians (likely a majority of those who read the Wall St. Journal) are slightly more likely to attribute success to effort and less likely to attribute it to context. Below is a graph of our YourMorals data, which mirrors previous research.
Is this true? One way to examine this is to compare the table from the previous post with the below chart of moral psychology differences between women and men. Below are the same constructs, sorted by effect size, with constructs at the top being more associated with men and constructs toward the bottom being more associated with women. I did the same thing for just liberal women/men and just conservative women/men and found the same result, so I feel fairly confident that these differences between men and women are somewhat robust.
The conclusion? First, in comparing the previous liberal-conservative differences to the differences here, it is pretty clear that male-female differences are far lower in magnitude than liberal-conservative differences. The effect sizes are much smaller, meaning that scores of women and men overlap much more than scores of liberals and conservatives. It is clear that male-female differences cannot account for a great deal of the variance in political attitudes.
Second, there are many constructs associated with being female that are indicative of liberalism (valuing universalism, empathizing) as well as traits indicative of conservativism (higher disgust scores, belief in a just world, and being collectivistic). Similarly, there are male traits associated with liberalism (individualism, utilitarianism) and conservativism (attitudes toward war, belief in proportionality).
I was recently forwarded a question about the differences that exist between Democrats and Republicans amongst white men. The question was framed by the fact that white men appear to be leaving the Democratic party at fairly high rates and it would be useful to pinpoint the variables that lead some white men to desert the Democratic party while others remain.
From that perspective, there is no one answer to what causes some white men to grativate toward the Republican party and not others. Rather, it might be useful to look at the bigger picture.
To do this, I created the below table of effect sizes (the mean difference between liberals and conservatives, divided by the standard deviation), using only US white male respondents, sorted from those characteristics that are most characteristic of liberals to those that are more characteristic of conservatives. We have better data on liberal-conservative identification than party identification, so we have to use this as a proxy, but we will have analyses in the future concerning party identification specifically.
There is too much here to really address in one post. I did the same thing for women and the pattern is very similar, so it doesn’t appear there are many gender interactions, though maybe someone will point something out. My main reaction is that it confirms my initial idea that all researchers are finding very real differences, but that no line of research has a monopoly on explaining differences. There is replication and support for a number of lines of research on ideological differences. Rather, ideology is a network of ideas, beliefs, and dispositions that encompasses all these findings.
Finding out what made white male liberals vote for McCain might be an even more interesting question, and perhaps I’ll do that analysis next as we do have some of that data. I did this previously to examine supporters of Obama vs. Clinton within the Democratic party and feel that examining within party psychological (as opposed to demographic) differences is a vast untapped area for political psychologists. Indeed, if I had to point out one interesting thing in the above graph, it would be the relatively small effect sizes of demographics like age compared to personality variables like neuroticism. It might make just as much sense for Obama to target the “empathic” vote as it does to target the “youth” vote.
Whenever I bring up the concept of maximizing (“never settling for less than the best”), the discussion inevitably evolves into thinking about what domains a given person maximizes in. For example, I definitely don’t maximize in terms of my clothing choices, but am more of a maximizer in my career choice. Actually, even within my career choice, I maximize for some characteristics (sense of purpose, geography, autonomy) more than others (stability, income).
Still, even as this distinction has been pointed out in Barry Schwartz’s original book and in subsequent papers, I am not aware of anyone who has attempted to measure maximizing in specific domains (please comment/email me if you know of such research, as I’m guessing that it’s out there). Here is a quote from a recent paper:
Although content-free items have several advantages, specific examples may be needed to measure domain specific maximizing tendency, i.e., individual maximizing tendency within particular domains such as consumer purchase. Future research needs to address whether there are systematic variations between individuals’ global maximizing tendency and their propensity for maximizing within given decision making domains, based on for example the degree of involvement.
To answer this question, I modified the original maximizer-satisficer scale and gave the resulting questionnaire to both a sample at yourmorals.org and to a sample of USC students. Below are the reliability coefficients, which won’t mean a lot to many people who read this, but are useful in determining if it really is possible to measure domain specific maximizing, simply by taking the original scale’s questions and tweaking them to be specific to a domain (e.g. instead of “I never settle for 2nd best”, change the question to “In picking a place to live, one should never settle for 2nd best”). More interesting are the domain specific correlations with the satisfaction with life scale, a measure of “happiness”.
The reliabilities are fair, meaning that the domain specific scales measure the constructs decently, but not extremely well. Better measures usually have reliabilities around .8. Still, the domain specific measures are comparable to the original scale’s reliabilities and the test-retest reliability (asking people the same question a month later) also is similar. I think the fair reliabilities are a result of the fact that maximizing (Nenkov et. al) has since been shown to have multiple dimensions: the search for alternatives, having high standards, and having difficulty making decisions (see this paper).
Beyond reliabilities, I think the best argument for domain specific maximizing is the pragmatic reliability, meaning whether maximizing in different domains predicts different outcomes. From the correlations above, you can see that maximizing in the material/physical domain (shopping, work, a place to live) has negative consequences for life satisfaction, while maximizing in the moral and political decision making domains does not (bold values are significant, click on the graph to zoom in). This is consistent across both samples. In addition, I asked the USC students how much they liked where they live, and the “place to live” subscale had the highest negative relationship (-.33, p<.001) to liking where they lived, followed by shopping (r=-.22) and work (r=-.22). Maximizing in relationships, political decision making and moral decision making were unrelated. At the very least, I think this is good evidence that maximizing is at least different in moral/political decision making versus in consumer decision making. Incidentally, maximizing had a long history in moral philosophy, before it became popular in psychology to think of it in terms of consumption.
One issue with my original scale construction is that I did it before Nenkov’s paper that deconstructed maximizing came out, so I did not evenly pick items across subscales. To make sure that the findings above aren’t just because of item selection, I ran some analyses for specific matched items that existed in all domain specific scales.
Again, bold values are significant and we see negative correlations only for alternative search questions only in the material domain. This replicates Nenkov’s finding in that having high standards does not relate to lower life satisfaction, but always searching for alternatives, no matter how satisfied one is, does relate to lower life satisfaction. However, it appears that this is true only in the material domain (shopping, career, a place to live) and not in moral and political decision making.
Lastly, the case of maximizing in relationships is interesting. The above data isn’t conclusive, but it converges with another pattern I’ve seen when comparing USC students to our YourMorals.org sample. Specifically, relationships appear to play a greater role in happiness in the general population rather than in our student samples. Perhaps loneliness is a bigger issue in the real world than it is within the college campus environment. Or perhaps paying attention to alternatives in relationships is less adaptive as you get older.
Watching baseball can be a frivolous pursuit and a distraction from psychology research, but last night something happened which demonstrated a psychological finding far more effectively than any study or paper.
Armando Galarraga, a pitcher for the Detroit Tigers, was very close to pitching a perfect game. For non-baseball fans, its a very rare occurrence, comparable to other rare unpredictable events that take some amount of skill and luck, like bowling 300 or climbing Mount Everest and seeing the perfect sunset. Its something you can work hard for, but even the best of pitchers may not achieve the feat.
Galarraga’s remarkably calm and forgiving reaction hasledto aseriesof articlestalking about him, probably a lot more than if he had completed his perfect game. He plans to shake hands publicly with Jim Joyce, the umpire who missed the call, and present him with the lineup card in the next game, in a public show of forgiveness in front of thousands of fans who might otherwise be irate at Joyce the entire next game.
Personally, I learned something from Galaragga’s reaction that I’ll take with me the next time I am wronged. Its something subtle and true about the power of forgiveness…something that I always know, but often dont have the strength or awareness to practice. Galaragga is not just reducing the amount of animosity in the world, but he is also ensuring his own happiness.
Studies confirm the relationship between being a forgiving person and being a happier person (Maltby, Day, Barber, 2005). Below is a graph of our yourmorals.org data showing the relationship between forgiveness of others (using the Heartland Forgiveness Scale – “I continue to punish a person who has done something that I think is wrong.”) and satisfaction with life (“The conditions of my life are excellent.”). As in the Maltby et. al study, forgiving people are indeed happier.
It may not have been a perfect game….but it was as close to a perfect reaction as we generally see and I’m hopeful this story will be remembered far more than if an actual perfect game had occurred. It’s a stark contrast to the ugliness we often see in most news and politics. As Galarraga put it himself, everything happens for a reason.
Thanks to the publicity which moral psychology (and specifically Jon Haidt’s work) has begun to receive, along with the average person’s insatiable appetite for knowledge about themselves, facilitated by the internet, we have collected a truly unique dataset at yourmorals.org. It is a large community sample and includes some reaction time data. It is non-representative (skewed liberal and educated), but includes individuals from diverse trackable sources such that some robustness analysis is possible. However, even if we wanted to (an open question), it would be impossible for those of us who collected this data to formally publish all the results. Hence, we would like to potentially solicit your help.
Academic publishing is not easy. In psychology (though we’d be happy to publish outside of psychology), it’s not enough just to have a valid results, but the results often have to be novel as well. Therefore, many replication studies may not be publishable or may only be publishable in lesser known journals or just on this blog. That doesn’t necessarily make that endeavor unworthwhile, as replication, or the failure to replicate, is an essential part of the scientific method, but we want people to know what they are getting into. We’re open to anyone who is motivated to publish in peer reviewed journals, and there is no inherent reason that limits this to academics. However, it’s a labor intensive process with no monetary reward, so it’s quite possible that only those with an eye toward building an academic CV might be interested.
Here is a running list of potentially publishable results which are in our publication queue, but there are many more possibilities. We are open to proposals on a variety of topics. Some of you might be interested in a specific topic and might find this list of measures useful in determining if we have data on that topic. Data might potentially serve as the 1st study in a 3 study package where a community sample reinforces the results of a lab experiment, or as convergent evidence in something you already are working on. In rare cases, we may even be willing to collect new data using additional measures, even including experimental methods, if your ideas are compelling enough. However, there are only so many resources we have and the degree of effort required is definitely a consideration, balanced against the contribution which could be made. Also bear in mind that some number of papers are already in progress, and it may be possible that your idea is already being worked on.
At the same time, I’ve been working with colleagues on a paper about experiential vs. material purchasing styles, for which we have found convergent correlations all suggesting that experiential purchasers are dispositionally motivated towards seeking new, stimulating experiences to promote positive emotion, while material purchasers often seek to avoid negative emotions. This is supported by the fact that, in the YourMorals.org dataset, experiential purchasers report higher levels of openness to experience, lower levels of neuroticism (both measured by the Big Five Personality Inventory), and lower levels of disgust (as measured by the Disgust Scale). The disgust finding does not necessarily fit with the idea that experiential purchasing is related to seeking new experiences, unless one looks at the literature on disgust. In particular, this study theorized about such a relationship and confirmed it by reporting correlations between disgust and big five personality dimensions.
It occurred to me that I could contribute to the original studies’ findings, by examining the same correlations in our dataset, using a more diverse and far larger sample, and perhaps even including some internal cross-validation. The results are summarized in the table below.
Disgust Scale Correlations with Big Five Personality Traits
The main hypothesis of the original study actually dealt with the two robust relationships found in our dataset, specifically that disgust is negatively related to openness to experience and positively related to neuroticism. In all, these two relationships stand out as robust across groups and in both studies. Interestingly, the correlation between openness to experience and disgust is weaker in the two most ‘rational’ groups, edge.org and libertarians, which might be worth pursuing later. Given the smaller sample size and restricted diversity of the original study, I’d be inclined to say that conscientiousness and agreeableness are not robust correlates of disgust, though this could be an effect of the fact that yourmorals.org uses a different measures of Big Five personality traits from the original study.
Can I publish this finding? It’s only correlational and says nothing about causality. It really doesn’t say much that is new, but rather confirms the original study, more or less. Still, the 26 papers which cited the original study would be slightly more improved if they could cite this finding as well, since it’s the same basic study with a different (larger and more diverse) sample. This is where the discussion of the peer review system converges with this analysis. According to this paper, ”many natural science fields operate on a norm that submissions should be accepted unless they are patently wrong.” In contrast, psychology papers are often rejected, not because they are wrong, but because they are not interesting or novel enough.
The paper and the listserve discussion bring up many points related to this, but one relevant one to this finding is that it is hard to build a cumulative science when you don’t reward replication, but instead reward novelty. The end result is that you end up with a series of slightly different perspectives on the same subjects, all named differently, where authors are constantly trying to come up with something new rather than building on something existing. This may help academics, but it makes it very difficult for these theories to be used in the real world. Any research on humans is likely flawed in some way. Can anybody do double-blind experiments on representative samples of people with behavioral measures? The public is wisely skeptical of any social science finding as are academics…but the solution might lie in publishing more replications rather than in restricting the publication process toward the mythical goal of the perfect, novel study. No single study proves anything when dealing with research on people. It’s the convergence of lots of studies that might potentially be convincing enough to outsiders.
- Ravi Iyer
ps. if anyone wants to write this up and publish it traditionally, feel free to contact me