I want to invest in "big data stocks". After all, everyone is saying that big data is the future of health care, education, government, business, and will literally change the world. As someone who works with data both as an academic at USC and as the principal data scientist at Ranker, I am the type of person who is likely to make and believe in such hyperbolic claims. I recently put money into my IRA and needed to invest it and as someone who believes in investing in what I know about, naturally I wanted to invest in our data driven future.
Where should I invest? If you look around the internet, you'll find a number of recommendations from places like Forbes or The Street. The general consensus appears to be to take the "picks and shovels" approach to investing in big data, where you invest in the companies that make the tools that enable people to use data, rather than in the data itself. I'm writing this post because I think this is absolutely the wrong approach. I believe in investing in data, not in tools. Why do I believe that?
- My experience in academia has taught me that simple statistics and tools are often the most reliable. If there is signal to be detected, any analysis and/or tool should be able to find it. Many people turn to more complex statistics when they don't find the right relationship using simple statistics. In psychology, people are finding that the use of more complex models (e.g. covariates) is often an indicator that the study's results may be less likely to be reliable. Given the size of datasets that we often have in data science, we often don't need special statistical techniques to find relationships in data as we have so much statistical power that most tools and techniques should give you convergent results. Put simply, the tools matter less than the data.
- The most popular tools and techniques are often open source. You can do a lot with R, Python, Gephi, Mahout, etc.
- Yes, there are advantages to using particular distributions of open source tools (e.g. Hadoop distributions that come with particular features), but there are so many companies out there offering different flavors of products that do essentially the same thing, that I can't see how any particular company is going to be the next Apple or Google, in terms of stock growth. There are no barriers to entry in the tools market. Perhaps a company will be the next RedHat, which may be a fine business to be in, but I don't believe that that is the revolutionary wave that investors in big data stocks are looking for.
So what should you do if you want to invest in big data? Buy stock in companies that have the best, biggest, most unique sets of data and/or the most defensible ways of collecting that data. I invested my IRA money into Facebook, which has the biggest and best dataset of human behavior that ever existed. I invest my academic time into scalable data collection projects such as YourMorals, BeyondThePurchase, and ExploringMyReligion, confident that that will lead to the most long-term knowledge. And I invest my professional time into Ranker, which has a scalable process for collecting an opinion graph, that will be essential for the kinds of intelligent applications that big data futurists have been promising us.
Do you want to invest in big data? Generally, you'll get better returns if you invest your money, time, and energy in data, rather than in tools.
- Ravi Iyer
We live in a world where we often have to make categorical decisions. We date someone or we don't. We marry them or we don't. We hire someone or we don't. We pick either the Democrat or the Republican. There is no middle ground.
Unfortunately, the world isn't necessarily organized in that fashion. Few would believe there are such categorical distinctions. Prospective dates have some degree of positive and negative qualities, rather than attributes being merely present or absent. Are people either qualified or not for a job? Most people instead belong along a continuum of professional ability, with some being very qualified (way above being merely adequately qualified) and some people being just below and just above the border of qualification. Politicians aren't uniformly liberal or conservative and we routinely see partisans on both sides upset at those who aren't extreme enough and who toe the partisan line.
This may seem obvious, but the reason I bring it up now is that while most everyone would agree with this fact, when thought about more carefully, still many people continue to argue as if things are categorical. There are two recent examples on the yourmorals blog.
First, the comment section of this post has become a debate (for many) over whether psychology is objective (science) or subjective (art). Allow me to quote Gene, from this thread:
there is SOME objective knowledge that comes from psych research (anything that can be experimentally shown, is predictive, even if only statistically, it has value).
If you want to get really nitty gritty, even physics is not completely “objective”…it’s merely instrumental to understanding objectivity (see here: http://en.wikipedia.org/wiki/Instrumentalism)
Most things are not completely objective or completely subjective, especially where human affect, behavior, and cognition is concerned. Yes, psychology is less objective than physics...but it's more objective than sculpture. If I think that Paul McCartney sings better than I do, is that an objective or a subjective fact? It's objective in so far as a survey of people would detect a very large statistically significant difference between perceptions of our singing. But it's subjective in so far as it may not be true for a particular person (e.g. my wife and my mom).
What complicates things further is that many people who read psychology don't really care about what happens to most people, but rather how the research applies to them. Consider this very useful overview of how changing our consumption patterns can make people happier. One of the recommendations is something that I tell people often, that experiences lead to more happiness than material things, an opinion shared by 57% of a national sample (and shown to be true for most in experimental research). Yet, 34% of those people disagree (and some don't benefit in experiments). So is the statement that "buying experiences leads to more happiness than buying things" an objective or a subjective fact? It's true for a majority of people, but not for a significant minority. It's likely true for many groups, but certainly not all groups. Yet many people still think we can definitively decide if psychology is objective or subjective, even though humans, unlike inanimate objects, don't react predictably to situations, except perhaps in aggregate (e.g. we have free will or at least the illusion of it). I can find truths that apply to all rocks or all electrons, but not for all humans. But I can find truths that apply to many humans or most humans, and that might give someone insight into themselves, which is a valuable thing.
A second instance of categorical thinking on the yourmorals blog of late is Pigliucci's critique of Haidt's recent SPSP speech. Haidt pointed out that there is underrepresentation of conservatives in social psychology compared to the population and cites both self-selection and discrimination as issues to varying degrees. Many people (understandably) focus on the sexier charge of discrimination, and Pigliucci answered that he "suspect(s) the obvious reason for the “imbalance” of political views in academia is that the low pay, long time before one gets to tenure (if ever), frequent rejection rates from journals and funding agencies, and the necessity to constantly engage one’s critical thinking skills naturally select against conservatives." But what if causality was continuous and not categorical. Pigliucci may be entirely right about his obvious reason, yet there still could be some amount of discrimination. Indeed, if there is one student somewhere whose ideas are supressed (and there was at least one in Haidt's talk), then there is at least some degree of both self-selection and discrimination, meaning that a debate over what statistically causes underrepresentation misses the point. Bear in mind that these are not just data points, but actual human beings. One human being discriminated against is one human being we could serve better, even if the vast majority of under-representation is due to self-selection.
I'm obviously biased in the above debate, but these thoughts are not a response to that debate, but rather a response to almost every debate and decision I see in psychology. Some other things that are continuous, and not categorical:
Journal Publication - Editors have to make categorical decisions to accept or reject papers, yet many papers that are accepted never get cited, while other papers are published through sheer persistence down the chain of journal prominence.
Statistical Significance - A 94.9% chance of being right is not that different than a 95.1% chance of being right, yet it is treated as a categorical distinction called "significance" because we need to be able to say whether something is true or not, when in reality, all we have is some evidence toward the truth, that varies to some degree. Even the best paper does not definitively prove anything and even the worst paper is some evidence toward something.
Authorship - Many people work on papers (often undergraduate research assistants) and are not authors, while others do fairly little and receive authorship. Sometimes the first author does 90% of the work and sometimes they do 51%. Yet they still receive the categorical distinction of first author.
Psychological conditions - Few psychological clinical conditions are categorical. In reality, people have some degree of anxiety, rather than having or not having an anxiety disorder. Yet, for insurance reasons, people have to be diagnosed categorically as having a particular condition.
Psychological constructs - Is shame the same as guilt or different? Is shame the same as sadness? Is shame the same as happiness? The truth is that shame is somewhat like some of these constructs and less like others of these constructs. Categorical distinctions between such constructs are useful for publications, but don't really reflect the continuous nature of the real world.
I am sure that if I thought more, I could come up with many more examples of things that are continuous, but treated as categorical. In academia, perhaps we can eventually change our systems, leveraging technology, to acknowledge the continuous nature of things. My real-world hope, as someone who believes that a world with less conflict is better than a world with more conflict, is that perhaps seeing things as continuous, rather than categorical, means that people will be less likely to make harsh judgments of others based on the idea that their beliefs are the categorical caricatures that we make them out to be.
- Ravi Iyer
Recently, Jon Haidt gave a talk at the main social psychology conference about the statistically impossible lack of diversity in social psychology, meaning that the vast majority of social psychologists are liberal, with a smattering of libertarians or moderates and close to zero self-identified conservatives. This talk was covered in this New York Times article by John Tierney, and it has inspired many social psychologists I know to some degree of introspection about our discipline. It has also led many who read the article to wonder why there are so many liberals in academia. Is it a question of discrimination? Self-selection?
As someone who studies political psychology, I have two main self-serving thoughts. First, findings in political psychology would support the idea that most of this is due to self-selection. We know that liberals score higher on measures like openness to experience, challenging the status quo, enjoying effortful thinking, having existential angst (searching for meaning) and placing a value on stimulation. All of these findings are published and replicated in our YourMorals dataset. These are all traits that can be framed as positive (enjoying new things, wanting to be an agent of change) and negative (disrespecting tradition, being narcissistic) in the 'real world', but are useful in academia. Personally, I could be earning more money and likely doing something more objectively useful, but I like the stimulation of working in the world of ideas and it helps ease my existential angst. This cluster of traits describes some part of most academics I know.
If you see the actual talk (video below), you'll notice that Haidt presumes a fair degree of self-selection and does not set representativeness (e.g. 40% conservatives in the US means we should have 40% in psychology) as a goal, perhaps for this reason.
Still, much of the talk is about discrimination (e.g. the analogy of the closeted homosexual) and so I see why many bloggers might have picked up on the discrimination angle. I am not saying that there is not some peer pressure exacerbated by the assumption that everyone in the room is liberal...but my experience is that self-selection causes that environment more than the reverse. That does not mean it isn't a problem. It is and we should do something about it.
The main problem, from the perspective of someone who wants to understand political attitudes and ideology, is that it's really hard to study something you have no experience with. Imagine what a collective of non-parents would think of parenting from a completely outside perspective. Giving up sleep, friends, leisure, and money for an infant that cannot even smile might seem delusional, which is exactly the way that some psychologists see conservative ideology...as a product of some kind of mental fault. It is only from the inside that sometimes things make more sense.
Those of us who study ideology often have nobody on the inside of conservative movements to help us make sense of them. It is for that reason that I'd love to see more research conducted by conservatives. Conservatives don't just have different perspectives on politics, but also in all sorts of other domains. Until then, I'll have to settle for befriending them wherever I can and plying them with liquor to get their inner thoughts. As a liberal who wants to persuade conservatives, such understanding is essential, unless I simply want to cheerlead amongst people who already agree with me.
In some ways, it's part of a larger problem in psychology where we ask relatively inexperienced (outside of academia) individuals to theorize about the nature of human experience. Business school students are expected to have business experience to get into business school, yet social psychologists often have very limited experience with human social life before investigating it. Given that, is it any wonder that many people feel that memoirs offer as much insight into the human condition as psychology journals? Having a diverse set of experiences and perspectives within political psychology can only make our work that much more interesting.
- Ravi Iyer