Problem retrieving data from Twitter

Ranker Predicts Spurs to beat Cavaliers for 2015 NBA Championship

Reposted from this post on the Ranker Data Blog

The NBA Season starts tonight and building on the proven success of our World Cup and movie box office predictions, as well as the preliminary success of our NFL predictions, Ranker is happy to announce our 2015 NBA Championship Predictions, based upon the aggregated data from basketball fans who have weighed in on our NBA and basketball lists.

Ranker's 2015 NBA Championship Predictions as Compared to ESPN and FiveThirtyEight
Ranker’s 2015 NBA Championship Predictions as Compared to ESPN and FiveThirtyEight

For comparison’s sake, I included the current ESPN power rankings as well as FiveThirtyEight’s teams that have the most percentage chance of winning the championship.  As with any sporting event, chance will play a large role in the outcome, but the premise of producing our predictions regularly is to validate our belief that the aggregated opinions of many will generally outperform expert opinions (ESPN) or models based on non-opinion data (e.g. player performance data plays a large role in FiveThirtyEight’s predictions).  Our ultimate goal is to prove the utility of crowdsourced data, as while something like NBA predictions is a crowded space where many people attempt to answer this question, Ranker produces the world’s only significant data model for equally important questions, such as determining the world’s best DJseveryone’s biggest turn-ons or the best cheeses for a grilled cheese sandwich.

– Ravi Iyer

The post Ranker Predicts Spurs to beat Cavaliers for 2015 NBA Championship appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Characteristics of People who are less Afraid of Ebola

Reposted from this post on the Ranker Data Blog

Ebola is everywhere in the news these days, even as Ebola trails other causes of death by wide margins.  Clearly the risks are great, so some amount of fear is certainly justified, but many have taken it to levels that do not make sense scientifically, making back of the envelope projections for its spread based on anecdotal evidence and/or positing that its only a matter of time before the virus evolves into an airborne disease, as diseases regularly mutate to enable more killing in movies.  Regardless of whether Ebola warrants fear or outright panic, the consensus is that it is scary, as also evidenced by its clear #1 ranking on Ranker‘s Scariest Diseases of All Time list.  Yet, among those who are fearful, I couldn’t help but wonder, what are the characteristics of people who tend to be less afraid than others?  Using the metadata associated with users who voted and reranked this list, in combination with their other activity on the site, here are a few things I found.

– Ebola fear appears to be slightly less prevalent in the Northeast, as compared to other regions of the US.

– Older people tend to be slightly less afraid of Ebola, often expressing more fear of Alzheimer’s.

– International visitors to this list are half as likely to vote for Ebola, as compared to Americans.

– People who are afraid of Ebola are 4.4x as likely to be afraid of Dengue Fever.

– People who are afraid of Strokes, Parkinson’s Disease, Muscular Distrophy, Influenza, and/or Depression are about half as likely to believe that Ebola is one of the world’s scariest diseases.

Bear in mind that these results are based on degree of fear and ALL people are afraid of Ebola.  The fear in some groups is simply less pronounced and only the last 3 results are statistically significant based on classical statistical methods.  There are plausible explanations for all of the above, ranging from the fact that conservative areas of the country are likely more responsive to potential threats, to the fact that losing one’s mind over time to Alzheimer’s really may be much scarier for older people versus a quick death, to the fact that people who are afraid of foreign diseases prevalent in tropical areas likely fear other foreign diseases prevalent in tropical areas.

To me the most interesting fact is that people who are afraid of more common everyday diseases, including Influenza, which kills thousands every year, appear to be less afraid of Ebola than others.  Human beings are wired to be more afraid of the new and spectacular, as much psychological research has shown.  That fear kept many of our ancestors alive, so I wouldn’t dismiss it as wrong.  But it is interesting to observe that perhaps some of us are less wired in this way than others.

– Ravi Iyer

The post Characteristics of People who are less Afraid of Ebola appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Send us Your Academic Papers on Civil Intergroup Relations

Reposted from this post on the Civil Politics Blog

Civil Politics exists to help educate the public on evidence based methods to improve inter-group relations, especially those intractable conflicts that have a moral dimension to them, such as the partisanship that paralyzes US politics.  Part of this effort involves compiling all of the existing evidence that may exist in this domain, so that we can more authoritatively bring this evidence to others who are doing the work on the ground.

Evidence can include many things.  It certainly includes empirical research, both in its published and unpublished form.  It includes examples from the news that echo this research, where people talk about what does or does not lead them toward more or less cooperation vs. animosity across groups.  It includes both the empirical study of the effects of programs that focus on improving inter-group dialogue, as well as the lessons that those who run those programs have learned through years of practice.    The basis of psychometrics and crowdsourcing is the aggregation of results across methods, each of which has it’s own sources of error, with the hope that convergent evidence is reached across methods.  It is the same reason that we want to ask multiple people, ideally with diverse tastes, before passing judgment on a new restaurant or movie, and we hope to bring the same thoughtfulness to the evidence that we present on improving intergroup relations.  The links in this paragraph are examples of how we support the collection and dissemination of each of these types of evidence.

We are currently working on projects that aim to be more systematic about evidence in each of these categories and we could use your help.  Specifically, if you know of academic research that provides evidence for the role of specific variables in increasing or decreasing inter-group civility, please do use this form to provide us with details.  Questions and comments welcome (email me at ravi at civilpolitics dt org) and feel free to provide as much information as you have, even just filling in the first part about specific papers, as we can have others fill in the rest of the information.

IMG_5692

A group of USC students working on CivilPolitics’ academic database.

Please do feel free to forward this blog post to anyone who does research bearing on this question or who knows of such research.  We are also happy to acknowledge your contribution publicly and/or to provide rewards to students who contribute to this project (e.g. travel support to academic conferences) to both incentivize participation and hopefully encourage their interest in this domain.  Thank you for your interest and consideration.

- Ravi Iyer

 

 

 

 

Go to Source

Comments

comments

Ranky Goes to Washington?

Reposted from this post on the Ranker Data Blog

Something pretty cool happened last week here at Ranker, and it had nothing to do with the season premiere of the “Big Bang Theory”, which we’re also really excited about. Cincinnati’s number one digital paper used our widget to create a votable list of ideas mentioned in Cincinnati Mayor John Cranley’s first State of the City. As of right now, 1,958 voters cast 5,586 votes on the list of proposals from Mayor Cranley (not surprisingly, “fixing streets” ranks higher than the “German-style beer garden” that’s apparently also an option).

Now, our widget is used by thousands of websites to either take one of our votable lists or create their own and embed it on their site, but this was the very first time Ranker was used to directly poll people on public policy initiatives.

Here’s why we’re loving this idea: we feel confident that Ranker lists are the most fun and reliable way to poll people at scale about a list of items within a specific context. That’s what we’ve been obsessing about for the past 6 years. But we also think this could lead to a whole new way for people to weigh in in fairly  large numbers on complex public policy issues on an ongoing basis, from municipal budgets to foreign policy. That’s because Ranker is very good at getting a large number of people to cast their opinion about complex issues in ways that can’t be achieved at this scale through regular polling methods (nobody’s going to call you at dinner time to ask you to rank 10 or 20 municipal budget items … and what is “dinner time” these days, anyway?).  It may not be a representative sample, but it may be the only sample that matters, given that the average citizen of Cincinnati will have no idea about the details within the Mayor’s speech and likely will give any opinion simply to move a phone survey conversation along about a topic they know little about.

Of course, the democratic process is the best way to get the best sample (there’s little bias when it’s the whole friggin voting population!) to weigh in on public policy as a whole. But elections are very expensive, infrequent, and the focus of their policy debates is the broadest possible relative to their geographical units, meaning that micro-issues like these will often get lost in same the tired partisan debates.

Meanwhile, society, technology, and the economy no longer operate on cycles consistent with elections cycles: the rate and breadth of societal change is such that the public policy environment specific to an election quickly becomes obsolete, and new issues quickly need sorting out as they emerge, something our increasingly polarized legislative processes have a hard time doing.

Online polls are an imperfect, but necessary, way to evaluate public policy choices on an ongoing basis. Yes, they are susceptible to bias, but good statistical models can overcome a lot of such bias and in a world where the response rates for telephone polls continue to drop, there simply isn’t an alternative.  All polling is becoming a function of statistical modeling applied to imperfect datasets.  Offline polls are also expensive, and that cost is climbing as rapidly as response rates are dropping. A poll with a sample size of 800 can cost anywhere between $25,000 and $50,000 depending on the type of sample and the response rate.  Social media is, well, very approximate. As we’ve covered elsewhere in this blog, social media sentiment is noisy, biased, and overall very difficult to measure accurately.

In comes Ranker. The cost of that Cincinnati.com Ranker widget? $0. Its sample size? Nearly 2,000 people, or anywhere between 2 to 4x the average sample size of current political polls. Ranker is also the best way to get people to quickly and efficiently express a meaningful opinion about a complex set of issues, and we have collected thousands of precise opinions about conceptually complex topics like the scariest diseases and the most important life goals by making providing opinions entertaining within a context that makes simple actions meaningful.

Politics is the art of the possible, and we shouldn’t let the impossibility of perfect survey precision preclude the possibility of using technology to improve civic engagement at scale.  If you are an organization seeking to poll public opinion about a particular set of issues that may work well in a list format, we’d invite you to contact us.

– Ravi Iyer

The post Ranky Goes to Washington? appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

What Abraham Lincoln Would’ve Said About Ranker Comics If He Were Still Alive (And, You Know, a Comics Fan)

Reposted from this post on the Ranker Data Blog

by Ranky (said in the voice of Abraham Lincoln)

Four score and seven days ago, our Ranker fathers brought forth on this continent Ranker Comics, a new and mighty crowdsourced opinion subsite on the entire comic book universe, conceived in fandom, and dedicated to the ludicrous proposition that all opinions on Batman are created equal.

Now we are engaged in a great civil war between DC Comics and Marvel, testing whether that Ranker Comics nation, or any Ranker.com subsite so conceived and so dedicated, can long endure. We are met on a great battlefield of that war: our parents basements. We have come to dedicate a portion of the Ranker site as a final resting place for all 17 fans of Aquaman who here gave their opinions, so that Ranker Comics might live to finally find out who would win in a fight between The Hulk and Superman. It is altogether fitting and proper that we should do this, considering that Ranker.com has 17 million uniques and 6 million votes per month.

But, in a larger sense, we cannot dedicate, we can not consecrate, we can not hallow this ground without the explicit authorization of Mom (it’s her house). The brave Cheetos, Queso and Flaming Hot, who were eaten here, have consecrated it, far above our poor power to add or detract the 317 irritating times Jean Grey died. The world will little note, nor long remember what other comic book websites say, but it will never forget what Ranker Comics did here.

It is for us the stay at home fans, rather, to be dedicated here to the unfinished work of deciding which superhero had the most “emo moment,” a critical issue which they who fought here have thus far so nobly advanced. It is rather for us fans to be here dedicated to the great task remaining before us (besides moving out): that from these honored geek battles we take increased devotion to the DC v. Marvel debates for which they gave the last full measure of devotion—that we here highly resolve that these Ranker Comics voters shall not have voted in vain—that this Ranker Comics nation, under The Living Room, shall have a new birth of fandom—and that comic book opinions of the people, by the people, for the people, shall not perish from the earth.Screen Shot 2014-09-11 at 4.11.01 PM

The post What Abraham Lincoln Would’ve Said About Ranker Comics If He Were Still Alive (And, You Know, a Comics Fan) appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Ranker Predicts Jacksonville Jaguars to have NFL’s worst record in 2014

Reposted from this post on the Ranker Data Blog

Today is the start of the NFL season and building on our success in using crowdsourcing to predict the World Cup, we’d like to release our predictions for the upcoming NFL season.  Using data from our “Which NFL Team Will Have the Worst Record in 2014?” list, which was largely voted on by the community at WalterFootball.com (using a Ranker widget), we would predict the following order of finish, from worst to first.  Unfortunately for fans in Florida, the wisdom of crowds predicts that the Jacksonville Jaguars will finish last this year.

As a point of comparison, I’ll also include predictions from WalterFootball’s Walter Cherepinsky, ESPN (based on power rankings), and Betfair (basted on betting odds for winning the Super Bowl).  Since we are attempting to predict the teams with the worst records in 2014, the worst teams are listed first and the best teams are listed last.


Ranker NFL Worst Team Predictions 2014

The value proposition of Ranker is that we believe that the combined judgments of many individuals is smarter than even the most informed individual experts.  Our predictions were based on over 27,000 votes from 2,900+ fans, taking into account both positive and negative sentiment by combining the raw magnitude of positive votes with the ratio of positive to negative votes.  As research on the wisdom of crowds predicts, the crowd sourced judgments from Ranker should outperform those from the experts.  Of course, there is a lot of luck and randomness that occurs throughout the NFL season, so our results, good or bad, should be taken with a grain of salt.  What is perhaps more interesting is the proposition that crowdsourced data can approximate the results of a betting market like BetFair, for the real value of Ranker data is in predicting things where there is no betting market (e.g. what content should Netflix pursue?).

Stay tuned til the end of the season for results.

– Ravi Iyer

The post Ranker Predicts Jacksonville Jaguars to have NFL’s worst record in 2014 appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Ranker Predicts Jacksonville Jaguars to have NFL’s worst record in 2014

Reposted from this post on the Ranker Data Blog

Today is the start of the NFL season and building on our success in using crowdsourcing to predict the World Cup, we’d like to release our predictions for the upcoming NFL season.  Using data from our “Which NFL Team Will Have the Worst Record in 2014?” list, which was largely voted on by the community at WalterFootball.com (using a Ranker widget), we would predict the following order of finish, from worst to first.  Unfortunately for fans in Florida, the wisdom of crowds predicts that the Jacksonville Jaguars will finish last this year.

As a point of comparison, I’ll also include predictions from WalterFootball’s Walter Cherepinsky, ESPN (based on power rankings), and Betfair (basted on betting odds for winning the Super Bowl).  Since we are attempting to predict the teams with the worst records in 2014, the worst teams are listed first and the best teams are listed last.

Ranker NFL Worst Team Predictions 2014

The value proposition of Ranker is that we believe that the combined judgments of many individuals is smarter than even the most informed individual experts.  As such, the crowd sourced judgments from Ranker should outperform those from the experts.  Of course, there is a lot of luck and randomness that occurs throughout the NFL season, so our results, good or bad, should be taken with a grain of salt.  What is perhaps more interesting is the proposition that crowdsourced data can approximate the results of a betting market like BetFair, for the real value of Ranker data is in predicting things where there is no betting market (e.g. what content should Netflix pursue?).

Stay tuned til the end of the season for results.

- Ravi Iyer

Go to Source

Comments

comments

Why Ranker Data is Better than Facebook’s and Twitter’s

Reposted from this post on the Ranker Data Blog

 By Clark Benson (CEO, Ranker)

It’s unlikely you’ll be pouring freezing water over your head for it, but the marketing world is experiencing its own Peak Oil crisis.

Yes, you read correctly: we don’t have enough data. At least not enough good data.

Pull up to any marketing RSS and you’ll read the same story: the world is awash in golden insights, companies are able to “know” their customers in real time and predict more and better about their own market … blablabla.

Here’s what you won’t read: it’s really, really hard. And it’s getting harder, for the simple reason that we are all positively drenched in … overwhelmingly bad data. Noisy, incomplete, out of context, approximate, downright misleading data. “Big Data” = (Mostly) Bad Data as it tends to draw explicit behavior from implicit and noisy sources like social media or web visits.

Traditional market research methods are getting less reliable due to dropping response rates, especially among young, tech-savvy consumers. To counteract this trend, marketing research firms have hired hundreds of PhDs to refine the math in their models and try to build a better picture of the zeitgeist, leveraging social media and implicit web behavior. This has proven to be a dangerous proposition, as modeling and research firms have fallen prey to statistics’ number one rule: garbage in, garbage out.

No amount of genius mathematical skills can fix Bad Data, and simple statistical models on well measured data will trump extensive algorithms on badly measured data every single time. Sophisticated statistical models might help in political polling, where people are far more predictable based on party and demographics, but they won’t do anything to help traditional marketing research, where people’s tastes and positions are less entrenched and evolve more rapidly.

Parsing the exact sentiment behind a “like”, a follow or a natural language tweet is extremely difficult, as analysts often lack control over the sample population they are covering, as well as any context about why the action occurred, and what behavior or opinion triggered it. Since there is no negative sentiment to use as control, there is no aibility to unconfound good with popular. Natural language processing algorithms can’t sort out sarcasm, which reigns supreme on social media, and even the best algorithms can’t reliably categorize the sentiment of more than 50% of Twitter’s volume of posts. Others have pointed out the issues with developing a more than razor-thin understanding of consumer mindsets and preferences based on social media data. What does a Facebook “Like” mean, exactly? If you “like” Coca-Cola on Facebook, does it mean that you like the product or the company? And does it necessarily mean you don’t like Pepsi? And what is a “like” worth? Nobody knows.

This is where we come in. We at Ranker have developed a very good answer to this issue: the “opinion graph”, which is a more precise version of the “interest graph” that advertisers are currently using.

Ranker is a popular (top 200 website, 18 million unique visitors and 300 million pageviews per month) that crowdsources answers to questions, using the popular list format.  Visitors to Ranker can view, rank and vote items on around 400,000 lists. Unlike more ambiguous data points based on Facebook likes or twitter tweets, Ranker solicits precise and explicit opinions from users about questions like the most annoying celebrities, the best guilty pleasure movies, the most memorable ad slogansthe top dream colleges, or the best men’s watch brands.

It’s very simple: instead of the vaguely positive act of “liking” a popular actor on Facebook, Ranker visitors cast 8 million votes every month and thus directly express whether they think someone is “hot”, “cool”, one of the “best actors of all-time”, or just one of the “best action stars”. Not only that, they also vote on other lists of items seemingly unrelated to their initial interest: best cars, best beers, most annoying TV shows, etc.

As a result, Ranker has been building since 2008 the world’s largest opinion graph, with 50,000 nodes (topics) and 20 million edges (statistically significant connections between 2 items). Thanks to our massive sample and our rich database of correlations, we can tell you that people who like “Modern Family” are 5x more likely to dine at “Chipotle” than non-fans, or people who like the Nissan 370Z also like oddball comedy movies such as “Napoleon Dynamite” and “Big Lebowski”, and TV shows such as “Dexter” and “Weeds”.

Our exclusive Ranker “FanScope” about the show “Mad Men” lays out this capability in more details below:

Mad Men Data

How good is it? Pretty good. Like “ we predicted the outcome of the World Cup better than Nate Silver’s FiveThirtyEight and Betfair” good.

Our opinion data is also much more precise than Facebook’s, since we not only know that someone who likes Coke is very likely to rank “Jaws” as one of his/her top movies of all time, but we’re able to differentiate between those who like to drink Coke, and those who like Coca-Cola as a company:

jaws chart

We’re also able to differentiate between people who always like Pepsi better than Coke overall, and those who like to drink Coke but just at the movie theater:

  • 47% of Pepsi fans on Ranker vote for (vs. against) Coke on Best Sodas of All Time
  • 65% of Pepsi fans on Ranker vote for (vs. against) Coke on Best Movie Snacks

That’s the kind of specific relationship you can’t get using Facebook data or Twitter messages.

By collecting millions of discrete opinions each month on thousands of diverse topics, Ranker is the only company able to combine internet-level scale (hundreds of thousands surveyed on millions of opinions each month) with market research-level precision (e.g. adjective specific opinions about specific objects in a specific context).

We can poll questions that are too specific (e.g. most memorable slogans) or not lucrative enough (most annoying celebrities) for other pollsters. And we use the same types of mathematical models to address sampling challenges that all pollsters (internet or not internet based) currently have, working with some of the world’s leading academics who study crowdsourcing, such as our Chief Data Scientist Ravi Iyer, and UC Irvine Cognitive Sciences professor Michael Lee.

Our data suggests you won’t be dropping gallons of iced water on your face over it. But if you’re a marketer or an advertiser, we predict it’s likely you will want to pay close attention.

The post Why Ranker Data is Better than Facebook’s and Twitter’s appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Living Room Conversations Builds Trust Across Differences Concerning CA Prison Policy

Reposted from this post on the Civil Politics Blog

At CivilPolitics, one of our service offerings is to help groups that are doing work connecting individuals who may disagree about political and moral issues.  These disagreements do not necessarily have to be about partisanship.  One organization that we work with is Living Room Conversations, a California based non-profit that holds small gatherings co-hosted by individuals who may disagree about a particular issue, in order to conciously foster non-judgmental sharing about potentially contentious issues.    Below is a description from their website, in addition to a short video.

Living Room Conversations are designed to revitalize the art of conversation among people with diverse views and remind us all of the power and beauty of civil discourse. Living Room Conversations enable people to come together through their social networks, as friends and friends of friends to engage in a self-guided conversation about any chosen issue. Typically conversations have self-identified co-hosts who hold differing views. They may be from different ethnic groups, socio-economic backgrounds or political parties. Each co-host invites two of their friends to join the conversation. Participants follow an easy to use format offering a structure and a set of questions for getting acquainted with each other and with each other’s viewpoints on the topic of the conversation.

Living Room Conversations is currently holding conversations around the issue of “realignment” in California, which is designed to alleviate prison overcrowding and where many would like to develop alternatives to jail for non-violent criminals.  Living Room Conversations wanted help understanding the effects of their program so we worked with them to develop a survey appropriate for their audience, asking people about their attitudes before and after conversations.  Informed by work in psychology, we looked at how reasonable, intelligent, well-intentioned, and trustworthy people felt after these meetings, especially toward people on the opposite side of the issue, compared to how they felt before the meeting.  Results, based on a 7-point scale, are plotted below.

LivingRoomConversationsTrust1

The fact that all scores are greater than zero means that people felt that individuals who disagreed with them on these issues were more reasonable, intelligent, well-intentioned, and trustworthy compared to how they felt before the conversation (though with a sample size of only 23 individuals so far, only the increase in trustworthiness is statistically significant).

There was still a stark difference between how people felt about those who disagreed on these issues compared to how they felt about people who they agreed with, as respondents both before and after the event felt that those they agreed with were more likely to be reasonable, intelligent, well-intentioned, and trustworthy.  As well, we asked people about their attitudes about realignment policy and people’s attitudes about the issue didn’t change.  However, civility, as we define it, is not the absence of disagreement, but rather being able to disagree in a civil way that respects the intentions of others.

Moreover, even if people’s minds hadn’t changed with respect to others, individuals felt strongly (8+ on a 10 point scale) that talking with others that hold different views is valuable.  Research on the effects of such positive contact would indicate that if these individuals do follow through on this course, they will likely end up building on these attitudinal gains toward those who disagree.  Given that, these conversations appear to be a step in the right direction.

- Ravi Iyer

Go to Source

Comments

comments

The Value of Opinion Datasets – Twitter vs. Facebook vs. Ranker vs. Market Research vs. ?

As Ranker’s Chief Data Scientist, I’ve been doing a lot of thinking of late about how much a given opinion dataset is worth.  I’m not talking about the value of a specific dataset to answer a specific question, as that varies so wildly depending on the question, but rather I’d like to consider broad datasets/categories of data that promise to satisfy the world’s thirst for opinion answers.  The existence of sites like Quora and Yahoo Answers, as well as the move by many search engines to move from providing links to pages to directly answering questions, highlights the need for such data, as does the growing demand for opinion queries.  The future of services like Siri, Cortana, and Google Now is one where one’s questions about what to buy for one’s wife, where to eat, and what to watch on TV are answered directly  and to do that well, one needs the data to answer those question.  Are the world’s data collection methodologies up to the task?

One reason I ask this is that there seems to be a misconception that large amounts of data can answer anything.  I’m a huge believer in reform in academia, but one thing my traditional academic peer-review oriented training has given me is an appreciation for that not being true.  Knowing which universities have more follows, likes, or mentions isn’t going to tell you which one has the best reputation.  Still, there certainly are advantages to scale, as well as depth.  The math behind both psychometrics and crowdsourcing tells me that no one dataset is likely to have the ultimate answers as all data has error and aggregating across that error, which is Nate Silver’s claim to fame, almost always produces the best answer.  So as I consider the below datasets, the true answer as to which you should use is “all of the above”.  That being said, I think it is helpful (at least for organizing my thinking) to consider the specific dimensions that each dataset does best.

Below I consider prominent datasets along four dimensions: sampling, scale, breadth, and measurement.  Sampling refers to how diverse and representative a set of users is used to answer a question.  Note that this isn’t essential in all cases (drug research that has life/death implications is almost always done on samples that are extremely limited/biased), and perfect sampling is almost impossible these days such that even the best political polls rely on mathematical models to “fix” the results.  Such modeling requires Scale, which is important in that it helps one find non-linear patterns in data and prevents spurious conclusions from being reached.  Related to that is Breadth as large datasets also tend to answer larger amounts of questions.  Anyone can spend the money on the definitive study of a single question at great expense, but that won’t help us for the many niche questions that exist (e.g. what is the best Barry Manilow song that I can play to woo my date?  What new TV show would my daughter like, given that she loves Elmo?).  Measurement might be the most important dimension of them all, as one can’t answer questions that one doesn’t ask well.

How do the most prominent datasets in the world fare along these dimensions?

Twitter – Sampling: C, Scale: B+, Breadth: A, Measurement: C

Twitter is great for breadth, which can be thought of not only in terms of the things talked about, which are infinite on Twitter, but also in terms of the range of emotions (e.g. the difference between awesome and cool can potentially be parsed).  There is also a lot of scale.  Unfortunately, Twitter users are hardly representative and people who tweet represent a specific group.  Measurement is very hard on Twitter as well, as there is very little context to a tweet, so one can’t tell if something is really popular or just highly promoted.  As well, natural language will always have ambiguity, especially in 140 characters (e.g. consider how many interpretations there are for a sentence like “we saw her duck”).

Facebook – Sampling: B, Scale A, Breadth, B, Measurement: D

Facebook is ubiquitous and reaches a far more diverse audience than Twitter.  People provide data on all sorts of things about all sorts of topics too.  I bought their stock because I think their data is absolutely great and still do.  Still, the ambiguity of a “like” (combined with the haphazard and ambiguous nature of relying on status updates) will mean that there will always be questions (e.g. how hated is something?  what do I think of a companies individual products?  is this movie overrated?) that can’t be answered with Facebook.

Behavioral Targeting – Sampling: B-, Scale: A, Breadth C, Measurement D

Companies like Doubleclick (owned by Google) and Gravity track your web behavior and attempt to interpret information about you based on what you do online.  They can therefore infer relationships between almost anything on the web (e.g. Mad Men interests) based on web pages having common visitors.  Yet, the use of vague terms like “interest” highlight the fact that these relationships are highly suspect.  Anyone who has looked up what these companies think they know about them can clearly see that the error rates are fairly high, which makes sense when you consider the diverse reasons we all have for visiting any website.  This type of data has proven utility for improving ad response across large groups, where the laws of large numbers means that some benefit will occur in using this data.  But I wouldn’t want to rely on it to truly understand public opinion.

Market Research – Sampling B, Scale, D, Breadth, D, Measurement A

Market research companies like Nielsen and GFK spend a lot of money to ask the right questions to the right people.    Measurement is clearly a strength as market research companies can provide context and nuance to responses as needed, asking about specific opinions about specific items in the context of other similar items.  Yet, given that only ~10% of people will answer surveys when called, even the best sampling that money can buy will be imperfect.  Moreover, these methods do not scale, given the cost, and can only cover questions that clients will pay for, such that there is no way that such methods can power the diverse queries that will go to tomorrow’s answer engines.

Ranker - Sampling B-, Scale B-, Breadth B+, Measurement A

I work at Ranker largely because I believe in the power of our platform to uniquely answer questions, even if we don’t have the scale of larger sites like Twitter and Facebook…yet (we are growing and are among the top 200 websites now, per Quantcast).  Our sample is imperfect, as are all samples, including pollsters like Gallup, but our sample is generally representative of the internet given that we get lots of traffic from search, so we can model our audience in the same ways that companies like YouGov and Google Consumer Surveys do.  The strength of our platform is in our ability to answer a broad number of specific questions explicitly and with the context of alternative choices using the list format.  Users can specifically say whether they think (or disagree) that Breaking Bad is great for binge watching, that Kanye West is a douchebag, that being smart is an important life goal, or that intelligence is a turn-on, while also considering other options that they may not have considered.

In summary, no dataset gets “A”s across the board and if I were running a company like Proctor and Gamble and needed to understand public opinion, I would use all of these methods and triangulate amongst them, as there is something to be uniquely learned from each.  That being said, I agree with Nate Silver’s suggestion to Put Data Fidelity First, and am excited that Ranker continues to collect precise, explicit answers to increasingly diverse questions (e.g. the best Doritos flavors). We are the only company that combines the precision of market research with the scale of internet polling methods, and so I’m hopeful, as our traffic continues to grow, that the value of our opinion data will continue to grow with it.

- Ravi Iyer

ps. I welcome disagreement and thoughtful discussion as I’m certain I have something to learn from others here and that there are things I could be missing.

 

 

Comments

comments