Problem retrieving data from Twitter

The Value of Opinion Datasets – Twitter vs. Facebook vs. Ranker vs. Market Research vs. ?

As Ranker’s Chief Data Scientist, I’ve been doing a lot of thinking of late about how much a given opinion dataset is worth.  I’m not talking about the value of a specific dataset to answer a specific question, as that varies so wildly depending on the question, but rather I’d like to consider broad datasets/categories of data that promise to satisfy the world’s thirst for opinion answers.  The existence of sites like Quora and Yahoo Answers, as well as the move by many search engines to move from providing links to pages to directly answering questions, highlights the need for such data, as does the growing demand for opinion queries.  The future of services like Siri, Cortana, and Google Now is one where one’s questions about what to buy for one’s wife, where to eat, and what to watch on TV are answered directly  and to do that well, one needs the data to answer those question.  Are the world’s data collection methodologies up to the task?

One reason I ask this is that there seems to be a misconception that large amounts of data can answer anything.  I’m a huge believer in reform in academia, but one thing my traditional academic peer-review oriented training has given me is an appreciation for that not being true.  Knowing which universities have more follows, likes, or mentions isn’t going to tell you which one has the best reputation.  Still, there certainly are advantages to scale, as well as depth.  The math behind both psychometrics and crowdsourcing tells me that no one dataset is likely to have the ultimate answers as all data has error and aggregating across that error, which is Nate Silver’s claim to fame, almost always produces the best answer.  So as I consider the below datasets, the true answer as to which you should use is “all of the above”.  That being said, I think it is helpful (at least for organizing my thinking) to consider the specific dimensions that each dataset does best.

Below I consider prominent datasets along four dimensions: sampling, scale, breadth, and measurement.  Sampling refers to how diverse and representative a set of users is used to answer a question.  Note that this isn’t essential in all cases (drug research that has life/death implications is almost always done on samples that are extremely limited/biased), and perfect sampling is almost impossible these days such that even the best political polls rely on mathematical models to “fix” the results.  Such modeling requires Scale, which is important in that it helps one find non-linear patterns in data and prevents spurious conclusions from being reached.  Related to that is Breadth as large datasets also tend to answer larger amounts of questions.  Anyone can spend the money on the definitive study of a single question at great expense, but that won’t help us for the many niche questions that exist (e.g. what is the best Barry Manilow song that I can play to woo my date?  What new TV show would my daughter like, given that she loves Elmo?).  Measurement might be the most important dimension of them all, as one can’t answer questions that one doesn’t ask well.

How do the most prominent datasets in the world fare along these dimensions?

Twitter – Sampling: C, Scale: B+, Breadth: A, Measurement: C

Twitter is great for breadth, which can be thought of not only in terms of the things talked about, which are infinite on Twitter, but also in terms of the range of emotions (e.g. the difference between awesome and cool can potentially be parsed).  There is also a lot of scale.  Unfortunately, Twitter users are hardly representative and people who tweet represent a specific group.  Measurement is very hard on Twitter as well, as there is very little context to a tweet, so one can’t tell if something is really popular or just highly promoted.  As well, natural language will always have ambiguity, especially in 140 characters (e.g. consider how many interpretations there are for a sentence like “we saw her duck”).

Facebook – Sampling: B, Scale A, Breadth, B, Measurement: D

Facebook is ubiquitous and reaches a far more diverse audience than Twitter.  People provide data on all sorts of things about all sorts of topics too.  I bought their stock because I think their data is absolutely great and still do.  Still, the ambiguity of a “like” (combined with the haphazard and ambiguous nature of relying on status updates) will mean that there will always be questions (e.g. how hated is something?  what do I think of a companies individual products?  is this movie overrated?) that can’t be answered with Facebook.

Behavioral Targeting – Sampling: B-, Scale: A, Breadth C, Measurement D

Companies like Doubleclick (owned by Google) and Gravity track your web behavior and attempt to interpret information about you based on what you do online.  They can therefore infer relationships between almost anything on the web (e.g. Mad Men interests) based on web pages having common visitors.  Yet, the use of vague terms like “interest” highlight the fact that these relationships are highly suspect.  Anyone who has looked up what these companies think they know about them can clearly see that the error rates are fairly high, which makes sense when you consider the diverse reasons we all have for visiting any website.  This type of data has proven utility for improving ad response across large groups, where the laws of large numbers means that some benefit will occur in using this data.  But I wouldn’t want to rely on it to truly understand public opinion.

Market Research – Sampling B, Scale, D, Breadth, D, Measurement A

Market research companies like Nielsen and GFK spend a lot of money to ask the right questions to the right people.    Measurement is clearly a strength as market research companies can provide context and nuance to responses as needed, asking about specific opinions about specific items in the context of other similar items.  Yet, given that only ~10% of people will answer surveys when called, even the best sampling that money can buy will be imperfect.  Moreover, these methods do not scale, given the cost, and can only cover questions that clients will pay for, such that there is no way that such methods can power the diverse queries that will go to tomorrow’s answer engines.

Ranker - Sampling B-, Scale B-, Breadth B+, Measurement A

I work at Ranker largely because I believe in the power of our platform to uniquely answer questions, even if we don’t have the scale of larger sites like Twitter and Facebook…yet (we are growing and are among the top 200 websites now, per Quantcast).  Our sample is imperfect, as are all samples, including pollsters like Gallup, but our sample is generally representative of the internet given that we get lots of traffic from search, so we can model our audience in the same ways that companies like YouGov and Google Consumer Surveys do.  The strength of our platform is in our ability to answer a broad number of specific questions explicitly and with the context of alternative choices using the list format.  Users can specifically say whether they think (or disagree) that Breaking Bad is great for binge watching, that Kanye West is a douchebag, that being smart is an important life goal, or that intelligence is a turn-on, while also considering other options that they may not have considered.

In summary, no dataset gets “A”s across the board and if I were running a company like Proctor and Gamble and needed to understand public opinion, I would use all of these methods and triangulate amongst them, as there is something to be uniquely learned from each.  That being said, I agree with Nate Silver’s suggestion to Put Data Fidelity First, and am excited that Ranker continues to collect precise, explicit answers to increasingly diverse questions (e.g. the best Doritos flavors). We are the only company that combines the precision of market research with the scale of internet polling methods, and so I’m hopeful, as our traffic continues to grow, that the value of our opinion data will continue to grow with it.

- Ravi Iyer

ps. I welcome disagreement and thoughtful discussion as I’m certain I have something to learn from others here and that there are things I could be missing.

 

 

Comments

comments

Happy Inanimate Objects, Dramatic Animals + Gamer Survival Kits

Reposted from this post on the Ranker Data Blog

60+ Everyday Objects That Look Really Happy
What could be nicer than finding a smiley face at the bottom of your coffee mug? How about living in a house that looks super excited to see you every time you come home? This is a simple gallery of pics of everyday things that look like they are happy.

Essential Products For a Gamer Survival Kit
No matter what, gamers gonna game! But for a gamer to play at peak performance levels, there are certain survival essentials for quests, campaigns, and candy crushes. Behold: everything you need to survive an intense gaming sesh.

The Worst Qualities in a Person
Let’s face it: not everyone is perfect. Even the most charming people are guilty of at least a few negative personality traits. But which ones are the worst?

Being a Student Is . . .
Being a student isn’t always as easy as it sounds. Sure, there are the parties, the booze, the moving away from home and living without your parents – but there’s also a whole lot of stress, worry, studying, and exams.

The 36 Most Dramatic Animals on the Internet
Drama isn’t just a human thing. As this list of GIFs proves, the animal kingdom is full of feisty creatures who have a flair for the dramatic.

Pretentious Words You Secretly Don’t Know How to Pronounce
What are some of the most overused, mispronounced words people use to try to sound smarter?

The Internet Remembers Robin Williams
And finally, we lost a beloved comedy legend this month. Robin Williams was the face, voice, and talent of a lot of our childhoods and many people took to social media to share touching memories of him. We’ve rounded up the best tributes and would like to invite fans to vote on their favorite Robin Williams movies. R.I.P. Genie.

That’s it! Stay in touch and we hope you’re having a great month!

The post Happy Inanimate Objects, Dramatic Animals + Gamer Survival Kits appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Political Discrimination as Normative as Racial Divisions once were

Reposted from this post on the Civil Politics Blog

Once upon a time, it was socially normative for society to divide itself along racial lines.  Thankfully, that time has passed and while racism still exists, it is generally considered to be a bad thing by most people in society.  The same trajectory is occurring with respect to attitudes toward homosexuals, with increased acceptance being not only encouraged, but mandated as the right thing to do.  However, in many circles, it remains normative for individuals to discriminate against those with the opposite political views.  Recent research indicates that this occurs amongst both parties.

Despite ample research linking conservatism to discrimination and liberalism to tolerance, both groups may discriminate. In two studies, we investigated whether conservatives and liberals support discrimination against value violators, and whether liberals’ and conservatives’ values distinctly affect discrimination. Results demonstrated that liberals and conservatives supported discrimination against ideologically dissimilar groups, an effect mediated by perceptions of value violations. Liberals were more likely than conservatives to espouse egalitarianism and universalism, which attenuated their discrimination; whereas the conservatives’ value of traditionalism predicted more discrimination, and their value of self-reliance predicted less discrimination. This suggests liberals and conservatives are equally likely to discriminate against value violators, but liberal values may ameliorate discrimination more than conservative values.

In addition, recent research out of Stanford University indicates that “hostile feelings for the opposing party are ingrained or automatic in voters’ minds, and that affective polarization based on party is just as strong as polarization based on race.”  Tackling this at the societal level is a daunting task for anyone, but there are things that one can do at the individual level.  Both research and practice indicates that positive relationships between individuals across such divides are likely to ameliorate such feelings.  Mixing group boundaries are likely to make competition less salient as well, perhaps allowing superordinate goals that we all share to come to the fore, as often happens when national emergencies strike.  Just as with discrimination based on race and sexual orientation, discrimination against opposing ideologies can be combated with similar techniques.

- Ravi Iyer

 

Go to Source

Comments

comments

Ranker World Cup Predictions Outperform Betfair & FiveThirtyEight

Reposted from this post on the Ranker Data Blog

Former England international player turned broadcaster Gary Lineker famously said “Football is a simple game; 22 men chase a ball for 90 minutes and at the end, the Germans always win.” That proved true for the 2014 World Cup, with a late German goal securing a 1-0 win over Argentina.

Towards the end of March, we posted predictions for the final ordering of teams in the World Cup, based on Ranker’s re-ranks and voting data. During the tournament, we posted an update, including comparisons with predictions made by FiveThirtyEight and Betfair. With the dust settled in Brazil (and the fireworks in Berlin shelved), it is time to do a final evaluation.

Our prediction was a little different from many others, in that we tried to predict the entire final ordering of all 32 teams. This is different from sites like Betfair, which provided an ordering in terms of the predicted probability each team would be the overall winner. In order to assess our order against the true final result, we used a standard statistical measure called partial tau. It is basically an error measure — 0 would be a perfect prediction, and the larger the value grows the worse the prediction — based on how many “swaps” of a predicted order need to be made to arrive at the true order. The “partial” part of partial tau allows for the fact that the final result of the tournament is not a strict ordering. While the final and 3rd place play-off determined the order of the first four teams: Germany, Argentina, the Netherlands, and Brazil, other groups of teams are effectively tied from then on.  All of the teams eliminated in the quarter finals can be regarded as having finished in equal fifth place. All of the teams eliminated in the first game past the group stage finished equal sixth. And all of the 32 teams eliminated in group play finished equal last.

The model we used to make our predictions involved three sources of information. The first was the ranks and re-ranks provided by users. The second was the up and down votes provided by users. The third was the bracket structure of the tournament itself. As we emphasized in our original post, the initial group stage structure of the World Cup provides strong constraints on where teams can and cannot finish in the final order. Thus, we were interested to test how our model predictions depended on each sources of information. This lead to a total of 8 separate models

  • Random: Using no information, but just placing all 32 teams in a random order.
  • Bracket: Using no information beyond the bracket structure, placing all the teams in an order that was a possible finish, but treating each game as a coin toss.
  • Rank: Using just the ranking data.
  • Vote: Using just the voting data.
  • Rank+Vote: Using the ranking and voting data, but not the bracket structure.
  • Bracket+Vote: Using the voting data and bracket structure, but not the ranking data.
  • Bracket+Rank: Using the ranking data and bracket structure, but not the voting data.
  • Rank+Vote+Bracket: Using all of the information, as per the predictions made in our March blog post.

We also considered the Betfair and FiveThirtyEight rankings, as well as the Ranker Ultimate List at the start of the tournament, as interesting (but maybe slightly unfair, given their different goals) comparisons. The partial taus for all these predictions, with those based on less information on the left, and those based on more information on the right, are shown in the graph below. Remember, lower is better.

The prediction we made using the votes, ranks, and bracket structure out-performed Betfair, FiveThirtyEight, and the Ranker Ultimate List. This is almost certainly because of the use of the bracket information. Interestingly, just using the ranking and bracket structure information, but not the votes, resulted in a slightly better prediction. It seems as if our modeling needs to improve how it benefits from using both ranking and voting data. The Rank+Vote prediction was worse than either source alone. It is also interesting to note that the Bracket information by itself is not useful — it performs almost as poorly as a random order — but it is powerful when combined with people’s opinions, as the improvement from Rank to Bracket+Rank and from Vote to Bracket+Vote show.

The post Ranker World Cup Predictions Outperform Betfair & FiveThirtyEight appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Selfies at Funerals, Genius Shower Thoughts + Inadvertently Hilarious Kids

Reposted from this post on the Ranker Data Blog

We’ve been having a blast reading about all of the ridiculous and thought provoking things you’ve been ranking this summer. Some of these are too good not to share. Check ‘em out!

49 Stellar Cosplay Costumes From This Year’s Comic Con
Comic Con was awesome this year! The best costumes we saw showcased some serious attention to detail (and blatant disregard for heat and comfort). Check ‘em out and rank accordingly!

50 Incredible Pictures That Just Might Teach You Something
This photo gallery includes pictures of natural phenomena, manmade things, the goings-on inside our own bodies, and tons of other cool sh*t that might even teach you a thing or two. The universe is pretty amazing. Let’s look at it together!

50+ Signs That Will Definitely Make You Giggle
These funny signs range from the whimsical, to the witty, to the downright stupid. In an age where image is everything, you would think people would be more careful with their signage.

34 Things Every Man Should Know. Seriously, Take Notes.
Dudes. Guys. MEN. When it comes to the male gender, there are certain things all guys simply must know. Whether its for your own personal safety or to not look like an idiot in public, take a moment to learn these things.

Kids Who Answered Wrong, But Deserve An “A” For Effort
Kids these days, can’t live with ‘em, can’t teach ‘em anything because the Internet shows them that they can be a smartass instead of submitting real quiz answers.

This is Real: Selfies at Funerals Are Officially a Thing
Of all the occasions to commemorate with a selfie, it would seem funerals aren’t exactly the most appropriate. But these selfie lovers don’t seem to mind. In many occasions, their dearly departed is even in the photo with them!

The Greatest Shower Thoughts Ever Thought
We (most of us) bathe in quiet solitude, with neither friends nor social media to entertain us lest we get our devices wet and ruin them. Amidst all that lathering and rinsing, the mind wanders, and for the duration of each shower, anything is possible.

Yesterday’s Technology With Today’s Prices
Ever wonder how much it would cost to buy a Gameboy if it was released today? Or what people were paying for the privilege of having a cell phone when they first came out? This may stop you from complaining for a while.

The post Selfies at Funerals, Genius Shower Thoughts + Inadvertently Hilarious Kids appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Overcoming The Psychological Barriers to Combining Realism with Idealism

Reposted from this post on the Civil Politics Blog

I was recently forwarded this thoughtful article by Peter Wehner, from Commentary Magazine, that talks about the need for people to appreciate the importance of idealism in striving for policy goals as well as the realism of compromise with others who also have valid parts of the truth.  From the article:

Politics is an inherently messy business. Moreover, the American founders–who developed the concepts of checks and balances, separation of powers, and all the rest–wanted politics to be messy. …

Too often these days, zealous people who are in a hurry don’t appreciate that the process and methods of politics–the “messy,” muddling through side of politics–is a moral achievement of sorts. But this, too, is only part of the story.

The other part of the story is that justice is often advanced by people who are seized with a moral vision. They don’t much care about the prosaic side of governing; they simply want society to be better, more decent, and more respectful of human dignity. So yes, it’s important not to make the perfect the enemy of the good. But it’s also the case that politics requires us to strive for certain (unattainable) ideals….

What happens all too often in our politics is that people who are drawn to one tend to look with disdain on those who are drawn to the other. What we need, I think, is greater recognition that both are necessary, that each one alone is insufficient. Visionaries have to find a way to give their vision concrete expression, which requires deal-making, compromise, and accepting something less than the ideal. Legislators need to govern with some commitment to philosophical and moral ideals; otherwise, they’re just passing laws and cutting deals for their own sake.

Unfortunately, moral conviction is often negatively correlated with appreciating the need for compromise.  How then can we combine realism with idealism?  We here at CivilPolitics are actively supporting research to help understand how to remove these barriers to groups coming together despite moral disagreements and welcome contributions from academics who have good ideas.  Some ideas that have support in the research include improving the personal relationships between groups and introducing super-ordinate goals where moral agreement can occur.  In future months, we’ll be highlighting other recommendations along these lines to help combine realism with idealism.

- Ravi Iyer

 

 

 

Go to Source

Comments

comments

CivilPolitics.org comments on Hollande’s Political Strategy for BBC World

Reposted from this post on the Civil Politics Blog

Earlier today, I appeared on BBC World’s Business Edition to comment on Francois Hollande’s efforts to unite union and business interests in working to improve the lagging French economy.  I provided the same advice that I often do to groups that are looking to leverage the more robust findings from social science in conflict resolution, specifically that rational arguments only get you so far and that real progress is often made when our emotions are pushing us toward progress, as opposed to working against us.  Accordingly, it often is better to try to get the relationships working first, in the hopes that that opens minds for agreement on factual issues.  As well, it is often helpful to emphasize super-ordinate goals, such as improving the economy as a whole in this case, as opposed to competitive goals such as hiring mandates.  Lastly, hopefully Hollande, as a socialist who is fighting for business interests, can help muddy the group boundaries that can make conflicts more intractable, providing an example of someone who is indeed focused on shared goals.

Below is the segment, and my appearance is about 2 minutes into the video.

- Ravi Iyer

Go to Source

Comments

comments

Ridiculously Good Looking Celebs Are Licking Popsicles, It Must Be Summer!

Reposted from this post on the Ranker Data Blog

The Best Internet Reactions to Jeremy Meeks’s Sexy Mug Shot
Sexy mugshot photos reached a new high with the booking of the ridiculously hot felon Jeremy Meeks. If you somehow missed this gem of a story last week, you need to check these out.

Who Will Win The 2014 World Cup?
It’s official: the Ranker office has World Cup fever. And so do all of you apparently! Thousands have voted on who they think will win. You may be surprised at who’s currently on top.

24 Crazy Sexy Photos of Celebrities Eating Popsicles
It’s finally summer! When it’s hot, here’s the best possible way to cool off: with pictures of ridiculously great looking celebrities sucking down on cold, delicious, refreshing popsicles.

What Guys REALLY Talk About On Boys’ Night Out
Hint: it’s sex. Also, sex.

73 Rare Photos From Behind the Scenes of Star Wars
These leaked photos capture images of Yoda before he was finished, the building of the actual, on-set Millennium Falcon and, of course, the entire cast flirting with Princess Leia.

11 Wedding Themes That Are Just a Bad Idea
You’d assume that if someone is that big of a Game of Thrones fan, they’d know what happens at weddings, right?

The 47 Greatest Pun-tastic Restaurant Names
It doesn’t make any sense, but food that comes from restaurants with funny names always tastes better. Pulled pork sandwich from KFC? Gross. Pulled pork sandwich from Forrest Rump? Awesome.

The 26 Craziest 2014 World Cup Hair Cuts
Bonus: here is the single most important World Cup ranking you will see all day.

The post Ridiculously Good Looking Celebs Are Licking Popsicles, It Must Be Summer! appeared first on The Ranker.com Blog.

Go to Source

Comments

comments

Pew Research highlights Social, Political and Moral Polarization among Partisans, but more people are still Moderates

Reposted from this post on the Civil Politics Blog

A recent research study by Pew highlights societal trends that have a lot of people worried about the future of our country.  While many people have highlighted the political polarization that exists and others have pointed to the social and psychological trends underlying that polarization, Pew’s research report is unique for the scope of findings across political, social, and moral attitudes.  Some of the highlights of the report include:

  • Based on a scale of 10 political attitude questions, such as a binary choice between the statements “Government is almost always wasteful and inefficient” and  ”Government often does a better job than people give it credit for”, the median Democrat and median Republicans’ attitudes are further apart than 2004 and 1994.
  • On the above ideological survey, fewer people, whether Democrat, Republican, or independent, are in the middle compared to 1994 and 2004.  Though it is still worth noting that a plurality, 39% are in the middle fifth of the survey.
  • More people on each side see the opposing group as a “threat to the nation’s well being”.
  • Those on the extreme left or on the extreme right are on the ideological survey are more likely to have close friends with and live in a community with people who agree with them.

 

The study is an important snapshot of current society and clearly illustrates that polarization is getting worse, with the social and moral consequences that moral psychology research would predict when attitudes become moralized.  That being said, I think it is important not to lose sight of the below graph from their study.

 

Pew Survey Shows a Shrinking Plurality holds Moderate Views

Pew Survey Shows a Shrinking Plurality holds Moderate Views

 

Specifically, while there certainly is a trend toward moralization and partisanship, the majority of people are in the middle of the above distributions of political attitudes and hold  mixed opinions about political attitudes.  It is important that those of us who study polarization don’t exacerbate perceived differences, as research has shown that perceptions of differences can become reality.  Most Americans (79%!) still fall somewhere between having consistently liberal and consistently conservative attitudes on political issues, according to Pew’s research.  And even amongst those on the ends of this spectrum, 37% of conservatives and 51% of liberals have close friends who disagree with them.  Compromise between parties is still the preference of most of the electorate.  If those of us who hold a mixed set of attitudes can indeed make our views more prominent, thereby reducing the salience of group boundaries, research would suggest that this would indeed mitigate this alarming trend toward social, moral, and political polarization.

- Ravi Iyer

Go to Source

Comments

comments

Comparing World Cup Prediction Algorithms – Ranker vs. FiveThirtyEight

Reposted from this post on the Ranker Data Blog

Like most Americans, I pay attention to soccer/football once every four years.  But I think about prediction almost daily and so this year’s World Cup will be especially interesting to me as I have a dog in this fight.  Specifically, UC-Irvine Professor Michael Lee put together a prediction model based on the combined wisdom of Ranker users who voted on our Who will win the 2014 World Cup list, plus the structure of the tournament itself.  The methodology runs in contrast to the FiveThirtyEight model, which uses entirely different data (national team results plus the results of players who will be playing for the national team in league play) to make predictions.  As such, the battle lines are clearly drawn.  Will the Wisdom of Crowds outperform algorithmic analyses based on match results?  Or a better way of putting it might be that this is a test of whether human beings notice things that aren’t picked up in the box scores and statistics that form the core of FiveThirtyEight’s predictions or sabermetrics.

So who will I be rooting for?  Both methodologies agree that Brazil, Germany, Argentina, and Spain are the teams to beat.  But the crowds believe that those four teams are relatively evenly matched while the FiveThirtyEight statistical model puts Brazil as having a 45% chance to win.  After those first four, the models diverge quite a bit with the crowd picking the Netherlands, Italy, and Portugal amongst the next few (both models agree on Colombia), while the FiveThirtyEight model picks Chile, France, and Uruguay.  Accordingly, I’ll be rooting for the Netherlands, Italy, and Portugal and against Chile, France, and Uruguay.

In truth, the best model would combine the signal from both methodologies, similar to how the Netflix prize was won or how baseball teams combine scout and sabermetric opinions.  I’m pretty sure that Nate Silver would agree that his model would be improved by adding our data (or similar data from betting markets that similarly think that FiveThirtyEight is underrating Italy and Portugal) and vice versa.  Still, even as I know that chance will play a big part in the outcome, I’m hoping Ranker data wins in this year’s world cup.

– Ravi Iyer

Ranker’s Pre-Tournament Predictions:

FiveThirtyEight’s Pre-Tournament Predictions:

The post Comparing World Cup Prediction Algorithms – Ranker vs. FiveThirtyEight appeared first on The Ranker.com Blog.

Go to Source

Comments

comments