Problem retrieving data from Twitter

What Psychologists Could Learn From Data Science About Exporatory Research

I recently attended the main conference for social psychologists, even as I’m slowly transitioning to think of myself less as an academic and more as a data scientist.  Of course, the term data science is a pretty poor term as all science has to do with data, but I think it serves a purpose in that there are methods for answering questions with data that operate across the domain where the data was collected.  There is no real reason why a person well trained in understanding and analyzing data can’t apply their techniques on medical data, sports data, psychological data, and online data.  In fact, research on the wisdom of crowds would suggest that any discipline would benefit from analyzing data in different ways as colleagues are likely to make correlated errors concerning understanding anything.  This is certainly true in social psychology, where a common error that has been made is the under-valuing of exploratory research.

To our credit, social psychologists are beginning to understand this.  Many years after Paul Rozin formally published a great article concerning the need for more diverse ways of researching questions, psychologists are starting to accept the idea that exploratory research has value alongside the experimental methods that are so popular.  Below is a picture from one of several such talks given.

photo

It’s great that psychologists are willing to consider exploratory approaches.  However, I don’t think we necessarily need to pretend like we are starting from scratch.  It seems like many psychologists want to simply let people fiddle with data in the haphazard ways they have been doing, label it exploratory, and then get on with “real” (confirmatory) research.  This is an area where data science, with it’s emphasis on how to automatically, efficiently extract well-supported insights from large datasets, has a big head start.  What can data science offer psychologists?

- More efficient exploration.  Running haphazard regressions til you find a good model is inefficient for a number of reasons.  It takes a lot of human effort and then when you do find something, you have no real way to reproduce the algorithm that you used to find the result you did on a subsequent dataset.  To put it in more practical terms, every psychologist who wants to run exploratory regressions should at least understand GLMnet (details of which I’ll put in a future post).

-  Cross-validated exploration.  Data scientists have given a lot of thought to questions of how to be more sure that a result is true, when one is testing so many hypotheses that one is bound to find something by chance.  Cross-validation is not a cure-all, but then again, nor are relatively artificial lab studies.  Certainly a cross-validated exploratory finding is more likely to be true than a non-cross-validated exploratory finding.  Broadly, just as some experiments are greater evidence than less well-designed experiments, so too are some exploratory findings greater evidence than other explorations.  Of course, this last sentence will completely confound those who insist that publications can only publish “true” findings that are supported by p<.05 statistics, which leads me to my last point.

- Bayesian models of findings.   There was a ton of talk about the problem of false positives, but the entrenched interests of the journal system (IMHO) inhibit the paradigm shift that is needed, which is to think of findings and papers as evidence as opposed to truth.  Good publications are not true…they are merely stronger evidence.  And rejected publications are rarely worthless.  Rather, they may be weaker evidence or may not affect prior beliefs to quite the same degree.   Setting a high bar for publication is great for creating a tournament for job seekers.  But it’s a terrible way to find truth in an age where data and research is ubiquitous.  If you want to read a more detailed argument about this, I’d read Nate Silver’s Book.

There are some things that social psychologists are really good at.  They understand experimental methods and can critique them really well.  They understand measurement much better than most disciplines.  But there are some things that other disciplines do much better with data, such as exploration.  The banner of data science presents the opportunity to break down these barriers, so that the social psychologist can help the Google engineer design the perfect study to validate the results of their latest machine learning algorithm, while the political scientist helps the social psychologist with representative sampling and the Google engineer helps the political scientist explore the latest national survey in a far more efficient way and then mash up that data with more ecologically valid social media behavior.  And so, the end result is that there really isn’t a huge need for disciplinarity in an age of big data (which was a theme of Jamie Pennebaker’s presidential address at SPSP).  It actually gets in the way of us all being data scientists.

- Ravi Iyer

Comments

comments

Ranker’s Rankings API Now in Beta

Reposted from this post on the the Ranker Data Blog

Increasingly, people are looking for specific answers to questions as opposed to webpages that happen to match the text they type into a search engine.  For example, if you search for the capital of France or the birthdate of Leonardo Da Vinci, you get a specific answer.  However, the questions that people ask are increasingly about opinions, not facts, as people are understandably more interested in what the best movie of 2013 was, as opposed to who the producer for Star Trek: Into Darkness was.

Enter Ranker’s Rankings API, which is currently now in beta, as we’d love the input of potential users’ of our API to help improve it.  Our API returns aggregated opinions about specific movies, people, tv shows, places, etc.  As an input, we can take a Wikipedia, Freebase, or Ranker ID.  The request needs to be made to http://api.ranker.com/rankings/ with “type” (e.g. FREEBASE, WIKIPEDIA, or RANKER, depending on the type of ID sent) and “id” (the specific wikipedia, freebase or Ranker ID) sent in the URL request, and our API returns JSON by default. For example, below are requests for information about Tom Cruise, using each of these IDs.

http://api.ranker.com/rankings/?id=/m/07r1h&type=FREEBASE
http://api.ranker.com/rankings/?id=2257588&type=RANKER
http://api.ranker.com/rankings/?id=31460&type=WIKIPEDIA (look for wgArticleId in the source of any wikipedia page to get a wikipedia id)

In the response to this request, you’ll get a set of Rankings for the requested object, including a set of list names (e.g. “listName”:”The Greatest 80s Teen Stars”), list urls (e.g. “listUrl”:”http://www.ranker.com/crowdranked-list/45-greatest-80_s-teen-stars” - note that the domain, www.ranker.com, is implied), item names (e.g. “itemName”:”Tom Cruise”) position of the item on this list (e.g. “position”:21), number of items on the list (e.g. “numItemsOnList”:70), the number of people who have voted on this list (e.g. “numVoters”:1149), the number of positive votes for this item (e.g. “numUpVotes”:245) vs. the number of negative votes (e.g. “numDownVotes”:169), and the Ranker list id (e.g. “listId”:584305).  Note that results are cached so they may not match the current page exactly.

Here is a snipped of the response for Tom Cruise.

[ { "itemName" : "Tom Cruise",
"listId" : 346881,
"listName" : "The Greatest Film Actors & Actresses of All Time",
"listUrl" : "http://www.ranker.com/crowdranked-list/the-greatest-film-actors-and-actresses-of-all-time",
"numDownVotes" : 306,
"numItemsOnList" : 524,
"numUpVotes" : 285,
"numVoters" : 5305,
"position" : 85
},
{ "itemName" : "Tom Cruise",
"listId" : 542455,
"listName" : "The Hottest Male Celebrities",
"listUrl" : "http://www.ranker.com/crowdranked-list/hottest-male-celebrities",
"numDownVotes" : 175,
"numItemsOnList" : 171,
"numUpVotes" : 86,
"numVoters" : 1937,
"position" : 63
},
{ "itemName" : "Tom Cruise",
"listId" : 679173,
"listName" : "The Best Actors in Film History",
"listUrl" : "http://www.ranker.com/crowdranked-list/best-actors",
"numDownVotes" : 151,
"numItemsOnList" : 272,
"numUpVotes" : 124,
"numVoters" : 1507,
"position" : 102
}

...CLIPPED....
]

What can you do with this API?  Consider this page about Tom Cruise from Google’s Knowledge Graph.  It tells you his children, his spouse(s), and his movies.  But our API will tell you that he is one of the hottest male celebrities, an annoying A-List actor, an action star, a short actor, and an 80s teen star.  His name comes up in discussions of great actors, but he tends to get more downvotes than upvotes on such lists, and even shows up on lists of “overrated” actors.

We can provide this information, not just about actors, but also about politicians, books, places, movies, tv shows, bands, athletes, colleges, brands, food, beer, and more.  We will tend to have more information about entertainment related categories, for now, but as the domains of our lists grow, so too will the breadth of opinion related information available from our API.

Our API is free and no registration is required, though we would request that you provide links and attributions to the Ranker lists that provide this data.  We likely will add some free registration at some point.  There are currently no formal rate limits, though there are obviously practical limits so please contact us if you plan to use the API heavily as we may need to make changes to accommodate such usage.  Please do let me know (ravi a t ranker) your experiences with our API and any suggestions for improvements as we are definitely looking to improve upon our beta offering.

- Ravi Iyer

Reposted from Ranker Data Blog

Go to Source

Comments

comments

Creating Shared Goals Using The Asteroids Club Paradigm

Reposted from this post on the Civil Politics Blog

One of the most general and robust findings in social psychology is the power of situations to shape behavior.  For example, if you are in a situation where you are competing with others, you will tend to dislike them, whereas when you are cooperating with them, you will tend to like them.  This is relatively intuitive, yet we often fail to appreciate this in practice, and then we end up amazed when arbitrary groups put in competition end up in deep conflict.  If artificially created competitions can inflame divisions (e.g. sports fandom usually pits very similar people against each other), perhaps we can also manufacture cooperation to reduce division.
 
Jonathan Haidt (a director of CivilPolitics) conceived of the idea of The Asteroids Club with this in mind and the idea is currently being incubated by To The Village Square, a non-profit dedicated to improving political dialogue.  Below is an excerpt from an op-ed by Haidt in The Tallahassee Democrat:

Partisanship is not a bad thing. We need multiple teams developing multiple competing visions for the voters to choose among. But when our political system loses the ability for national interest to come before party interest, we’ve crossed over into hyper-partisanship. And that’s a very bad thing, because it paralyzes us in the face of so many impending threats.

What can we do about this? How can we free ourselves and our leaders from hyper-partisanship, and return to plain old partisanship? By joining the Asteroids Club! It’s a club for all Americans who are willing to grant that the other side sees some real threats more acutely than their own side does. It’s a concept developed with Tallahassee’s Village Square, which is hosting a series of Asteroids Club Dinner at the Square programs this year.

Asteroids Clubs would never hold debates. Debates often increase polarization. Rather, a local Asteroids Club would hold telescope parties in which members help each other to see approaching asteroids — one from each side — that they hadn’t really noticed before. Telescope parties would harness the awesome power of reciprocity. If we acknowledge your asteroid, will you acknowledge ours?

So come on, people! Dozens of asteroids are closer to impact than they were yesterday. Don’t wait for Washington to fix itself. Let’s just start working together, and if we can do it, it will be easier for Washington to follow our example. The alternative is for us to follow theirs.

If you are in the Tallahassee area, consider joining the event on Tuesday, January 14, 2014 from 5:30 to 7:30pm (more info at www.tothevillagesquare.org).  At Civil Politics, we plan to both support the work of such groups, by giving them access to academic research and to support the work of academics, by giving them access to the findings generated by such real-world events.

- Ravi Iyer

 

Go to Source

Comments

comments

Creating Shared Goals Using The Asteroids Club Paradigm

Reposted from this post on the Civil Politics Blog

One of the most general and robust findings in social psychology is the power of situations to shape behavior.  For example, if you are in a situation where you are competing with others, you will tend to dislike them, whereas when you are cooperating with them, you will tend to like them.  This is relatively intuitive, yet we often fail to appreciate this in practice, and then we end up amazed when arbitrary groups put in competition end up in deep conflict.  If artificially created competitions can inflame divisions (e.g. sports fandom usually pits very similar people against each other), perhaps we can also manufacture cooperation to reduce division.
 
Jonathan Haidt (a director of CivilPolitics) conceived of the idea of The Asteroids Club with this in mind and the idea is currently being incubated by To The Village Square, a non-profit dedicated to improving political dialogue.  Below is an excerpt from an op-ed by Haidt in The Tallahassee Democrat:

Partisanship is not a bad thing. We need multiple teams developing multiple competing visions for the voters to choose among. But when our political system loses the ability for national interest to come before party interest, we’ve crossed over into hyper-partisanship. And that’s a very bad thing, because it paralyzes us in the face of so many impending threats.

What can we do about this? How can we free ourselves and our leaders from hyper-partisanship, and return to plain old partisanship? By joining the Asteroids Club! It’s a club for all Americans who are willing to grant that the other side sees some real threats more acutely than their own side does. It’s a concept developed with Tallahassee’s Village Square, which is hosting a series of Asteroids Club Dinner at the Square programs this year.

Asteroids Clubs would never hold debates. Debates often increase polarization. Rather, a local Asteroids Club would hold telescope parties in which members help each other to see approaching asteroids — one from each side — that they hadn’t really noticed before. Telescope parties would harness the awesome power of reciprocity. If we acknowledge your asteroid, will you acknowledge ours?

So come on, people! Dozens of asteroids are closer to impact than they were yesterday. Don’t wait for Washington to fix itself. Let’s just start working together, and if we can do it, it will be easier for Washington to follow our example. The alternative is for us to follow theirs.

If you are in the Tallahassee area, consider joining the event on Tuesday, January 14, 2014 from 5:30 to 7:30pm (more info at www.tothevillagesquare.org).  At Civil Politics, we plan to both support the work of such groups, by giving them access to academic research and to support the work of academics, by giving them access to the findings generated by such real-world events.

- Ravi Iyer

 

Go to Source

Comments

comments

How Netflix’s AltGenre Movie Grammar Illustrates the Future of Search Personalization

Reposted from this post on the the Ranker Data Blog

I recently got sent this Atlantic article on how Netflix reverse engineered Hollywood by a few contacts, and it happens to mirror my long term vision for how Ranker’s data fits into the future of search personalization.  Netflix’s goal, to put “the right title in front of the right person at the right time,” is very similar to what Apple, Bing, Google, and Facebook are attempting to do with regards to personalized contextual search.  Rather than you having to type in “best kitchen gadgets for mothers”, applications like Google Now and Cue (bought by Apple) hope to eventually be able to surface this information to you in real time, knowing not only when your mother’s birthday is, but also that you tend to buy kitchen gadgets for her, and knowing what the best rated kitchen gadgets that aren’t too complex and are in your price range happen to be.  If the application was good enough, a lot of us would trust it to simply charge our credit card and send the right gift.  But obviously we are a long way from that reality.

Netflix’s altgenre movie grammar (e.g. Irreverent Werewolf Movies Of The 1960s) gives us a glimpse of the level of specificity that would be required to get us there.  Consider what you need to know to buy the right gift for your mom.  You aren’t just looking for a kitchen gadget, but one with specific attributes.  In altgenre terminology, you might be looking for “best simple, beautifully designed kitchen gadgets of 2014 that cost between $25 and $100″ or “best kitchen gadgets for vegetarian technophobes”.  Google knows that simple text matching is not going to get it the level of precision necessary to provide such answers, which is why semantic search, where the precise meaning of pages is mapped, has become a strategic priority.

However, the universe of altgenre equivalents in the non-movie world is nearly endless (e.g. Netflix has thousands of ways just to classify movies), which is where Ranker comes in, as one of the world’s largest sources for collecting explicit cross-domain altgenre-like opinions.  Semantic data from sources like wikipedia, dbpedia, and freebase can help you put together factual altgenres like “of the 60s” or “that starred Brad Pitt“, but you need opinion ratings to put together subtler data like “guilty pleasures” or “toughest movie badasses“.  Netflix’s success is proof of the power of this level of specificity in personalizing movies and consider how they produced this knowledge.  Not through running machine learning algorithms on their endless stream of user behavior data, but rather by soliciting explicit ratings along these dimensions by paying “people to watch films and tag them with all kinds of metadata” using a “36-page training document that teaches them how to rate movies on their suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.”  Some people may think that with enough data, TripAdvisor should be able to tell you which cities are “cool”, but big data is not always better data.  Most data scientists will tell you the importance of defining the features in any recommendation task (see this article for technical detail on this), rather than assuming that a large amount of data will reveal all of the right dimensions.  The wrong level of abstraction can make prediction akin to trying to predict who will win the superbowl by knowing the precise position and status of every cell in every player on every NFL team.  Netflix’s system allows them to make predictions at the right level of abstraction.

The future of search needs a Netflix grammar that goes beyond movies.  It needs to able to understand not only which movies are dark versus gritty, but also which cities are better babymoon destinations versus party cities and which rock singers are great vocalists versus great frontmen.  Ranker lists actually have a similar grammar to Netflix movies, except that we apply this grammar beyond the movie domain.  In a subsequent post, I’ll go into more detail about this, but suffice it to say for now that I’m hopeful that our data will eventually play a similar role in the personalization of non-movie content that Netflix’s microtagging plays in film recommendations.

- Ravi Iyer

 

Reposted from Ranker Data Blog

Go to Source

Comments

comments

Murray-Ryan Budget Deal Illustrates the Importance of Good Personal Relationships

Reposted from this post on the Civil Politics Blog

One of the reasons that we feel that politics has gotten more uncivil is that the relationships that used to bind partisans across parties have frayed.  Partisans of the past seemed to know how to compete for their policy priorities while still remaining cordial to each other.  It is no longer enough to question a politician's policies and we now question their motivation and character.  Social psychology research shows that it is much harder to cooperate with others when we do not have positive contact with them.

Of course, research in a lab may not map onto real world situations so it is important to note when real world examples confirm what is suggested in research.  Recently, Patty Murray and Paul Ryan, leaders of their respective parties were able to put together a bi-partisan budget deal that will ostensibly remove the threat of government shutdowns for two full years.  According to this Politico article, some amount of the credit for this deal can be given to the relatively warm personal relationship between Murray and Ryan.

Fresh off the campaign trail last year, Ryan and Murray sat down for breakfast in the Senate dining room last December, talking about their upbringings, their churches (both are Roman Catholic), two families and two states. They found more in common than they thought, Murray said.

“I had no idea what to know about this guy,” Murray said. “He ran for vice president, he was a political figure, he walked in, and we had a really good conversation about it, about his family, my family — about who we are. Honestly, his state was kind of compatible with mine — unless you talk about football.”

Ryan praised Murray on Thursday evening, calling her a “delight” and saying the talks were “very tough, very honest … but we kept our emotions in check and we kept working at it.”

 

Given the convergence of evidence from both social science research and real world examples, groups and individuals who wish to reduce inter-group conflict would be well served to consider how to increase positive relationships across groups.  

- Ravi Iyer

Go to Source

Comments

comments

Murray-Ryan Budget Deal Illustrates the Importance of Good Personal Relationships

Reposted from this post on the Civil Politics Blog

One of the reasons that we feel that politics has gotten more uncivil is that the relationships that used to bind partisans across parties have frayed.  Partisans of the past seemed to know how to compete for their policy priorities while still remaining cordial to each other.  It is no longer enough to question a politician's policies and we now question their motivation and character.  Social psychology research shows that it is much harder to cooperate with others when we do not have positive contact with them.

Of course, research in a lab may not map onto real world situations so it is important to note when real world examples confirm what is suggested in research.  Recently, Patty Murray and Paul Ryan, leaders of their respective parties were able to put together a bi-partisan budget deal that will ostensibly remove the threat of government shutdowns for two full years.  According to this Politico article, some amount of the credit for this deal can be given to the relatively warm personal relationship between Murray and Ryan.

Fresh off the campaign trail last year, Ryan and Murray sat down for breakfast in the Senate dining room last December, talking about their upbringings, their churches (both are Roman Catholic), two families and two states. They found more in common than they thought, Murray said.

“I had no idea what to know about this guy,” Murray said. “He ran for vice president, he was a political figure, he walked in, and we had a really good conversation about it, about his family, my family — about who we are. Honestly, his state was kind of compatible with mine — unless you talk about football.”

Ryan praised Murray on Thursday evening, calling her a “delight” and saying the talks were “very tough, very honest … but we kept our emotions in check and we kept working at it.”

 

Given the convergence of evidence from both social science research and real world examples, groups and individuals who wish to reduce inter-group conflict would be well served to consider how to increase positive relationships across groups.  

- Ravi Iyer

Go to Source

Comments

comments

Evidence Based Techniques for Transcending Political Divisions: Newt Gingrich Praising Nelson Mandela

Reposted from this post on the Civil Politics Blog

Human beings are the only ultra-social species (e.g. we gather and cooperate in groups of thousands and millions) where there is not a common reproductive source (e.g. a queen bee or queen ant).  The trick that allows human beings to form such large scale groups is in our moral motivations, which enable us to suppress individualistic goals in service of the group.  This trick is powerful and has a dark side, whereby we can demonize and reflexively oppose anything that benefits the other group.

This phenomenon was evident following the recent passing of Nelson Mandela, who generally is more likely to be cited as a role model by liberals and minorities.   For example, some members of the conservative base reacted negatively to praise of Mandela by conservatives like Ted Cruz.  The motivations to deny moral credentials to members of an opposing group are strong, yet psychological research suggests that one can mitigate the effect by positing larger super-ordinate groups with common goals and by demonstrating positive relationships between members of different groups.

Newt Gingrich demonstrated both of these tactics in a recent statement, entitled "What would you have done?"

Some of the people who are most opposed to oppression from Washington attack Mandela when he was opposed to oppression in his own country. [Freedom as a super-ordinate goal across groups ]

When he visited the Congress I was deeply impressed with the charisma and the calmness with which he could dominate a room. It was as if the rest of us grew smaller and he grew stronger and more dominant the longer the meeting continued. [Demonstrating personal attachment ] 
 

Many of the ways to reduce inter-group division that we at Civil Politics wish to highlight are used regularly by politicians with good intuitions who understand moral psychology at an implicit level, without necessarily knowing the social science that supports what they do.  We hope to make these techniques more explicit so that any interested group or individual can use these methods to break down group divisions consciously as well.

- Ravi Iyer

  If you want to hear more on hive psychology, consider watching this video:

Go to Source

Comments

comments

Evidence Based Techniques for Transcending Political Divisions: Newt Gingrich Praising Nelson Mandela

Reposted from this post on the Civil Politics Blog

Human beings are the only ultra-social species (e.g. we gather and cooperate in groups of thousands and millions) where there is not a common reproductive source (e.g. a queen bee or queen ant).  The trick that allows human beings to form such large scale groups is in our moral motivations, which enable us to suppress individualistic goals in service of the group.  This trick is powerful and has a dark side, whereby we can demonize and reflexively oppose anything that benefits the other group.

This phenomenon was evident following the recent passing of Nelson Mandela, who generally is more likely to be cited as a role model by liberals and minorities.   For example, some members of the conservative base reacted negatively to praise of Mandela by conservatives like Ted Cruz.  The motivations to deny moral credentials to members of an opposing group are strong, yet psychological research suggests that one can mitigate the effect by positing larger super-ordinate groups with common goals and by demonstrating positive relationships between members of different groups.

Newt Gingrich demonstrated both of these tactics in a recent statement, entitled "What would you have done?"

Some of the people who are most opposed to oppression from Washington attack Mandela when he was opposed to oppression in his own country. [Freedom as a super-ordinate goal across groups ]

When he visited the Congress I was deeply impressed with the charisma and the calmness with which he could dominate a room. It was as if the rest of us grew smaller and he grew stronger and more dominant the longer the meeting continued. [Demonstrating personal attachment ] 
 

Many of the ways to reduce inter-group division that we at Civil Politics wish to highlight are used regularly by politicians with good intuitions who understand moral psychology at an implicit level, without necessarily knowing the social science that supports what they do.  We hope to make these techniques more explicit so that any interested group or individual can use these methods to break down group divisions consciously as well.

- Ravi Iyer

  If you want to hear more on hive psychology, consider watching this video:

Go to Source

Comments

comments

Why Topsy/Twitter Data may never predict what matters to the rest of us

Reposted from this post on the the Ranker Data Blog

Recently Apple paid a reported $200 million for Topsy and some speculate that the reason for this purchase is to improve recommendations for products consumed using Apple devices, leveraging the data that Topsy has from Twitter.  This makes perfect sense to me, but the utility of Twitter data in predicting what people want is easy to overstate, largely because people often confuse bigger data with better data.  There are at least 2 reasons why there is a fairly hard ceiling on how much Twitter data will ever allow one to predict about what regular people want.

1.  Sampling – Twitter has a ton of data, with daily usage of around 10%.  Sample size isn’t the issue here as there is plenty of data, but rather the people who use Twitter are a very specific set of people.  Even if you correct for demographics, the psychographic of people who want to share their opinion publicly and regularly (far more people have heard of Twitter than actually use it) is way too unique to generalize to the average person, in the same way that surveys of landline users cannot be used to predict what psychographically distinct cellphone users think.

2. Domain Comprehensiveness – The opinions that people share on Twitter are biased by the medium, such that they do not represent the spectrum of things many people care about.  There are tons of opinions on entertainment, pop culture, and links that people want to promote, since they are easy to share quickly, but very little information on people’s important life goals or the qualities we admire most in a person or anything where people’s opinions are likely to be more nuanced.  Even where we have opinions in those domains, they are likely to be skewed by the 140 character limit.

Twitter (and by extension, companies that use their data like Topsy and DataSift) has a treasure trove of information, but people working on next generation recommendations and semantic search should realize that it is a small part of the overall puzzle given the above limitations.  The volume of information gives you a very precise measure of a very specific group of people’s opinions about very specific things, leaving out the vast majority of people’s opinions about the vast majority of things.  When you add in the bias introduced by analyzing 140 character natural language, there is a great deal of variance in recommendations that likely will have to be provided by other sources.

At Ranker, we have similar sampling issues, in that we collect much of our data at Ranker.com, but we are actively broadening our reach through our widget program, that now collects data on thousands of partner sites.  Our ranked list methodology certainly has bias too, which we attempt to mitigate that through combining voting and ranking data.  The key is not in the volume of data, but rather in the diversity of data, which helps mitigate the bias inherent in any particular sampling/data collection method.

Similarly, people using Twitter data would do well to consider issues of data diversity and not be blinded by large numbers of users and data points.  Certainly Twitter is bound to be a part of understanding consumer opinions, but the size of the dataset alone will not guarantee that it will be a central part.  Given these issues, either Twitter will start to diversify the ways that it collects consumer sentiment data or the best semantic search algorithms will eventually use Twitter data as but one narrowly targeted input of many.

- Ravi Iyer

Reposted from Ranker Data Blog

Go to Source

Comments

comments