22 March 2012

Credibility

'Tweeting is Believing? Understanding Microblog Credibility Perceptions' [PDF] by Meredith Morris, Scott Counts, Asta Roseway, Aaron Hoff and Julia Schwarz reports on survey results regarding user perceptions of tweet credibility, concluding that there is a disparity between features users consider relevant to credibility assessment and those currently revealed by search engines.

The authors comment [citations and figure references deleted] that
Our survey showed that users are concerned about the credibility of content when that content does not come from people the user follows. In contexts like search, users are thus forced to make credibility judgments based on available information, typically features of the immediate user interface. Our survey results indicated features currently underutilized, such as the author bio and number of mentions received, that could help users judge tweet credibility.

It is sensible that traditional microblog interfaces hide some of these interface features because they aren’t necessary when only consuming content from known authors. Without these established relationships, errors in determining credibility may be commonplace. Participants were poor at determining whether a tweet was true or false, regardless of experience with Twitter. In fact, those higher in previous Twitter usage rated both content and authors as more credible. This mirrors findings with internet use generally, and may be due to a difficulty in switching from the heavily practiced task of reading content from authors a person follows to the relatively novel task of reading content from unknown authors. Even topical expertise may not support reliable content validity assessments. We did find that for politics, those higher in self-reported expertise (by a median split) gave higher credibility ratings to the true political tweets and their authors, yet these effects disappear for the science topic and for entertainment where those low in expertise actually gave slightly (though non-significantly) higher ratings to the true content.

In the absence of the ability to distinguish truthfulness from the content alone, people must use other cues. Given that Twitter users only spend 3 seconds reading any given tweet, users may be more likely to make systematic errors in judgment due to minimal “processing” time. Indeed, participants rated tweets about science significantly more credible than tweets on politics or entertainment, presumably because science is a more serious topic area than entertainment. Other types of systematic errors, such as gender stereotyping based on user image, did not appear to play a role. Although our survey respondents reported finding non-photographic user images less credible, our experiment found that in practice image choice (other than the detrimental default image) had little effect on credibility judgments. It is possible that image types we did not study (such as culturally diverse photographs) might create a larger effect.

The user name of the author showed a large effect, biasing judgment of both content and authors. Cha et al. discuss the role of topically consistent content production in the accumulation of followers. We see a similar phenomenon reflected here in users incorporating the degree of topical similarity in an author’s user name and tweets as another heuristic for determining credibility.

What are the implications of these difficulties in judging credibility and how can they be mitigated? Our experimental findings suggest that for individual users, in order to increase credibility in the eyes of readers, they should start by avoiding use of the default twitter icon. For user names, those who plan to tweet exclusively on a specific topic (an advisable strategy for building a large follower base), should adopt a topically-aligned user name as those generated high levels of credibility. If the user does not want a topical username, she should choose a traditional user name rather than one that employs “internet” styled spelling.

Other advice for individual tweet authors stems from our survey findings. For instance, use of non-standard grammar damaged credibility more than any other factor in our survey. Thus, if credibility is a goal, users are encouraged to use standard grammar and spelling despite the space challenges of the short microblog format, though we note that in some user communities non-standard grammar may increase credibility. Maintaining a topical focus also increases credibility, as does geographic closeness between the author and tweet topic, so users tweeting on geographically-specific events should enable location-stamping on their mobile devices and/or update their bio to accurately identify location, which is often not done.

Tweet consumers should keep in mind that many of these metrics can be faked to varying extents. Selecting a topical username is trivial for a spam account. Manufacturing a high follower to following ratio or a high number of retweets is more difficult but not impossible. User interface changes that highlight harder to fake factors, such as showing any available relationship between a user’s network and the content in question, should help. The Twitter website, for instance, highlights those in a user’s network that have retweeted a selected item. Search interfaces could do something similar if the user were willing to provide her Twitter credentials. Generally speaking, consumers may also maintain awareness of subtle biases that affect judgment, such as science-oriented content being perceived as more credible.

In terms of interface design, we highlight the issue that users are dependent on what is prominent in the user interface when making credibility judgments. To promote easier credibility assessment, we recommend that search engines for microblog updates make several UI changes. Firstly, author credentials should be accessible at a glance, since these add value and users rarely take the time to click through to them. Ideally this will include metrics that convey consistency (number of tweets on topic) and legitimization by other users (number of mentions or retweets), as well as details from the author’s Twitter page (bio, location, follower/following counts). Second, for content assessment, metrics on number of retweets or number of times a link has been shared, along with who is retweeting and sharing, will provide consumers with context for assessing credibility. In our pilot and survey, seeing clusters of tweets that conveyed similar messages was reassuring to users; displaying such similar clusters runs counter to the current tendency for search engines to strive for high recall by showing a diverse array of retrieved items rather than many similar ones – exploring how to resolve this tension is an interesting area for future work.
Useful hints for verification experts and identity criminals alike.