Research Associate Waqas Khawaja (left) in the Semantic Web group and PhD Student Mohan Timilsina in the Machine Learning & Statistics group at the Insight Center for Data Analytics, National University of Ireland, Galway.
Increasingly, policymakers expect scientists to demonstrate the value of their research. That hasn’t always been easy. In the past, scientometrics focused almost exclusively on citations in journal articles, which gives a limited picture. But researchers at the Insight Centre for Data Analytics at the National University of Ireland, Galway (NUIG) are discovering ways to move scientometrics far beyond the standard citation and into the complex, noisy and often messy world of social media.
Te key is altmetrics.
|Dr. Brian Davis, Adjunct Lecturer, Research Fellow and leader of the Knowledge Discovery Unit at the Insight Centre NUIG|
“Alternative metrics, or altmetrics,” said Dr. Brian Davis, Adjunct Lecturer, Research Fellow and leader of the Knowledge Discovery Unit at the Insight Centre NUIG, “is the application of scientometrics and bibliometrics to non-traditional data sources, and the measuring of scientific impact from those sources.”
Traditional scientometrics measures the impact of science based on citations of traditional, peer-reviewed sources such as journal articles, conference papers and workshops. Anything that passes through the peer-review process is fair game for scientometric analysis.
Non-traditional sources, on the other hand, might include anything from citations in unpublished pre-prints of journal articles to an entry in a data set. Other non-traditional citations could include mentions in government white papers, policy papers or technical reports. Altmetrics might even capture downloads of a paper based on a tweet of its Digital Object Identifier (DOI) – or the mention of a scientist or scientific organization on social media.
As Dr. Davis explained:
The data sources could be either non-traditional from the point of view of the the knowledge sources the articles might come from, but it could also be on social media. If you, as a scientist, were mentioned on social media, then you being tweeted about implies some kind of scientific dissemination on your site; that is, that you are of interest as a scientist. It could also be the scientific organization. And then there are scientific blogs: A BBC science blog may discuss your paper and reference its DOI. Or they might just discuss you as a scientist and reference your work somehow indirectly, or mention the scientific organization you're affiliated with, and its website as well.
As such, these metrics help tell the story of research and researchers by revealing interest and usage beyond traditional measures. Andrea Michalek, Managing Director of Plum Analytics, a leading altmetrics provider acquired by Elsevier last year, explained:
“As funders require more narrative input on their applications and more recent and relevant information on the impact of research, PlumX Metrics can help researchers in understanding what to emphasize in their grant applications. Publishers and editors also have a more complete understanding of what has been happening with articles they have published. Librarians can support faculty with tenure and promotion or funding applications by providing article-level metrics.”
Andrea Michalek, co-founder and Managing Director of Plum Analytics, speaks at a roundtable at the Harvard Data Science Initiative in November. (Photo by Alison Bert)
To test the potential of altmetrics, Dr. Davis and his team at the Insight Centre analyzed billions of social media posts to discover scientometric trends in a topic that affects nearly everyone: the flu. Dubbed SOPHIA, the project was a targeted effort co-funded by Science Foundation Ireland, Ireland’s national scientific agency, and Elsevier Labs, Elsevier’s advanced technology group.
As Dr. Davis explained:
The best way to look at it was as a business analytics for a university: Which departments are publishing? Which scientists are are publishing? who are they collaborating with? And so on. Universities were just starting to look at things like altmetrics, but they weren't very sure how to measure it or how to define what could be measured. Our idea was to build an experimental prototype dashboard with a focused use case of altmetrics data sources, so it would have had analytics at the structural level.
This is a chart from the SOPHIA project, which stands for Social Phrases Having Impact in Altmetrics. SOPHIA is a custom analysis prototype built by Insight Galway and Elsevier. It aims to help users understand research dissemination in sources other than traditional scientific publications, such as in social media, government documents and news sources.
The team’s use case was the scientific impact of research about theTamiflu vaccine, using a massive data set called Spinn3r, crawling social media from November 2010 through July 2011, during an avian flu epidemic.
“We thought,” said Mohan Timilsina, a PhD candidate and research associate at the Insight Centre, “that if we picked a topic like highly public health stories, then we might see the scientific activities around those particular entities.”
“There was an awful lot of buzz around the topic,” said Dr. Davis, “and scientific organizations would be mentioned on social media, but that would be as much of a mention as you got. You would never get a direct quote or a citation.”
To make up for the lack of direct citations, the team examined the links among heterogeneous sources. For instance, a paper might be linked to an author, an author to a scientific organization, and that organization to government policy document. By analyzing all these sources as a connected whole, they were able to draw heretofore unseen connections.
A multi-relational graph of altmetric entities from Insight Galway and Elsevier. View source.
“We started with the development of a web graph,” explained Timilsina, “a graph of blogs and mainstream news, which was billions of pages. From there, we found which identifiers were used in social media and from the avian influenza publications from mainstream news and blogs, and used those identifiers to link to the primary sources which are stored in Elsevier’s Scopus database.”
Scopus is the world’s largest abstract and citation database of peer-reviewed literature, including scientific journals, books and conference proceedings.
Timilsina said their team used the Scopus API to extract metadata from publications, including citations, authors, publication venue and organizations:
In this way, we had connected information, and we used different influence-based metrics in networks, such as page rank and other algorithms, and tried to measure the influence of scientific entities in social media. Traditionally, the impact of scientific publications (is) measured in a citation graph. But in our case, we measured the influence of social media; we also used some machine learning approaches to validate those scores.
By approaching the data heterogeneously, a clearer scientometric picture began to emerge. “We were able to see, for instance, co-mentions of scientists together,” said Waqas Khawaja, also a PhD student and research associate at the Insight Centre. “They would not necessarily be co-authors of any papers, but their names were appearing together in particular news items or blog posts. We were able to also see the co-mentions of different research organizations as well.”
The team also plotted the data against time, which allowed them to see when individual scientists began collaborating with a particular organization, how long such a collaboration lasted, or when a scientist or organization was popular in news items.
“There were new visualization designs we had to come up with,” said Dr. Davis. “It wasn't just visualizing paper citations; we had this whole new graph of different types of knowledge that we had to find ways to visualize effectively.”
The results of the project are speculative but tantalizing. The sheer volume of data and heterogeneity of information made the task a daunting one, Timilsina said:
Basically this kind of data is really noisy, and it's very hard to find a signal in this data because it's like looking for a needle in a haystack. Because social media is full of junk information, finding the signal is really hard. I was expecting we would get at least twenty or thirty thousand scientific publications around the topic but we found only a thousand publications that were actually talking about avian influenza.
With that caveat in place, the team was able to make significant strides in understanding what altmetrics is capable of. Their results could have lasting value, both for individual researchers and for funding organization.
“If you're an early career researcher I think you can benefit from engaging with tools like Mendeley and scientific dissemination on social media,” said Dr. Davis.
Mendeley Data, Elsevier’s cloud-based data repository, allows researchers to share, access and cite research data, and it features Plum Analytics in its dashboard.
If you're a young scientist or you're a young PhD student (and) you promote your science in social media, then you kind of accelerate your h-index, but the higher the h-index gets, the harder it is to make it increase. For example, if you have an index of 3, then the change to an h-index of 4 requires that you have four publications cited four times, which is difficult. Social media can help to influence that, but there may be other scientific factors to boost that kind of effect. Social media is a way to disseminate your information; it’s very (unlikely) that it will reflect that influence in the real academic world.
“A lot of funding agencies are looking at altmetrics, but they’re not quite sure how to measure it,” Dr. Davis said. Compared to traditional bibilometrics, the measures provided by altmetrics are less concrete; the peer-reviewed nature of traditional bibilometric subjects allows for a precision that is impossible with altmetrics.
“But that doesn’t mean that altmetrics isn’t of value,” Dr. Davis added:
It’s just that new measures and metrics need to be defined and new tools are needed in order to capture them. And they don't exist really yet. It's still very much a very new field and unexplored, uncharted territory. I also think funding agencies will be more and more under pressure to justify to their paymasters bow funding for educational research is benefiting the public, for instance. Altmetrics may be the way of capturing how science is diffused into the public domain.
According to John Lonican, an intern and master’s student at the Insight Centre, the results of the current project may not be as important as what’s done with their results. “The data set was one of the most interesting things in the long run,” he explained, “because it means we've started on something; we’ve created a dataset which is basically an introduction a large amount of research can be done with it. The release of that, hopefully in the near future, will open up so many doors – not just for us but for a lot of other researchers.”
Table of Contents