Research profiling systems have achieved notable adoption by research institutions. The ability to present a consistent public-facing presentation of research expertise not only improves the visibility of an institution's researchers, it also provides a significant foundation for enabling team science through discovery of potential collaborators.
One of the challenges presented to deployment of these systems is the resulting 'stovepipe effect' of each institution publishing data only about their researchers, even when those researchers collaborate with many other researchers at many other institutions. In the same way that search engines provide a single point of discovery for users of the World Wide Web, CTSAsearch provides a single point of discovery and visualization targeted specifically at scholarly researchers, their research, and their collaborators. CTSAsearch can therefore provide a number of features that build upon the data in research profiles, but take a user's overall experience to a higher level:
- A common conceptual terminology (currently the Unified Medical Language System, UMLS, from NIH's National Library of Medicine) supports matching of queries to profiles even when a query uses a synonym of a term found in a profile. Users can zoom in and out conceptually when a given query proves to generic or specific.
- A search hit in CTSAsearch points directly back to that researcher's page at their home institution, so the information presented is as fresh as possible.
- Visualization of a research community spans multiple institutions, allowing information seekers to see key players in a field without the need to manually search and corrolate publication data.
- State-of-the-art social network analysis techniques reveal community relationships that span inter-institutional boundaries.
- Inter-institutional analytics emerge easily through aggregation of profile data.
How does it work?
CTSAsearch is a federated search engine using VIVO ontology compliant Linked Open Data published by 80 institutions using a number of open source and commercial research profiling systems. These data are harvested periodically using multiple methods:
- Querying a SPARQL endpoint, when available;
- Crawling publicly visible RDF pages, when supported (e.g., by VIVO);
- Querying an API, when available (e.g., by Elsevier Pure); and
- Crawling publicly visible HTML pages, and extracting data, when necessary.