Link: University of Iowa

What is CTSAsearch?

CTSAsearch is a federated search engine using Linked Open Data published by members of the CTSA Consortium and other interested parties. We harvest linked open data through four different approaches:

  1. SPARQL endpoints using queries based on the VIVO ontology;
  2. standardized RDF crawlers interrogating VIVO and Profiles sites;
  3. custom content crawlers for other sites; and
  4. site-specific data dumps for sites not amenable to the above approaches.
If you would like to participate, contact Dave Eichmann with details on how to access your SPARQL endpoint or the URL for your research networking system.

Indexed Content

We generate a federated information view over the following classes of interest in the VIVO ontology and their various properties. The main properties of interest to our current indexing scheme appear with each class.
  1. Person and associated properties.
    1. URI
    2. URL pattern (if different from URI - while strictly speaking not part of the ontology, we've already encountered one site that does not resolve a URI request to a HTML page when manually using web browsers)
    3. Last name
    4. First name
    5. Title (optional)
    6. Email address (optional)
    7. Phone number (optional)
    8. Research overview (optional)
    9. Research area(s) (optional)
    10. Keywords (optional)
  2. Academic Article with associated properties.
    1. URI
    2. Label (optional)
    3. PMID (optional)
    4. DOI (optional)
    5. PMCID (optional)
  3. Authorship with associated properties.
    1. Person URI
    2. Article URI
    3. rank (optional)
    4. isCorrespondingAuthor (optional)

Sample Query for SPARQL-accessible Sites

We use Apache Jena to interrogate participating SPARQL endpoints. Here's a code fragment demonstrating our standard query pattern. Note that 'endpoint' is a string containing the SPARQL URL.

String query = Prefix + "SELECT ?p ?fn ?ln where { ?p rdf:type foaf:Person. ?p foaf:firstName ?fn. ?p foaf:lastName ?ln }";
Query theClassQuery = QueryFactory.create(query, Syntax.syntaxARQ);
QueryExecution theClassExecution = QueryExecutionFactory.sparqlService(endpoint, theClassQuery);
ResultSet crs = theClassExecution.execSelect();