OpenURL: next steps
During the course of the project many interesting areas of research surfaced, which we were not able to pursue due to time constraints.
Making Further Data Reliably Available
- The information in the Level 2 data made available relates only to ‘redirect to resolver’ requests. There is further information in the Level 0 logs that may be of use to others. We noted a Level 1 data set that would include as much of the data as possible while not identifying institutions. A brief analysis and a small amount of development would be needed to make this available.
- Several institutions indicated that they would be willing to share data from their resolvers. Further legal work would be required to enable this as well as a solution to the challenge of de- duplicating the data that would appear both in the OpenURL Router and institutional resolver logs. Making this expanded data set available in the same way as the files released during this project would likely be very useful to others.
- Manual steps are currently involved in producing the data files. It would be valuable to invest effort to fully automate monthly production of the data files and include this as part of the OpenURL Router service so that others could rely upon regular production of the data.
- Not all UK Higher Education Institutions (HEIs) that have an OpenURL resolver are registered with the OpenURL Router service. Working with these institutions to facilitate their registration and encouraging them to participate in the data aggregation would expand the data set available.
Develop Services Based on the Data
- Based on the data from 2) above develop a reporting service for institutions to give them statistics about what journals are sought via the OpenURL Router for their institution compared with a) all UK institutions b) some grouping eg Russell Group. This depends on whether they pass all information via the OpenURL Router, but for those that do it will enable them to compare publisher data on statistics with an independent source, and to determine the value for money of the packages that are sold by the publishers.
- Develop a method for an institution to take the data set and integrate it with existing recommenders (possibly working with the RISE Project). This is likely to involve further work with vendors.
Develop the Prototype Recommender
- Develop the prototype recommender into a service producing relevant recommendations via machine interface that could be integrated into any of the existing library systems.
- One of the OpenURL resolver vendors indicated that they would be willing to share a large volume of data; inclusion of this in the data set for the prototype recommender would enable a significantly greater number of links to be made, thus improving the recommendations for end users.
- Develop a journal-level recommender for undergraduates, ideally with a visual interface to allow them to explore the network of journals that may interest them.
- Explore the use of content filtering in combination with collaborative filtering (or adding session identifiers) to enable use of the whole data set, and not just the proxy free data set.
Investigate the Data Files Made Available During this Project
- Determine whether the data held by the OpenURL Router is just the same as citation data, or whether it holds links that are not obvious from citation data.
- Enable visualisation of the data set, such as with a Force Directed Graph for displaying links between the articles based on the recommendation data, or adapting technology from the MOSAIC Book Galaxy project
Explore Potential Uses for the Data
As indicated elsewhere we expect use of the data that we cannot anticipate. Nevertheless,
some ideas have emerged within the community which are listed here:
- Analysis to identify patterns of requests for journals or articles;
- Use by students learning to analyse large data sets;
- Use by researchers as the basis for a thesis;
- Use by publishers to compare their listings of journals and articles with those sought by users;
- Linking with other data sets for analysis of wider activity.