One of the key purposes of the activity data programme was that other universities and colleges should be able to learn lessons from the work of the projects in order to make it easier for others to build on the work of the projects. Here we draw together some of the key lessons learnt by the projects and either reported by themselves or observed by us as part of our synthesis work.
The lessons learnt can be divided into the following areas:
Legal issues (data protection, privacy, anonymisation)
All the projects addressed issues around data protection and how they could ensure the privacy of the users of the systems. While there is no simple solution to this, one of the key points was would the data be traceable back to an individual? Clearly, some information would make this easy. For instance names and user IDs are easy to trace to an individual. There was however considerable discussion about whether or not IP addresses are personal information, and can be used to identify people. While in many cases it will not be possible (for instance because they are using a shared computer or the request is coming via a proxy and a single IP address may appear for a whole institution) in other cases it may be easy as users are using a fixed IP address.
- LIDP suggest that "you should ensure you discuss privacy issues with your institute’s legal advisor" For further details see here.
- SALT, on the other hand, found that concerns over anonymisation and data privacy are not remotely shared by the users we spoke to. While we might question this response as potentially naive, this does indicate that users trust libraries to handle their data in a way that protects them and also benefits them.
- STAR-Trak's basic position is that they are simply consuming data that already exists in their IT ecosystem – nothing new is created and suggest that there is nothing innovative in this data that requires us to treat it differently from similar data elsewhere. For further details see here.
Data (scale, quality)
Several of the projects commented on the importance of data quality. There are several aspects to this including the completeness of the data, how up to date the data is, the signal noise ratio and the actual information itself.
- AEIOU commented that the data can contain a lot of noise which can impact on performance, and that data will grow rapidly, so you need to think big from the start. For further details see here.
- AGtivity suggest that you should log as much data and for as long as possible in order to capture long- term trends, and note that scripting and regular expressions are key to extracting useful information from the large quantity of unrelated data. Future work could and should consider further recipes and advice.
- LIDP warn of the importance of retaining data for later analysis and that this needs to be thought about early. They also warn that you need to consider what the data means particularly around the use of e- resources. For further details see here.
- The OpenURL project also commented that release of data facilitates unanticipated use, which meant that the effort expended in sharing OpenURL knowledge was worthwhile as it expanded the data fields that have been made available and enabled an analysis of the quality of the data set by others that was not originally anticipated. They also found it necessary to have large volumes of data in order to identify sessions. For further details see here.
- RISE were also concerned with retention, but they also had to consider the licensing of some of the other (cataloguing data) that they needed to link to and expose in order to create the recommendations.
- Whilst SALT had 10 years of data available to them they found that a good recommendation service could be based on as little as five weeks worth of data, so it may not be necessary to have huge volumes of data. For further details see here.
- STAR-Trak which was rather different from other projects in that it brought together data from multiple sources and not just log files noted the importance of retaining data for use and of having a suitable data model. For further details see here.
- Ensure that the data you need is actually being collected. See Which events are significant guide
- Ensure that the data you need is retained for as long as you need it.
- Ensure that the data is of suitable quality and sufficiently complete .
- Consider the amount of data that will be generated and how you will process it.
- Consider filtering out data that you do not need in order to improve performance.
Understanding and presenting the data
Several of the projects discussed the importance of presenting the data in ways that are meaningful to users. AGtivity commented that one should consider visualisations and statistics results for users at an early stage and then be adaptable and agile to change. User requirements change and one should with them.
- Consider using visualisations to present the data. See Bringing activity data to life guide .
Recommendations based on user activity can be very powerful, as demonstrated by Amazon amongst others. The number of recommendations offered can have a powerful bearing on the usefulness of the recommendations. Too few and nothing useful may turn up; too many and it can be overwhelming. AEIOU suggest that 5 or 6 is about right.
- AEIOU besides looking at the number of items to present also discuss what a session might be and where to but recommendations. For further details see here.
- RISE found that there are several types of recommendation that can be used based on data in proxy log files, but that you need access to bibliographic data as well. For further details see here .
- SALT noted that if you allow very rare events to provide recommendations then they may have little to do with the current item, though they might be of interest. For further details see here.
- Ensure that you can relate log file data to bibliographic data.
- Consider using personal data to determine what course users are on to focus recommendations.
- Consider the most appropriate number of items to recommend.
- See What’s behind a recommendation? guide .