Data protection issues

It is essential that you consider data protection as this is a legal issue governed by the Data Protection Act 1998 and various codes produced by the information commissioner. There are two key areas that you will have to consider when using activity data:
  • Data can only be used for purposes which it had been collected (as notified to the user).
  • Where information is being made public (released as open data) the information needs to be suitably anonymised.
Data protection
Whenever data is collected it is a requirement of the data protection act that the user gives their permission. Typically students and staff do this when they join the university through the various policies that they sign up to, however you need to ensure that the permission that they are giving is appropriate to whatever you are trying to do.
The Surfacing the Long Academic Tail (SALT) project found that the users that they consulted expressed no concern about the collection of activity data and its use in the recommender. In fact, they found, most assumed this data was collected anyway and encouraged its use in this way, as ultimately it is being used to develop a tool which helps users to research more effectively and efficiently.
The following advice is available:
Consent management
The Data Protection Act 1988 requires organisations collecting data to gain permission from people prior to collecting that information. It is therefore important to ensure that any consent that your users (staff, students, prospective students and any other visitors) give appropriate consent for the information that is being collected.
The OpenURL Activity Data project sought legal advice on consent management which led them to asking universities using the OpenURL router to seek consent from their users, while the Student Tracking And Retention: STAR-Trak: NG project enables users to take explicit control of what their users can see.
The following further information and advice is available:
  • Guide to Informing users about data collection
  • JISC legal has produced a briefing Consent Management: Handling Personalisation Data Lawfully from which we have abstracted some of the key details.
Personal data cannot be shared except for the purposes that the user originally agreed that it could be collected for. However, if the data is anonymised such that it cannot be traced back to an individual then it is no longer personal data and can be published. Therefore, if data is going to be opened up, or shared, perhaps to create a better recommender system, then one approach is to anonymise the data. This can be done in a number of different ways depending on the nature of the data and the purpose it will be used for.
It should be noted that a recent appeal court judgement suggested that the critical issues is "does the 'other information' (if provided to the hypothetical member of the public) add anything to the statistics which would enable them to identify the underlying individuals? If the answer is no, the statistics are not personal data. The underlined words are important: if identification can be achieved from the 'other information' in isolation (rather than when added to the statistics) then the statistics themselves are truly anonymous, and are not personal data."
Within this program, there have been a number of different approaches to anonymisation in different projects. For instance LIDP wrote:
"Our data involves tying up student degree results with their borrowing history (i.e. the number of books borrowed), the number of times they entered the library building, and the number of times they logged into electronic resources. In retrieving data we have ensured that any identifying information is excluded before it is handled for analysis. We have also excluded any small courses to prevent identification of individuals eg where a course has less than 35 students and/or fewer than 5 of a specific degree level."
Note that they have both anonymised any identifiers that might be used to identify students, but have also excluded small courses, setting the lower limit on course size that they use at 35.
The following advice is available:
Projects have also taken their own advice as summarised below:
  • AEIOU were concerned with the ability to identify users from IP addresses.
  • AGtivity - At present only sites that agree to receiving their data have been analysed. For more public documents specific names have been removed within for example the case studies.
  • EVAD took a cautious approach to data protection, with the ability to release more data to partners at a later date if this seemed appropriate.
  • LIDP have produced summary statistical information from which individuals cannot be identified.
  • OpenURL
  • RISE - have altered the Open University's privacy policy and anonymise the data
  • STAR-Trak- NG
You should also look at the section on licensing and sharing of activity data .