This guide discusses anonymisation of activity data. This is required before activity data can be distributed to others..
The Data Protection Act legislates that one cannot release personal data to other people without the data subjects' permission. Much of the activity data that is collected and used contains information which can identify the person responsible for its creation. Examples of personal information are username, library number, IP address of the computer that a user is using, or other information including patterns of behaviour that can identify individual users.
Therefore where information is to be released as open data, consideration needs to be given to anonymising the data. This may also be required for sharing data with partners in a closed manner depending on the reasons for sharing and the nature of the data, together with any consent provided by the user.
Two main options exist if you want to share data.
The first is to only share statistical data . As the Information commissioner recently wrote "Some data sharing doesn't involve personal data, for example where only statistics that cannot identify anyone are being shared. Neither the Data Protection Act (DPA), nor this code of practice, apply to that type of sharing."
The second is to anonymise the personal data so that it cannot be traced back to an individual. This can take a number of forms. For instance, some log files store user names while other log files may store IP addresses, where a user uses a fixed IP address these could be traced back to them. anonymising the user name or IP address through a purpose-specific algorithm would prevent this. A further problem may arise where rare occurrences might be able to be used to identify an individual. For instance a pattern of accessing some rare books could be linked to someone with a particular research interest. Small counts are often omitted, e.g. loans made by a people attending a course with a low class size.
Taking it further
If you want to take it further then you will need to consider the following as a starting point:
- Does the data you are considering releasing contain any personal information?
- Are the people that you are sharing the data with already covered by the purpose the data was collected for (eg a student’s tutor)?
- Is the personal information directly held in the data (user name, IP address)?
- Does the data enable one to deduce a user identity (only x could have borrowed those two rare books – so what else have they borrowed)?
- See also Sharing activity data: Legal issues .
- Data sharing code of practice, the Information Commissioner's office, 2011 http://www.ico.gov.uk/~/media/documents/library/Data_Protection/Detailed_specialist_guides/data_sharing_code_of_practice.pdf
- Three of the projects have also been addressing this issue: