Exposing VLE data (EVAD)
This cautious approach meant hiding any data that would identify an individual user, site names and anything that might link a session back to a particular user. We settled on a hashing algorithm to use to obscure any such items of data yielding a string that can be determined uniquely from the value; we also used a salt to prevent inversion of the hash through exhaustive search of short inputs.
At this stage, we also looked at some sample data to reinforce our decisions.
The decision on what to hash was straightforward in many cases such as concealing any field with Site, User Name, URL or Content in it. Some things were less clear cut. For instance, the skin around a site could be used to identify a Department. The Session Ids that we looked at appeared to be numeric and we decided there was little risk in leaving this in its raw state. However, later testing revealed that, in some instances and points of time, this has included a user identifier so we agreed to hash this. It is worth remembering that the hashing algorithm is consistent so even though the value of the Session Id has been changed, it can still be used to link the Event and Session tables.