JUNE 18–22, 2017

Presentation Details

Name: (RP14) Event Pattern Identification in Anonymized System Logs
Time: Tuesday, June 20, 2017
08:35 am - 09:45 am
Room:   Substanz 1+2  
Breaks:07:30 am - 10:00 am Welcome Coffee
Presenter:   Siavash Ghiasvand, TU Dresden/ZIH
The size of computing systems and the number of their components steadily increase. The volume of generated system logs is in proportion to this increase. Storing system logs for analyzing and diagnosing systems behavior in large computing systems, requires a high amount of storage capacity. Sensitive data in system logs raise significant concerns about their sharing and publishing. The use of anonymization methods to cleanse sensitive data in system logs before publication reduces the usability of anonymized system logs for further analysis. After a certain level of anonymization, the cleansed system logs lose their semantic and only remain useful for certain statistical analyses. In this work, we address this tradeoff between anonymization and the usefulness of anonymized system logs. This way, full system logs anonymization is guaranteed, minimum storage space is required, and the cleansed system logs remain usable for general statistical analyses. To address the above tradeoff: (1) All variables -of every log entry- need to be replaced with defined constant values. (2) Each log entry maps to a hash-key via a hash function that is resistant to hash-key collisions. (3) The frequency of each hash-key is calculated. (4) The hash-keys are optimized based on their frequency of appearance. Additionally, based on the hash-keys frequency, the non-informative hash-keys will be eliminated. Preliminary results of analyzing system logs from a production system via the proposed method, show up to 95% reduction in required storage capacity, while the precision of the statistical analysis remains unchanged and full anonymity is guaranteed.

Siavash Ghiasvand, TU Dresden / ZIH
Florina M. Ciorba, University of Basel

RP14_Ghiasvand.pdf (1198 KB)