CERN Accelerating science

Statistical Reports and Data Analytics with Distributed Computing

Date published: 
Tuesday, 1 September, 2015
Document type: 
Summer student report
Author(s): 
G. Azzopardi
The control systems needed to run the Large Hadron Collider (LHC), its injector accelerators and their infrastructure generate massive amounts of data. This data can be used to optimize the control systems, and provide meaningful information to the machines operators and experts. At present, algorithms perform Data Analytics on over 3000 signals each day, and this number is only increasing. Performing such a large amount of computations is time consuming, especially when run on a single machine. Therefore it is through this project that we aim to search for a means of parallelizing the execution of such algorithms. The proposed solution makes use of Docker which allows for straightforward scalability. Results show that scaling up the system does indeed decrease the execution time required.