CERN Accelerating science

Oracle R technologies for data analytics and machine learning in hybrid data systems

Date published: 
Monday, 10 October, 2016
Document type: 
Summer student report
Author(s): 
R. Bisht
I present an evaluation of Oracle R Advanced Analytics for Hadoop as a Big Data Analysis platform for advance analytics and machine learning. I have used R as a basic modelling tool as it one of the most powerful statistical and computing languages with a number of predefined functionalities available to allow an easy analysis and testing of data. To provide a comparison and truly judge the performance of ORAAH, Apache Spark has been used to model the same approaches. The performance has been measured on the basis of the time consumed to build the model and the accuracy of the model. The task in this project was aimed to study the potential applicability of the aforementioned technologies using real CERN analytics use cases: (a) The degradation analysis of cryogenic valves in LHCb (b) Predict faulty cryogenic valves. The above mentioned use cases were duly run and modelled using the technologies mentioned earlier and the results computed provided very promising statistics for future use of scalable services as CERN Big Data Analytics platform.