Random Decision Forests on Apache Spark

Date:

Tuesday, 12 April, 2016 - 15:00 to 16:00

Location:

Apache Spark continues to gain momentum as the new processing paradigm for Apache Hadoop, and for the data scientist, it has a lot to like: natively distributed, REPL, Python APIs in addition to native Scala, and a library of machine learning algorithms, MLlib.

Spark includes an implementation of random decision forests, an important and popular ensemble classifier/regressor algorithm. This talk will introduce Spark and random decision forests to the curious, and demonstrate the process of analyzing a real-world data set with them. The session will cover loading data and understanding the data set, and introduce ideas like training and test set evaluation, ensemble methods, feature types, and supporting concepts like impurity and entropy.

Indico or other event webpage:

For more information about the event

Calendar
Send by email
PDF version

CERN Accelerating science

title

You are here

Random Decision Forests on Apache Spark