Organizations have long been on the "collect troves of any and all data" bandwagon for over a decade now. Yet Data Scientists are rarely ubiquitous within these same organizations. We will present easy-to-implement case studies of diverse statistical analytics and machine learning algorithms that allow practitioners to make actionable inference from their data, painlessly, such that businesses across disciplines and domains begin to actually make use of these data.
Open-source computer science communities for R and Python have developed powerful statistical libraries that are unprecedented in their reliability, transparency, potential, and now accessibility and ease of use. This tutorial will cover the following big-data and statistical learning capabilities in R and Python:
-visualizations of summary statistics, -classification and regression problems, -dimensionality reduction, -how to implement the above in hadoop/mapreduce ecosystems and NoSQL capabilities in our case studies.
Attendees will leave with an understanding of a breadth of statistical learning capabilities, along with a concrete roadmap of how to put into practice and utilize these mathematical and computational advances within diverse domains and data types, across varied data disciplines. Tutees will leave with a general impression of the ease to which these techniques can be applied to their particular problems and organizations, along with the know-how to actually implement these powerful tools.