Mastering Big Data Management and Analysis- Formatech

Information Technologies

In this course you will learn how to get most from your data by combing statistical analysis, data mining and machine learning on your Big Data resources. You will gain hands on lab real life experience which will allow you easily start exploring your own data just after finishing this course.

We’ll start with learning about overall process of working with Big Data and what are the main challenges in collecting, storing and processing those data. You will learn how open source stack solutions from Apache Hadoop ecosystem can be used in your organization. Next, we will dive into solving practical problems with those tools starting from scratch and building our cutting-edge solution. First, we will prepare environment for collecting from multiple sources and storing our Big Data. You will learn how to use Zookeeper to centralize management of your Hadoop solution and manage large number of nodes with command shell. Next, we will focus on managing data storage and working with data – how to migrate them, transform and filter.

After that we will focus on processing data with Workflows and especially how to incorporate Map Reduce to get results in our distributed systems. Then you will learn how to analyze data in distributed environment with Python, Impala and Hive.

Next module will focus on ETL, Warehousing and Data Mining in Big Data environment. At the end we will go beyond classic approach for data analysis and we will use machine learning and data science techniques to get most out of your data.

Our goal is to teach you how to handle Big Data in different solutions and how to get additional insight on your business.

This course is intendent for data analysts, data scientist, big data analyst, developers and IT professionals who wants to get deep knowledge and skills regarding processing big data in Hadoop ecosystem.

hours

language

English

Summary

Our goal is to teach you how to handle Big Data in different solutions and how to get additional insight on your business.

Target Audience

Data analysts, database analysts, big data analysts, data scientists and IT professionals who wants master big data management and analysis.

prerequisites

To attend this training, you should have experience with basic statistical analysis, it is recommended that participants would understand basic concepts of object-oriented programing languages, control flow statements like IF, FOR, FOREACH, concept of variables, datatypes, collections, datasets.

01 Module 1: Big Data and Hadoop Ecosystem
+

a) Defining Big Data

b) Problems arising with BD

c) Overview of Hadoop Ecosystem

d) Hadoop architecture concepts and practical implementation

02 Module 2: Storing and Configuring Data
+

a) Hadoop stack and environment management

b) Setup and management of cluster and nodes

c) Using Zookeeper for centralized management

d) Hadoop command shell

03 Module 3: Collecting and Managing Data
+

a) HBase Installation and management

b) Nutch and Solr configuration

c) Working with Nutch Crawlers

d) Bulk transferring and streaming data

e) Monitoring cluster and data

04 Module 4: Data Processing and Workflows
+

a) Map Reduce concept and examples

b) Working with Pig

c) Map Reduce in Hive

d) Scheduling in Hadoop

e) Working with Oozie Workflow

05 Module 5: Data analytics with Hadoop
+

a) Data analysis with Python

b) Working with data in Hive and Impala

c) In-memory computing with Spark

d) Distributed analysis and design patterns

06 Module 6: Reporting
+

a) Configuration and management of Hunk

b) Creating reports and dashboards

c) Working with Talend

07 Module 7: Data Mining and Warehousing
+

a) Working with Pentaho Data Integrator

b) Creating and managing ETL process

c) Data Ingestion

d) Structured Data Queries with Hive

e) Flume Data Flows

08 Module 8: Machine Learning and Data Science
+

a) ML and Data Science overview and lifecycle

b) Analytics with Higher-level APIs

c) Machine learning with Spark

d) Working with Apache Mahout

e) Using data lakes

minimize course outline

Mastering Big Data Management and Analysis

In-Class Training

€ 1,718

Mastering Big Data Management and Analysis

Online Instructor-Led Training

€ 1,375