Advanced Big Data Analytics Expert (ABDE)

Register Here
Classroom Session: 9:30am to 5:30pm (SGT / UTC +8) 4 Lessons Click on preferred date to register
Course Information
  • Duration: 5-Day / 40 Hours
  • Who Should Attend: Advanced Big Data Analytics Expert (ABDE)is designed for anyone who is interested in advanced knowledge and skills in Big Data

Course Objective

Advanced Big Data Analytics Expert is aimed to provide participants with the advanced knowledge on Big Data Analytics. Through real-time demonstration on scenario based hands-on exercises, Participants will be able to experience first-hand how Advanced Big Data Analytics can be applied in real life.


It is preferred that participants have some knowledge in Big Data, Data Analytics or successfully received a Certificate of Competency in Advanced Big Data Professional (ABDP).


Participants are required to attempt an examination upon completion of course. This exam tests a candidate’s knowledge and skills related to Big Data based on the syllabus covered.


Participants will be awarded a Certificate of Competency and recognized as a Advanced Big Data Analytics Expert (ABDE) upon meeting the requirements and passing the examination.

Module 1 Overview on Big Data Hadoop Ecosystem
Topics Covered
  • Python Refresher
  • Standard Toolkit for Hadoop and Analytics
  • Understanding Relational, NoSQL, and Graph Databases
  • Construction of Data Pipelines
  • Data Modeling in Hadoop
  • HDFS Schema Design
  • HBase Schema Design
  • Working on Metadata

Module 2 Advanced Hadoop Techniques
Topics Covered
  • What is Data Ingestion?
  • Different Ways to Perform Data Ingestion
  • Data Extraction
  • Data Processing in Hadoop
  • Overview on MapReduce
  • Working on Spark components
  • Pig and How it is Being Used
  • Overview on Hive
  • Impala Speed-Oriented Design

Module 3 Introduction to Orchestration
Topics Covered
  • Orchestration Frameworks in Hadoop
  • Oozie Terminology and Workflow
  • Windowing Analysis using Spark
  • Parameterizing Workflows
  • Scheduling Patterns
  • Execution of Workflows

Module 4 Real-Time Processing with Hadoop
Topics Covered
  • Stream Processing
  • Integration of Apache Storm with HDFS and HBase
  • Trident Overview
  • Spark Streaming Overview
  • Flume Interceptors
  • Low-Latency Enrichment, Validation, Alerting, and Ingestion
  • NRT Counting, Rolling Averages, and Iterative Processing
  • Complex Data Pipelines

Module 5 Working with Big Data Framework using Python and Spark
Topics Covered
  • Hadoop and Spark Refresher
  • Spark SQL and Python Pandas DataFrame
  • Improving Analysis Performance with Parquet and Partitions
  • Working with Unstructured Data
  • Working on Spark DataFrames
  • Writing Output from Spark DataFrames
  • Data Manipulation with Spark DataFrames
  • Plotting Graph in Sparks

Module 6 Exploratory Data Analysis
Topics Covered
  • Handling of Missing Values using Spark DataFrame
  • Correlation Analysis with Python PySpark DataFrame
  • Improving Analysis Performance with Parquet and Partitions
  • Understanding Exploratory Data Analysis
  • Identify Target Variable and Related KPIs
  • Feature Importance of Target Variable
  • Different Phases of an Analytics Project Life Cycle
  • Gaussian Distribution of Numeric Features

Module 7 Advanced Big Data Analysis
Topics Covered
  • Reproducible Approach to Gathering Data
  • Understanding the Standards and Code Practices
  • Segmentation of Workflow
  • Missing Value Preprocessing with High Reproducibility
  • Use of Functions / Loops to Optimize Coding
  • Utilization of Libraries / Packages / Algorithms
  • Normalization of Data

Module 8 Putting Everything To Together
Topics Covered
  • Reading Data from a CSV File with Python PySpark Object
  • Reading JSON Data with Python PySpark Object
  • Using Python PySpark Objects for SQL Operations
  • Generating Statistical Measurements
  • Visualisation Using Plotly

Module 9 Big Data and Machine Learning using Spark
Topics Covered
  • Resilient Distributed Datasets with Spark
  • Introduction to Spark MLlib
  • Decision Tree with Spark MLlib
  • K-Means Clustering with Spark
  • Term Frequency – Inverse Document Frequency (TF-IDF)
  • DataFrame API with Spark MLlib
  • Understanding A/B Testing

Advanced Big Data Analytics Expert (ABDE) involves rigorous usage of real-time case studies, hands-on exercises and group discussion