Advanced Data Science Professional with Python (ADSP)

Data Scientist has become one of the most in demand job position and the demand for professionals with Data Science expertise are booming. According to a study by IBM, experts has predicted a 28% increase in demand for data scientists by the year 2020.

To embark on a career as a Data Scientist, Python programming may be the programming language of choice. Based on recent studies, 66% of data scientists reported using Python daily, making it the number one tool for analytics professionals. Data science expert also expect the to increase as the development of the Python ecosystem increases.

Course Information

  • Duration: 5 Day / 40 Hours
  • Who Should Attend: Professionals or Anyone interested in pursuing a career as a data scientist and use data to understand the world, uncover insights, and make better decisions

Course Objective

Advanced Data Science Professional with Python (ADSP) is designed for participants interested in pursuing a career as a data scientist and acquire knowledge on using Python to uncover insights and make better decision.

Pre-Requisite

NA

Examination

Participants are required to attempt an examination upon completion of course. This exam tests a candidate’s knowledge and skills related to Data Science and Python based on the syllabus covered

Certification

Participants will receive a Certificate of Competency upon successfully completing and fulfilling all course requirements course

Module 1 Introduction to Data Science

  • What is Data Science
  • Data Science Vs. Analytics vs. Data warehousing, OLAP, MIS Reporting
  • Relevance in industry and need of the hour
  • Type of problems and objectives in various industries
  • How leading companies are harnessing the power of Data Sceince?
  • Different phases of a typical Analytics/Data Science projects

Module 2 Introduction to Python Programming Essential

  • Introduction to Python Editors & IDE’s(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)
  • Custom Environment Settings
  • Python Basic Rules in Python
  • Concept of Packages/Libraries – Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)
  • Installing & loading Packages & Name Spaces
  • Data Types & Data objects/structures (Tuples, Lists, Dictionaries)
  • List and Dictionary Comprehensions
  • Variable & Value Labels –  Date & Time Values
  • Basic Operations – Mathematical – string – date
  • Reading and writing data
  • Simple plotting/Control flow/Debugging/Code profiling

Module 3 Importing and Exporting Data with Python

  • Starting Python
  • Importing Data from various sources (Csv, txt, excel, access etc…)
  • Database Input (Connecting to database)
  • Viewing Data objects – sub setting, methods
  • Exporting Data to various formats

Module 4 Data Cleansing with Python

  • Cleansing Data with Python
  • Data Manipulation steps (Sorting, filtering, duplicates, merging, appending, sub setting, derived variables, sampling, Data type conversions, renaming, formatting etc)
  • Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
  • Python Built-in Functions (Text, numeric, date, utility functions)
  • User Defined Functions in Python
  • Stripping out extraneous information
  • Normalizing data and Formatting data
  • Important Python Packages for data manipulation(Pandas, Numpy etc)

Module 5 Data Visualization with Python

  • Introduction exploratory data analysis
  • Descriptive statistics, Frequency Tables and summarization
  • Univariate Analysis (Distribution of data & Graphical Analysis)
  • Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
  • Creating Graphs
  • Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, Pandas and scipy.stats etc)

Module 6 Basic Statistics

  • Basic Statistics – Measures of Central Tendencies and Variance
  • Building blocks (Probability Distributions, Normal distribution, Central Limit Theorem)
  • Inferential Statistics -Sampling – Concept of Hypothesis Testing
  • Statistical Methods – Z/t-tests(One sample, independent, paired), ANOVA, Correlation and Chi-square

Module 7 Introduction to Machine Learning

  • Statistical learning vs Machine learning
  • Iteration and evaluation
  • Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
  • Different Phases of Predictive Modelling (Data Pre-processing, Sampling, Model Building, Validation)
  • Concept of Overfitting and Under fitting (Bias-Variance Trade off) & Performance Metrics
  • Types of Cross validation(Train & Test, Bootstrapping, K-Fold validation etc)

Module 8 Basic Predictive Modelling

  • Introduction to Predictive Modelling
  • Types of Business problems – Mapping of Techniques
  • Linear Regression
  • Logistic Regression
  • Segmentation – Cluster Analysis (K-Means / DBSCAN)
  • Decision Trees (CHAID/CART/CD 5.0)
  • Time Series Forecasting