Diploma in Data Science and Business Analytics in collaboration with MAKAUT, WB

COURSE NAME

:

One Year Diploma in Data Science and Analytics

COURSE CODE

:

CONTACT HOURS

:

180 Hours (60 Hours Theory, 120 Hours Laboratory)

Prerequisite:

Bachelor Degree (B.Sc/B.E/B.Tech) or Diploma in Computer Science, Information Technology, or allied streams.

Course Objective:

· Expose the students to the basic statistical techniques that provide the foundation of data science.

· Illustrate the various steps of the data science process, viz. cleaning, visualisation, modeling, and presentation.

· Provide practice in different software tools used for data science: Python, R, Tableau

Course Outcome : At the end of the course students will be able to

1: Clean and prepare data for analysis

2: Perform basic visualisation of data

3: Model and curve-fit the data

4: Present findings of the analysis to stakeholders

#

Topic

Theory

Lab

1.

Introduction to Data Science and Analytics

2

2.

Overview of Statistics

6

3.

Statistical computing in Python – I

2

4

4.

Data visualizations in Python

2

4

5.

Statistical computing in Python – II

2

4

6.

Data cleaning and Preparation in Python

2

4

7.

Statistical computing in R – I

2

4

8.

Data visualizations in R

2

4

9.

Statistical computing in R – II

2

4

10.

Data cleaning and preparation using R

2

4

11.

Creating data visualisations in Tableau

4

8

12.

Predictive Analytics

4

6

13.

Time Series Forecasting

4

4

14.

Introduction to Machine Learning

4

8

15.

Analytics for Business Domains

8

8

16.

Industry Project

12

54

TOTAL

60

120

Course Content:

Introduction to Data Science [2Theory]:

What is data Science? – Applications of data science – Skills required – tools required – Models and methods – The data science process – Type of data – Nominal data – Ordinal data – Interval data – Ratio data – Relationship between different types of data Use of graphs to see characteristics of data

Overview of Statistics [6Theory]:

Descriptive Statistics – Central tendency – Spread – Distributions – Inferential Statistics – Hypothesis testing – Chi-Square – Correlation- Regression

Statistical computing in Python – I [2Theory, 4Laboratory]:

Using Jupyter Notebooks – Statements and comments – Data types and Variables – Introduction to Numpy and Pandas – Descriptive statistics in Python

Data visualizations in Python [2Theory, 4Laboratory]:

Perceptions of visual cues – Bar chart – dot plot – scatter plot – histogram – plotting in Python – numerical – categorical – time series – Matplotlib

Statistical computing in Python – II [2Theory, 4Laboratory]:

Inferential statistics using Pandas and Scipy.stats library – Chi-Square Test – Correlation – T-test – ANOVA

Data cleaning and Preparation in Python [2Theory, 4Laboratory]:

Missing values – outliers – sorting – merging – Dropping Columns in a DataFrame – Changing the Index of a DataFrame – Tidying up Fields in the Data – Combining str Methods with NumPy to Clean Columns – Cleaning the Entire Dataset Using the applymap Function

Statistical computing in R – I [2Theory, 4Laboratory]

Basic data types – variables – vectors – matrices – control structures – functions – Factors – Data frames – lists – Useful R packages – Basic statistics in R – Reading in data – Descriptive statistics in R

Data visualizations in R [2Theory, 4Laboratory]:

Basic plotting in R – Using GGPlot2 – Aesthetics – Faceting – Geoms – Position Adjustments – Saving Graphs

Statistical computing in R – II [2Theory, 4Laboratory]:

Inferential statistics using R – Chi-Square Test – Covariance – Correlation – T-test – Wilcox – ANOVA

Data cleaning and preparation using R [2Theory, 4Laboratory]:

Reshaping – meltdcastrbindcbind – Treating Missing values – Using dplyr – Using tidyr – Working with Continuous and Categorical Variables – Joining Data Sets – Grouping Data

Creating data visualisations in Tableau [4Theory, 8Laboratory]:

What is Tableau – Features of Tableau – Applications of Tableau – The Tableau products – Install Tableau Public – Tableau Workspace – Build views – Connect to data source – Creating dashboards – Data blending

Predictive Analytics[4Theory, 6Laboratory] :

Multiple Linear Regression – Classification – Logistic Regression – Linear Discriminant Analysis – Dimensionality reduction – Rapidminer tool

Time Series Forecasting [4Theory, 4Laboratory]:

Examples of time series – Forecasting – ETS models – Auto-regressive models – ARIMA – KMIME tool

Introduction to Machine Learning [4Theory, 8Laboratory]:

What is machine learning? – Applications of machine learning – How does ML work? –Training data – model/algorithm – testing data – evaluation – prediction – tools required – Orange tool – Types of ML – Supervised – Unsupervised – Common techniques – Deep Learning

Analytics for Business Domains [8Theory, 8Laboratory]:

Marketing and Retail – Web and Social Media – Banking and Finance – Supply chain and Logistics

Industry Project [12Theory, 54Laboratory]:

Each student will be required to work on a Data Science project relevant to the industry. This will involve performing business requirements analysis, solutions design, and implementation.

Reference Books:

  1. Think Stats, Allen B. Downey, O’Reilly Media.
  2. R for Data Science, Garret Grolemund and Hadley Wickham , Chapman & Hall/CRC
  3. Python Data Science Handbook, Jake VanderPlas, O’Reilly Media.