
Prerequisite:
Bachelor Degree (B.Sc/B.E/B.Tech) or Diploma in Computer Science, Information Technology, or allied streams.
Course Objective:
· Expose the students to the basic statistical techniques that provide the foundation of data science.
· Illustrate the various steps of the data science process, viz. cleaning, visualisation, modeling, and presentation.
· Provide practice in different software tools used for data science: Python, R, Tableau
Course Outcome : At the end of the course students will be able to
1: Clean and prepare data for analysis
2: Perform basic visualisation of data
3: Model and curvefit the data
4: Present findings of the analysis to stakeholders
# 
Topic 
Theory 
Lab 
1. 
Introduction to Data Science and Analytics 
2 

2. 
Overview of Statistics 
6 

3. 
Statistical computing in Python – I 
2 
4 
4. 
Data visualizations in Python 
2 
4 
5. 
Statistical computing in Python – II 
2 
4 
6. 
Data cleaning and Preparation in Python 
2 
4 
7. 
Statistical computing in R – I 
2 
4 
8. 
Data visualizations in R 
2 
4 
9. 
Statistical computing in R – II 
2 
4 
10. 
Data cleaning and preparation using R 
2 
4 
11. 
Creating data visualisations in Tableau 
4 
8 
12. 
Predictive Analytics 
4 
6 
13. 
Time Series Forecasting 
4 
4 
14. 
Introduction to Machine Learning 
4 
8 
15. 
Analytics for Business Domains 
8 
8 
16. 
Industry Project 
12 
54 
TOTAL 
60 
120 
Course Content:
Introduction to Data Science [2Theory]:
What is data Science? – Applications of data science – Skills required – tools required – Models and methods – The data science process – Type of data – Nominal data – Ordinal data – Interval data – Ratio data – Relationship between different types of data Use of graphs to see characteristics of data
Overview of Statistics [6Theory]:
Descriptive Statistics – Central tendency – Spread – Distributions – Inferential Statistics – Hypothesis testing – ChiSquare – Correlation Regression
Statistical computing in Python – I [2Theory, 4Laboratory]:
Using Jupyter Notebooks – Statements and comments – Data types and Variables – Introduction to Numpy and Pandas – Descriptive statistics in Python
Data visualizations in Python [2Theory, 4Laboratory]:
Perceptions of visual cues – Bar chart – dot plot – scatter plot – histogram – plotting in Python – numerical – categorical – time series – Matplotlib
Statistical computing in Python – II [2Theory, 4Laboratory]:
Inferential statistics using Pandas and Scipy.stats library – ChiSquare Test – Correlation – Ttest – ANOVA
Data cleaning and Preparation in Python [2Theory, 4Laboratory]:
Missing values – outliers – sorting – merging – Dropping Columns in a DataFrame – Changing the Index of a DataFrame – Tidying up Fields in the Data – Combining str Methods with NumPy to Clean Columns – Cleaning the Entire Dataset Using the applymap Function
Statistical computing in R – I [2Theory, 4Laboratory]
Basic data types – variables – vectors – matrices – control structures – functions – Factors – Data frames – lists – Useful R packages – Basic statistics in R – Reading in data – Descriptive statistics in R
Data visualizations in R [2Theory, 4Laboratory]:
Basic plotting in R – Using GGPlot2 – Aesthetics – Faceting – Geoms – Position Adjustments – Saving Graphs
Statistical computing in R – II [2Theory, 4Laboratory]:
Inferential statistics using R – ChiSquare Test – Covariance – Correlation – Ttest – Wilcox – ANOVA
Data cleaning and preparation using R [2Theory, 4Laboratory]:
Reshaping – melt – dcast – rbind – cbind – Treating Missing values – Using dplyr – Using tidyr – Working with Continuous and Categorical Variables – Joining Data Sets – Grouping Data
Creating data visualisations in Tableau [4Theory, 8Laboratory]:
What is Tableau – Features of Tableau – Applications of Tableau – The Tableau products – Install Tableau Public – Tableau Workspace – Build views – Connect to data source – Creating dashboards – Data blending
Predictive Analytics[4Theory, 6Laboratory] :
Multiple Linear Regression – Classification – Logistic Regression – Linear Discriminant Analysis – Dimensionality reduction – Rapidminer tool
Time Series Forecasting [4Theory, 4Laboratory]:
Examples of time series – Forecasting – ETS models – Autoregressive models – ARIMA – KMIME tool
Introduction to Machine Learning [4Theory, 8Laboratory]:
What is machine learning? – Applications of machine learning – How does ML work? –Training data – model/algorithm – testing data – evaluation – prediction – tools required – Orange tool – Types of ML – Supervised – Unsupervised – Common techniques – Deep Learning
Analytics for Business Domains [8Theory, 8Laboratory]:
Marketing and Retail – Web and Social Media – Banking and Finance – Supply chain and Logistics
Industry Project [12Theory, 54Laboratory]:
Each student will be required to work on a Data Science project relevant to the industry. This will involve performing business requirements analysis, solutions design, and implementation.
Reference Books:
 Think Stats, Allen B. Downey, O’Reilly Media.
 R for Data Science, Garret Grolemund and Hadley Wickham , Chapman & Hall/CRC
 Python Data Science Handbook, Jake VanderPlas, O’Reilly Media.