
Prerequisite:
Bachelor Degree (B.Sc/B.E/B.Tech) or Diploma in Computer Science, Information Technology, or allied streams.
Course Objective:
· Expose the students to the basic statistical techniques that provide the foundation of data science.
· Illustrate the various steps of the data science process, viz. cleaning, visualisation, modeling, and presentation.
· Provide practice in different software tools used for data science: Python, R, Tableau
Course Outcome : At the end of the course students will be able to
1: Clean and prepare data for analysis
2: Perform basic visualisation of data
3: Model and curvefit the data
4: Present findings of the analysis to stakeholders
# 
Topic 
Theory 
Lab 
1. 
Introduction to Data Science and Analytics 
2 

2. 
Overview of Statistics 
6 

3. 
Statistical computing in Python – I 
2 
4 
4. 
Data visualizations in Python 
2 
4 
5. 
Statistical computing in Python – II 
2 
4 
6. 
Data cleaning and Preparation in Python 
2 
4 
7. 
Statistical computing in R – I 
2 
4 
8. 
Data visualizations in R 
2 
4 
9. 
Statistical computing in R – II 
2 
4 
10. 
Data cleaning and preparation using R 
2 
4 
11. 
Creating data visualisations in Tableau 
4 
8 
12. 
Predictive Analytics 
4 
6 
13. 
Time Series Forecasting 
4 
4 
14. 
Introduction to Machine Learning 
4 
8 
15. 
Analytics for Business Domains 
8 
8 
16. 
Industry Project 
12 
54 
TOTAL 
60 
120 
Course Content:
Introduction to Data Science [2Theory]:
What is data Science? – Applications of data science – Skills required – tools required – Models and methods – The data science process – Type of data – Nominal data – Ordinal data – Interval data – Ratio data – Relationship between different types of data Use of graphs to see characteristics of data
Overview of Statistics [6Theory]:
Descriptive Statistics – Central tendency – Spread – Distributions – Inferential Statistics – Hypothesis testing – ChiSquare – Correlation Regression
Statistical computing in Python – I [2Theory, 4Laboratory]:
Using Jupyter Notebooks – Statements and comments – Data types and Variables – Introduction to Numpy and Pandas – Descriptive statistics in Python
Data visualizations in Python [2Theory, 4Laboratory]:
Perceptions of visual cues – Bar chart – dot plot – scatter plot – histogram – plotting in Python – numerical – categorical – time series – Matplotlib
Statistical computing in Python – II [2Theory, 4Laboratory]:
Inferential statistics using Pandas and Scipy.stats library – ChiSquare Test – Correlation – Ttest – ANOVA
Data cleaning and Preparation in Python [2Theory, 4Laboratory]:
Missing values – outliers – sorting – merging – Dropping Columns in a DataFrame – Changing the Index of a DataFrame – Tidying up Fields in the Data – Combining str Methods with NumPy to Clean Columns – Cleaning the Entire Dataset Using the applymap Function
Statistical computing in R – I [2Theory, 4Laboratory]
Basic data types – variables – vectors – matrices – control structures – functions – Factors – Data frames – lists – Useful R packages – Basic statistics in R – Reading in data – Descriptive statistics in R
Data visualizations in R [2Theory, 4Laboratory]:
Basic plotting in R – Using GGPlot2 – Aesthetics – Faceting – Geoms – Position Adjustments – Saving Graphs
Statistical computing in R – II [2Theory, 4Laboratory]:
Inferential statistics using R – ChiSquare Test – Covariance – Correlation – Ttest – Wilcox – ANOVA
Data cleaning and preparation using R [2Theory, 4Laboratory]:
Reshaping – melt – dcast – rbind – cbind – Treating Missing values – Using dplyr – Using tidyr – Working with Continuous and Categorical Variables – Joining Data Sets – Grouping Data
Creating data visualisations in Tableau [4Theory, 8Laboratory]:
What is Tableau – Features of Tableau – Applications of Tableau – The Tableau products – Install Tableau Public – Tableau Workspace – Build views – Connect to data source – Creating dashboards – Data blending
Predictive Analytics[4Theory, 6Laboratory] :
Multiple Linear Regression – Classification – Logistic Regression – Linear Discriminant Analysis – Dimensionality reduction – Rapidminer tool
Time Series Forecasting [4Theory, 4Laboratory]:
Examples of time series – Forecasting – ETS models – Autoregressive models – ARIMA – KMIME tool
Introduction to Machine Learning [4Theory, 8Laboratory]:
What is machine learning? – Applications of machine learning – How does ML work? –Training data – model/algorithm – testing data – evaluation – prediction – tools required – Orange tool – Types of ML – Supervised – Unsupervised – Common techniques – Deep Learning
Analytics for Business Domains [8Theory, 8Laboratory]:
Marketing and Retail – Web and Social Media – Banking and Finance – Supply chain and Logistics
Industry Project [12Theory, 54Laboratory]:
Each student will be required to work on a Data Science project relevant to the industry. This will involve performing business requirements analysis, solutions design, and implementation.
Reference Books:
 Think Stats, Allen B. Downey, O’Reilly Media.
 R for Data Science, Garret Grolemund and Hadley Wickham , Chapman & Hall/CRC
 Python Data Science Handbook, Jake VanderPlas, O’Reilly Media.
For more information and registration please contact Aunwesha Academy Learning and Development partner of Aunwesha Knowledge Technologies Pvt. Ltd 120A Linton Street Kolkata 700014  email: enq@aunweshaacademy.com  Call/Whatsapp: 9088998585/9051952573/9830379592/6290622433 