![]() |
|
Prerequisite:
Bachelor Degree (B.Sc/B.E/B.Tech) or Diploma in Computer Science, Information Technology, or allied streams.
Course Objective:
· Expose the students to the basic statistical techniques that provide the foundation of data science.
· Illustrate the various steps of the data science process, viz. cleaning, visualisation, modeling, and presentation.
· Provide practice in different software tools used for data science: Python, R, Tableau
Course Outcome : At the end of the course students will be able to
1: Clean and prepare data for analysis
2: Perform basic visualisation of data
3: Model and curve-fit the data
4: Present findings of the analysis to stakeholders
# |
Topic |
Theory |
Lab |
1. |
Introduction to Data Science and Analytics |
2 |
|
2. |
Overview of Statistics |
6 |
|
3. |
Statistical computing in Python – I |
2 |
4 |
4. |
Data visualizations in Python |
2 |
4 |
5. |
Statistical computing in Python – II |
2 |
4 |
6. |
Data cleaning and Preparation in Python |
2 |
4 |
7. |
Statistical computing in R – I |
2 |
4 |
8. |
Data visualizations in R |
2 |
4 |
9. |
Statistical computing in R – II |
2 |
4 |
10. |
Data cleaning and preparation using R |
2 |
4 |
11. |
Creating data visualisations in Tableau |
4 |
8 |
12. |
Predictive Analytics |
4 |
6 |
13. |
Time Series Forecasting |
4 |
4 |
14. |
Introduction to Machine Learning |
4 |
8 |
15. |
Analytics for Business Domains |
8 |
8 |
16. |
Industry Project |
12 |
54 |
TOTAL |
60 |
120 |
Course Content:
Introduction to Data Science [2Theory]:
What is data Science? – Applications of data science – Skills required – tools required – Models and methods – The data science process – Type of data – Nominal data – Ordinal data – Interval data – Ratio data – Relationship between different types of data Use of graphs to see characteristics of data
Overview of Statistics [6Theory]:
Descriptive Statistics – Central tendency – Spread – Distributions – Inferential Statistics – Hypothesis testing – Chi-Square – Correlation- Regression
Statistical computing in Python – I [2Theory, 4Laboratory]:
Using Jupyter Notebooks – Statements and comments – Data types and Variables – Introduction to Numpy and Pandas – Descriptive statistics in Python
Data visualizations in Python [2Theory, 4Laboratory]:
Perceptions of visual cues – Bar chart – dot plot – scatter plot – histogram – plotting in Python – numerical – categorical – time series – Matplotlib
Statistical computing in Python – II [2Theory, 4Laboratory]:
Inferential statistics using Pandas and Scipy.stats library – Chi-Square Test – Correlation – T-test – ANOVA
Data cleaning and Preparation in Python [2Theory, 4Laboratory]:
Missing values – outliers – sorting – merging – Dropping Columns in a DataFrame – Changing the Index of a DataFrame – Tidying up Fields in the Data – Combining str Methods with NumPy to Clean Columns – Cleaning the Entire Dataset Using the applymap Function
Statistical computing in R – I [2Theory, 4Laboratory]
Basic data types – variables – vectors – matrices – control structures – functions – Factors – Data frames – lists – Useful R packages – Basic statistics in R – Reading in data – Descriptive statistics in R
Data visualizations in R [2Theory, 4Laboratory]:
Basic plotting in R – Using GGPlot2 – Aesthetics – Faceting – Geoms – Position Adjustments – Saving Graphs
Statistical computing in R – II [2Theory, 4Laboratory]:
Inferential statistics using R – Chi-Square Test – Covariance – Correlation – T-test – Wilcox – ANOVA
Data cleaning and preparation using R [2Theory, 4Laboratory]:
Reshaping – melt – dcast – rbind – cbind – Treating Missing values – Using dplyr – Using tidyr – Working with Continuous and Categorical Variables – Joining Data Sets – Grouping Data
Creating data visualisations in Tableau [4Theory, 8Laboratory]:
What is Tableau – Features of Tableau – Applications of Tableau – The Tableau products – Install Tableau Public – Tableau Workspace – Build views – Connect to data source – Creating dashboards – Data blending
Predictive Analytics[4Theory, 6Laboratory] :
Multiple Linear Regression – Classification – Logistic Regression – Linear Discriminant Analysis – Dimensionality reduction – Rapidminer tool
Time Series Forecasting [4Theory, 4Laboratory]:
Examples of time series – Forecasting – ETS models – Auto-regressive models – ARIMA – KMIME tool
Introduction to Machine Learning [4Theory, 8Laboratory]:
What is machine learning? – Applications of machine learning – How does ML work? –Training data – model/algorithm – testing data – evaluation – prediction – tools required – Orange tool – Types of ML – Supervised – Unsupervised – Common techniques – Deep Learning
Analytics for Business Domains [8Theory, 8Laboratory]:
Marketing and Retail – Web and Social Media – Banking and Finance – Supply chain and Logistics
Industry Project [12Theory, 54Laboratory]:
Each student will be required to work on a Data Science project relevant to the industry. This will involve performing business requirements analysis, solutions design, and implementation.
Reference Books:
- Think Stats, Allen B. Downey, O’Reilly Media.
- R for Data Science, Garret Grolemund and Hadley Wickham , Chapman & Hall/CRC
- Python Data Science Handbook, Jake VanderPlas, O’Reilly Media.