data Presentation and Automation - Presenting findings and building applications on top of them

Presenting findings to stakeholders after successful data analysis and model development.
Automating models to meet the demand for repeatable predictions and insights
Implementing model scoring or creating applications for automatic updates of reports, Excel spreadsheets, or PowerPoint presentations.
Emphasizing the importance of soft skills in the final stage of data science.
Recommendation: Find dedicated books and information on the subject to enhance your skills.

Modeling - Build the models

Model Building Process

Clean data and understanding of content are crucial.
Goals include better predictions, object classification, and system understanding.
Focused phase compared to exploratory analysis.
Outcomes determined by desired outcomes.
Below Figure illustrates model building components.

Building a model is an iterative process. The way you build your model depends on whether you go with classic statistics or the somewhat more recent machine learning school, and the type of technique you want to use. Either way, most models consist of the following main steps:

1. Model and variable selection

Selecting variables and modeling technique based on exploratory analysis findings.
Judgment required to choose the right model for a problem.
Consideration of model performance and project requirements.
Factors to consider: model's suitability for production environment, maintenance challenges, and model's ease of explanation.
Action required once the model is developed.

2. Model execution

Once you’ve chosen a model you’ll need to implement it in code. Here are the two example

Example1:

In the above code we provided how a linear regression model will be executed.

Example2:

3. Model diagnostics and model comparison

Multiple models are built and chosen based on multiple criteria.
Holdout sample is used to evaluate the model after building.
The model should work on unseen data.
Only a fraction of the data is used for model estimation.
The model is then unleashed on unseen data and error measures calculated.
Multiple error measures are available, with the mean square error.

Exploration - Exploratory data analysis

2.2.24 – by dodda venkatareddy 0

Exploratory Data Analysis Overview

Deep dive into data using graphical techniques.
Uses open mind and eyes for understanding data interactions.
Aims to discover anomalies not previously identified.
Requires step back and fixation to ensure accuracy.

Visualization Techniques in Data Analysis

Uses range from simple line graphs or histograms to complex diagrams like Sankey and network graphs.
Composes composite graphs for deeper data insight.
Animates or makes interactive graphs for ease and enjoyment.

Interactive Data Exploration Techniques

Combining plots for deeper insights.
Overlaying several plots for better understanding.
Using Pareto diagrams or 80-20 diagrams.
Brushing and linking for automatic transfer of changes from one graph to another.
High correlation between answers indicated by average score per country.
Selection of points on subplots corresponds to similar points on other graphs.
Histogram: Categorizes variables into discrete categories, summarizing occurrences in each category.
Boxplot: Provides distribution within categories, showing maximum, minimum, median, and other characterizing measures.
Techniques include visualization, tabulation, clustering, and other modeling techniques.
Building simple models can also be part of exploratory analysis.
After data exploration, move on to building models.

Key objectives of EDA:

Gaining familiarity with the data: This involves understanding the structure of the dataset, the data types of each variable, and any missing values present.

Identifying patterns and trends: EDA helps uncover relationships between variables, outliers, and potential errors within the data.

Formulating hypotheses: Based on the observations and insights gained, you can start forming hypotheses that you can later test through modeling or analysis.

Guiding further analysis: EDA lays the groundwork for choosing the appropriate techniques for modeling, feature engineering, and data cleaning.

Common steps involved in EDA:

Data import and cleaning: This involves loading the data into your chosen environment and addressing any missing values, inconsistencies, or formatting issues.

Univariate analysis: This step examines each variable individually, using summary statistics like mean, median, and standard deviation for numerical variables and frequency distributions for categorical variables. Visualizations like histograms, boxplots, and bar charts are helpful in understanding the distribution of each variable.
Bivariate analysis: This step explores the relationships between two variables. Scatter plots, heatmaps, and correlation matrices are commonly used to visualize these relationships.
Multivariate analysis: This step involves exploring the relationships between multiple variables simultaneously. Techniques like principal component analysis (PCA) and dimensionality reduction can be used for this purpose.

Benefits of EDA:

Improved data understanding: A thorough EDA provides a deep understanding of the data, its strengths, and weaknesses, allowing you to make informed decisions about further analysis.

Enhanced data quality: By identifying and addressing data quality issues early on, you can ensure the reliability and accuracy of your results.

More effective modeling: Understanding the data's characteristics helps you choose the most appropriate modeling techniques and avoid common pitfalls.

Clearer communication: EDA findings can be effectively communicated to stakeholders through data visualizations and reports, fostering better collaboration and project understanding.

Friday, 2 February 2024

data Presentation and Automation - Presenting findings and building applications on top of them

Modeling - Build the models

Exploration - Exploratory data analysis

Latest Notifications

Results

Timetables

Latest Schlorships

Latest Job Updates

Materials

Previous Question Papers

All syllabus Posts

Important Questions

AI Fundamentals Tutorial

Data Science and R Tutorial

Digital Logic Design Tutorial