Monday 29 January 2024

Data Science Process.

Data science is mostly applied in the context of an organization. When the business asks you to perform a data science project, you’ll first prepare a project charter. This charter contains information such as what
you’re going to research, how the company benefits from that, what data and resources you need, a timetable, and deliverables.


1. Setting the research goal: This initial step involves defining the specific problem or question you want to answer using data. It's crucial to have a clear and well-defined goal to guide the rest of the process.

2. Retrieving data: Once you know what you're looking for, you need to gather the relevant data. This can involve accessing existing data sources, designing and conducting surveys or experiments, or scraping data from the web.

3. Data preparation: Raw data is rarely ready for analysis, so this step involves cleaning, organizing, and formatting the data to make it suitable for modeling. This might include tasks like:

  • Data cleaning: Fixing errors, inconsistencies, and missing values.
  • Data integration: Combining data from multiple sources.
  • Data transformation: Converting data into a format compatible with your chosen analysis tools.
  • Feature engineering: Creating new features from existing data to improve the performance of your models.

4. Data exploration: This is where you start to get a feel for the data by analyzing its properties and identifying patterns, trends, and relationships. Exploratory data analysis (EDA) can involve techniques like:

  • Descriptive statistics: Summarizing the data using measures like mean, median, and standard deviation.
  • Data visualization: Creating charts and graphs to represent the data visually.
  • Correlation analysis: Identifying relationships between different variables.

5. Data modeling: This step involves using the prepared data to build a model that can answer your research question or make predictions. There are many different types of data models, such as:

  • Regression models: Used to predict a continuous outcome variable based on one or more predictor variables.
  • Classification models: Used to predict a categorical outcome variable.
  • Clustering algorithms: Used to group similar data points together.

6. Presentation and automation: Finally, you need to communicate your findings to others and, if applicable, deploy your model into production. This might involve:

  • Creating reports and presentations: Summarizing your results and insights in a clear and concise way.
  • Developing dashboards and visualizations: Making your results more accessible and interactive.
  • Deploying the model: Integrating your model into a production environment to make predictions on new data.
 DavyCielen, Arno.D.B.Maysman, Mohamed Ali, “Introducing Data Science” ManningPublications, 2016


Post a Comment

Note: only a member of this blog may post a comment.


Follow US

Join 12,000+ People Following





Java Tutorial


Digital Logic design Tutorial




ANU Materials