If you've been curious about the world of data analysis, today is your lucky day! We're granting you exclusive access to a portion of our best-selling Data Analysis Level 1 course.
Normally reserved for registered students, we're leaking this module to give you a taste of the powerful skills you can gain. Get ready to have a first-hand look at the core principles of descriptive statistics and the magic of pivot tables.
⌨️ The goal of this module is to learn about descriptive statistics and using pivot tables to answer the questions we set out when framing the problem.
Before you begin conducting in-depth analysis, it's essential to first become familiar with your dataset—to get a "feel" for it, if you will. This is where exploratory data analysis comes in.
You can think of it as looking at a large puzzle and trying to figure out what it represents.
The goal is to get a better understanding of the data and what it tells you, so you can make informed decisions and ask more focused questions.
A key part of the exploratory analysis is using Descriptive Statistics.
Descriptive statistics are a set of techniques used to summarize and describe the main features of a dataset.
Imagine you have a dataset with hundreds or even thousands of values. It would be impossible to make sense of it just by looking at the raw data.
For example, if you had customer feedback scores from 500 customers, you wouldn't be able to determine the overall rating just by looking at a spreadsheet with all the raw data. However, you could calculate the average (or mean) score to get an idea of your company’s performance. This is an example of descriptive statistics.
It provides a quick and simple way to get a general idea of what the data looks like and also helps with spotting errors.
Let’s look at another example, when calculating the minimum and maximum values for a certain variable, you might notice one of the values falls outside a reasonable range. For instance, if you've collected product prices for a restaurant menu and calculated a minimum value of -$20, this is not realistic. Investigating further can help you identify and fix the dataset.
The three main types of descriptive statistics are:
Pivot tables are a data tool to summarize and organize large amounts of data in a meaningful way. It allows you to quickly look up the information you need.
They are especially useful when you have a lot of data, because they allow you to quickly see the patterns and relationships in the data that might not be immediately obvious.
💡 Pivot Tables Example
Let’s say you have a worksheet containing monthly sales data for three products. To find out which has the highest revenue. You can manually add the corresponding sales figure to a running total every time the product appears. However, if the worksheet has thousands of rows, it could literally take a lifetime for three products. Pivot tables can quickly aggregate sales figures and calculate their sums in less than a minute.
This was a brief explanation of statistics and pivot tables. Now let’s actually practice with our data set in the next mission and understand it better.
💼 Now that you're familiar with the theory behind descriptive statistics and pivot tables, let's explore how they can be used to answer some of the key research questions we posed at the start of the course.
Q: Calculate descriptive statistics for “Content Duration”, “Price”, and “Subscribers”
For these variables, we’ll calculate the mean and median values (measures of central tendency), as well as the minimum and maximum values.
We’ll be using the following formulas:
=AVERAGE
=MEDIAN
=MIN
=MAX
📹 Video: Descriptive Statistics
As we mentioned earlier, descriptive statistics can help you to spot if something’s “off” with your dataset, or with a particular value.
In the video, we noticed that the minimum value was zero for the content duration and subscribers and we deleted those entries as they were outliers and can give us misleading results. Generally, an analyst would decide what values to keep and remove. For example, it could be that they will perform analysis on all courses with more than 10 subscribers. In that case, we would need to delete the lower value.
Now, looking at our calculations, try answering the following questions:
We can’t, right? Our insights have given us a good starting point, but they are not enough to answer the questions we asked at the beginning of the course.
We have established the basics; now let's use another powerful tool to answer those key questions: the pivot table.
This module you've explored is just the tip of the iceberg. There's a whole world of data insights waiting for you in our full course, with in-depth tutorials, practical exercises, and real-world applications.
If you're ready to unlock the full potential of your data analysis capabilities, the next step is clear.
Enroll now and start a transformational journey to become a data analysis expert: EntryLevel: Data Analysis using Excel and Tableau