Descriptive statistics summarize or describe the characteristics of a dataset. Common statistical measures include:
Descriptive statistics provide a method for summarizing, interpreting, and presenting data in a meaningful way. These statistics offer insights into the general characteristics of a data set or population. Below is a comprehensive overview of descriptive statistics concepts and how to calculate them, making use of various formulas to derive valuable insights from raw data.
The mean is a measure of central tendency and is calculated as the sum of all data points divided by the total number of data points.
π = (Ξ£π=1ππ₯π) / π
π₯Μ = (Ξ£π=1ππ₯π) / π
Example: If the data set is 42, 54, 65, 47, 59, 40, 53, the mean is:
π = (42 + 54 + 65 + 47 + 59 + 40 + 53) / 7 = 360 / 7 = 51.43
The median is the middle value in an ordered data set. If the number of data points is odd, it is the center value. If even, it is the average of the two middle values.
π = (π + 1) / 2, Median = π₯π
π = π / 2, Median = (π₯π + π₯π+1) / 2
The mode represents the value(s) that occur most frequently in the data set. A dataset can have more than one mode or no mode at all.
The range is the difference between the maximum and minimum values in a dataset.
Range = π₯max β π₯min
Standard deviation measures the spread or dispersion of data values from the mean. A smaller SD indicates data points are closer to the mean, while a larger SD indicates a wider spread.
π = β(Ξ£π=1π(π₯π β π)Β² / π)
π = β(Ξ£π=1π(π₯π β π₯Μ)Β² / (π β 1))
Variance quantifies the spread of the data set and is the square of the standard deviation.
πΒ² = Ξ£π=1π(π₯π β π)Β² / π
π Β² = Ξ£π=1π(π₯π β π₯Μ)Β² / (π β 1)
The midrange is the average of the maximum and minimum values of the dataset.
Midrange = (π₯min + π₯max) / 2
Quartiles divide the dataset into four equal parts:
The Interquartile Range (IQR) is the range between Q1 and Q3.
IQR = Q3 β Q1
Outliers are values that fall far outside the typical range of data. They are often detected using the upper and lower fences, based on the IQR:
Q3 + 1.5 Γ IQR
Q1 β 1.5 Γ IQR
The sum of squares measures the total squared differences between each data point and the mean.
SS = Ξ£π=1π(π₯π β π)Β²
SS = Ξ£π=1π(π₯π β π₯Μ)Β²
MAD measures the average of the absolute differences between each data point and the mean.
MAD = Ξ£π=1π |π₯π β π| / π
MAD = Ξ£π=1π |π₯π β π₯Μ| / π
RMS describes the magnitude of the data set and is calculated as the square root of the average of the squared data values.
RMS = β(Ξ£π=1ππ₯πΒ² / π)
Skewness measures the asymmetry of the data distribution. Positive skewness indicates a distribution with a long right tail, while negative skewness indicates a long left tail.
πΎβ = Ξ£π=1π(π₯π β π)Β³ / (ππΒ³)
πΎβ = (Ξ£π=1π(π₯π β π₯Μ)Β³) / (π(nβ1)(nβ2)sΒ³)
Kurtosis measures the 'tailedness' of a distribution. High kurtosis means the data have more extreme outliers, while low kurtosis indicates a more uniform distribution.
ΞΊ = Ξ£π=1π(π₯π β π)β΄ / (ππβ΄) β 3
ΞΊ = Ξ£π=1π(π₯π β π₯Μ)β΄ / (π(nβ1)(nβ2)(nβ3)sβ΄) β 3
Note: These statistics are crucial for summarizing and making sense of large datasets. Mastery of descriptive statistics is essential for anyone working with data, as it forms the foundation for further statistical analysis.