Describing data

The most common way people use statistics is to describe data. What is the average value from a set of measurements? How much variability is there? How different are the groups, relative to the variability within groups?

 
When we collect data in a statistical sample, we need to think about the types of variables that we are going to record. Will these be categorical or numeric?
 

Types of Data

What is a variable and what types of data could a variable take on? What is the difference between numeric and categorical data? How do nominal and ordinal data differ?

In addition to classifying variables by their type of data, we can also classify variables as being explanatory or response variables in our study.
 

Explanatory and Response Variables

We can classify variables based on their data type (e.g. categorical versus numerical), but we can also classify variable based on their role within our study. Response variables are the measures we are interested in predicting, and explanatory variables are the things we want to test to see whether or not they predict our response variable(s).

What are frequency distributions and how can we use them to visualise data in a statistical sample?
 

Visualizing Data in a Sample

Once we have collected our data, how can we visualise these data use plots like frequency distributions?

How can we describe a typical value in a statistical sample using the mode, median and mean?
 

Describing a Typical Value in a Sample

What are the various ways that we can describe a typical value in a statistical sample? What is the difference between a mean and a median and when should I use each of these?

Five ways we can describe variability within a statistical sample: range, inter-quartile range, variance, standard deviation and coefficient of variation.
 

Describing Variability in a Sample

What are the ways I can describe variability in a statistical sample? What is a variance and how do I calculate it?

What's the difference between standard deviation and standard error, and when should I use one and not the other?
 

What's the difference between standard deviation and standard error?

Standard deviation and standard error are easy to confuse. Here is a brief explainer of what they are, how they are different and when to use standard deviation versus standard error.

What are confidence intervals, what affects their width and what does confidence mean?
 

Confidence Intervals

Confidence intervals are based on sampling error and provide a range of values within which we expect the true population parameter to fall 95% of the time (for 95% confidence intervals). There is a great tutorial that will help you understand this concept here:

https://www.zoology.ubc.ca/~whitlock/Kingfisher/CIMean.htm

Additional Resources


Whitlock & Schluter - The Analysis of Biological Data

Chapter 1: pages 11-17, and Chapter 3: 65-83 [Sapling Ch1, Sapling Ch3]

Chapter 4: pages 97-109 [Sapling]

 

Why do we need the median?

Intro: Why do we have various ways to measure Central Tendency?

Range, variance, and standard deviation as measures of dispersion

Intro: Helping us understand how spread apart the data is.

 

Standard error and confidence intervals

Intro: Definition of standard error, and how it relates to confidence intervals.

 

What are confidence intervals?

Intermediate: Several real-world examples where confidence intervals are used, followed by a discussion of Frequentist versus Bayesian confidence intervals.


Review Questions

 

Using the following leg-length measurements (mm) from deer ticks, calculate the below values by hand.

0.36 2.73 2.64 3.03 3.63 2.29

  1. What is the mean of the sample?

  2. What is the median value of the sample?

  3. What is the standard deviation of the sample?

The Next Steps


Confused?

Let’s move down the tree and review these concepts.

Ready to Move Forward?

Let’s move up the tree to the next topic.