Scatter Plots A Scatter (XY) Plot has points that show the relationship between two sets of data. In this example, each dot shows one person's weight versus their height. (The data is plotted on the graph as " Cartesian (x,y) Coordinates ") Example: The local ice cream shop keeps track of how much ice cream they sell versus the noon temperature on that day. Here are their figures for the last 12 days: Ice Cream Sales vs Temperature Temperature °C Ice Cream Sales 14.2° $215 16.4° $325 11.9° $185 15.2° $332 18.5° $406 22.1° $522 19.4° $412 25.1° $614 23.4° $544 18.1° $421 22.6° $445 17.2° $408 And here is the same data as a Scatter Plot: It is now easy to see that warmer weather leads to more sales, but the relationship is not perfect. Line of Best Fit We can also draw a "Line of Best Fit" (also called a "Trend Line") on our scatter plot: Try to have the line as close as possible to all points, and as many points above the line as below. Example: Sea Level Rise A Scatte r Plot of Sea Level Rise: And here I have drawn on a "Line of Best Fit". Correlation When the two sets of data are strongly linked together we say they have a High Correlation. The word Correlation is made of Co- (meaning "together"), and Relation Correlation is Positive when the values increase together, and Correlation is Negative when one value decreases as the other increases Like this: Outliers "Outliers" are values that "lie outside" the other values. When we collect data, sometimes there are values that are "far away" from the main group of data ... what do we do with them? Example: Long Jump A new coach has been working with the Long Jump team this month, and the athletes' performance has changed. Augustus can now jump 0.15m further, June and Carol can jump 0.06m further. Here are all the results: Augustus: +0.15m Tom: +0.11m June: +0.06m Carol: +0.06m Bob: + 0.12m Sam: -0.56m Oh no! Sam got worse. Here are the results on the number line: The mean is: (0.15+0.11+0.06+0.06+0.12-0.56) / 6 = -0.06 / 6 = -0.01m So, on average the performance went DOWN. The coach is obviously useless ... right? Sam's result is an "Outlier" ... what if we remove Sam's result? Example: Long Jump (continued) Let us try the results WITHOUT Sam: Mean = (0.15+0.11+0.06+0.06+0.12)/6 = 0.08m Hey, the coach looks much better now! But is that fair? Can we just get rid of values we don't like? What To Do? You need to think "why is that value over there?" It may be quite normal to have high or low values People can be short or tall Some days there is no rain, other days there can be a downpour Athletes can perform better or worse on different days Or there may be an unusual reason for extreme data Example: Long Jump (continued) We find out that Sam was feeling sick that day. Not the coach's fault at all. So it is a good idea in this case to remove Sam's result. When we remove outliers we are changing the data, it is no longer "pure", so we shouldn't just get rid of the outliers without a good reason! And when we do get rid of them, we should explain what we are doing and why.
© Copyright 2026 Paperzz