Graphing

Reading and creating graphs will be fundamental to succeeding in physics. In nearly every experiment we perform, we will graph data to find general trends and underlying relationships. In our practice work, graphs will frequently be the clearest way to express relationships between variables. While there is the potential for using several different types of graphs, we will most often use line graphs and scatter plots. In either case, there are a few key ideas to consider when trying to make meaning of a graph.

Below is a video that shows the basics of setting up a graph and plotting points. Below that is a more detailed written description of each step in that process.

What are we graphing?

This seems obvious, but I see more mistakes made with this question than any other step in reading or creating a graph. Labels on the axes are always going to be the first thing we consider when reading a graph, and the first thing we write when making a graph. Take time on this step -- a mistake here makes the rest of the process moot.

Independent and dependent variables

In general, the placement of a measurement on the X or Y axis is NOT arbitrary. There is a convention that the independent variable goes on the X (horizontal) axis, and the dependent variable goes on the Y (vertical) axis. The distinction can be subtle, but it is an important one. In general, the dependent variable is determined (at least in part) by the value of the independent variable (or at least we believe that it does).

As an example, we might conduct an experiment to determine the relationship between the mass of a pendulum bob and the time it takes to make one complete motion. In this experiment, we might believe that adding mass to the end of a pendulum will cause the pendulum to swing more slowly or more rapidly. Since we are conducting this experiment under the impression that changes in mass cause a change in the time, mass would be the independent variable and time would be the dependent variable. Conversely, we would probably all agree that it would be silly to think that making a change to the time it takes to swing the pendulum would cause the mass of the bob to change.

Other times, the relationship between the two variables will be less obvious. Most commonly, this will occur when we measure some quantity and how it changes over time. For example, we might measure the speed of a ball as it rolls down a hill and how it changes over time. The fact that time is advancing isn't directly causing the ball to get faster; it's less direct than that. Instead, we have a force that pulls the ball downward and makes it accelerate. The longer it accelerates, the faster it goes. It certainly makes some sense to say that the amount of time the ball has been rolling down the ramp has an effect on how fast it is moving. It makes no sense to say that the speed of the ball somehow affects how long it has been rolling down the ramp. In situations where a variable is measured at different moments in time, we will always treat time as the independent variable and the other measured quantity as the dependent one.

Scale

The most important thing to remember about scale is that both axes in every graph must have consistent scales. This means that every time you go over one space on the graph, the associated X value changes by the same amount. Every time you go up or down one space on a graph, the associated Y value changes by the same amount. We base our analysis of the relationship between variables on the shapes we see in graphs. Without a consistent scale, the shape of the graph is meaningless.

We must also consider that, even with a consistent scale, the choice of what scale to use will affect the way a graph looks. In particular, the choice to zoom in on a specific region by using a break on the axes can have a huge effect. Below are two graphs of the same data, but with different scales. The one on the left has a consistent scale, while the one on the right as a much smaller consistent scale and uses a break in the Y-axis.

When reading the graph on the left, we would probably conclude that the object had a roughly constant speed over the 11 or so seconds it travelled. From the right graph, however, it seems that the object was getting faster over time, maybe with a dip in the middle. Whether we are constructing a graph or interpreting one that is already made, making note of the scale on the Y-axis and using that scale to inform our interpretation of the data is essential.

Fitting lines and curves

When creating a scatterplot, we will often have a mixture of raw data (collected through experimentation) and added elements related to our interpretation of the data. One such element is a fitted line or curve. In adding this to a graph, we are acknowledging that each of our data points is subject to uncertainty, and therefore may not be an exact representation of the situation in question. A fitted line or curve is our best idea of what the true shape of the graph would be if it were possible to get perfect data. While it is possible to examine a system that is inconsistent with lots of tiny, seemingly random fluctuations, we will tend to focus on systems where a consistent relationship between variables can be observed. In short, this means that our fitted lines or curves will generally have a consistent shape across the entire graph, or at least across sections of the graph. A fitted line or curve will NOT look like you've been playing "connect the dots" on your graph.

Two graphs of the same data are shown below. The one on the left has a fitted line, the one on the right has line segments connect each data point.

The graph on the left shows that our interpretation of the data is that the object was getting faster consistently over time, but uncertainty in the measurements caused some of the data points not to show that. The graph on the right, however, shows that our interpretation of the data is that the object initially got faster, then a bit slower, and then faster again, and that the uncertainty in the measurements was not large enough for our data points to show anything but the true event. Either interpretation could be correct, depending on the situation and the amount of uncertainty in the measurements.

Fitting lines and curves by hand can be tricky (those using graphing calculators are welcome to utilize the built-in functionality to speed up that process). Here are a few guidelines to follow to make sure that your fitted line or curve is representative of the data:

    1. The fitted line or curve does not need to pass through each point. It does strengthen your argument if it passes through each point with error bars (discussed later), but even that doesn't always work out. It should pass as close to as many points as possible.

    2. When we account for uncertainty, we acknowledge that we will sometimes measure values as being larger than they actually are, and sometimes as smaller than they actually are. When dealing with random uncertainty, there is an equal probability that a data point will fall above or below the true value. As such, our idealized graph (the fitted line or curve) ought to have roughly as many points above the line as it does below the line.

    3. To the last point, it is unlikely that all the points in a particular range will be measured too high, while all the points in a different range will be measured too high. If you find that, for example, all the points on the left side of your graph are below your fitted line and all the ones on the right are above the fitted line, try tilting the line. If, for another example, you draw a line and find that all the points on the left and right side of your graph are above the line, and all the points in the middle are below the line, consider trying some kind of a curve that fits that shape better.

Uncertainty

There are no perfect measurements, and we need to be aware of this when we interpret relationships from sets of data. Students often use phrases like, "the data tells us this" or "the data indicates that". These statements are ignoring the creative, interpretive part of conducting an experiment. To learn about uncertainty and how to quantify it, check out this page. In this section, we'll look at how to use values of uncertainty when creating graphs to help with the interpretive part of the process.

For each value that we intend to plot on a graph, there will be some amount of associated uncertainty for both the X and Y positions. The dots we place on the graph will be shown at the measured values. In addition to the dots, we will add "error bars" or "uncertainty bars" in both dimensions to help us visualize the area in which the "true value" for that point might be found.

The size of these error bars will correspond to the uncertainty in each value. For example, if you plotted a point whose X value was 4.0 with an uncertainty of 0.2, then dot would be placed at 4.0 and the error bars on each side would have a length of 0.2. The range of values where we expect the "true value" to fall, then, would be somewhere between 3.8 and 4.2. We do the same thing for the Y values, and end up with little rectangles in which we have high confidence that we would find the true values for each data point.

These error bars should help you interpret your data, and will often help you decide on a scale. If we consider the earlier example in which we saw the same data represented on two very different scales, it might be unclear at first which is the more appropriate scale to use. Let's say that the uncertainty for each time in that data is 0.2 seconds, and for each speed in the data is 0.1 m/s. The graphs below are identical to the ones first presented but this time, have error bars included.

With the introduction of the error bars, we get a sense that the scale for the graph on the right is inappropriate for the data. With that scale, the only thing we can really say for sure is that all the data belongs somewhere on the graph. We have no ability to understand the shape of the data. We could draw a horizontal line that passes through each data point (with the error bars). We could also draw a diagonal line with a positive or a negative slope that passes through each error bar. A variety of curves could be drawn that pass through these points.

On the graph on the left, however, the amount of uncertainty is shown as an insignificant amount compared to the graph as a whole. In this graph, the data seems very much to me linear, such that a horizontal line could pass through all the data. This graph may actually err on the other side. The scale is so large that we can barely see the error bars. If we go to a large enough scale, we can make any data set appear to fit a horizontal line. For example, if you took the first 100 whole-number points that fit on the graph y = x ( <1,1>, <2, 2>, etc.) and graphed them on a graph that went from 0 to 100 in the x direction and in the Y direction, you would see a graph that probably looks quite familiar, such as below:

The graph below shows the same data, but graphed with the Y-axis going from zero to one million:

Now that data looks identical to a graph of the equation y = 0.

The uncertainty of measurements can be a good guideline for choosing an appropriate scale for the graph. You want the error bars visible, but not to dominate the space. For the example above, in which the uncertainty for the speed is 0.1 m/s, a scale of roughly 0.1 m/s might be a good choice. This will require a break in the graph to get all data to show up, and would look like the following:

This graph will probably give us a strong position from which to make our argument. You might conclude that the speed increased slightly over time, or that it was constant. Either way, you'd add a line or curve of best fit that passed through each of the error bars in order to make a strong case for your conclusion. Here, it would probably make sense to conclude that, since we see a "change" in the speed of less than 0.1 m/s, and each measurement of speed has an uncertainty of 0.1 m/s, the speed remained constant (you could say "roughly constant") during this time, and any changes we see are just an artifact of the uncertainty of the measurements.