logo

1 303 279 1021

Loading

Live Chat

All Issues

Golden Software Newsletter

Our newsletters are filled with interesting technical tips, news of how people are using Surfer, Grapher, Voxler, MapViewer, Didger and Strater, and (of course!) great illustrations.

Subscribe to Our Newsletter

Issue 66

Obtaining Basic Statistical Information in Grapher 9

Before graphing data, it is often necessary to determine some basic statistics about the data. Grapher can calculate statistics in the worksheet and create several statistical graphs. These often help you decide how to work with your data and what assumptions about your data can be made.

 

The data used for this example can be downloaded from here. These are speed measurements (in seconds) from the same experiment. The decision on how to treat this data for modeling purposes can be decided by first looking at the data, second creating box plots and histograms from the data, and third by creating a qq plot.

 

Worksheet Statistics

Let’s start by opening the data file. Click the File | Open command and open the random sample.dat file. The data has been sorted using the Data | Sort command, so we can see the data in ascending order. Highlight the column and choose Data | Statistics. A dialog appears with the types of statistics to compute. Check the boxes next to minimum, maximum, mean, standard error of the mean, 95% confidence interval for the mean, and standard deviation. Select Copy to worksheet and set the Starting in cell value to C1. We can then see:

 


The Data | Statistics command provides
basic statistical information about the data.

 

From the copied statistics, we can see that the data ranges from 6.2 to 17,1 with a mean value of 11.954. The 95% confidence interval for the mean can be rounded to 0.695. When we add and subtract the confidence interval value from the mean, we get a confidence interval of 11.259 to 12.649. This means that we are 95% certain that the true mean is within these values. The standard error can be used if a different confidence interval is desired, or for other testing. The standard deviation of 2.444 gives us an estimate of the spread of the values. These values can be used for more advanced statistical analysis, such as hypothesis testing, or can be graphed for additional information.

 

Standard Statistical Graphs

Probably the first graph most people learn to create is a histogram. This quickly gives us an overview of how the data is distributed by counting the frequency of points between two values. In Grapher, the bin size defines the two values. Grapher allows the histogram to be split into as many bins as desired.

 

To create the histogram, click on the Graph | 2D XY Graphs | Histogram command. Select the data file and click Open. The histogram is initially created with the entire range (from 6.2 to 17.1) included and with a bin size and number of bins necessary to split the data appropriately. To change the number of bins and bin size,

  1. Click on the Histogram 1 plot in the Object Manager.
  2. In the Property Manager, click on the Plot tab.
  3. Scroll down and change the Number of bins to 12.
  4. Change the Bin size to 1.

We can visually compare the graph to known distributions or add a normal Gaussian fit curve to the graph. We might estimate that the data doesn’t quite look like a normal distribution, because the largest peak is off center and the data is not symmetrical. This may tell us that additional analysis is required.


The histogram gives a visual estimate of
the range and distribution of the data.

 

The box plot is a great tool to quickly examine the spread and tendency of the raw data, before modeling. The main section of the box plot shows where the central tendencies are and the whiskers on the plot show any points that are outside the normal values. By comparing where the median line is in the box, you can get a good visual idea of how the data is distributed about the median.

 

To create the box plot, click the Graph | Specialty Graphs | Box-Whisker Plot command. Select the data file and click Open. The default box plot is created. The box plot can be customized by adding labels, displaying outliers, or changing the properties of the plot.

 

To display add labels and display outliers on the box plot, follow these steps:

  1. Click on the Box-Whisker Plot 1 in the Object Manager.
  2. In the Property Manager, click on the Labels tab.
  3. Open the Quartiles (25) section and place a check mark next to Display.
  4. Open the Median (50) section and place a check mark next to Display.
  5. Open the Quartiles (75) section and place a check mark next to Display.
  6. The labels appear on the graph. The labels can be moved by clicking the Graph | Move Labels command. After moving the labels to the desired location, press ESC on the keyboard to end the move labels mode.
  7. Click on the Plot tab.
  8. Check the box next to the Outliers as symbols option. This will display outliers as symbols, instead of including the values in the whiskers.
  9. At the default outlier definition, no symbols appear. This means that all values are within 1.5*IQR. The IQR is defined as the distance between Quartiles (25) and Quartiles (75). The Factor can be changed to a smaller value, if desired.


The box plot displays the spread of the
data, along with the median, first
quartile, and third quartile values.

 

A final simple graph type that can be created would display the central tendencies observed with the worksheet Data | Statistics command and compare them to the box plot values. A floating bar chart can be used to show ranges of data. Let’s rearrange the worksheet statistical data slightly so that the mean plus the 95% confidence value is in one column and the mean minus the 95% confidence value is in another column. Below this, we can put the first quartile and third quartile in one row. In another row, the mean minus the standard deviation and the mean plus the standard deviation. The rearranged data should look like:

 


Rearrange the data to create a floating bar chart.

 

To create the floating bar chart, click the Graph | 2D XY Graphs | Floating Bar command. Select the data and click Open. Any part of the graph can be changed, including the axis labels and grid lines, as shown here:

 


This floating bar chart shows the quartiles, mean ± standard
deviation, and 95% confidence interval for the original data.

 

QQ Plot

The QQ plot is a plot that plots the actual data values against the normal curve values. If the data fits a straight line, the data can be considered normally distributed. To create the graph, you first must have a set of normal curve data, in the same quantity as the data set you are using. To get the normal curve data in the proper spacing and into the same worksheet, follow these steps:

  1. Open this graph in Grapher using the File | Open command. This graph was created with the equation for the normal distribution function.

  2. Click on the normal function curve to export the data.

  3. Click on the Normal Distribution Function Plot in the Object Manager.
  4. Since our data has 50 points, we need 50 points in the function. In the Property Manager, click on the Plot tab. Change the Number of points to 50.
  5. Click the Graph | Export Plot Data and the data is exported to a new worksheet with exactly 50 data points.
  6. Highlight the numbers in column A (the X values) and click the Edit | Copy command.
  7. Switch back to the original random sample.dat data file.
  8. Click in cell B1 and click Edit | Paste.

 

To make the QQ plot, click in the plot window.

  1. Click the Graph | 2D XY Graphs | Line/Scatter command.
  2. Select the worksheet and click Open.
  3. With the Line/Scatter Plot 1 selected, click on the Plot tab in the Property Manager.
  4. Change the X column to Column B.
  5. Change the Y column to Column A.
  6. To add a fit curve, click the <Click here to add/edit fits> next to the Fits command.
  7. In the dialog, click the Add button to add the Linear fit and click OK.


Plot the normal values versus the actual values to
determine if the data is normally distributed.

 

You can see that the data approximately fits the linear fit curve line, indicating that the data are approximately normal, with the exception of the extreme values.

 

Conclusion

These descriptive statistical calculations and graphs allow you, the researcher, to determine how your data can be treated before further analysis.

 

Trusted by over 10,000 Companies and Schools


Label Your 3D Point Cloud with Voxler 3

Image Voxler 3 has added the ability to label your 3D point cloud. You can use numbers or text to label your scatter plot, apply a uniform offset in the X... Read More

Subscribe to Our Newsletter

Enter your email address below to receive email notifications of product updates and our newsletter, filled with helpful technical tips and case studies.