|
HistogramA histogram is a useful device for exploring the shape of the distribution of the values of a variable. Histograms are used for screening of outliers, checking normality, or suggesting another parametric shape for the distribution. A histogram breaks the range of values of a variable into intervals of equal length. This applet chooses the number of intervals as the smallest of the square root of the number of observations and twenty. The maximum number of intervals is twenty. Thus, if we have 16 observations we have four intervals whereas if we have 1000 observations we have 20 intervals. For each interval, the number of observations falling in that interval is counted. The histogram is constructed by displaying the midpoints of the intervals on the horizontal axis, and the frequencies on the vertical axis. The height of the bar above each midpoint represents the number of observations.ScatterplotA scatterplot is useful for studying the association between two interval variables. It is a plot of the values of one variable against the other. They may suggest a relationship between the two variables, for instance a linear or quadratic relation, or may help to identify patterns or clusters in the data. Another reason to inspect these plots is to detect outliers.QQ PlotA normal distribution is often a reasonable model for the data. Without inspecting the data, however, it is risky to assume a normal distribution. There are a number of graphs that can be used to check the deviations of the data from the normal distribution. A histogram is an example of a graph that can be used to check normality. Here, the histogram should reveal a bell shaped curve. The most useful tool for assessing normality is a quantile quantile or QQ plot. This is a scatterplot with the quantiles of the scores on the horizontal axis and the expected normal scores on the vertical axis. The steps in constructing a QQ plot are as follows: First, we sort the data from smallest to largest. A plot of these scores against the expected normal scores should reveal a straight line. The expected normal scores are calculated by taking the z-scores of (I - ½)/n where I is the rank in increasing order. Curvature of the points indicates departures of normality. This plot is also useful for detecting outliers. The outliers appear as points that are far away from the overall pattern op points |
||||||||||||||||||