Data Analysis

Wednesday, April 10, 2013

One-Way ANOVA Video

The video below demonstrates how to conduct a One-Way ANOVA:

One Way ANOVA from Statistics Solutions on Vimeo.

Conduct and Interpret a One-Way ANOVA

Monday, April 8, 2013

The video below demonstrates how to conduct a one sample t test.

One Sample t Test from Statistics Solutions on Vimeo.

For more information on conducting and interpreting a one sample t test, please click here.

Tuesday, April 2, 2013

Conduct a Paired T-Test Video

The video below demonstrates how to conduct a paired t-test.

Paired t Test from Statistics Solutions on Vimeo.

Statistics Solutions

Friday, March 15, 2013

Independent Sample T-Test Video

The video below demonstrates how to conduct an independent sample t-test.

One Sample t Test from Statistics Solutions on Vimeo.

Wednesday, January 30, 2013

Table and Symbols in a Logistic Regression

Source	B	SE B	Wald χ²	p	OR	95% CI OR

Variable 1	1.46	0.12	7.55	.006	4.31	[3.26, 5.35]
Variable 2	-0.43	0.15	6.31	.012	0.65	[0.18, 0.83]

Note. OR = odds ratio. CI = confidence interval

The table for a typical logistic regression is shown above. There are six sets of symbols used in the table (B, SE B, Wald χ², p, OR, 95% CI OR). The main variables interpreted from the table are the p and the OR. However, it can be useful to know what each variable means.

B – This is the unstandardized regression weight. It is measured just a multiple linear regression weight and can be simplified in its interpretation. For example, as Variable 1 increases, the likelihood of scoring a “1” on the dependent variable also increases. As Variable 2 increases, the likelihood of scoring a “1” on the dependent variable decreases.

SE B – Like the multiple linear regression, this is how much the unstandardized regression weight can vary by. It is similar to a standard deviation to a mean.

Wald χ² – This is the test statistic for the individual predictor variable. A multiple linear regression will have a t test, while a logistic regression will have a χ² test. This is used to determine the p value.

p – this is used to determine which variables are significant. Typically, any variable that has a p value below .050 would be significant. In the table above, Variable 1 and Variable 2 are significant.

OR – this is the odds ratio. This is the measurement of likelihood. For every one unit increase in Variable 1, the odds of a participant having a “1” in the dependent variable increases by a factor of 4.31. However, for Variable 2, this doesn’t make a lot of sense (for every one unit increase in Variable 2, the odds of a participant having a “1” in the dependent variable increases by a factor of 0.65). Any significant variable with a negative B value will be easier to interpret in the opposite manner. Therefore for every one unit increase in Variable 2, the odds of a participant being a “0” in the dependent variable increases by a factor of (1 / 0.65) 1.54. To interpret in the opposite direction, simply take one divided by that odds ratio.

95% CI OR – this is the 95% confidence interval for the odds ratio. With these values, we are 95% certain that the true value of the odds ratio is between those units. If the confidence interval does not contain a 1 in it, the p value will end up being less than .050.

Pearson Correlation Assumptions

The assumptions of the Pearson product moment correlation can be easily overlooked. The assumptions are as follows: level of measurement, related pairs, absence of outliers, normality of variables, linearity, and homoskedasticity.

Level of measurement refers to each variable. For a Pearson correlation, each variable should be continuous. If one or both of the variables are ordinal in measurement, then a Spearman correlation could be conducted instead.

Related pairs refers to the pairs of variables. Each participant or observation should have a pair of values. So if the correlation was between weight and height, then each observation used should have both a weight and a height value.

Absence of outliers refers to not having outliers in either variable. Having an outlier can skew the results of the correlation by pulling the line of best fit formed by the correlation too far in one direction or another. Typically, an outlier is defined as a value that is 3.29 standard deviations from the mean, or a standardized value of less than ±3.29.

Linearity and homoskedasticity refer to the shape of the values formed by the scatterplot. For linearity, a “straight line” relationship between the variable should be formed. If a line were to be drawn between all the dots going from left to right, the line should be straight and not curved. Homoskedasticity refers to the distance between the points to that straight line. The shape of the scatterplot should be tube-like in shape. If the shape is cone-like, then homoskedasticity would not be met.

Regression Table

Symbols Used in an APA-Style Regression Table

Source	B	SE B	β	t	p

Variable 1	1.57	0.23	.23	2.39	.020
Variable 2	1.26	2.26	.05	0.58	.560
Variable 3	-1.65	0.17	-.28	2.92	.005

There are five symbols that easily confuse students in a regression table: the unstandardized beta (B), the standard error for the unstandardized beta (SE B), the standardized beta (β), the t test statistic (t), and the probability value (p). Typically, the only two values examined are the B and the p. However, all of them are useful to know.

The first symbol is the unstandardized beta (B). This value represents the slope of the line between the predictor variable and the dependent variable. So for Variable 1, this would mean that for every one unit increase in Variable 1, the dependent variable increases by 1.57 units. Also similarly, for Variable 3, for every one unit increase in Variable 3, the dependent variable decreases by 1.65 units.

The next symbol is the standard error for the unstandardized beta (SE B). This value is similar to the standard deviation for a mean. The larger the number, the more spread out the points are from the regression line. The more spread out the numbers are, the less likely that significance will be found.

The third symbol is the standardized beta (β). This works very similarly to a correlation coefficient. It will range from 0 to 1 or 0 to -1, depending on the direction of the relationship. The closer the value is to 1 or -1, the stronger the relationship. With this symbol, you can actually compare the variables to see which had the strongest relationship with the dependent variable, since all of them are on the 0 to 1 scale. In the table above, Variable 3 had the strongest relationship.

The fourth symbol is the t test statistic (t). This is the test statistic calculated for the individual predictor variable. This is used to calculate the p value.

The last symbol is the probability level (p). This tells whether or not an individual variable significantly predicts the dependent variable. You can have a significant model, but a non-significant predictor variable, as shown with Variable 2. Typically, if the p value is below .050, the value is considered significant.