Data Analysis

Wednesday, January 30, 2013

Regression Table

Symbols Used in an APA-Style Regression Table

Source	B	SE B	β	t	p

Variable 1	1.57	0.23	.23	2.39	.020
Variable 2	1.26	2.26	.05	0.58	.560
Variable 3	-1.65	0.17	-.28	2.92	.005

There are five symbols that easily confuse students in a regression table: the unstandardized beta (B), the standard error for the unstandardized beta (SE B), the standardized beta (β), the t test statistic (t), and the probability value (p). Typically, the only two values examined are the B and the p. However, all of them are useful to know.

The first symbol is the unstandardized beta (B). This value represents the slope of the line between the predictor variable and the dependent variable. So for Variable 1, this would mean that for every one unit increase in Variable 1, the dependent variable increases by 1.57 units. Also similarly, for Variable 3, for every one unit increase in Variable 3, the dependent variable decreases by 1.65 units.

The next symbol is the standard error for the unstandardized beta (SE B). This value is similar to the standard deviation for a mean. The larger the number, the more spread out the points are from the regression line. The more spread out the numbers are, the less likely that significance will be found.

The third symbol is the standardized beta (β). This works very similarly to a correlation coefficient. It will range from 0 to 1 or 0 to -1, depending on the direction of the relationship. The closer the value is to 1 or -1, the stronger the relationship. With this symbol, you can actually compare the variables to see which had the strongest relationship with the dependent variable, since all of them are on the 0 to 1 scale. In the table above, Variable 3 had the strongest relationship.

The fourth symbol is the t test statistic (t). This is the test statistic calculated for the individual predictor variable. This is used to calculate the p value.

The last symbol is the probability level (p). This tells whether or not an individual variable significantly predicts the dependent variable. You can have a significant model, but a non-significant predictor variable, as shown with Variable 2. Typically, if the p value is below .050, the value is considered significant.

Friday, November 30, 2012

Scientific Merit Review (SMR): Constructs, Variables and Operational Definitions

We were working with a dissertation client recently helping them understand the SMR section 3.3 Constructs, 3.4 Variables, and 3.5 Operational definitions. As we read it, it seemed that the SMR writers got this one correct. Essentially, the pattern of 3.3-3.5 describes the theoretical constructs (3.3), discuss each of the variables (3.4), and tie together how the constructs are measured by the variables (3.5).

3.3 Constructs section is the place to talk about the theoretical constructs. For example, self-efficacy is a construct that can be looked at from social learning theory perspective, attribution theory, motivation theory, etc. Constructs should describe just that, constructs from a theoretical perspective. Every construct in your research questions should be described here.

In the Variables (section 3.4) of the SMR is the section to talk about the variables, and levels of the variables. For example, one could be assessing participants’ sense of locus of control, stability, and controllability, and each of these measures could range on a continuous scale from 1-10, or be scored on an ordinal scale of low-medium-high. Every variable that is used in your study needs to be talked about here.

The Operational definition (3.5) is the section to put 3.3 and 3.4 together: once you talk about the constructs and explain the scales, the operational definition is simply how the constructs in 3.3 are measured by the scales in section 3.4.

The bottom-line is that your language is going to matter: if you don’t have the correct language, you are going to get it kicked-back to you, causing you even more time and tuition dollars—and it’s frustrating. At Statistics Solutions, we help in the latter two sections, or see if you can get a hold of your advisors’ previous students to see how things were written.

Remember one thing: you only have to do this once! You will get through it and you will succeed!

Monday, November 12, 2012

Bonferroni Correction

When the same dependent variable is used multiple times in analysis it increases the likelihood of committing a Type 1 error. Type 1 error occurs when a researcher incorrectly rejects a true null hypothesis. To correct for this type of error, a Bonferroni type adjustment is typically made. This is done by dividing the alpha level (typically set at .05) by the number of tests (n). In this example we will assume three analyses use the same dependent variable. The standard alpha level of .05 would be divided by three (number of analyses for each DV) and the new alpha level would be established at .017. This level would be used to determine statistical significance for the corresponding analyses (Tabachnick & Fidell, 2012).

Tuesday, November 6, 2012

Capella University and the Scientific Merit Review

For the past 20 years Statistics Solutions’ mission is to help graduate students graduate. Whether you go to Berkeley or Capella, students need help. Students (“learner” always reminded me of Milgram’s 60’s Obedience to Authority study) at Capella however have a couple of things working against them. First, they’re not on campus to get the help they need, and second, they’re paying tuition as the process continues. One of the places students get stuck is writing aspects of the Capella’s Scientific merit review (SMR).

My staff and I have worked with over 2000 graduate students, and despite the resources at the universities, some still need help from an objective, non-evaluative professional. We are such professionals! When we work with students they typically get stuck in the same few places: research questions, proposed data analysis, and the target population and participation selection.

Research questions are easy to handle: make sure the constructs (your measures) are obtainable and measure what you want to measure, AND you arrange these constructs in statistical language. For example, if you have constructs A and B, and want to relate them (read “correlate” them), then say that. If you are assessing whether A predicts B (read “regression”), then say predict, impact, or account for variability in B.

Capella’s Scientific Merit Review also asks for a data plan. When it comes to data analysis plans, these plans are based on two things: the statistical language you used in the research questions and the level of measurement of your variables. We have resources on our website or if you need more 1-to-1 help you can go to click here. By the way, Capella will send you back for a round of revisions (tuition not included) if you don’t have this correct. When I went to school 100 years ago, the IRB which we would have sent our SMR to, made sure that we didn’t hurt our participants but now they look at everything. And let’s face it, the revision costs you both time and money.

Sample size is typically trickier still (even with the help of G-power). There are two tricks: selecting the right analysis (see data plan above) and selecting the effect size. Effect size can be derived by looking at past research using these constructs and analyses, then calculating or seeing the effect size used. There’s also a realistic aspect too: for dissertations—and I’ve seen 1000’s of them—large and medium effect sizes, requiring relatively small sample (under 100 participants), is the norm. Requesting a small effect size (small effect take a lot of people to detect) requires typically 300-500 participants—and this is just not reasonable for a dissertation student to obtain. Here are a couple of resources (sample size tool; power analysis) to get you started. I should note that the exception is when you are conducting EFA, CFA, path analysis, and structural equation modeling; these techniques typically require 150 or more participants.

I’m going to leave you with a Dissertation Template to look at. It’s free and you may find some definition of terms helpful.

Good luck with your Scientific Merit Review and call us if you run into trouble. Contact us at: http://www.statisticssolutions.com/contact or call us at (877) 437-8622 (M-F, 9-5 EST)

PS: A Stanford Ph.D. student just called; their private stats consultant just took another job. See, everybody needs help sometimes, even schools with lots of resources!

Wednesday, February 3, 2010

Probability

Probability is a value that specifies whether or not an event is likely to happen. The value of probability generally lies between zero to one. If the probability of a happening of an event comes out to be zero, then that event would be considered successful. If the probability of a happening of an event comes out to be one, then that event would be considered a failure.
There are certain definitions of probability.

Statistics Solutions is the country's leader in probability and dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

A sample space S in probability is a non empty set whose elements are called outcomes. The events in the probability are nothing but the subsets of the sample space.

A probability space consists of the sample space and the probability function, which involves the mapping of the events to the real numbers in an interval of zero in such a way that the probability of the sample space is one. If A0 ,A1, ….. is the sequence of disjointed events, then the probability of the union of the sequence will be equal to the sum of the probability of all the disjointed events.

Conditional probability is that type of probability that denotes the probability of a particular event when it is given that another particular event has occurred, provided that the probability of the occurrence of the other particular event is not equal to zero.

There is a product rule in probability that states that the probability of the intersection of any two particular events is equal to the product between the probability of the second event and the conditional probability of the events.

The theorem of total probability states that if the sample space is the disjointed union of events, for example B1, B2, …. then for all events of A, then the probability of A will be equal to the sum of the probability of the intersection between the event A and the disjointed events Bi.

Suppose the two events, A and B, have a positive probability. In this case, the event A would be independent of B if and only if the conditional probability of A given the events B is equal to the probability of A. It is important to remember that this independence probability would be applicable only when the probability of the event B would not be equal to zero.

There is also an independence product rule in probability that states that the probability of the intersection of the two events is equal to the product of the probability of the event A and the probability of the event B. It is important to remember that in the theory of probability, the disjointed events are not the same as that of the independent events.

The theory of probability is the logic of science. According to James Clerk Maxwell (1850), the true logic involves the calculus of probability, which takes into consideration the magnitude of the probability that is supposed to be reasonable.

The theory of probability can be described with a popular example— the tossing of a coin with possible outcomes of “heads” or “tails.” Suppose “heads” is considered a success and “tails” is considered a failure. Thus, the probability of a success (“heads”) will be the probability of the value one, and the probability of failure (“tails”) is the value of zero. Similarly, rolling dice is another popular example based on the theory of probability.

Monday, February 1, 2010

F-test

An F-test is conducted by the researcher on the basis of the F statistic. The F statistic in the F-test is defined as the ratio between the two independent chi square variates that are divided by their respective degree of freedom. The F-test follows the Snedecor’s F- distribution.

Statistics Solutions is the country's leader in F-test and dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

The F-test contains some applications that are used in statistical theory. This document will detail the applications of the F-test.

The F-test is used by a researcher in order to carry out the test for the equality of the two population variances. If a researcher wants to test whether or not two independent samples have been drawn from a normal population with the same variability, then he generally employs the F-test.

The F-test is also used by the researcher to determine whether or not the two independent estimates of the population variances are homogeneous in nature.

An example depicting the above case in which the F-test is applied is, for example, if two sets of pumpkins are grown under two different experimental conditions. In this case, the researcher would select a random sample of size 9 and 11. The standard deviations of their weights are 0.6 and 0.8 respectively. After making an assumption that the distribution of their weights is normal, the researcher conducts an F-test to test the hypothesis on whether or not the true variances are equal.

The researcher uses the F-test to test the significance of an observed multiple correlation coefficient. The F-test is also used by the researcher to test the significance of an observed sample correlation ratio. The sample correlation ratio is defined as a measure of association as the statistical dispersion in the categories within the sample as a whole. Its significance is tested by the researcher using the F-test.

The researcher should note that there is some association between the t and F distributions of the F-test. According to this association, if a statistic t follows a student’s t distribution with ‘n’ degrees of freedom, then the square of this statistic will follow Snedecor’s F distribution, as in the F-test, with 1 and n degrees of freedom.

The F-test also has some other associations, like the association between the F-test and chi square distribution.

Due to such relationships, the F-test has many properties, like chi square. The F-values in the F-test are all non negative. The F-distribution in the F-test is always non-symmetrically distributed. The mean in F-distribution in the F-test is approximately one. There are two independent degrees of freedom in F distribution, one in the numerator and the other in the denominator. There are many different F distributions in the F-test, one for every pair of degree of freedom.

The F-test is a parametric test that helps the researcher draw out an inference about the data that is drawn from a particular population. The F-test is called a parametric test because of the presence of parameters in the F- test. These parameters in the F-test are the mean and variance. The mode of the F-test is the value that is most frequently in a data set and it is always less than unity. According to Karl Pearson’s coefficient of skewness, the F-test is highly positively skewed. The probability distribution of F increases steadily before reaching the peak, and then it starts decreasing in order to become tangential at infinity. Thus, we can say that the axis of F is asymptote to the right tail.