Data Analysis: November 2009

Tuesday, November 24, 2009

Analysis Of Variance (ANOVA)

The question that one usually asks about Analysis of Variance (ANOVA) is about the definition of Analysis of Variance (ANOVA). Analysis of Variance (ANOVA) is defined as the process of examining the differences among the means for two or more populations. The next question that arises in the researcher’s mind is what null hypothesis is assumed in the Analysis of Variance (ANOVA). The answer is that the null hypothesis is assumed as the following: “there exists no significant difference in the means of all the populations that are being examined in the Analysis of Variance (ANOVA).”

Statistics Solutions is the country's leader in Analysis of Variance (ANOVA) and dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

The type of variable on which the Analysis of Variance (ANOVA) is applicable is also an important issue. Analysis of Variance (ANOVA) is applicable in cases where the interval or a ratio type of the dependent variable and one or more categorical type of independent variable is involved. The researchers should also note that the categorical type of variables is considered as the factors in the Analysis of Variance (ANOVA). The combination of the factor levels or the categories in the Analysis of Variance (ANOVA) is generally termed as the treatments.

The Analysis of Variance (ANOVA) technique, which consists of only one categorical type of independent variable, or in other words a single factor, is called one way Analysis of Variance (ANOVA). On the other hand, if the Analysis of Variance (ANOVA) technique consists of two or more than two factors or categorical types of variables or independent variables, then it is called n way Analysis of Variance (ANOVA). In this, the term ‘n’ refers to the number of factors in the Analysis of Variance (ANOVA).

Like regression analysis, the process of Analysis of Variance (ANOVA) also requires the calculation of multiple sums of squares for evaluating the test statistic that is used for testing the null and alternative hypothesis. There is also one difference in Analysis of Variance (ANOVA) and regression analysis, and that is that Analysis of Variance (ANOVA) uses separate and combined means and variances for the samples while evaluating the values that are applicable for the sum of the squares.

Often, the researcher questions what type of test statistic is used for testing the significant difference. The test statistic is nothing but the F statistic that is used in Analysis of Variance (ANOVA). The F test statistic is defined as the ratio between the sample variances. The task of the F test in Analysis of Variance (ANOVA) is to carry out the test of significance of the variability of the components existing in the study.

The most important question is about the assumptions in Analysis of Variance (ANOVA).

The first assumption of Analysis of Variance (ANOVA) is that each sample has been drawn from the population by the process of random sampling.

The second assumption of Analysis of Variance (ANOVA) is that the population from which each sample is randomly drawn should follow normal distribution. In other words, this means that in Analysis of Variance (ANOVA), it is assumed that the error term is normally distributed having its mean as zero and the variance as σ2e.

The third assumption of Analysis of Variance (ANOVA) is that there is homogeneity within the variances of the populations from which the sample has been drawn.

The fourth assumption of Analysis of Variance (ANOVA) is that the population that consists of the random effects (A) is normally distributed having ‘0’ as the mean and σ2a as the variance.

Thursday, November 19, 2009

Validity

Validity refers to the state in which the researcher or the investigator can get assurance that the inferences drawn from the data are error free or accurate. If there is validity in the sample, then there is validity in the population from where that sample has been drawn.

Statistics Solutions is the country's leader in validity and dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

There are basically four major types of Validity. These types are Internal Validity, External Validity, Statistically Conclusive Validity and Construct Validity.

Internal Validity refers to that type of validity where there is a causal relationship between the variables. Internal Validity signifies the causal relationship between the dependent and the independent type of variable. Internal Validity refers to those factors that are the reason for affecting the dependent variable. This type of validity is used in the case of the design of experiments where the treatments are randomly assigned.

External Validity refers to that type of validity where there is a causal relationship between the cause and the effect. The cause and effect in this type of validity are those that are generalized or transferred either to different people or different treatment variables and the measurement variable.

Statistically conclusive validity refers to that type of validity in which the researcher is interested about the inference on the degree of association between the two variables. For instance, in the study of the association between the two variables, the researcher reaches statistically conclusive Validity only if he has performed statistical significance tests upon the hypotheses predicted by him. This type of validity is violated when the researcher reaches two types of errors, namely type I error and type II error.

Type I error causes violation of this type of validity because in this type of error, the researcher rejects the hypothesis which was indeed true.

Type II error causes violation of this type of validity because in this type of error, the researcher accepts the hypothesis which was indeed false.

Construct Validity refers to that type of validity in which the construct of the test is involved in predicting the relationship for the dependent type of variable. For example, construct validity can be drawn with the help of Cronbach’s alpha. In Cronbach’s alpha, it is assumed that if its value is 0.80, then it is considered good for confirmation, and if its value is 0.70, then it is adequate. So, if the construct satisfies such conditions, then the validity holds. Otherwise, it does not.

Convergent/divergent validation and factor analysis is also used to test this type of validity.
There is a strong relationship between validity and reliability. A test is said to be unreliable if it does not hold the conditions of validity. Reliability is a necessary property of the test, but is not the sufficient condition for validity.

Thus, validity plays the significant role in making an accurate inference about the data.
There are certain things that act as a threat to validity. These are as follows:

If the researcher collects insufficient data to attain validity in the inference, this is not feasible because insufficient data will not represent the population as a whole.

If the researcher measures the sample of the population with too few measurement variables, then he also cannot achieve validity of that sample.

If the researcher selects the wrong type of sample, then he too cannot achieve validity in the inference about the population.

If the researcher selects an inaccurate measurement method during analysis, then the researcher would not be able to achieve validity.

Tuesday, November 17, 2009

Kaplan-Meier survival analysis (KMSA)

Kaplan-Meier survival analysis (KMSA) is a method that involves generating tables and plots of the survival or the hazard function for the event history data. Kaplan-Meier survival analysis (KMSA) does not determine the effect of the covariates on either function. Kaplan-Meier survival analysis (KMSA) is a kind of explanatory method for the time to event, where the time is considered as the most prominent variable.

Statistics Solutions is the country's leader in Kaplan-Meier survival analysis (KMSA) and dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

Kaplan-Meier survival analysis (KMSA) consists of certain terms that are very important to know and understand, as these terms form the basis of a strong understanding of Kaplan-Meier survival analysis (KMSA).

The censored cases in Kaplan-Meier survival analysis (KMSA) indicate those cases in which the event has not yet occurred. In this case of Kaplan-Meier survival analysis (KMSA), the event is considered as the variable of interest for the researcher. Kaplan-Meier survival analysis (KMSA) can efficiently compute the survival functions in those cases that are censored in nature.
The time is considered as the continuous variable in Kaplan-Meier survival analysis (KMSA). However, the researcher should note that in Kaplan-Meier survival analysis (KMSA), the initial time of the occurrence of the event must be clearly defined.

There is a variable called a status variable in Kaplan-Meier survival analysis (KMSA). This variable in Kaplan-Meier survival analysis (KMSA) defines the terminal event. This variable in Kaplan-Meier survival analysis (KMSA) should always be continuous in nature and should always be a categorical type of variable.

There is a variable called the stratification variable in Kaplan-Meier survival analysis (KMSA). As the name suggests, the stratification variable in Kaplan-Meier survival analysis (KMSA) should be a categorical type of variable. This variable in Kaplan-Meier survival analysis (KMSA) represents the grouping effect. In the medical field, the stratification variable in Kaplan-Meier survival analysis (KMSA) can be types of cancer, like lung cancer, blood cancer, etc.
The researcher should note that Kaplan-Meier survival analysis (KMSA) provides incorrect results when covariates other than the time are considered as the prominent aspect in obtaining the extent of a certain consequence.

There is a variable called a factor variable in Kaplan-Meier survival analysis (KMSA). The factor variable in Kaplan-Meier survival analysis (KMSA) should be of categorical type. This type of variable in Kaplan-Meier survival analysis (KMSA) is used by the researcher to indicate the causal effect of a particular consequence. For example, in the case of the previous example, the treatment applied to decrease the effect of the cancer in the body is considered to be the factor variable in Kaplan-Meier survival analysis (KMSA).

The factor variable in Kaplan-Meier survival analysis (KMSA) is the main grouping variable, whereas the stratification variable is the sub grouping variable in Kaplan-Meier survival analysis (KMSA).

Kaplan-Meier survival analysis (KMSA) can be carried out by the researcher with the help of SPSS software.

The log rank test in Kaplan-Meier survival analysis (KMSA) provided in SPSS allows the investigator to examine whether or not the survival functions are equivalent to each other, by measuring their individual time points.

There are certain assumptions that are made in Kaplan-Meier survival analysis (KMSA). For one, it is assumed in Kaplan-Meier survival analysis (KMSA) that the events that occur in the survival function are the dependent variables that depend only upon the time. This is due to the fact that it has been assumed in Kaplan-Meier survival analysis (KMSA) that survival is always based upon time. Thus, this implies that in Kaplan-Meier survival analysis (KMSA), both the censored and uncensored cases perform in similar manners.

Monday, November 16, 2009

Hierarchical Linear Modeling

Suppose that a researcher wants to conduct Hierarchical Linear Modeling on educational data. Hierarchical linear modeling is a kind of regression technique that is designed to take the hierarchical structure of educational data into account.

Statistics Solutions is the country's leader in hierarchical linear modeling and dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

Hierarchical Linear Modeling is generally used to monitor the determination of the relationship among a dependent variable (like test scores) and one or more independent variables (like a student’s background, his previous academic record, etc).

In Hierarchical Linear Modeling, the assumption of the classical regression theory that the observations of any one individual are not systematically related to the observations related to any other individual is violated. This assumption is violated in Hierarchical Linear Modeling because this yields biased estimates by applying this assumption in classical regression theory.

Hierarchical Linear Modeling is also called the method of multi level modeling. Hierarchical Linear Modeling allows the researcher working on educational data to systematically ask questions about how policies can affect a student’s test scores.

The advantage of Hierarchical Linear Modeling is that Hierarchical Linear Modeling allows the researcher to openly examine the effects on student test scores when the policy relevant variables are used on it (like the class size, or the introduction of a particular reform etc.).

Hierarchical Linear Modeling is conducted by the researcher in two steps.

In the first step of Hierarchical Linear Modeling, the researcher must conduct the analyses individually for every school (in the case of educational data) or some other unit in the system.

The first step of Hierarchical Linear Modeling can be very well explained with the help of the following example. In the first step of Hierarchical Linear Modeling, the student’s academic scores in science are regressed on a set of student level predictor variables like a student’s background and a binary variable representing the student’s sex.

In the first step of Hierarchical Linear Modeling, the equation would be expressed mathematically as the following:

(Science)ij=β0j+β1j(SBG)ij+β2j(Male)ij+eij. In this first step of Hierarchical Linear Modeling, β0 would signify the level of performance for each school under consideration after controlling the SBG (student’s background) and sex. In this first step of Hierarchical Linear Modeling, β1 and β2 indicate the extent to which inequalities exist among the student with respect to the two different variables taken under consideration.

In the second step of Hierarchical Linear Modeling, the regression parameters that are obtained from the first step of Hierarchical Linear Modeling become the outcome variables of interest.

The second step of Hierarchical Linear Modeling can be very well explained with the help of the following example. In the second step of Hierarchical Linear Modeling, the outcome variables mean the estimate of the magnitude of consequence of the policy variable. In the second step of Hierarchical Linear Modeling, the β0j is given by the following formula:

β0j = Y00 + Y01(class size)j + Y02 (Discipline)j + U01.

In the second step of Hierarchical Linear Modeling, Y01 indicates the expected gain (or loss) in the test score of science due to an average reduction in the size of the class. In the second step of Hierarchical Linear Modeling, Y02 signifies the effect of the policy of the discipline implemented in the school.

According to Goldstein in 1995 and Raudenbush and Bryk in 1986, Hierarchical Linear Modeling’s statistical and computing techniques involve the incorporation of a multi level model into a single one. This is where regression analyses is performed (it has been already explained in the above two steps of Hierarchical Linear Modeling). Hierarchical Linear Modeling estimates the parameters specified in the model with the help of iterative procedures.

Friday, November 13, 2009

Fisher Exact test

The Fisher Exact test is a test of significance that is used in the place of chi square test in 2 by 2 tables, especially in cases of small samples.

Statistics Solutions is the country's leader in fisher exact test and dissertation consulting. Contact Statistics Solutions today for a free 30-minute consultation.

The Fisher Exact test tests the probability of getting a table that is as strong due to the chance of sampling. In the case of the Fisher Exact test, the word ‘strong’ is defined as the proportion of the cases that are diagonal with the most cases.

The Fisher Exact test is generally used in one tailed tests. However, the Fisher Exact test can also be used as a two tailed test as well. The Fisher Exact test is sometimes called a Fisher Irwin test. The Fisher Exact test is called by this name because the Fisher Exact test was developed at the same time by Fisher, Irwin and Yates in 1930.

In SPSS, the Fisher Exact test is computed in addition to the chi square test for a 2X2 table when the table consists of a cell where the expected number of frequencies is fewer than 5.

There are certain terminologies that help in understanding the theory of Fisher Exact test.

The Fisher Exact test uses the following formula:

p= ( ( a + b ) ! ( c + d ) ! ( a + c ) ! ( b + d ) ! ) / a ! b ! c ! d ! N !

In this formula of the Fisher Exact test, the ‘a,’ ‘b,’ ‘c’ and ‘d’ are the individual frequencies of the 2X2 contingency table, and ‘N’ is the total frequency.

The Fisher Exact test uses this formula to obtain the probability of the combination of the frequencies that are actually obtained. The Fisher Exact test also involves the finding of the probability of every possible combination which indicates more evidence of association.

There are certain assumptions on which the Fisher Exact test is based.

In the Fisher Exact test, it is assumed that the sample that has been drawn from the population is done by the process of random sampling. This assumption of the Fisher Exact test is also assumed in general in all the significance tests.

In the Fisher Exact test, a directional hypothesis is assumed. The directional hypothesis assumed in the Fisher Exact test is nothing but the hypothesis based on the one tailed test. In other words, the directional hypothesis assumed in the Fisher Exact test is that type of hypothesis which predicts either a positive association or a negative association, but not both.
In the Fisher Exact test, it is assumed that the value of the first person or the unit of items that are being sampled do not get affected by the value of the second person or the other unit of item being sampled. This assumption of the Fisher Exact test would be violated if the data is pooled or united.

In the Fisher Exact test, mutual exclusivity within the observations is assumed. In other words, in the Fisher Exact test, the given case should fall in only one cell in the table.
In the Fisher Exact test, the dichotomous level of measurement of the variables is assumed.

Thursday, November 12, 2009

t-test

The parametric test called the t-test provides a statistical inference about the population by testing the sample that has been drawn from that population in such a manner that it represents the population as a whole.

Statistics Solutions is the country's leader in t-test and dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

This parametric test, called the t-test, is based on the student’s t statistic. This statistic in the t-test is based upon the assumption that the samples are drawn from a normal population. It is assumed in the t-test that the mean of the normal population exists. The shape of the distribution of the t-test is a bell shaped appearance.

The t-test is applicable in those cases where the size of the sample is less than 30. If the sample size is more than 30 and the t-test is carried out on it, then the inference drawn would not be valid as the distribution of the t-test and the normal distribution would not be noticeable.
The parametric test called the t-test is called parametric because it consists of the parameters called the mean and the variance. There are chiefly three types of t-tests: one sample t-test, two independent sample t-tests, and paired sample t-test.

The first type of t-test is applicable in those cases where the testing of one sample is done. For example, if the researcher wants to test whether or not at least 65 % of the students of a particular school would pass their 10 standard board exam, he could use this test. To conduct this type of t-test, a suitable null and alternative hypothesis is created by the researcher. The next step for the researcher is to construct the test statistic. In this case, the test statistic would be t-test. An appropriate level of significance would be selected by the researcher to conduct the t-test of the null hypothesis. The appropriate level of significance for conducting t-test is generally 0.05(which is the same in other significant tests as well). The level of significance refers to the probability that there would be a false rejection of the null hypothesis on which the t-test would be carried out.

Now, the comparison of the tabulated value of the t-test and the calculated value of the t-test is done by the researcher. If the calculated value of the t-test is more than the tabulated value, then the null hypothesis is rejected at that level of significance. In the opposite case of t-test, the null hypothesis is accepted.

Similarly, in the case of the second type of t-test, two independent samples are tested by comparing their significances with the help of the t-test. So, all the steps carried out in the previous step would remain the same, except that the hypothesis assumed by the researcher in this case would be for two independent samples.

Similarly, in the case of the paired sample t-test, the paired type of categories are tested and all the steps would remain the same, except that the hypothesis on which the t-test would be conducted will now be formulated according to the third type of t-test.

Monday, November 9, 2009

Null hypothesis and Alternative Hypothesis

Hypothesis is an approximate explanation that relates to the set of facts that can be tested by certain further investigations. There are basically two types of hypothesis, namely null hypothesis and alternative hypothesis. A research generally starts with a problem. Next a hypotheses like null hypothesis and alternative hypothesis provide the researcher with some specific restatements and clarifications of the research problem.

Statistics Solutions is the country's leader in dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

The criteria of the research problem in the form of null hypothesis and alternative hypothesis should be expressed as a relationship between two or more variables. The criteria of the null hypothesis and alternative hypothesis is that the statements should be the one that expresses the relationship between the two or more measurable variables. The null hypothesis and alternative hypothesis should carry clear implications for testing and stating relations.

The major differences between the null hypothesis and alternative hypothesis and the research problems are that the research problems are simple questions that cannot be tested. The null hypothesis and alternative hypothesis, however, can be tested.

The null hypothesis and alternative hypothesis are required to be fragmented properly before the data collection and interpretation phase in the research. A well fragmented null hypothesis and alternative hypothesis indicates that the researcher has adequate knowledge in that particular area and is thus able to take the investigation further because they can use a much more systematic system. The null hypothesis and alternative hypothesis give direction to the researcher on his/her collection and interpretation of data.

The null hypothesis and alternative hypothesis are useful only if the null hypothesis and alternative hypothesis state the expected relationship between the variables or if the null hypothesis and alternative hypothesis are consistent with the existing body of knowledge. The null hypothesis and alternative hypothesis should be expressed as simply and concisely as possible. The null hypothesis and alternative hypothesis are useful if the null hypothesis and alternative hypothesis have explanatory power.

The purpose and importance of the null hypothesis and alternative hypothesis are that the null hypothesis and alternative hypothesis provide an approximate description of the phenomena. The purpose of the null hypothesis and alternative hypothesis is to provide the researcher or an investigator with a relational statement that is directly tested in a research study. The purpose of the null hypothesis and alternative hypothesis is to provide the framework for reporting the inferences of the study. The purpose of the null hypothesis and alternative hypothesis is to behave as a working instrument of the theory. The purpose of the null hypothesis and alternative hypothesis is to prove whether or not the test is supported, which is separated from the investigator’s own values and decisions. The null hypothesis and alternative hypothesis also provide direction to the research.

The null hypothesis is generally denoted as H0. The null hypothesis states the exact opposite of what an investigator or an experimenter predicts or expects. The null hypothesis basically defines the statement which states that there is no exact or actual relationship between the variables.

The alternative hypothesis is generally denoted as H1. The alternative hypothesis makes a statement that suggests or advises a potential result or an outcome that an investigator or the researcher may expect. The alternative hypothesis has been categorized into two categories: directional alternative hypothesis and non directional alternative hypothesis.

The directional hypothesis is a kind of alternative hypothesis that explains the direction of the expected findings. Sometimes this type of alternative hypothesis is developed to examine the relationship among the variables rather than a comparison between the groups.

The non directional hypothesis is a kind of alternative hypothesis that has no definite direction of the expected findings being specified.

Friday, November 6, 2009

LISREL

LISREL stands for linear structural relation. The methodology of LISREL was first developed by Karl Joreskog in 1970. LISREL is statistical software that is used for structural regression modeling. Structural equation models are the system of linear equations. LISREL is the simultaneous estimation of the structural model and measurement model. Structural model assumes that all variables are measured without error. Factor analysis is the technique that deals with the measurement model. Factor analysis is of two types: one is the exploratory factor analysis, (where the computer determines the underlining factor) and the second type of factor analysis is confirmatory factor analysis (where the researcher determines the factor structure). LISREL makes it possible to combine the structural equation and factor analysis. LISREL can also generate path diagrams for structural equations. LISREL 8.8 is the latest version available. LISREL is not only used for structural equation modeling, but it also has several other program applications. In LISREL, the PRELIS (Lisrel pre-processor) option is used for data manipulation and basic statistics. In LISREL, the SURVEYGLIM option is used for generalized linear modeling. For categorical response variables, formative interface modeling is used in LIRSEL. For continuous response variables, the COMFIRM option is used. For multivariate data, the MAPGLIM option is used for generalized linear modeling. In business, psychology and medical research, most researchers use LISREL for structural equation modeling. LISREL was the first software that was used for structural equation modeling. Competing software for LISREL include AMOS, SAS, and EQS, etc. However, LISREL has its own importance due to unique features.

Statistics Solutions is the country's leader in LISREL consulting and dissertation consulting. Contact Statistics Solutions today for a free 30-minute consultation.

The following are some basic features of LISREL:

Starting of LISREL: Select “LISREL” from the start menu or create a shortcut and start from the short cut.
Importing data in to LISREL: To enter data into LISREL, select “import options” from the file menu.

Opening a new window: In LISREL file, the “new “option is used to open a new window. From the new option we can open syntax, output, path diagram or data window as required.

Data manipulation: In the “data” option of LISREL, there are options like the variable properties, select variable, sort case, insert variable, delete variable, assign weight, etc.
Transform option: Like SPSS, LISREL also has an option to record or compute a new variable by using the “transform” option.

Statistics option: In LISREL, by using the statistics option, we can perform all the statistical models. LISREL can handle a number of models that include measurement models, no recursive models, hierarchical linear models, confirmatory factor analysis models, ordinal regression models, multiple group comparisons model, etc.

Graph option: Like many other statistical software, LISREL also has the option for graphs. By using the “graph” option in LISREL, we can produce high quality univariate, bivariate and multivariate charts.

Advance modeling: In LISREL, the multilevel option provides the flexibility to perform advance level modeling. By using the multilevel option, we can perform advance level linear and non-linear statistical methods.

View and Window option: Like any other statistical software, LISREL also has the view and window option. View option has the basic features like the tool bar, status bar, etc. By using the window option, we can arrange the window in a horizontal or vertical manner.

Advantages of LISREL:
1. This software provides the full information about the model coefficient which increases the power of the model.
2. It provides good treatment to the missing value.
3. It provides significance testing for all the coefficients.
4. It imposes restrictions on models if that is what is wanted.
Drawbacks of LISREL:
1. It is complicated to handle when someone is a novice.
2. The interaction effects are hard to handle.
3. Correlation matrix is used in SEM and it is assumed that these correlations are derived from the multivariate normality distribution. This assumption does not look valid.

Kolmogorov Smrinov’s one sample test

The Kolmogorov Smrinov’s one sample test is a test for goodness of fit. The Kolmogorov Smrinov’s one sample test is concerned with the degree of agreement between the distribution of the observed sample values and some specified theoretical distribution. The Kolmogorov Smrinov’s one sample test determines whether or not the values in a sample can reasonably be thought to have come from a population having a theoretical distribution.

Statistics Solutions is the country's leader in Kolmogorov Smirinov's one sample test and dissertation consulting. Contact Statistics Solutions today for a free 30-minute consultation.

In Kolmogorov Smrinov’s one sample test, it is assumed that the distribution of the underlying variables being tested is continuous in nature. The Kolmogorov Smrinov’s one sample test is appropriate for those types of variables that are tested at least on an ordinal scale.
One usually conducts Kolmogorov Smrinov’s one sample test in order to test the normality assumption in analysis of variances.

Suppose, for example, F0(x) has a completely specified cumulative relative frequency distribution function in Kolmogorov Smrinov’s one sample test. In this case in Kolmogorov Smrinov’s one sample test, the theoretical distribution under the null hypothesis for any value of F0(x) is the proportion of the cases that are expected to have values which are equal to or are less than the value of x.

Suppose Sn(x) is the observed cumulative relative frequency distribution function of a random sample of ‘n’ observations in Kolmogorov Smrinov’s one sample test. If xi is any possible value in Kolmogorov Smrinov’s one sample test, then Sn(xi) = Fi/n , where Fi is nothing but the number of expected proportions of observations which are less than or equal to xi.

Now, according to the null hypothesis in Kolmogorov Smrinov’s one sample test, it is expected that for every value of xi, Sn(xi) should be fairly close to F0(xi). In other words, in Kolmogorov Smrinov’s one sample test, if the null hypothesis is true, then the difference between Sn(xi) and F0(xi) is small and should be within the limits of the random error.

The Kolmogorov Smrinov’s one sample test focuses on the largest of the deviations. The largest deviation in Kolmogorov Smrinov’s one sample test is called the maximum deviation. The maximum deviation in Kolmogorov Smrinov’s one sample test is the largest absolute difference between the cumulative observed proportion and the cumulative proportion expected on the basis of the hypothesized distribution. The sampling distribution of the maximum deviation in Kolmogorov Smrinov’s one sample test under the null hypothesis is generally known.

There are certain assumptions that are made in Kolmogorov Smrinov’s one sample test.

It is assumed that in Kolmogorov Smrinov’s one sample test, the sample is drawn from the population by the process of random sampling.

It is assumed in Kolmogorov Smrinov’s one sample test that the level of data variables should be continuous interval or ratio types in order to get the exact results. If approximate results are required by the researcher through Kolmogorov Smrinov’s one sample test, then the researcher can use ordinal data or grouped interval level of data.

Kolmogorov Smrinov’s one sample test is also used for ordinal scale of data when the large-sample assumptions of the chi-square goodness-of-fit test are not met.

The hypothetical distribution is specified in advance in Kolmogorov Smrinov’s one sample test.
In the case of the normal distribution in Kolmogorov Smrinov’s one sample test, the expected sample mean and sample standard deviation should always be specified in advance.

In the case of Poisson distribution and in the case of exponential distribution in Kolmogorov Smrinov’s one sample test, the expected sample mean should always be specified in advance.

In the case of uniform distribution in Kolmogorov Smrinov’s one sample test, the expected range which consists of the minimum and maximum values, should always be specified in advance.