To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Tuesday, January 27, 2009

Data Analysis

Data analysis refers to the larger subject of gathering, transforming, Interpreting and modeling data into meaningful information relevant for business decisions. With the help of data analysis we can compare, support and highlight content based on relevance and context in a business or research setting. It consists of various techniques, approaches and applications which act as aids in producing information for managerial and business decision making. Data analysis can be applied in various fields such as research, medicine, engineering, election polling and gambling to name a few. Some of the more popular data analysis techniques are:

· Data mining

Data mining is the process of obtaining meaningful information from a raw dataset. For instance, a bank may need to know the specific areas wherein bad debts are frequent, from it’s entire database of collections and bad debts. Data mining is an important step in transforming data into information. Industries that implement data mining include retail, financial services, marketing organizations and many more.

· Business Intelligence

This is a technique used in the interpretation of data available to an organization using reports, transaction processing, analytics, scorecards and even includes data mining in its overall context. It encompasses all the tools, skills, applications and strategies in developing meaningful information that can be used and accessed by managers or decision makers. In essence, business intelligence helps present information to those interested in it, in a format that makes it easy to understand the historical context, key metrics, benchmarks and other relevant topics that aid in the analysis and planning of business activities and performance.

· Exploratory Data analysis (EDA)

This method of data analysis refers to the analysis and interpretation of data. Using tools such as charts and diagrams, along with key descriptive statistics such as means, totals, crosstabs and other related methods, exploratory data analysis attempts to provide the information that helps business performance.

· Confirmatory Data analysis (CDA)

The key difference between exploratory and confirmatory data analysis is that while the former aids in navigating the gamut of performance related data, this technique aids in producing information to assess the validity of business decisions. It includes techniques such as hypothesis testing, structured equation modeling and other quantitative techniques.

Data analysis involves a number of steps in its general form. In order to understand the process behind data analysis in depth, one needs to evaluate the context and needs of the data analysis activity. However, the following steps can act as general guidelines for a data analysis project:

1. The First step is the screening or the cleansing of the data to eliminate errors, such as missing and duplicate values. Typically cleansing will also encompass normalizing date in order to make sure that the results can be easily analyzed using major statistical tools and methods.

2. The Second step is to select key indicators which are meant to be produced from the data analysis project, in terms of metrics, reports, benchmarks and formats. This is similar to a design phase where the entire project is planned and laid out.

3. Next, we conduct the actual analysis or data analysis and obtain results. Depending on the specific technique which is used, it could be a dynamic (real time visuals and reports) or static process (wherein reports and other metrics and generated and delivered to be analyzed). In addition, the process could be interactive, wherein a user selects metrics on-the-fly and the screen report is generated. Many business intelligence platforms offer this facility.

4. Reliability testing should be a priority in nearly every data analysis exercise. It ensures that the output received is meaningful, particularly when statistical analyses are involved.

‘Gotchas’ in data analysis: key validation requirements for meaningful data analysis

Data analysis can help create a very useful and valuable information chain in an organization or it can cause the disruption of perfectly good activities, if it is based on nonsensical information founded on invalid data or analysis techniques. It is important therefore, to know:

· Source of data and collection methods: it’s important to know how the data were collected, organized and sourced, in order for the data analysis to be meaningful. Cases of unethical data collection abound, which can lead to serious privacy concerns and consequences for the analyst as well as the reporter. In addition, how the data were collected is critical to know whether or not the underlying assumptions make any sense whatsoever.

· Analytical methods/techniques: based on the characteristics of the data set itself and the needs of the analysis, techniques or methods have to be chosen carefully. This could mean a simple decision such as z-test versus ANOVA or t-test, or a larger research design concern.

· Understand the limitations of quantitative facts: at the end of the day, the purpose of the data analysis is to help make decisions to improve performance. It is very easy to be misled by averages and totals that do not display the whole picture. A background into quantitative statistics and values can reveal facts that would otherwise be ignored.

For assistance with your data analysis for your research or dissertation click here!