Dating behavior a framework for analysis and an illustration
Textual data spell checkers can be used to lessen the amount of mistyped words, but it is harder to tell if the words themselves are correct.The process of exploration may result in additional data cleaning or additional requests for data, so these activities may be iterative in nature.Mathematical formulas or models called algorithms may be applied to the data to identify relationships among the variables, such as correlation or causation.In general terms, models may be developed to evaluate a particular variable in the data based on other variable(s) in the data, with some residual error depending on model accuracy (i.e., Data = Model Error).Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing on business information.
The need for data cleaning will arise from problems in the way that data is entered and stored.
Descriptive statistics such as the average or median may be generated to help understand the data.
Data visualization may also be used to examine the data in graphical format, to obtain additional insight regarding the messages within the data.
For example, with financial information, the totals for particular variables may be compared against separately published numbers believed to be reliable.
Unusual amounts above or below pre-determined thresholds may also be reviewed.