Friday, 5 April 2013

Data Preprocessing Definition :
  • Data preprocessing is a data mining technique that involves transforming raw data into an understandable format.
  • Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors.
  •  Data preprocessing is a proven method of resolving such issues.
  •  Data preprocessing prepares raw data for further processing.
  • Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks).
 Data goes through a series of steps during preprocessing:

* Data Cleaning: Data is cleansed through processes such as filling in missing values, smoothing the noisy data, or resolving the inconsistencies in the data.
* Data Integration: Data with different representations are put together and conflicts within the data are resolved. * Data Transformation: Data is normalized, aggregated and generalized.
 * Data Reduction: This step aims to present a reduced representation of the data in a data warehouse.
* Data Discretization: Involves the reduction of a number of values of a continuous attribute by dividing the range of attribute intervals.
   


Data Cleaning
Importance:
“Data cleaning is the number one problem in data 
warehousing”
        Data cleaning tasks – this routine attempts to
Fill in missing values
Identify outliers and smooth out noisy data
Correct inconsistent data
Resolve redundancy caused by data integration
 


No comments:

Post a Comment