Data preprocessing involves transforming raw data to well-formed data sets so that data mining analytics can be applied. Raw data is often incomplete and has inconsistent formatting. Preprocessing involves both data validation and data imputation.

*Discretization* :

Discretization methods are used to chop a continuous function into a discrete function, where the solution values are defined at each point in space and time. Discretization simply refers to the spacing between each point in your solution space.

Discretization methods are used to chop a continuous function into a discrete function, where the solution values are defined at each point in space and time. Discretization simply refers to the spacing between each point in your solution space.

*Continuization :*

A continuation reifies the program control state, i.e. the continuation is a **data structure** that represents the computational process at a given point in the process’ execution; the created data structure can be accessed by the programming language, instead of being hidden in the runtime environment.

*Normalization :*

The data normalization (also referred to as data pre-processing) is **a basic element of data mining**. It means transforming the data, namely converting the source data in to another format that allows processing data effectively. The main purpose of data normalization is to minimize or even exclude duplicated data.

*Randomization:*

Randomization in an experiment is **where you choose your experimental participants randomly**. For example, you might use simple random sampling, where participants’ names are drawn randomly from a pool where everyone has an even probability of being chosen.