Data Mining functionalities

Tell Your Friends

Last Updated on 1 month ago by Sarina Sindurakar

Data mining functions are used to define the kind of patterns that will be discovered during data mining jobs.

Some of the major data mining functionalities are as follows:

Class/ concept descriptions: Characterization and Discrimination

Class/concept descriptions are the definitions of a class or idea.
Data features should be generalised, summarised, and contrasted.
For example, at the Electronics shop, computer and printer classes of things are for sale, and client ideas include large spenders and budget spenders.
Data Characteristics: The characterization of data is a description of the key characteristics of objects in a target class which create what is called a characteristics rule.
Data Discrimination: It compares common feature of class which is under study. The output of this process can be representation many forms.

Patterns that appear frequently in data are known as frequent patterns.
Mining frequent patterns leads to the discovery of interesting associations and correlations within data.

Here, frequent pattern is { milk, bread}
Association rule is milk-> bread If the sale of milk is increased then the sale of bread also increases this indicates correlation.

The process of finding a model that explains and separates data classes or ideas is known as classification.
It’s used to figure out what class an object belongs to when the class label isn’t known.
Describe and distinguish classes or concepts for future prediction
E.g., classify people based on age, income, etc.

Continuous-valued functions are used in the prediction model.
It is used to predict missing or unavailable numerical data values rather than class labels.

Clustering groups data to form new clusters, e.g., cluster fruits to find distribution patterns.
It can used to generate such labels.
The objects are grouped based on the principle of maximising the intra-class similarity and minimising the intra class similarity.

A data object that does not comply with the general behaviour of the data is called outlier.
Outliers may be detected using statistical tests that assume a distribution or probability model for the data,or using distance measures .
Useful in fraud detection, rare events analysis.