Data mining functions are used to define the kind of patterns that will be discovered during data mining jobs.
Some of the major data mining functionalities are as follows:
Class/ concept descriptions: Characterization and Discrimination
- Class/concept descriptions are the definitions of a class or idea.
- Data features should be generalised, summarised, and contrasted.
- For example, at the Electronics shop, computer and printer classes of things are for sale, and client ideas include large spenders and budget spenders.
- Data Characteristics: The characterization of data is a description of the key characteristics of objects in a target class which create what is called a characteristics rule.
- Data Discrimination: It compares common feature of class which is under study. The output of this process can be representation many forms.
Mining frequent patterns, Association rules and Correlations
- Patterns that appear frequently in data are known as frequent patterns.
- Mining frequent patterns leads to the discovery of interesting associations and correlations within data.
TID | Items |
1 | Milk, Bread, Cigarette |
2 | Milk, Bread, Sugar |
3 | Milk, Bread, Pen |
- Here, frequent pattern is { milk, bread}
- Association rule is milk-> bread If the sale of milk is increased then the sale of bread also increases this indicates correlation.
Classification and Regression for Predictive analysis
- The process of finding a model that explains and separates data classes or ideas is known as classification.
- It’s used to figure out what class an object belongs to when the class label isn’t known.
- Describe and distinguish classes or concepts for future prediction
- E.g., classify people based on age, income, etc.
- Continuous-valued functions are used in the prediction model.
- It is used to predict missing or unavailable numerical data values rather than class labels.
Cluster Analysis for clustering
- Clustering groups data to form new clusters, e.g., cluster fruits to find distribution patterns.
- It can used to generate such labels.
- The objects are grouped based on the principle of maximising the intra-class similarity and minimising the intra class similarity.
Outlier Analysis
- A data object that does not comply with the general behaviour of the data is called outlier.
- Outliers may be detected using statistical tests that assume a distribution or probability model for the data,or using distance measures .
- Useful in fraud detection, rare events analysis.
Also Read: Introduction To Data Mining