B.Sc. CSIT 7th Semester Data Warehousing and Data Mining

Knowledge Discovery in Database(KDD)

Knowledge Discovery in Databases (KDD)

Knowledge Discovery in a database is the process of discovering useful knowledge from a collection of data .This widely used data mining technique is a process that includes data preparation and selection, data cleansing incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results.

Knowledge Discovery in Database

Knowledge discovery consist of an iterative sequence of the following steps:

Data Cleaning

  • Data cleaning is the initial phase in the knowledge discovery process, and it involves removing noise and incorrect data.

Data Integration

  • The second phase is Data Integration, which involves combining data from several sources into a single data warehouse.
  •  Careful data integration can help to decrease and eliminate redundancies and inconsistencies in the final data collection.
  • This can help increase the accuracy and speed of the data mining process after that.

Data Selection

  • The data storage is queried for information relevant to the analysis process.
  • It is done using neural networks,Decision Trees,Naive Bayes,Clustering,Regression etc.
  •  E.g., if you want to perform stock market prediction then retrieve the data relevant for this prediction.

Data Transformation

  • Data are transformed(e.g., normalisation) or consolidated into forms appropriate for mining by performing summary or aggregation operations. Data Transformation is a two step process
  • Data Mapping: Assigning elements from source base to destination to capture transformation.
  • Code generation: Creation of the actual transformation program.
  • For example, the daily sales data may be aggregated so as to compute monthly and annual total amounts.

Data Mining

  • Data Mining methods are applied in order to extract  data patterns.
  • Data relevant to the task is transformed into patterns.
  • Using categorization or characterization, it determines the model’s purpose.

Pattern Evaluation

  • Data Patterns are identified based on some interesting measures.
  • To make data understandable to the user, it employs summarization and visualisation.

Knowledge Presentation

  • Many different knowledge representation strategies are used to communicate information to the user.
  • It creates reports, tables, discriminant rules, classification rules, and characterization rules.

Also Read: Data Mining Functionalities

About Author

Sarina Sindurakar