Motivation for Data Mining
Over the last three decades, the steady and remarkable advancement of computer hardware technology has resulted in a large supply of powerful and affordable computers, data collection equipment, and storage media. This technology provides a significant boost to the database and information industries, allowing for the availability of a large number of databases and information repositories. Because of the availability of massive data repositories, a data explosion problem has arisen (data rich knowledge poor situation). As a result, powerful and versatile tools are desperately needed to automatically uncover valuable information from massive amounts of data and convert such data into organised knowledge. Data mining arose as a result of this requirement.
Data mining is the process of extracting information from large sets of data to identify patterns, trends, and useful data that will allow businesses to make data-driven decisions. It is the process of investigating hidden patterns of information from various perspectives for categorization into useful data that is collected and assembled in specific areas such as data warehouses, efficient analysis, data mining algorithms, assisting decision making, and other data requirements to ultimately cost-cutting and revenue generation.
Data mining employs complex mathematical algorithms to segment data and assess the likelihood of future events.
It is also known as Data Knowledge Discovery (KDD).
A data mining system architecture is made up of several components that make up the data mining process. The following major components may be found in the architecture of an atypical data mining system:
- Database,Data Warehouse,WorldWide Web,or Other Information Repository.
- Databases or Data Warehouse Server
- Knowledge Base
- Data Mining Engine
- Pattern Evaluation Module
- User Interface
What kind of Data are Mined on Data Mining?
Data mining, as a general technology, can be applied to any type of data as long as the data is relevant to the target application. For mining applications, the most basic types of data are:
- Relational Database
- A relational database is a collection of multiple data sets formally organised by tables,records and columns from which data can be accessed in various ways without having to recognize the database tables.
- Each table consists of a set of attributes and usually stores a large number of tuples.
- Each tuple in a relational table represents a record identified by a unique key and described by a set of attribute values.
- Data mining is applied to search for trends or data patterns
- Example: predict the credit risk of customers based on their income, age and expenses.
- Data warehouse data
- A data warehouse (DW) is a repository of information collected from multiple sources,stored under a unified schema.
- To facilitate data mining the data in a data warehouse are :
- organised around major subjects (e.g., customer, item, supplier, and activity).
- stored to provide information from a historical perspective, such as in the past 6 to 12 months, and summarised.
- Modelled by multidimensional data structures.
- Transactional data
- Transaction database consists of transactions .
- Data mining provides knowledge about “Which items sold well together?” Such knowledge enables us to bundle groups of items together as a strategy for boosting sales.
- For example, given the knowledge that printers are commonly purchased together with computers, you could offer printers at a steep discount to customers buying computers.
Besides relational database data, data warehouse data, and transaction data, Data mining can also be applied to other forms of data like:
- data streams
- sequence data
- graph or networked data
- spatial data
- text data
- multimedia data
- Web data etc.