Data Warehousing and Data Mining
Course Title: Data Warehousing and Data Mining Full Marks: 60 + 20 + 20
Course No: CSC410 Pass Marks: 24 + 8 + 8
Nature of the Course: Theory + Lab Credit Hrs: 3
Semester: VII
Course Description:
This course introduces advanced aspects of data warehousing and data mining, encompassing the principles, research results and commercial application of the current technologies.
Course Objective:
The main objective of this course is to provide knowledge of different data mining techniques and data warehousing.
- Unit 1: Introduction to Data Warehousing (5 Hrs.)
- Unit 2: Introduction to Data Mining (2 Hrs.)
- Unit 3: Data Preprocessing (3 Hrs.)
- Unit 4: Data Cube Technology (4 Hrs.)
- Unit 5: Mining Frequent Patterns (6 Hrs.)
- Unit 6: Classification and Prediction (10 Hrs.)
- Unit 7: Cluster Analysis (8 Hrs.)
- Unit 8: Graph Mining and Social Network Analysis (5 Hrs.)
- Unit 9: Mining Spatial, Multimedia, Text and Web Data (2 Hrs.)
- Laboratory Works:
- Text Book:
- Reference Books:
Course Contents:
Unit 1: Introduction to Data Warehousing (5 Hrs.)
- Lifecycle of data
- Types of data
- Data warehouse and data warehousing ,Differences between operational database and data warehouse
- A multidimensional data model
- OLAP operation in multidimensional data model
- Conceptual modeling of data warehouse
- Architecture of data warehouse
- Data warehouse implementation
- Data marts, Components of data warehouse
- Need for data warehousing ,Trends in data warehousing
Unit 2: Introduction to Data Mining (2 Hrs.)
- Motivation for data mining
- Introduction to data mining system and Data mining functionalities
- KDD(Knowledge discovery in Database)
- Data object and attribute types
- Statistical description of data, Issues and Applications
Unit 3: Data Preprocessing (3 Hrs.)
- Data cleaning ,Data integration and transformation
- Data reduction, Data mining primitives
- Data discretization and Concept Hierarchy Generation
Unit 4: Data Cube Technology (4 Hrs.)
- Efficient method for data cube computation
- Cube materialization (Introduction to Full cube, Iceberg cube, Closed cube, Shell cube),
- General strategies for cube computation
- Attribute oriented induction for data characterization
- Mining class comparison, Discriminating between different classes
Unit 5: Mining Frequent Patterns (6 Hrs.)
- Frequent patterns, Market basket analysis, Frequent itemsets, closed itemsets,
- association rules, Types of association rule (Single dimensional, multidimensional, multilevel, quantitative)
- Finding frequent itemset (Apriori algorithm, FP growth)
- Generating association rules from frequent itemset
- Limitation and improving Apriori, From Association Mining to Correlation Analysis, Lift
Unit 6: Classification and Prediction (10 Hrs.)
- Definition (Classification, Prediction), Learning and testing of classification,
- Classification by decision tree induction, ID3 as attribute selection algorithm
- Bayesian classification, Laplace smoothing
- Classification by backpropagation, Rule based classifier (Decision tree to rules, rule coverage and accuracy, efficient of rule simplification)
- Support vector machine, Evaluating accuracy (precision, recall, f-measure)
- Issues in classification, Overfitting and underfitting
- K- fold cross validation, Comparing two classifier (McNemar’s test)
Unit 7: Cluster Analysis (8 Hrs.)
- Types of data in cluster analysis
- Similarity and dissimilarity between objects
- Clustering techniques: – Partitioning (k-means, k-means++, Mini-Batch k-means, k-medoids)
- Hierarchical (Agglomerative and Divisive)
- Density based (DBSCAN), Outlier analysis
Unit 8: Graph Mining and Social Network Analysis (5 Hrs.)
- Graph mining, Why graph mining, Graph mining algorithm (Beam search, Inductive logic programming)
- Social network analysis, Link mining
- Friends of friends, Degree assortativity
- Signed network (Theory of structured balance, Theory of status, Conflict between the theory of balance and status)
- Trust in a network (Atomic propagation, Propagation of distrust, Iterative propagation),
- Predicting positive and negative links
Unit 9: Mining Spatial, Multimedia, Text and Web Data (2 Hrs.)
- Spatial data mining, Spatial data cube
- Mining spatial association, Multimedia data mining
- Similarity search in multimedia data, Mining association in multimedia data
- An introduction to text mining, natural language processing and information extraction,
- Web mining (Web content mining, Web structure mining, Web usage mining)
Laboratory Works:
The laboratory should contain all the features mentioned in a course, which should include data preprocessing and cleaning, implementing classification, clustering, association algorithms in any programming language, and data visualization through data mining tools.
Text Book:
1. Data Mining: Concepts and Techniques, 3rd ed. Jiawei Han, Micheline Kamber, and Jian Pei. Morgan Kaufmann Series in Data Management Systems Morgan Kaufmann Publishers, July 2011.
Reference Books:
1. Introduction to Data Mining, 2nd ed. Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar. Pearson Publisher, 2019.
2. Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff rey D. Ullman, 2014.