Data Mining and Data Warehousing Syllabus | BSc.CSIT 7th Semester

Data Warehousing and Data Mining

Course Title: Data Warehousing and Data Mining                       Full Marks: 60 + 20 + 20

Course No: CSC410                                                                      Pass Marks: 24 + 8 + 8

Nature of the Course: Theory + Lab                                            Credit Hrs: 3

Semester: VII

Course Description:

This course introduces advanced aspects of data warehousing and data mining, encompassing the principles, research results and commercial application of the current technologies.

Course Objective:

The main objective of this course is to provide knowledge of different data mining techniques and data warehousing.

Course Contents:

Unit 1: Introduction to Data Warehousing (5 Hrs.)

Unit 2: Introduction to Data Mining (2 Hrs.)

Unit 3: Data Preprocessing (3 Hrs.)

  • Data cleaning ,Data integration and transformation
  • Data reduction, Data mining primitives
  • Data discretization and Concept Hierarchy Generation 

Unit 4: Data Cube Technology (4 Hrs.)

  • Efficient method for data cube computation
  •  Cube materialization (Introduction to Full cube, Iceberg cube, Closed cube, Shell cube),
  • General strategies for cube computation
  • Attribute oriented induction for data characterization
  • Mining class comparison, Discriminating between different classes

Unit 5: Mining Frequent Patterns (6 Hrs.)

  • Frequent patterns, Market basket analysis, Frequent itemsets, closed itemsets, 
  • association rules, Types of association rule (Single dimensional, multidimensional, multilevel, quantitative)
  • Finding frequent itemset (Apriori algorithm, FP growth)
  • Generating association rules from frequent itemset
  •  Limitation and improving Apriori, From Association Mining to Correlation Analysis, Lift

Unit 6: Classification and Prediction (10 Hrs.)

  • Definition (Classification, Prediction), Learning and testing of classification,
  •  Classification by decision tree induction, ID3 as attribute selection algorithm
  • Bayesian classification, Laplace smoothing
  • Classification by backpropagation, Rule based classifier (Decision tree to rules, rule coverage and accuracy, efficient of rule simplification)
  • Support vector machine, Evaluating accuracy (precision, recall, f-measure)
  • Issues in classification, Overfitting and underfitting
  •  K- fold cross validation, Comparing two classifier (McNemar’s test)

Unit 7: Cluster Analysis (8 Hrs.)

  • Types of data in cluster analysis
  • Similarity and dissimilarity between objects
  • Clustering techniques: – Partitioning (k-means, k-means++, Mini-Batch k-means, k-medoids)
  • Hierarchical (Agglomerative and Divisive)
  •  Density based (DBSCAN), Outlier analysis

Unit 8: Graph Mining and Social Network Analysis (5 Hrs.)

  • Graph mining, Why graph mining, Graph mining algorithm (Beam search, Inductive logic programming)
  • Social network analysis, Link mining
  • Friends of friends, Degree assortativity
  • Signed network (Theory of structured balance, Theory of status, Conflict between the theory of balance and status)
  • Trust in a network (Atomic propagation, Propagation of distrust, Iterative propagation), 
  • Predicting positive and negative links

Unit 9: Mining Spatial, Multimedia, Text and Web Data (2 Hrs.)

  • Spatial data mining, Spatial data cube
  • Mining spatial association, Multimedia data mining
  • Similarity search in multimedia data, Mining association in multimedia data
  • An introduction to text mining, natural language processing and information extraction, 
  • Web mining (Web content mining, Web structure mining, Web usage mining)

Laboratory Works:

The laboratory should contain all the features mentioned in a course, which should include data preprocessing and cleaning, implementing classification, clustering, association algorithms in any programming language, and data visualization through data mining tools.

Text Book:

1. Data Mining: Concepts and Techniques, 3rd ed. Jiawei Han, Micheline Kamber, and Jian Pei. Morgan Kaufmann Series in Data Management Systems Morgan Kaufmann Publishers, July 2011.

Reference Books:

1.  Introduction to Data Mining, 2nd ed. Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar. Pearson Publisher, 2019.

2.  Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff rey D. Ullman, 2014.

Facebook Comments

Comments are closed, but trackbacks and pingbacks are open.