B.Sc. CSIT 7th Semester

Data Mining and Data Warehousing Syllabus | BSc.CSIT 7th Semester

Data Warehousing and Data Mining

Course Title: Data Warehousing and Data Mining                       Full Marks: 60 + 20 + 20

Course No: CSC410                                                                      Pass Marks: 24 + 8 + 8

Nature of the Course: Theory + Lab                                            Credit Hrs: 3

Semester: VII

Course Description:

This course introduces advanced aspects of data warehousing and data mining, encompassing the principles, research results and commercial application of the current technologies.

Course Objective:

The main objective of this course is to provide knowledge of different data mining techniques and data warehousing.

Course Contents:

Unit 1: Introduction to Data Warehousing (5 Hrs.)

Unit 2: Introduction to Data Mining (2 Hrs.)

Unit 3: Data Preprocessing (3 Hrs.)

  • Data cleaning ,Data integration and transformation
  • Data reduction, Data mining primitives
  • Data discretization and Concept Hierarchy Generation 

Unit 4: Data Cube Technology (4 Hrs.)

  • Efficient method for data cube computation
  •  Cube materialization (Introduction to Full cube, Iceberg cube, Closed cube, Shell cube),
  • General strategies for cube computation
  • Attribute oriented induction for data characterization
  • Mining class comparison, Discriminating between different classes

Unit 5: Mining Frequent Patterns (6 Hrs.)

  • Frequent patterns, Market basket analysis, Frequent itemsets, closed itemsets, 
  • association rules, Types of association rule (Single dimensional, multidimensional, multilevel, quantitative)
  • Finding frequent itemset (Apriori algorithm, FP growth)
  • Generating association rules from frequent itemset
  •  Limitation and improving Apriori, From Association Mining to Correlation Analysis, Lift

Unit 6: Classification and Prediction (10 Hrs.)

  • Definition (Classification, Prediction), Learning and testing of classification,
  •  Classification by decision tree induction, ID3 as attribute selection algorithm
  • Bayesian classification, Laplace smoothing
  • Classification by backpropagation, Rule based classifier (Decision tree to rules, rule coverage and accuracy, efficient of rule simplification)
  • Support vector machine, Evaluating accuracy (precision, recall, f-measure)
  • Issues in classification, Overfitting and underfitting
  •  K- fold cross validation, Comparing two classifier (McNemar’s test)

Unit 7: Cluster Analysis (8 Hrs.)

  • Types of data in cluster analysis
  • Similarity and dissimilarity between objects
  • Clustering techniques: – Partitioning (k-means, k-means++, Mini-Batch k-means, k-medoids)
  • Hierarchical (Agglomerative and Divisive)
  •  Density based (DBSCAN), Outlier analysis

Unit 8: Graph Mining and Social Network Analysis (5 Hrs.)

  • Graph mining, Why graph mining, Graph mining algorithm (Beam search, Inductive logic programming)
  • Social network analysis, Link mining
  • Friends of friends, Degree assortativity
  • Signed network (Theory of structured balance, Theory of status, Conflict between the theory of balance and status)
  • Trust in a network (Atomic propagation, Propagation of distrust, Iterative propagation), 
  • Predicting positive and negative links

Unit 9: Mining Spatial, Multimedia, Text and Web Data (2 Hrs.)

  • Spatial data mining, Spatial data cube
  • Mining spatial association, Multimedia data mining
  • Similarity search in multimedia data, Mining association in multimedia data
  • An introduction to text mining, natural language processing and information extraction, 
  • Web mining (Web content mining, Web structure mining, Web usage mining)

Laboratory Works:

The laboratory should contain all the features mentioned in a course, which should include data preprocessing and cleaning, implementing classification, clustering, association algorithms in any programming language, and data visualization through data mining tools.

Text Book:

1. Data Mining: Concepts and Techniques, 3rd ed. Jiawei Han, Micheline Kamber, and Jian Pei. Morgan Kaufmann Series in Data Management Systems Morgan Kaufmann Publishers, July 2011.

Reference Books:

1.  Introduction to Data Mining, 2nd ed. Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar. Pearson Publisher, 2019.

2.  Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff rey D. Ullman, 2014.

About Author

Karina Shakya