This course is an introductory course on data mining. It introduces the basic concepts, principles, methods, implementation techniques, and applications of data mining, with a focus on (1) data preprocessing and preparation, (2) classification, (3) pattern discovery, and (4) cluster analysis.
By the end of the course, students will have gained practical skills in formulating data mining problems, solving the problems using data mining techniques, and interpreting the output to derive meaningful insights.
Topics:
• Background, Knowledge Discovery from Data.
• Preprocessing and Preparation: data types, hypothesis testing, proximity measures, data quality, data cleaning, data integration, data reduction and transformation, dimensionality reduction, data exploration, and visualization.
• Classification: Decision tree, Naive Bayes, linear models, nearest-neighbor classifier, support vector machines, and ensemble methods.
• Pattern Discovery: Mining frequent patterns, association, compact representation of frequent itemsets, and evaluation.
• Advanced Pattern Discovery: sequences and graphs.
• Clustering: K-means, hierarchical clustering, spectral clustering and density-based clustering.