Bachelor of Information Technology
IT3040 – Introduction Data Mining
Most enterprises, including governments, are accumulating data at an incredible rate due to a host of technological advances, for example, the internet, e-commerce, electronic banking, point-of-sale devices, bar- code readers, and intelligent machines. Data mining is about extracting useful information from such mountains of data. This subject includes the following topics: Introduction: what is data mining, data mining process, data mining techniques, some case studies in brief. Data dredging, data fishing, and data scrubbing. Association rules mining (ARM): the ARM task and naïve algorithms, the apriori algorithm, improving efficiency, hashing algorithm, evaluation of algorithms. Supervised classification: the task, decision tree and building algorithm, split algorithms, naïve Bayes method, evaluating classification methods, improving accuracy of classification methods. Cluster analysis: defining the task, desired features of cluster analysis, types of data, computing distance, partitional methods, hierarchical methods, density-based methods, quality and validity of results. Web data mining: terminology and characteristics, locality and hierarchy in the web, web content mining, web structure mining, web usage mining. Search engines and query mining. Data Warehousing: what is data warehouse, operational data stores, ETL, warehouse design, guidelines for design, metadata.
