JNTU B.Tech 4th Year DATA WAREHOUSING AND DATA MINING Apr/May 2008
Time: 3 hours Max Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
1. (a) Draw and explain the architecture for on-line analytical mining.
(b) Briefly discuss the data warehouse applications. [8+8]
2. Briefly discuss the role of data cube aggregation and dimension reduction in the data reduction process. 
3. Write the syntax for the following data mining primitives:
(a) Task-relevant data.
(b) Concept hierarchies. 
4. Write short notes for the following in detail:
(a) Measuring the central tendency
(b) Measuring the dispersion of data. 
5. (a) Write the FP-growth algorithm. Explain.
(b) What is an iceberg query? Explain with example. [10+6]
6. (a) What is classification? What is prediction?
(b) What is Bayes theorem? Explain about Naive Bayesian classification.
(c) Discuss about k-Nearest neighbor classifiers and case-based reasoning.[4+6+6]
7. (a) Given the following measurement for the variable age:
18, 22, 25, 42, 28, 43, 33, 35, 56, 28
Standardize the variable by the following:
i. Compute the mean absolute deviation of age.
ii. Compute the Z-score for the first four measurements.
(b) What is a distance-based outlier? What are efficient algorithms for mining distance-based algorithm? How are outliers determined in this method? [4+4+2+3+3]
8. An e-mail database is a database that stores a large number of electronic mail messages. It can be viewed as a semistructured database consisting mainly of text data. Discuss the following.
(a) How can such an e-mail database be structured so as to facilitate multidimensional search, such as by sender, by receiver, by subject, by time, and so on?
(b) What can be mined from such an e-mail database?
(c) suppose you have roughly classified a set of your previous e-mail messages as
junk, unimportant, normal, or important. Describe how a data mining system may take this as the training set to automatically classify new e-mail messages or unclassified ones. [5+5+6]