Information Technology (Semester 6)
TOTAL MARKS: 80
TOTAL TIME: 3 HOURS
(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Assume data if required.
(4) Figures to the right indicate full marks.
1(a) Define "Data Mining". Enumerate five example applications that can benefit by using Data Mining.(5 marks)
1(b) Clearly explain the data preprocessing phase for data mining.(5 marks)
1(c) Describe one hierarchical clustering algorithm using an example dendrogram.(5 marks)
1(d) Explain the concept of a decision support system with the help of an example application.(5 marks)
2(a) Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries Data: 11, 13, 13, 15, 15, 16, 19, 20, 20, 20, 21, 21, 22, 23, 24, 30, 40, 45, 45, 71, 72, 73, 75(10 marks)
2(b) For the same set of data points in question 2.(a)
a) Find Mean, Medium and Mode.
b) Show a boxplot of the data. Clearly indicating the five-number summary.(10 marks)
3(a) The table below shows a sample dataset of whether a customer reponds to a survey of not. " Outcome" is the class label. Construct a Naive Bayes' Classifier for the dataset. For a new example (Rural, semidetached, low,No), what will be the predicted class label?
District |
House Type |
Income |
Previous Customer |
Outcome |
Suburban |
Detached |
High |
No |
Nothing |
Suburban |
Detached |
High |
Yes |
Nothing |
Rural |
Detached |
High |
No |
Reponded |
Urban |
Semi-detached |
High |
No |
Reponded |
Urban |
Semi-detached |
Low |
No |
Reponded |
Urban |
Semi-detached |
Low |
Yes |
Nothing |
Rural |
Semi-detached |
Low |
Yes |
Reponded |
Suburban |
Terrace |
High |
No |
Nothing |
Suburban |
Semi-detached |
Low |
No |
Reponded |
Urban |
Terrace |
Low |
No |
Reponded |
Suburban |
Terrace |
Low |
Yes |
Reponded |
Rural |
Terrace |
High |
Yes |
Reponded |
Rural |
Detached |
Low |
No |
Reponded |
Urban |
Terrace |
High |
Yes |
Nothing |
(10 marks)
3(b) Briefly explain Regression based Classifiers.(10 marks)
4(a) Using the Apriori algortihm to identify the frequent item-set in the following database. Them extract the strong association rules from these sets. Mini. Support = 30% Min. Confidence =75%
TID |
Items |
01 |
A, B, D, E, F |
02 |
B, C, E |
04 |
A, B, D, E |
04 |
A, B, C, E |
05 |
A, B, C, D, E,F |
06 |
B, C, D |
07 |
A, B, D,E |
(10 marks)
4(b) Explain multidimensional multi level Association rules with examples.(10 marks)
5(a) What is clustering? Explain k-means clustering algorithm. Suppose the date for clustering is {2, 4, 10, 12, 3, 20 ,11, 25} Consider k=2, cluster the given data using K-means algorithm.(10 marks)
5(b) What is an outlier? Describe methods that can be used for outlier analysis.(10 marks)
6(a) Consider the following case study: A telecom company wants to analyze and improve its performance by introducing a series of innovative mobile payment plants. For this case study design a BI system, clearly explaining all steps from data collection to decision making.(10 marks)
6(b) Clearly explain the working of the DBSCAN algorithm using appropriate diagrams.(10 marks)