0
2.8kviews
Data Mining and Warehousing Question Paper - Dec 18 - Computer Engineering (Semester 7) - Pune University (PU)
1 Answer
0
54views

Data Mining and Warehousing - Dec 18

Computer Engineering (Semester 7)

Total marks: 80
Total time: 3 Hours
INSTRUCTIONS
(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Draw neat diagrams wherever necessary.

1.a. For the given attribute AGE values : 16, 16, 180, 4, 12, 24, 26, 28, apply following Binning technique for smoothing the noise.

i) Bin Medians

ii) Bin Boundaries

iii) Bin Means

(6 marks) 00

1.b. Differentiate between Star schema and Snowflake schema.
(6 marks) 00

1.c. Calculate the Jaccard coefficient between Ram and Hari assuming that all binary attributes are a symmetric and for each pair values for an attribute, first one is more frequent than the second

| object | gender | Food | Caste | Education|Hobby|Job| | :------- | ----: | :---: | :------- | ----: | :---: | :------- | | Hari | M(1) | V(1) | M(0) | L(1) | C(0) | N(0) | Ram | M(1) | N(0) | M(0) |I(0) | T(1) | N(0) | Tomi | F(0) | N(0) |H(1) | L(1) | C(0) |Y(1)

(8 marks) 00

OR

2.a. Explain following attribute types with example.

i) Ordinal

ii) Binary

iii) Nominal

(6 marks) 00

2.b. Differentiate between OLTP and OLAP with example.
(6 marks) 00

2.c. Calculate the Euclidean distance matrix for given Data points.

point x y
p1 0 2
p2 2 0
p3 3 1
p4 5 1

(8 marks) 00

3.a. A database has 6 transactions. Let minimum support = 60% and Minimum confidence = 70%

Transaction ID Items Bought
T1 { A,B,C,E}
T2 {A,C,D,E}
T3 {B,C,E}
T4 {A ,C, D, E}
T5 {C, D, E }
T6 {A, D ,E }

i) Find Closed frequent Itemsets

ii) Find Maximal frequent itemsets

iii) Design FP Tree using FP growth algorithm

(8 marks) 00

3.b. Explain with example Multi level and Constraint based association Rule mining.
(5 marks) 00

3.b. How can we improve the efficiency of a-priori algorithm.
(4 marks) 00

OR

4.a. Consider the Market basket transactions shown below. Assuming the minimum support = 50% and Minimum confidence = 80%

i) Find all frequent item sets using Apriori algorithm

ii) Find all association rules using Apriori algorithm

Transaction ID Items Bought
T1 {Mango,Apple,Banana,Dates}
T2 {Apples,Dates,Coconuut,Banana,Fig}
T3 {Apple,Coconut,Banana,Fig}
T4 {Apple,banana,Dates}

(8 marks) 00

4.b. Explain FP growth algorithm with example.
(5 marks) 00

4.b. Explain following measures used in association Rule mining

i) Minimum Support

ii) Minimum Confidence

iii) Support

iv) Confidence

(4 marks) 00

5.a. Explain the training and testing phase using Decision Tree in detail. Support your answer with relevant example.
(8 marks) 00

5.b. Apply KNN algorithm to find class of new tissue paper (X1 = 3, X2 = 7). Assume K = 3

X1 =Acid Durability (secs) X2 = Strength(kg/sq.meter) Y= Classification
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good

(5 marks) 00

5.c. Explain the use of regression model in prediction of real estate prices.
(4 marks) 00

OR

6.a. What is Bayesian Belief Network. Elaborate the training process of a Bayesian Belief Network with suitable example.
(8 marks) 00

6.b. Explain K-nearest neighbor classifier algorithm with suitable application.
(5 marks) 00

6.c. Elaborate on Associative Classification with appropriate applications
(4 marks) 00

7.a. Discuss the Sequential Covering algorithm in detail.
(8 marks) 00

7.b. Explain following measures for evaluating classifier accuracy

i) Specificity

ii) Sensitivity

(4 marks) 00

7.c. Differentiate between Wholistic learning and Multi perspective learning
(4 marks) 00

OR

8.a. How is the performance of Classifiers algorithms evaluated. Discuss in detail.
(8 marks) 00

8.b. Discuss Reinforcement learning relevance and its applications in real time environment.
(4 marks) 00

8.c. Explain following measures for evaluating classifier accuracy

i) Recall

ii) Precision

(4 marks) 00

Please log in to add an answer.