Data mining

1.Show that the entropy of a node never increases after splitting it into smaller successor nodes.

2.Compute a two-level decision tree using the greedy approach described in this chapter. Use the classification error rate as the criterion for splitting. What is the overall error rate of the induced tree?

Note: To determine the test condition at the root note, you first need to computer the error rates for attributes X, Y, and Z.

For attribute X the corresponding counts are:

x

c1

c2

0

60

60

1

40

40

For Y the corresponding counts are:

y

c1

c2

0

40

60

1

60

40

For Z the corresponding counts are:

Z

c1

c2

0

30

70

1

70

30

3.Consider a binary classification problem with the following set of attributes and attribute values:

Air Conditioner = {Working, Broken}
Engine = {Good, Bad}
Mileage = {High, Medium, Low}
Rust = {Yes, No}
Suppose a rule-based classifier produces the following rule set:

Mileage = High −→ Value = Low

Mileage = Low −→ Value = High

Air Conditioner = Working, Engine = Good −→ Value = High

Air Conditioner = Working, Engine = Bad −→ Value = Low

Air Conditioner = Broken −→ Value = Low (

a) Are the rules mutually exclusive?

b) Is the rule set exhaustive?

c) Is ordering needed for this set of rules?

d) Do you need a default class for the rule set?

Consider the one-dimensional data set shown below:

X

.5

3.0

4.5

4.6

4.9

5.2

5.3

5.5

7.0

9.5

Y

+

+

+

+

(a)Classify the data point x = 5.0 according to its 1-, 3-, 5-, and 9-nearest neighbors (using majority vote).

(b)Repeat the previous analysis using the distance-weighted voting approach.

find the cost of your paper