No More Worries!

Our orders are delivered strictly on time without delay

Paper Formatting

Double or single-spaced
1-inch margin
12 Font Arial or Times New Roman
300 words per page

No Lateness!

Our orders are delivered strictly on time without delay

Our Guarantees

Free Unlimited revisions
Guaranteed Privacy
Money Return guarantee
Plagiarism Free Writing

Statistics Analysis

This problem is to be done by hand. In the parts below, you will set up the primal
Lagrangian and derive the dual Lagrangian, for support vector machine classifier, for the
case of data that is linearly separable in expanded feature space.
For the SVM learning, we stated the optimization problem as:
Assume our data is linearly separable in the expanded feature space. Also, nonaugmented
notation is used throughout this problem.
(a) If the above set of constraints (second line of equations above) is satisfied, will all the
training data be correctly classified?
(b) Write the Lagrangian function for the minimization problem stated above. Use
for the Lagrange multipliers. Also state the KKT conditions. (Hint:
there are 3 KKT conditions).
(c) Derive the dual Lagrangian , by proceeding as follows:
(i) Minimize w.r.t. the weights.
Minimize J (w) = 1
2 w
2
s.t. zi w
T
ui + w ( 0 ) −1 ≥ 0 ∀i
L w,w0 ( ,λ)
λi
, i = 1, 2,!, N
LD
L
p. 2 of 4
Hint: solve for the optimal weight (in terms of and other variables);
and set and simplify.
(ii) Substitute your expressions from part (i) into , and use your expression from
as a new constraint, to derive as:
subject to the (new) constraint: , which becomes a new KKT condition.
Also give the other two KKT conditions on , which carry over from the primal
form.
In this problem you will do SVM learning on a small set of given data points, using the
result from Problem 1 above. Problem 2 also uses nonaugmented notation.
Coding. This problem involves some work by hand, and some solution by coding. As in
previous homework assignments, in this problem you are required to write the code
yourself; you may use Python built-in functions, NumPy, and matplotlib; and you may use
pandas only for reading and/or writing csv files. For the plotting, we are not supplying
plotting function(s) for you; may use this as an opportunity to (learn, explore, and) use
matplotlib functions as needed. Alternatively, you may do the plots by hand if you prefer.
Throughout this problem, let the dataset have � = 3 points. We will work entirely in the
expanded feature space (�-space).
(a) Derive by hand a set of equations, with λ, � as variables, and �!, �!, � = 1,2,3 as given
constants, that when solved will give the SVM decision boundary and regions. To do this,
start the Lagrangian process to optimize w.r.t. and � , subject to the equality
constraint . Set all the derivatives equal to 0 to obtain the set of equations.
Hint: Start from , in which the
last term has been added to incorporate the equality constraint stated above.
Finally, formulate your set of equations as a matrix-vector equation of the form:
A ρ = b
∇wL = 0 w* λi
∂L
∂w0
= 0
L
∂L
∂w0
= 0 LD
LD (λ) = − 1
2 λi
λ j
zi
zj
ui
T
u j
j=1
N
∑
i=1
N
∑
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥

λi
i=1
N
∑
λi
zi
i=1
N
∑ = 0
λi
LD λ
λi
zi
i=1
N
∑ = 0
LD′(λ,µ) = λi
i=1
N
∑ − 1
2 λiλ jzizjui
T
u j
j=1
N
∑
i=1
N
∑
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥
µ ziλi
i=1
N
∑
⎛
⎝
⎜
⎞
⎠
⎟
p. 3 of 4
in which ρ = 0λ", µ2
"
. Give your expressions for A and b.
Tip: If you’d like a quick check to see if you’re on the right track, try plugging in (by hand)
for this simple dataset:
�# = 3
1
0
5, �$ = 3
0
0
5 ∈ �# (�# = �$ = 1); �% = 3
1
1
5 ∈ �$ (�% = −1).
The first row of � should be [1 0 − 1 − 1] and first entry of � should be 1.
In the parts below you will use a computer to solve this matrix equation for and µ, calculate
then plot and interpret the results, for 3 different datasets.
For parts (b)-(e) below, use the following dataset:
�# = 3
1
2
5, �$ = 3
2
1
5 ∈ �#
�% = 3
0
0
5 ∈ �$
For all parts below, consider �# to be the positive class (�! = +1).
(b) Write code in Python to solve for and µ, calculate �∗ and �' , and check the KKT
conditions.
Tip: Use variables �!, �!, i = 1,2,3 such that you can input their values for different
datasets.
Specifically, the code should:
(i) Use NumPy to invert your matrix, and to calculate the resulting values for and µ.
(ii) Check that the resulting satisfies the KKT conditions involving (but not
involving �) that you stated in Problem 1(c)(ii).
(iii) Calculate the optimal (nonaugmented) weight vector �∗ by using your result from
Problem 1(c)(i). And, find the optimal bias term �' using one of the KKT conditions
from Problem 1(b).
(iv) Check that the resulting � and �' satisfy the KKT conditions on � and �' of Pr.
1(c).
(v) Output the results of (i)-(iv) above.
(c) Run your code on the given dataset; give the resulting values for and µ, and the results of
the KKT-conditions check on .
Also give the resulting �∗ and �', and the results of the KKT-conditions check on � and
�'.
(d) Plot in 2D nonaugmented feature (�) space: the data points showing their class labels, the
decision boundary defined by �∗ and �', and an arrow showing which side of the boundary
λ
λ
λ
λ λ
λ
λ
p. 4 of 4
is class �#. While you are encouraged to do (some or all of) the plot by computer, you can
do some or all of it by hand if you prefer, as long as it is neat and reasonably accurate.
(e) Looking at the plot, does the decision boundary correctly classify the training data?
And if yes, does it look like the maximum-margin boundary that will correctly classify the
data? Briefly justify.
(f)(i)-(iii) Repeat parts (c) – (e) except for the following dataset:
�# = 3
1
2
5, �$ = 3
2
1
5 ∈ �#
�%′ = 3
1
1
5 ∈ �$
in which the third data point has been changed.
(iv) Explain the change, or lack of change, in the decision boundary from (d) to (f).
(g) (i) How do you think the boundary will change (relative to (f)) if we instead use the
following data:
�# = 3
1
2
5, �$ = 3
2
1
5 ∈ �#
�%′′ = 3 0
1.5
5 ∈ �$
in which the third data point has (yet again) been changed?
(ii)-(iv) Try it by repeating (c)-(e) except with these data points1
.
(v) Explain any change or lack of change in the decision boundary as compared with (d)
and (f).
1 Tip: Note that the linear algebra approach we take to solve this problem, has no way of
enforcing the requirement �! ≥ 0 ∀� . After you check this requirement, if any data
point �( has �( < 0 , then you can set �( = 0 as a given constant, and then reoptimize the other Lagrange multipliers. Then check the KKT conditions again to
verify they are satisfied.
Why? With no nonnegativity condition on �, the optimal solution effectively finds a
point that is on the boundary of all inequality constraint regions (as in case (b) of
Lecture 12 p. 10). If one of the constraints (say on data point �() is already satisfied
(as in case (a) on p. 9), it proceeds as in case (b) to find a point on the boundary of the
constraint region, which results in a negative Lagrange multiplier �( < 0. Resetting
it to 0, then re-optimizing the other parameters, is a way of enforcing the �( ≥ 0
requirement.
Comment: this method of implementation (Pr. 2 using matrix inverse) is useful for working
with algebraic expressions, theory, and seeing/verifying how things work. For larger
datasets, typically other implementation methods are used such as quadratic programming or
variations on gradient descent designed for this type of optimization.

No More Worries!

Paper Formatting

No Lateness!

Our Guarantees

Statistics Analysis

Satisfaction Guaranteed

Why Choose Us