Statistics Analysis

  1. This problem is to be done by hand. In the parts below, you will set up the primal
    Lagrangian and derive the dual Lagrangian, for support vector machine classifier, for the
    case of data that is linearly separable in expanded feature space.
    For the SVM learning, we stated the optimization problem as:
    Assume our data is linearly separable in the expanded feature space. Also, nonaugmented
    notation is used throughout this problem.
    (a) If the above set of constraints (second line of equations above) is satisfied, will all the
    training data be correctly classified?
    (b) Write the Lagrangian function for the minimization problem stated above. Use
    for the Lagrange multipliers. Also state the KKT conditions. (Hint:
    there are 3 KKT conditions).
    (c) Derive the dual Lagrangian , by proceeding as follows:
    (i) Minimize w.r.t. the weights.
    Minimize J (w) = 1
    2 w
    2
    s.t. zi w
    T
    ui + w ( 0 ) −1 ≥ 0 ∀i
    L w,w0 ( ,λ)
    λi
    , i = 1, 2,!, N
    LD
    L
    p. 2 of 4
    Hint: solve for the optimal weight (in terms of and other variables);
    and set and simplify.
    (ii) Substitute your expressions from part (i) into , and use your expression from
    as a new constraint, to derive as:
    subject to the (new) constraint: , which becomes a new KKT condition.
    Also give the other two KKT conditions on , which carry over from the primal
    form.
  2. In this problem you will do SVM learning on a small set of given data points, using the
    result from Problem 1 above. Problem 2 also uses nonaugmented notation.
    Coding. This problem involves some work by hand, and some solution by coding. As in
    previous homework assignments, in this problem you are required to write the code
    yourself; you may use Python built-in functions, NumPy, and matplotlib; and you may use
    pandas only for reading and/or writing csv files. For the plotting, we are not supplying
    plotting function(s) for you; may use this as an opportunity to (learn, explore, and) use
    matplotlib functions as needed. Alternatively, you may do the plots by hand if you prefer.
    Throughout this problem, let the dataset have � = 3 points. We will work entirely in the
    expanded feature space (�-space).
    (a) Derive by hand a set of equations, with λ, � as variables, and �!, �!, � = 1,2,3 as given
    constants, that when solved will give the SVM decision boundary and regions. To do this,
    start the Lagrangian process to optimize w.r.t. and � , subject to the equality
    constraint . Set all the derivatives equal to 0 to obtain the set of equations.
    Hint: Start from , in which the
    last term has been added to incorporate the equality constraint stated above.
    Finally, formulate your set of equations as a matrix-vector equation of the form:
    A ρ = b
    ∇wL = 0 w* λi
    ∂L
    ∂w0
    = 0
    L
    ∂L
    ∂w0
    = 0 LD
    LD (λ) = − 1
    2 λi
    λ j
    zi
    zj
    ui
    T
    u j
    j=1
    N

    i=1
    N








  • λi
    i=1
    N

    λi
    zi
    i=1
    N
    ∑ = 0
    λi
    LD λ
    λi
    zi
    i=1
    N
    ∑ = 0
    LD′(λ,µ) = λi
    i=1
    N
    ∑ − 1
    2 λiλ jzizjui
    T
    u j
    j=1
    N

    i=1
    N








  • µ ziλi
    i=1
    N







    p. 3 of 4
    in which ρ = 0λ",  µ2
    "
    . Give your expressions for A and b.
    Tip: If you’d like a quick check to see if you’re on the right track, try plugging in (by hand)
    for this simple dataset:
    �# = 3
    1
    0
    5, �$ = 3
    0
    0
    5 ∈ �# (�# = �$ = 1); �% = 3
    1
    1
    5 ∈ �$ (�% = −1).
    The first row of � should be [1   0  − 1  − 1] and first entry of � should be 1.
    In the parts below you will use a computer to solve this matrix equation for and µ, calculate
    then plot and interpret the results, for 3 different datasets.
    For parts (b)-(e) below, use the following dataset:
    �# = 3
    1
    2
    5, �$ = 3
    2
    1
    5 ∈ �#
    �% = 3
    0
    0
    5 ∈ �$
    For all parts below, consider �# to be the positive class (�! = +1).
    (b) Write code in Python to solve for and µ, calculate �∗ and �' , and check the KKT
    conditions.
    Tip: Use variables �!, �!, i = 1,2,3 such that you can input their values for different
    datasets.
    Specifically, the code should:
    (i) Use NumPy to invert your matrix, and to calculate the resulting values for and µ.
    (ii) Check that the resulting satisfies the KKT conditions involving (but not
    involving �) that you stated in Problem 1(c)(ii).
    (iii) Calculate the optimal (nonaugmented) weight vector �∗ by using your result from
    Problem 1(c)(i). And, find the optimal bias term �' using one of the KKT conditions
    from Problem 1(b).
    (iv) Check that the resulting � and �' satisfy the KKT conditions on � and �' of Pr.
    1(c).
    (v) Output the results of (i)-(iv) above.
    (c) Run your code on the given dataset; give the resulting values for and µ, and the results of
    the KKT-conditions check on .
    Also give the resulting �∗ and �', and the results of the KKT-conditions check on � and
    �'.
    (d) Plot in 2D nonaugmented feature (�) space: the data points showing their class labels, the
    decision boundary defined by �∗ and �', and an arrow showing which side of the boundary
    λ
    λ
    λ
    λ λ
    λ
    λ
    p. 4 of 4
    is class �#. While you are encouraged to do (some or all of) the plot by computer, you can
    do some or all of it by hand if you prefer, as long as it is neat and reasonably accurate.
    (e) Looking at the plot, does the decision boundary correctly classify the training data?
    And if yes, does it look like the maximum-margin boundary that will correctly classify the
    data? Briefly justify.
    (f)(i)-(iii) Repeat parts (c) – (e) except for the following dataset:
    �# = 3
    1
    2
    5, �$ = 3
    2
    1
    5 ∈ �#
    �%′ = 3
    1
    1
    5 ∈ �$
    in which the third data point has been changed.
    (iv) Explain the change, or lack of change, in the decision boundary from (d) to (f).
    (g) (i) How do you think the boundary will change (relative to (f)) if we instead use the
    following data:
    �# = 3
    1
    2
    5, �$ = 3
    2
    1
    5 ∈ �#
    �%′′ = 3 0
    1.5
    5 ∈ �$
    in which the third data point has (yet again) been changed?
    (ii)-(iv) Try it by repeating (c)-(e) except with these data points1
    .
    (v) Explain any change or lack of change in the decision boundary as compared with (d)
    and (f).
    1 Tip: Note that the linear algebra approach we take to solve this problem, has no way of
    enforcing the requirement �! ≥ 0  ∀� . After you check this requirement, if any data
    point �( has �( < 0 , then you can set �( = 0 as a given constant, and then reoptimize the other Lagrange multipliers. Then check the KKT conditions again to
    verify they are satisfied.
    Why? With no nonnegativity condition on �, the optimal solution effectively finds a
    point that is on the boundary of all inequality constraint regions (as in case (b) of
    Lecture 12 p. 10). If one of the constraints (say on data point �() is already satisfied
    (as in case (a) on p. 9), it proceeds as in case (b) to find a point on the boundary of the
    constraint region, which results in a negative Lagrange multiplier �( < 0. Resetting
    it to 0, then re-optimizing the other parameters, is a way of enforcing the �( ≥ 0
    requirement.
    Comment: this method of implementation (Pr. 2 using matrix inverse) is useful for working
    with algebraic expressions, theory, and seeing/verifying how things work. For larger
    datasets, typically other implementation methods are used such as quadratic programming or
    variations on gradient descent designed for this type of optimization.