No More Worries!

Our orders are delivered strictly on time without delay

Paper Formatting

Double or single-spaced
1-inch margin
12 Font Arial or Times New Roman
300 words per page

No Lateness!

Our orders are delivered strictly on time without delay

Our Guarantees

Free Unlimited revisions
Guaranteed Privacy
Money Return guarantee
Plagiarism Free Writing

Marketing campaign to influence the opinion of voters

How to influence the opinion of people when on a budget. (15 points)
Apparently it is in fashion to use social networks, like facebook, to influence voters in elections.
In this problem, you will consider a simplified model of influence in social networks: the social
network is represented by an undirected graph G = (V, E) and for a set of nodes S ⊆ V , we
denote by N(S) the set of their neighbors:
N(S) = {v ∈ V | ∃u ∈ S, (u, v) ∈ E}.
The influence I(S) of a set of nodes is measured by I(S) = |N(S)|.
Imagine you work now for facebook and you are given the dataset facebookgraph.txt (available in the same canvas folder). This dataset is in fact a subgraph of the real Facebook social
graph. Each line in the file contains the id of two users, indicating that these two users are
friends on Facebook.
a. As the designer of a marketing campaign to influence the opinion of voters, your goal
is to find a subset S ⊆ V of at most K nodes whose influence is maximal. Write a
mathematical model to solve this problem. Can you solve the problem for the data you
were given using MATLAB or AMPL or SCIP? If not, can you write a practical method
to give an approximation to the optimum?
b. Using any of your models/methods from part (a) write a computer program which,
given the social network described in the dataset and a budget K ∈ N
+, returns an
approximately optimal set of nodes S for the influence function I(S). The function
should return both the users to influence and the value (total amount of influence)
obtained.
c. Plot the influence I(S) obtained by your function as a function of the budget K. E.g.,
what happens when K = 1 and K = |V |?
d. In the definition of I(S) a node with largest degree is one of the most influential persons,
i.e., he/she is a VIP. Can you think of a way to adapt the pagerank algorithm to identify
VIP nodes in the graph? Can you invent of a third additional way to define who are the
VIPs in this graph? What else could you use to measure influence?
e. A vertex-cover of a graph G is a set S of vertices of G such that each edge of G is incident
with at least one vertex of S. The vertex cover number τ (G) is the size of a minimum
vertex cover in G. A dominating set for a graph is a subset D of vertices such that every
vertex not in D is adjacent to at least one member of D. The domination number γ(G)
is the smallest dominating set. Are there any relationships among the influence function
I(S), τ (G), and γ(G)? Explain.
Formulate a discrete models that given a graph find a vertex cover with smallest number
of vertices and a smallest dominating set. Explain the reasoning on your variables and
constraints.
Clustering and pairing senators by their political ideas (12 points)
The zip-file 2-congress-datafiles.zip (in the same canvas folder) contains two csv files with
(real!) voting data. You only need to use the one named 114 USA-congress.csv. I call this
file F114, and it has information of how all USA senators (in the 114th congress) voted in
15 different bills. 1 means voted YES, 0 means voted NO, 0.5 means ABSTAINED.
We always assume senators of the same party vote similarly, but let us analyze the votes
using mathematics and uncover some information!
Clustering is an operation in data science where we divide data up into a fixed number of
clusters while trying to ensure that the items in each cluster are as similar as possible (e.g., if
we cluster by political party we presumably get similar voting records, similar opinions). Kmeans is a clustering algorithm that allows you to cluster into K clusters based on distances.
It is implemented in MATLAB already.
a. Use K-means on F114, with K = 2 to get a clustering in two groups. Obviously, do
not use the labels of the real affiliations I gave you! Choose one to be called democrats, the other to be the republicans (do not look at the original file). Make your best
mathematical guess of the party affiliation.
You have obtained a classification of senators into two classes. Does your classification
coincide exactly with the real party affiliation? Or are there surprises? To measure this
compute the false-positive rate and false-negative rate of your classification:
false-positive rate := number of senators incorrectly labeled as democrats
number of democratic senators
false-negative rate := number of senators incorrectly labeled as republicans
number of republican senators
b. What if you cluster with K = 3? That would after all mean there are three factions in
the senate (moderates, conservatives, liberals?).
c. Use part of the data to train a SVM model to classify the senators, and test the model
using the unused data. Report your results.
d. Clustering can also be model as an integer optimization problem. Write such a model
and code it in AMPL or SCIP. Run it in the senator’s data.
Recommendation Systems. (13 points). The Movielens (https://movielens.org/) is
a website for movie recommendations. It is run by a research lab at University of Minnesota.
The zip file ml-latest-small.zip (in the same canvas folder) contains 100836 ratings from
610 users to 9742 movies. More information of this dataset can be found here: http://
files.grouplens.org/datasets/movielens/ml-latest-small-README.html. Rearrange
the ratings to a matrix with size 610 × 9724 (there are some movies that did not receive any
rating and they were not included). Denote this matrix as M. Each row of M corresponds
to a user, and each column of M corresponds to a movie. In this matrix there are 100836
entries (ratings) are given, that is only 100836/(610×9724) = 1.7% of the entries. So, 98.3%
ratings are missing. You are asked to predict the missing ratings. Denote the set of indices
of the known entries in the matrix as Ω, and denote the indices set of the unknowns as Ω. ¯
That is:
Ω := {(i, j) | Mij is known}, Ω := ¯ {(i, j) | Mij is not known}
You can’t use all known entries in Ω to train your model, because you need some testing data
to test whether your model and algorithm perform well. Therefore, you need to partition Ω
to two parts: Ωtrain and Ωtest, i.e., Ω = Ωtrain S
Ωtest. Entries corresponding to Ωtrain are
used as training data, and entries corresponding to Ωtest are used as testing data.
a. The first model is a matrix factorization model. Denote m = 610 and n = 9724. The
model is:
min
U∈Rm×r,V ∈Rr×n
kPΩtrain (UV − M)k
2
F
, (MatFac)
where r < m, and the operator PΩtrain : R
m×n
7→ R
m×n
is defined as:
[PΩtrain (Z)]ij =

Zij , if (i, j) ∈ Ωtrain
0, otherwise
Implement the gradient method (with a fixed step size) to solve (MatFac). Randomly
choose p% indices of Ω as Ωtrain and the remaining (1−p%) as Ωtest. Measure the qualify
of (U
∗
, V ∗
) returned by your algorithm using the following mean squared error:
MSE :=
1
|Ωtest|
X
(i,j)∈Ωtest
((U
∗V
∗
)ij − Mij )
2
For different combinations of (p, r), report MSE in table(s). You can choose p =
{99, 95, 90, 80, 70, 60, 50, . . .}, and r = {1, 10, 20, 50, 100, 200, 300}. Moreover, report the
effect of the step size of the gradient method to the results.
b. In (MatFac) we used Frobenius norm in the objective. We can change it to 1 norm. min U∈Rm×r,V ∈Rr×n kPΩtrain (UV − M)k1. (MatFacL1) The1 norm of a matrix Z is defined as kZk1 =
P
ij |Zij |, that is, the sum of absolute
values of all entries. (MatFacL1) can’t be solved by gradient method, because the
absolute value function is not differentiable. But you can still solve it approximately
by slightly perturbing the absolute value function. For a scalar z, you can approximate
|z| by |z| ≈ √
z
2 + with a very small positive . Note that the latter function is
differentiable for z. Now by replacing all absolute values in the objective of (MatFacL1)
by this formula, you get a differentiable objective function. Implement the gradient
method (with a fixed step size) to solve (MatFacL1). Report the same everything as
what you did in part (a).
c. Discuss the two models (MatFac) and (MatFacL1), which one is better? Under what
situation (MatFac) is better? Under what situation (MatFacL1) is better?

No More Worries!

Paper Formatting

No Lateness!

Our Guarantees

Marketing campaign to influence the opinion of voters

Satisfaction Guaranteed

Why Choose Us