Probabilistic graphical models

Probabilistic graphical models Artificial Intelligence Assignment 3 Semester 1 2015 Due 11.59pm on Friday 12th June 2015 1 Probabilistic graphical models 1.1 Inference (undergrads and postgrads) In lectures, we mainly covered Bayesian networks, one type of probabilistic graphical model (PGM) using directed acrylic graphs. Your task is to write a program to implement likelihood weighting (or weighted) sampling, as described in lectures, to perform inference on an arbitrary Bayesian network. You will need to parse an input text file that encodes the graph and the conditional probability distributions for each variable (conditioned on its parents). The format of the file is given in section 1.1.1 below. Two PGM model files have been defined in this format and can be downloaded, along with two queries each, at: https://cs.adelaide.edu.au/users/third/ai/15s1-ai-adelaide/files/infer.zip PGM 1: student model The first PGM models a typical university student in USA. A student has high intelligence has higher chance to score well in SAT test before entering the university. The grade for university courses depends on the difficulty of the courses and the student’s intelligence. A good grade increases the chance of getting a good recommendation letter from the teachers. The letter and SAT result may affect the job that the student may get. Both the job and the grade may affect the student’s happiness — typically a student is happy if they get a good grade and a dream job. The PGM’s structure is shown in Figure 1. PGM 2: car problem model The second PGM is for diagnosing the problem of a car. The full detail is omitted here, but you can figure out some of its meaning based 1 G I J S D H L Difficulty Intelligence Grade Happy Letter SAT Job Figure 1: Student model on its file. Your code should be able to process both PGMs in above format. The grade of the assignment depends on your results on both PGMs. 1.1.1 Model file format The graphical model will be specified by a text file with format as in Figure 2 (Left), where: • N is the number of random variables in the network; • rv are the random variable names (arbitrary alphanumeric strings); • Graph matrix: the matrix of zeros and ones specifies the directed arcs in the graph; a one (1) in the i, j entry indicates there is an edge from i to j (so the ith variable is a parent of the jth variable. • mat are two dimensional arrays of real numbers (in ASCII) that specify the conditional probability table of each random variable conditioned on its parents; The format of the matrices is a bit subtle, and is illustrated in Figure 2 (Right). If a node has m parents, then the matrix needs to specify the probability of each outcome (f alse, true) conditioned on 2m different combinations of parent values, so the matrix will be 2m × 2 (rows × columns). Treating f alse as 0, and true as 1, concatenate the values of the parents in their numerical order from most significant bit to least significant 2 Figure 2: Left: File format; Right: the format of the conditional probability matrices. Note the numbers in the figure are for illustration purpose, and might be different from the actual numbers in the file. You should use your code to read the file. The right figure is from the wonderful textbook Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009, by D. Koller and N. Friedman. bit (left to right) to create a row index r. The entry in the first column, rth row is then the probability that the variable is f alse given the values of the other variables (the entry in the corresponding 2nd column is the probability that the variable is true). For example in Figure 2 (Right), assume that the variables are ordered as I, D, G, S, L. G has two parents I and D. P(G|I, D) is represented by mat2 (the 3rd matrix), which is the 4 × 3 matrix in the figure. Since I’s order is before D, the row indexing is (i 0 , d0 ), and then (i 0 , d1 ) and so on instead of (d 0 , i0 ). I has only two values (false or true) thus denoted as i 0 , i1 . Here G has 3 values thus denoted as g 1 , g2 , g3 (if you want to use g 0 , g1 , g2 to denote, it is fine too. It won’t affect the code anyway). Another example is that if A has parents C and F where C is the 3rd variable specified and F is the 6th, then the (2,0) entry of their matrix represents P(A = f alse|C = true, F = f alse), and the (0,1) entry represents P(A = true|C = f alse, F = f alse). The format of the query, entered via the console (or read from redirected standard input) is: P(rvQ | rvE1=val, rvE2=val, ...) where rvQ is the name of the query variable, and rvEx are the names of the evidence variables with their respective true/false values specified. As is normal in Bayesian 3 Case I D G S L J H 1 1 1 2 1 1 0 1 2 0 0 1 0 0 1 0 3 1 0 3 0 0 1 0 4 1 0 1 0 0 1 1 . . . Table 1: Samples inference, variables not included are unobserved and should be marginalised out. 1.2 Learning parameters (postgrads) This task is for postgrads. Undergrads can choose to do it too for bonus points. The parameters in a Bayesian network are the conditional probability matrices. If you have samples from the joint distribution represented by the Bayesian network, you can estimate (conditional) probabilities of any variable by frequentism. For example, if you have samples like in Table 1, you can estimate probabilities as follows: P(D = 1) = ND=1 Ntotal P(G = 1|D = 1, I = 0) = NG=1,D=1,I=0 ND=1,I=0 . . . Here ND=1 is the number of samples that D = 1, and Ntotal is the total number of samples. NG=1,D=1,I=0 is the number of samples that G = 1, D = 1, I = 0, and ND=1,I=0 is the number of samples that D = 1, I = 0. Your code must be able to read both the PGM model file, and the sample data file, and use them to produce a new PGM model file. The new PGM model file should be in the same format as the PGM model file. The sample data file for learning can be downloaded from https://cs.adelaide.edu.au/users/third/ai/15s1-ai-adelaide/files/sample. zip Note that the Case ID and variables’ names are not included in the sample file. Assume there are N variables. Each line is a sample vector with N dimensions in the 4 same order of variables as specified in the model file. Note that the old model file and the new model file shall only differ in the entries of matrices. The rest should not change, as we are only learning (updating) the parameters (not the graph). In principle, to learn the parameters does not need to know the mat in the old model file. We allow you to use the entire old model file for your convenience — if you have a different number of matrices or their sizes are different from the ones in the old model file, you must have done something wrong. 2 Deliverables 2.1 Inference (undergrads and postgrads) Write your program in Java or C/C++. In the case of Java, name your program inference.java. Your program must be able to be compiled and run as follows: $ javac inference.java $ java inference modelfile.txt queryfile.txt answerfile.txt In the case of C/C++, you must supply a makefile Makefile with a rule called inference to compile your program into a Linux executable binary named inference.bin. Your program must be able to be compiled and run as follows: $ make inference $ ./inference.bin modelfile.txt queryfile.txt answerfile.txt Output Your code must be able to read the model file, and the query file to produce an answer file. The answer file should be named as studentanswer.txt for the student model, and caranswer.txt for the car model. Each query file has two queries. The (output) answer file should have 2 lines: the first line is the vector (of the conditional distribution) of the first query, and the second is the vector of the second query. Each vector uses a space to separate the value of each dimension. NOTE1: If a makfile exists, the marking script will assume that the program is written in C/C++, otherwise Java will be assumed. NOTE2: modelfile.txt, queryfile.txt, and answerfile.txt are the placeholders for the names of the model file, the query file and the answer file. 5 2.2 Learning (postgrads) Write your program in Java or C/C++. In the case of Java, name your program learn.java. Your program must be able to be compiled and run as follows: $ javac learn.java $ java learn modelfile.txt samplefile.txt In the case of C/C++, you must supply a makefile Makefile with a rule called learn to compile your program into a Linux executable binary named learn.bin. Your program must be able to be compiled and run as follows: $ make learn $ ./learn.bin modelfile.txt samplefile.txt Output Your code must be able to read the PGM model file, and the sample data file to produce a new PGM model file. The new PGM model file should be named as newmodel.txt following the same format as the old PGM model file. NOTE1: If a makfile exists, the marking script will assume that the program is written in C/C++, otherwise Java will be assumed. NOTE2: modelfile.txt and samplefile.txt are the placeholders for the names of PGM model file, and sample data file. 3 Submission policies You must submit your program on the Computer Science Web Submission System. This means you must create the assignment under your own SVN repository to store the submission files. The SVN key for this submission is 2015/s1/ai/Assignment3. The link to the Web Submission System used for this assignment is https://cs.adelaide.edu.au/for/students/automark/ DO NOT use the link: https://cs.adelaide.edu.au/services/websubmission/ 4 Grading For more details on the online submission procedures including SVN visit the home page of the school and look for SVN information under the “For Students” tab. Your code will be compiled and run, testing its outputs for the two networks with two queries each that have been provided to you. None of the networks will contain more than 6 16 variables. Since the results are stochastic, the query answers will not be exact and will vary from run to run. The answers will therefore be tested against a tolerance of ±0.01 (i.e. your answers must be within 1% of the true values), so you must ensure convergence to this level of precision. If it passes all tests you will get 15% of the overall course mark. Note undergrads need to do inference (15%), and postgrads need to to do both inference (10%) and learning (5%). The objective of the tests is to check for the correct operation of your implementation. Hence, the basis of the assessment is to compare your results against the expected results. You must also ensure that you have an efficient implementation. 5 Bonus points for Undergrads Undergrads can choose to do the learning task described in section 1.2 and 2.2 for bonus point of 3%. 6 Using other source code You may not use other source code for this assignment. You should personally and carefully implement to fully understand the concept(s). 7 Due date and late submission policy This assignment is due 11.59pm on Friday 12th June 2015. If your submission is late the maximum mark you can obtain will be reduced by 25% per day (or part thereof) past the due date or any extension you are granted.