TD1: Automated reasoning techniques in OCaml

Your first practical / homework is to write a SAT-solver, i.e. a solver for mathematical problems expresses in propositional logic.

The SAT problem consists in deciding, given a formula in Conjunctive Normal Form, if it is satisfiable. We recall the basic definitions:

Let's go

First attempt

Complete the file Exercise.ml. We give examples of different difficulties to test the correction and efficiency of your program.

How do I run and test the code on my machine?

Here are different possibilities:

A bit more guidance: The DPLL approach

Today, we implement the SAT solver as a procedure known as DPLL (for Davis–Putnam–Logemann–Loveland). The idea is to compute all the direct consequences of a partial valuation: if the problem contains some clause such that all the literals but one are false in a partial valuation, then we can deduce that the remaining literal must be true. This step is called unit propagation. When no more propagation is possible, we have to branch on a variable, that is to say to see what happens if we set a currently assigned variable to or .

Download and complete the file dpll.ml.

For testing purposes, we provide a DIMACS parser (DIMACS is the standard format for SAT-soslving problems): DIMACS.ml, as well as a couple of tests tests.zip (but you can download other DIMACS files from the Internet).

Working modulo a theory

We now try to be more expressive than just propositional logic.

Download this archive DPLLT.zip.

It contains a file dpllT.ml which is very similar to dpll.ml.

The main difference is that literals may be taken as literals from a particular theory T, for which we have a decision procedure (a function of type (Lit.t list -> bool) deciding whether a conjunction of literals is consistent (outputs true) or inconsistent (outputs false) with the theory T.

This decision procedure is passed as an extra argument to function solve.

Adapt your DPLL algorithm into a DPLL(T) algorithm to decide whether a CNF is satisfiable in theory T or not.

You need to complete the function solve, which now takes the decision procedure as an extra argument, and which is now of type (solutionType -> bool) -> cnfType -> solutionType

In order to test your algorithm, we have included, in the subdirectory LRA of the archive

Note: When reaching the test code, the simple OCaml interpreter available from emacs may not be able to fetch the code in subdirectory LRA. A more robust approach is to use ocamlbuild from the command line, to compile a multi-file project: In the directory with files dpllT.ml and _tags, run the command ocamlbuild dpllT.native (compiling to native code) or ocamlbuild dpllT.byte (compiling to bytecode). If no error occurs, it will produce a file dpllT.native or dpllT.byte which you can run on your OS.

Among the tests, we have modelled the 2016 American elections and expressed as a CNF (built on LRA literals) the question "There is a tie between Clinton and Trump". Determining whether this CNF is SAT answers the question of whether a tie is possible. We added a piece of code to count the number of solutions (if any). For this particular question, there should be 5 solutions.

Proving any formula

We now try to prove any type of formula.

Download the file all_form.ml.

Some explanations about the provided code:

Implementing ∀x p(x) as we write it by hand is not convenient because we have to manipulate formulae as equivalence classes of trees, the equivalence being that generated by the renaming of bound variables. This is not impossible, e.g. working on canonical representatives in the classes, but is quite complicated, error-prone and actually slightly inefficient.

Most implementations use the solution proposed in the file: A term t is not considered independently from the context in which it is being used. If it is considered as a term occurring in a formula, in such a way that one of its variables "x" is bound by a quantifier ∀x or ∃x further up in that formula, then in the implementation, the quantifier ∀x or ∃x is simply encoded as or , forgetting the name of the bound variable; at the same time, every occurrence of x in the term t is encoded as (Bound i), where i is an integer: the number of quantifiers that one has to cross when going from x to the quantifier that binds x in the formula tree. For instance, ∀x (x=x) is implemented as ∀(Bound 0 = Bound 0), because there are no quantifiers to cross when going from an occurrence of x to ∀x. On the other hand ∀x ∃y (x=y) is implemented as ∀∃(Bound 1 = Bound 0) because from y to its binder, there are no quantifiers to cross, but from x to its binder, there is 1 quantifier to cross (that which binds y). Note that sometimes, not all occurrences of x gets the same i, e.g. in ∀x ((x=x) ∧ ∃y (x=y)), which is implemented as ∀((Bound 0 = Bound 0) ∧ ∃(Bound 1 = Bound 0)).

Now if the term t is considered in a context that does not bind x, so x is really a free variable, not just in the term t, but also free in the context in which t is being used, then x is implemented as (Free "x").

Note that the manipulation of bound variables, managing the integers i of the (Bound i) constructs, is not something you should worry about, because the only point of a variable x bound by ∀x or ∃x is at some point to be substituted by a term t', and we give you the substitution function: If you see the formula (∀A) implementing ∀x A(x), and you want get into A itself, you need to say with which x you are doing this, i.e. with which term t' you want to substitute x. In that case you call (subst t' 0 A) to have what you would write on paper as A(t'). In case you don't want to substitute it by any interesting term and really want to work on A(x), where x is really a free variable, you can just call (subst (Free "x") 0 A).