Graphs and minimum spanning trees
par Eric Fusy et Karthik Bhargavan

Login :  Mot de passe :

In this TD, we will implement a particular type of graph, where nodes represent cities on a map and arcs are drawn between neighbouring cities (adjacencies are not determined by road connections, but instead by the geodesic distance, precisely two cities are considered as adjacent if their geodesic distance is smaller than a certain maximal value). We will then implement two classical (greedy) algorithms, Kruskal's algorithm and Prim's algorithm, to find the minimum spanning tree in each connected component of the graph of cities, whose arcs are weighted by the geodesic distance. (You will normally be expected to complete at least up to question 6.)

Left: a graph with two connected components. Right: the arcs of the graph are assigned weights; the arcs of the two minimum spanning trees (one for each connected component) are shown bolder.

# Java standard library

We will use different interfaces and classes coming from the Java standard library:

• the Map interface, associating values to objects. We use its LinkedHashMap implementation, based on hash tables;
• the Collection interface, representing finite multisets. It is implemented in particular by LinkedLists;
• the List interface, representing ordered collections. It is implemented in particular by LinkedLists;
• the Set interface, representing finite sets. We do not care about its implementations;
• the Queue interface, representing first-in-first-out collections. It is implemented using LinkedLists and PriorityQueues;
• the Comparator interface, which gives a total ordering between objects of a certain type. We are going to give an implementation of it.
You are encouraged to look at those libraries to see which methods are available on each data structure.

# Getting started

To begin with, create a new project TD5. We give you partial code, available at td_5.zip. Download it and unzip it within the TD5 project (in the src/ folder if the project has one). This will create three packages called arbres, carte and graphe, and a Test class (remember to refresh within Eclipse using "F5").

The package arbres contains the squeletons of three classes: ArcComparator, UnionFind, and SpanningTree, which will have to be completed in exercises 3 to 8 (in order to compute minimum spanning trees of proximity graphs of cities).

The package carte contains two classes, Carte and Visualisation, which are used to load and visualise a map in a graphical window. These classes should not be modified.

The package graphe contains four classes:

• Ville: This class represents a city, that is, its name and coordinates (latitude, longitude).
```public class Ville {
// Donne le nom
public String getNom();

// Calcule la distance en metres entre la ville courante et dest
public double distance(Ville dest);

...
}
```
• Arc: This class represents a link between two neighbouring cities.
```public class Arc {
public final Ville source, target;
public int length;

public Arc(Ville o,Ville d, int l){
source=o;
target=d;
longueur=l;
}
...
}
```
• GrapheVille, representing a graph of all the cities on a map and their interconnections; This class (detailed below) will have to be completed in the first exercise.
• GrapheEuclidean, an auxiliary class (extending GrapheVille) for efficiently computing proximity graphs.
You are encouraged to look at the Ville and Arc classes before starting.

We will test the programs on the maps of Mayotte and France. Download the files mf.txt and fr.txt, and install these two files in the main directory TD5 of the TD. If you would like to work on files for other countries, you can find them here.

# Proximity graphs

We will construct graphs of cities, where each Ville will be connected by an Arc to a number of other Ville that we call its neighbours.

## The class GraphVille

First, consider the incomplete class GrapheVille given in package graphe. It contains a field toNeighbours of type Map<Ville, List<Arc>> that maps each ville v to an ordered collection of Arcs, each of which has v as source and one of v's neighbours as target.

```public class GrapheVille {
private Map<Ville, List<Arc>> toNeighbours;

// Initialize toNeighbours
public GrapheVille() {...}

// Is Ville v in the graph?
boolean contient(Ville v){...}

// All the Villes in the graph
Set<Ville> villes(){...}

// Add a Ville to the graph
void ajouterVille(Ville v){...}

// Add an Arc to the graph
void ajouterArc(Arc arc){..}

// All the arcs that go out from Ville v to its neighbours
List<Arc> arcsSortant(Ville v){...}

}
```

Complete the class GrapheVille by implementing the methods shown here.

• The constructor GrapheVille must initialize toNeighbours using a class that implements Map: use LinkedHashMap<Ville,List<Arc>> to obtain the same results as the tests below.
• The toNeighbours map contains all the Villes and Arcs of the graph: each Ville in the graph is a key in the map. So contient and villes simply use the appropriate methods of the Map interface.
• To add a new Ville v to the graph, ajouterVille adds an entry to toNeighbours that maps v to the empty list of Arcs. If a Ville is already in the graph, ajouterVille does nothing.
• If the source of a given Arc is not defined in the graph, ajouterArc must throw an IllegalArgumentException, with a message about, otherwise it retrieves the list of Arcs that the source is mapped to and appends the arc to the end of this list. Note that ajouterArc does nothing regarding the target of the given arc.
• If the given Ville is not defined in the graph, arcsSortant must throw an IllegalArgumentException, with a message about, otherwise it returns the arcs that the Ville is mapped to. To protect the structure of the graph, we must prevent any modification of the list of arcs through the reference returned by arcsSortant. This can easily be done by using the static method Collections.unmodifiableList.

For now, the body of method connectNeighbours must be left empty.

Test with the method test1 in Test.java. The result should look as follows:

```There are 87 nodes in this graph.
Accua	Latitude: -12.7225	Longitude: 45.0563889
Andrema	Latitude: -12.6786111	Longitude: 45.0977778
Apandzo	Latitude: -12.8305556	Longitude: 45.1302778
Bambo Est	Latitude: -12.9261111	Longitude: 45.1738889
Bambo Ouest	Latitude: -12.9219444	Longitude: 45.0872222
Bandraboua	Latitude: -12.7016667	Longitude: 45.1205556
Bandrajou	Latitude: -12.7452778	Longitude: 45.2186111
Bandrani	Latitude: -12.845	Longitude: 45.1019444
Bandrele	Latitude: -12.9066667	Longitude: 45.1913889
There are 86 arcs in this graph.
```

and the window:

## Connecting Neighbours

Next, we construct graphs where neighbours are any two cities within a certain distance of each other. Now fill in the method connectNeighbours that takes an argument maxDistance and adds an Arcs between any two distinct Ville (using equals) that are less than or equal to maxDistance away from each other.

```public void connectNeighbours(double maxDistance){...}
```

You can assume that the graph already contains all cities as nodes but no arc. You will use the method double distance(Ville dest) of the class Ville that computes the distance between the current Ville and any other Ville. You will also use the method ajouterArc of the class GrapheVille to add a new Arc between neighbouring Ville.

Test with test2. The result should be:

```There are 166 arcs in this graph.
```

and you should obtain the following window, where the color of each arc is a function of its length field:

To compute the proximity graph for France, replace mf.txt in initMayotte with fr.txt (and the two cities with Palaiseau and Montanceix). What do you obtain? Since the time taken by connectNeighbours is quadratic in the number of cities, you will find that constructing this graph takes a long time. Instead, from this exercise onwards, all our tests will use the function initFrance that uses a more efficient graph construction (implemented by the GrapheEuclidean in package graphe).

# Minimum spanning trees/forests

We now focus on the problem of computing minimum spanning trees of a graph G. Since G can be made of several connected components G1,...,Gk (as with the proximity graph when the maximal value for the distances becomes small), we consider more generally the problem of computing the minimum ''connecting spanning forest'' of G (a connecting spanning forest of G is a forest of trees T1,...,Tk such that each Ti is a spanning tree of Gi, for each i in [1..k]), as shown in the introductory figure of the TD.

## Comparing arcs

Since Kruskal's and Prim's algorithm consider the arcs in increasing order, we need to have a comparator for arcs. Fill in the method compare of ArcComparator.java, which compares two Arc objects according to the length (which is an attribute of the class Arc).

Test with test3. The result should be:

```-1
1
1
0
```

## Union-find

For Kruskal's algorithm, we will also need a union-find structure for Arc objects, which is to be implemented in the class UnionFind.java. We would like the UnionFind to rely on a HashMap mechanism. Precisely, there is an attribute parent of type HashMap<Ville,Ville> such that the instruction parent.get(v) returns the parent of v (possibly null), and the instruction parent.put(v,u) assigns u as the parent of v. Fill in the constructor and the methods union and find in the class UnionFind.java. Use the fact that parent.get(v) returns null as no parent is assigned to v. As HashMap uses equals for testing identity, you should do the same for consistency.

Remark: the union operations will be applied to arcs of a geographic map that are considered in the order given by the geodesic distance; it is reasonable to assume that the components along the way will look quite "random", so that the underlying trees of the union-find structures are expected to be quite well balanced. Therefore you do not have to use size comparison as in TD2, but you should use iterative path compression in the find method.

Test with test4. The result should be:

```true
true
false
```

## Kruskal's algorithm, all components together

We can now implement a first version of Kruskal's algorithm. At first we remind you how Kruskal's algorithm works:

1. the arcs are sorted in increasing order of weight (you can use an instruction as Collections.sort(col,ac), where col is of type Collection<Arc>, and ac is of type ArcComparator),
2. the arcs e1,e2,...,ek are considered one by one in increasing order, and at each step i one decides if ei is to be selected: ei is selected if and only if it satisfies the property that its two extremities are not connected using the previously selected arcs (testing this property is handled by the union-find structure).
The output (the collection of selected arcs) forms the minimum connecting spanning forest of the graph. Note that this forest will not contain some trivial components which are formed from a single vertex and which are therefore without an arc.
To initially collect all the arcs of the GraphVille g (which are then to be sorted at the beginning of the method kruskal), create an auxiliary method public List<Arc> tousLesArcs() in the class GrapheVille.

Fill in the method public static Collection<Arc> kruskal(UnionFind u, GrapheVille g) in the class SpanningTree.java, which is to return the collection of the arcs that forms the minimum connecting spanning forest. The UnionFind argument u is initially empty and, at return time, it should contain a description of the non-trivial connected components of g. This side-effect will be useful in the next exercise.

Remark: beware that Collection<E> (with E the type of elements in the collection) is not a class but an interface. Therefore, it has to be instantiated using an implementing class, for instance with an instruction such as Collection<E> col=new LinkedList<E>().

Test with test5. The result should be (the computation time depending on device/implementation):

```Computation time (Kruskal first version) : 613 ms
Total length:  : 110191528 m
49390 connected cities
```

and you should obtain the following window, where the arcs in the minimum connecting spanning forest F are colored orange, and arcs not in F are light gray:

## Separating the components in Kruskal's algorithm

Using the first kruskal method, fill in the method public static Collection<Collection<Arc>> kruskal(GrapheVille g), which is to return the list of collections of arcs (one collection for each non-trivial tree-component) of the minimum connecting spanning forest of g.

Remark: it is recommended to use a HashMap<Ville,Collection<Arc>> listes that maps each root of the UnionFind (a root for each connected component) to the list of arcs of the minimum spanning tree of the corresponding connected component, and to return listes.values() at the end of the method.

Test with test6. The result should be:

```Computation time (Kruskal second version) : 707 ms
Total length  : 110191528 m
49390 connected cities
3330 components in the forest
```

and you should obtain the following window, where the arcs in the minimum connecting spanning forest F are colored (a color is randomly assigned to each tree-component), and arcs not in F are light gray:

## Prim's algorithm for the component of a starting Ville

At first we recall how Prim's algorithm works. It uses a priority queue q (here, a queue of arcs, with the priority given by having smaller distance) as an auxiliary structure. One starts at a Ville v, puts all arcs at v into q, and remove v from the set of non-visited elements. Then as long as q is not empty,

1. pick up the next element from q (instruction q.poll()), which is an Arc a,
2. if the target u of a is in the non-visited elements, add to q all the arcs with u as source, and remove u from the set of non-visited elements.
Note that Prim's algorithm only visits nodes (Ville objects) that are in the connected component C containing v; its output is the minimum spanning tree of C.

Fill in the method public static Collection<Arc> primTree(HashSet nonVisited, GrapheVille g, Ville start), which is to return (in the form of a collection of arcs) the minimum spanning tree of the connected component C (of g) containing the Ville object start. During the process, the HashSet object nonVisited is to be modified so that, at the end of the execution, all the Ville in C have been taken out of nonVisited (this side effect will be useful for the next exercise).

Test with test7. The result should be:

```Computation time (Prim first version) : 109 ms
Total length:  : 44973288 m
20002 connected cities
```

and you should obtain the following window, where the arcs in the minimum spanning tree T of the component containing ''Palaiseau'' are colored orange, and arcs not in T are light gray:

## Prim's algorithm at each component

Using the primTree method, fill in the method public static Collection<Collection<Arc>> primForest(GrapheVille g), which is to return the list of collections of arcs (one collection for each tree-component) of the minimum connecting spanning forest of g. Again, note that this forest should not contain the trivial components which are formed from a single vertex and empty sub-collections are not allowed in the ouput of primForest.

To peek (without removing) one element from a collection or a set, one can paste the following generic method inside class SpanningTree and use it properly:

```  public static <E> E peekOne(Collection<E> c) {
for (E e : c)
return e;
return null;
}
```

Test with test8. The result should be:

```Computation time (Prim second version) : 506 ms
Total length  : 110191528 m
49390 connected cities
3330 components in the forest
```

and you should obtain the following window, where the arcs in the minimum connecting spanning forest F are colored (a color is randomly assigned to each tree-component), and arcs not in F are light gray: