Tasks completed in weeks 1 to 2
- Greedy Heuristics for the following problems:
- Travelling Salesman
- Minimum Vertex Cover
- Minimum Steiner Tree
- Maximum Matching
- Greedy Coloring with Interchange
- Independent Vertex Set
- Dominating Vertex Set
- Local Node Connectivity
- Implemented Johnson’s All Pairs Shortest Path algorithm.
- Improved Kruskal’s Minimum Spanning Tree.
- Created the Union-Find data structure.
Details
1. Greedy Hueristics
I implemented several straight forward approximation algorithms and greedy hueristics. I deviated from the proposal in the solution to Travelling Salesman because Christofides algorithm would require linear program solvers and LightGraphs has a policy of minimising dependencies. A common problem I faced while implementing the algorithms is, using a ‘distributed for loop’ would result in type instable code.
Example: Independent Vertex Set
A vertex is said to be valid if it is not in ind_set
or adjacent to ind_set
.
Problem: Find the largest subset of vertices of the graph, such that no two vertices are adjacent. The solutions are of the following form:
- Set
ind_set
as the empty set. - Using some method, choose a vertex that is valid and insert it into
ind_set
. Repeat this until all vertices are not valid. - Return
ind_set
.
Program 1: degree_independent_set(g)
Step 2 is performed finding the vertex that will “invalidate” the fewest vertices. That is, it will choose the vertex that is adjacent to the fewest valid vertices (not in ind_set
or adjacent to ind_set
).
Program 2: random_maximal_independent_set(g)
Step 2 is performed by randomly choosing a vertex from the set of vertices that is valid. This is simulated by iterating over a random permutation of the vertices.
It easy to see that both these algorithms will always return an independent set. It may not return the maximum independent set, but the returned independent set is always maximal. i.e. inserting any vertex into it will make it a non-independent set.
2. Johnson’s All Pairs Shortest Path Algorithm
Implemented Johnson’ All Pairs Shortest Path Algorithm. It is well known that Dijkstra is the fastest algorithm to solve the shortest paths problem. However, it only provides correct output when all weights are possitive. Johnson’s algorithm involves transforming the edge weights into positve edge weights using Bellman Ford Shortest Path Algorithm and then running Dijkstra from each source.
The code looked something like
I initially failed to consider the case where the edge weights are immutable.
An additional |E|
memory overhead will have to be incurred if the input includes immutable distmx
. Eg. johnson_shortest_paths(CompleteGraph(100), SymmetricMatrix(rand(100, 100))
would have resulted in an error because SymmetricMatrix
has immutable entries.
The parallel implementation invokes the parallel implementation of bellman_ford
and then
runs each Dijkstra call in parallel.
Benchmarks
nworkers() = 4
g = loadsnap(:facebook_combined)
(|V| = 4039, |E| = 88234)
distmx = sparse(g)
for e in edges(g)
d[e.src, e.dst] = d[e.dst, e.src] = rand(0:n)
end
johnson_shortest_paths(g, distmx)
: 48.071 s (186011 allocations: 1.18 GiB)
parallel_johnson_shortest_paths(g, distmx)
: 22.932 s (27205 allocations: 68.92 MiB)
3. Kruskal’s Minimum Spanning Tree
kruskal(g, distmx=weights(g))
In this program, we iterate over the edges of graph g
in the ascending order of weights (stored in distmx
) until a spanning tree is formed.
A priority queue was being used to obtain the edge with the smallest weight.
I replaced the priority queue with a sorting function. I also avoided creating a ‘struct’ to store the edges and its weights by using sortperm so I could first sort the weights and then permute the edges.
I replaced
with
Benchmarks
distmx = sparse(g)
for e in edges(g)
d[e.src, e.dst] = d[e.dst, e.src] = rand(-n:n)
end
g = loadsnap(:ego_twitter_u)
(|V| = 81306, |E| = 1342310)
old_kruskal_mst(g, distmx)
: 1.825 s (5512651 allocations: 116.70 MiB)
kruskal_mst(g, distmx)
: 237.103 ms (81324 allocations: 42.00 MiB)
g = loadsnap(:ca_astroph)
(|V| = 17903, |E| = 197031)
old_kruskal_mst(g, distmx)
: 134.475 ms (819823 allocations: 17.43 MiB)
kruskal_mst(g, distmx)
: 39.764 ms (17920 allocations: 5.95 MiB)
g = loadsnap(:facebook_combined)
(|V| = 4039, |E| = 88234)
old_kruskal_mst(g, distmx)
: 41.809 ms (350697 allocations: 7.46 MiB)
kruskal_mst(g, distmx)
: 15.303 ms (4056 allocations: 2.51 MiB)
4. Union Find Data Structure
This data structure is already available in DataStructures.jl. I coded a version of it that would be optimised for graph algorithms that involve iteratively adding edges to the graph. The main difference is my implementation does not maintain ranks for each element. However, there was only a minor improvement in performance when I coded Kruskal’s MST using my implementation of Union Find. Hence, maintaining this code in LightGraphs may not be worthwhile.
Benchmarks
g = CompleteGraph(1000)
: run-time improved from 28 ms to 26 ms.
Lessons Learned
I lost 2 days of code while using git. Always commit before switching/deleting branches.