Tasks completed in weeks 1 to 2

  1. Greedy Heuristics for the following problems:
    • Travelling Salesman
    • Minimum Vertex Cover
    • Minimum Steiner Tree
    • Maximum Matching
    • Greedy Coloring with Interchange
    • Independent Vertex Set
    • Dominating Vertex Set
    • Local Node Connectivity
  2. Implemented Johnson’s All Pairs Shortest Path algorithm.
  3. Improved Kruskal’s Minimum Spanning Tree.
  4. Created the Union-Find data structure.

Details

1. Greedy Hueristics

I implemented several straight forward approximation algorithms and greedy hueristics. I deviated from the proposal in the solution to Travelling Salesman because Christofides algorithm would require linear program solvers and LightGraphs has a policy of minimising dependencies. A common problem I faced while implementing the algorithms is, using a ‘distributed for loop’ would result in type instable code.

Example: Independent Vertex Set

A vertex is said to be valid if it is not in ind_set or adjacent to ind_set.

Problem: Find the largest subset of vertices of the graph, such that no two vertices are adjacent. The solutions are of the following form:

  1. Set ind_set as the empty set.
  2. Using some method, choose a vertex that is valid and insert it into ind_set. Repeat this until all vertices are not valid.
  3. Return ind_set.

Program 1: degree_independent_set(g)

Step 2 is performed finding the vertex that will “invalidate” the fewest vertices. That is, it will choose the vertex that is adjacent to the fewest valid vertices (not in ind_set or adjacent to ind_set).

Program 2: random_maximal_independent_set(g)

Step 2 is performed by randomly choosing a vertex from the set of vertices that is valid. This is simulated by iterating over a random permutation of the vertices.

It easy to see that both these algorithms will always return an independent set. It may not return the maximum independent set, but the returned independent set is always maximal. i.e. inserting any vertex into it will make it a non-independent set.

2. Johnson’s All Pairs Shortest Path Algorithm

Implemented Johnson’ All Pairs Shortest Path Algorithm. It is well known that Dijkstra is the fastest algorithm to solve the shortest paths problem. However, it only provides correct output when all weights are possitive. Johnson’s algorithm involves transforming the edge weights into positve edge weights using Bellman Ford Shortest Path Algorithm and then running Dijkstra from each source.

The code looked something like

 
function johnson_shortest_paths(g::AbstractGraph, distmx::AbstractMatrix)
    BF = bellman_ford(g, vertices(g), distmx)
    distmx_tranform!(distmx, BF)
    ans = [dijkstra(g, v, distmx) for v in vertices(g)]  
    inverse_distmx_transform!(distmx, BF)
    return ans
end

I initially failed to consider the case where the edge weights are immutable. An additional |E| memory overhead will have to be incurred if the input includes immutable distmx. Eg. johnson_shortest_paths(CompleteGraph(100), SymmetricMatrix(rand(100, 100)) would have resulted in an error because SymmetricMatrix has immutable entries.

The parallel implementation invokes the parallel implementation of bellman_ford and then runs each Dijkstra call in parallel.

Benchmarks


nworkers() = 4

g = loadsnap(:facebook_combined) (|V| = 4039, |E| = 88234)

distmx = sparse(g)

for e in edges(g)
  d[e.src, e.dst] = d[e.dst, e.src] = rand(0:n)
end

johnson_shortest_paths(g, distmx): 48.071 s (186011 allocations: 1.18 GiB)

parallel_johnson_shortest_paths(g, distmx): 22.932 s (27205 allocations: 68.92 MiB)


3. Kruskal’s Minimum Spanning Tree

kruskal(g, distmx=weights(g)) In this program, we iterate over the edges of graph g in the ascending order of weights (stored in distmx) until a spanning tree is formed. A priority queue was being used to obtain the edge with the smallest weight. I replaced the priority queue with a sorting function. I also avoided creating a ‘struct’ to store the edges and its weights by using sortperm so I could first sort the weights and then permute the edges.

I replaced

 
function old_kruskal_mst(g::AbstractGraph, distmx::AbstractMatrix)
    DO SOMETHING

    edge_wt_list = [KruskalHeapEntry(Edge(src(e), dst(e)), distmx[src(e), dst(e)]) for e in edges(g)]
    heapify!(edge_wt_list)

    while !isempty(edge_wt_list)
        e = (heappop!(edge_wt_list)).edge
        DO SOMETHING
    end

    return mst
end

with

 
function kruskal_mst(g::AbstractGraph, distmx::AbstractMatrix)
    DO SOMETHING

    edge_list = collect(edges(g))
    wt_list = [distmx[e.src, e.dst] for e in edge_list]
    
    for e in edge_list[sortperm(wt_list)]
        DO SOMETHING
    end

    return mst
end

Benchmarks


distmx = sparse(g)

for e in edges(g)
  d[e.src, e.dst] = d[e.dst, e.src] = rand(-n:n)
end


g = loadsnap(:ego_twitter_u) (|V| = 81306, |E| = 1342310)

old_kruskal_mst(g, distmx): 1.825 s (5512651 allocations: 116.70 MiB)

kruskal_mst(g, distmx): 237.103 ms (81324 allocations: 42.00 MiB)



g = loadsnap(:ca_astroph) (|V| = 17903, |E| = 197031)

old_kruskal_mst(g, distmx): 134.475 ms (819823 allocations: 17.43 MiB)

kruskal_mst(g, distmx): 39.764 ms (17920 allocations: 5.95 MiB)



g = loadsnap(:facebook_combined) (|V| = 4039, |E| = 88234)

old_kruskal_mst(g, distmx): 41.809 ms (350697 allocations: 7.46 MiB)

kruskal_mst(g, distmx): 15.303 ms (4056 allocations: 2.51 MiB)

4. Union Find Data Structure

This data structure is already available in DataStructures.jl. I coded a version of it that would be optimised for graph algorithms that involve iteratively adding edges to the graph. The main difference is my implementation does not maintain ranks for each element. However, there was only a minor improvement in performance when I coded Kruskal’s MST using my implementation of Union Find. Hence, maintaining this code in LightGraphs may not be worthwhile.

Benchmarks

g = CompleteGraph(1000): run-time improved from 28 ms to 26 ms.

Lessons Learned

I lost 2 days of code while using git. Always commit before switching/deleting branches.