Analysis of parallel version of PageRank algorithm

In this post, we are going to analyze a simplified parallel version of the famous algorithm PageRank, the algorithm used by Google Search to rank web pages in their search engine results.

The code of the PageRank algorithm has been written in C++ exploiting the library OpenMP to parallelize the code. Finally, it has been tested over a different number of threads as well as different scheduling policies in order to compare its performances and analyze them.

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites

Important: The algorithm’s implementation details have not been reported here but can be found in the following repository.

Environment

The table below gives an overview of the configuration of the environment used to run the experiment.

Operative system	Ubuntu Server 16.04.1 LTS 64-bit
Memory	Intel(R) Xeon(R) CPU E5-2680 32 cores
C/C++ compiler	128 GB
CPU	gcc 7.4

Graph description

The experiments have been run on a huge graph containing 3072441 nodes (pages) and 117185083 edges (links). Each node had an average of 38,14 ingoing or outgoing edges. In addition, the node with more outgoing connections had a total of 33007 outgoing edges while the node receiving more input connections had not more than 3415 incoming edges.

Furthermore, it has been realized that 11% of the nodes didn’t have any outgoing edge and only 1 of the 3072441 nodes had zero incoming edges. Thus, the graph was not symmetric and its imbalances could result in different workloads while computing the rank of different nodes.

Important: The graph used for this experiment is not publicly available but a toy graph has been created to test the correctness of the algorithm.

Scheduling policies

The schedule clause affects how to loop iterations are mapped onto threads. For this experiment have been taken into examination 3 of the scheduling policies offered by OpenMP.

Static Policy

For N iterations and T threads, all threads equally get one chunk of N/T of loop iterations. The static scheduling type is appropriate when all iterations have the same computational cost.

Dynamic Policy

For N iterations and T threads, each thread gets a fixed-size chunk of k loop iterations and when a particular thread finishes its chunk of iterations, it simply gets assigned a new chunk. So, the relationship between iterations and threads is non-deterministic and would only be decided at run-time.

However, despite its good flexibility, this schedule presents a high overhead due to the lots of decisions regarding which thread gets each chunk. Thus, the dynamic scheduling type is appropriate when the iterations require different computational costs, that is, the iterations are poorly balanced between each other.

Guided Policy

For N iterations and T threads, initially, the first thread gets a fixed-size chunk of k=N/T loop iterations. Afterward, the second thread gets a chunk of size proportional to the number of iterations that have not been assigned divided by the number of threads. Therefore, the size of the chunks decreases until a minimum threshold is decided by the user. However, the chunk which contains the last iterations may also have a smaller size.

Its advantage over the static policy is that it can better handle the imbalanced load and it also takes fewer decisions than the pure dynamic version, thus causing less overhead. The guided scheduling type is appropriate when the iterations are poorly balanced between each other. The initial chunks are larger because they reduce overhead.

The smaller chunks fill the schedule towards the end of the computation and improve load balancing. This scheduling type is especially appropriate when poor load balancing occurs toward the end of the computation.

Running time comparison

Finally, the program has been tested on different numbers of threads and under different scheduling policies and the respective performances have been reported in the tables below.

In particular, in the first experiment has been used a chunk size of 1, which is the default set by OpenMP, while in the second experiment has been used a chunk size of 32 Let’s remind that the chunk size is the minimum number of iterations assigned to each tread.

Important: The last three columns of the first row of the table have to be interpreted as scheduling policy (chunk size). Furthermore, the running times have been expressed in seconds.

threads	static (1)	dynamic (1)	guided (1)
4	4.401911	4.351563	4.632874
8	2.879228	2.69747	2.846156
16	2.015405	1.91141	2.045555

Chunk size of 1

threads	STATIC (32)	dynamic (32)	guided (32)
4	4.281	3.427674	3.151209
8	2.38342	1.731104	1.887575
16	1.848175	1.140203	1.322415

Chunk size of 32

To make a comparison with the serial version, the program took an average of 9,5 seconds when running on a single thread. To understand the different running times taken by the program to terminate its task, let’s first briefly review how each of the examined policies works.

PageRank algorithm running time with 16 threads and dynamic policy

Results

Overall, the program runs faster when using a bigger chunk size. This is due to the overhead generated by the scheduling policies while assigning chunks to threads. As expected, the improvement is much more remarked in the dynamic and guided policy where the chunk assignment process is the main bottleneck.

The speedup obtained by the dynamic scheduling policies suggests that the analyzed graph was very unbalanced, thus resulting in some iterations taking much more time than others.

We can also note how the program’s running time for different numbers of threads followed Amdahl’s law. In fact, it benefits from bigger relative speedups when the number of multiple processes remains low and its increase in speedup cools down as more threads are used at the same time.

The program has also been tested running with 32 threads but the results have not been reported since the running time was similar or even worse with respect to the 16 threads version.

Find more

GitHub – PageRank-Parallel: Analysis of parallel version of PageRank algorithm implementation in C++ with OpenMP library

KejiTech

Falling in love with data and machine learning in the simplest and funniest way. A website that no data scientist should miss, but also open to everyone eager to explore something new.

Analysis of parallel version of PageRank algorithm

Environment

Graph description

Scheduling policies

Static Policy

Dynamic Policy

Guided Policy

Running time comparison

Results

Find more

References

Leave a comment Cancel reply

Published by

Davide Liu

Environment

Graph description

Scheduling policies

Static Policy

Dynamic Policy

Guided Policy

Running time comparison

Results

Find more

References

Share this:

Related

Leave a comment Cancel reply

Published by

Davide Liu