What is Distributed Computing?
Distributed Computing involves the breaking down a computational problem into several parallel tasks to be completed by two or more computers in a network which form a distributed system. This combines the computational power of several computers to solve large problems which involve the processing of large data or require a huge number of iterations.
Currently, there are several ongoing large-scale Distributed Computing projects spanning various fields which allow computers from all over the world to participate in, many of which have been running for years. This shows how large computational problems these days can be!
Here are some examples:
- Great Internet Mersenne Prime Search (Math)
Search for Mersenne Prime numbers, which are prime numbers that are in the form 2n – 1, or have only ‘1’s in binary. - Test4Theory (Physics)
Search for new fundamental particles at CERN’s Large Hadron Collider. - Folding@Home (Molecular Biology)
Run simulations on the folding of proteins, which aid the study of diseases such as Alzheimer’s, Huntington’s, and many cancers. - QMC@Home (Chemistry)
Study the structure and reactivity of molecules using quantum chemistry and Monte Carlo techniques. - MindModelling@Home (Cognitive Science)
Use computational cognitive process modelling to better understand the mind. - MilkyWay@Home (Astronomy)
Create a highly accurate 3D model of the Milky Way galaxy. - Enigma@Home (Cryptography)
Break three original Enigma message which have not been broken.
Demonstrating distributed computing using 2 Pis
For demonstration purposes, I shall connect 2 Raspberry Pis using an Ethernet cable and perform a simple merge sort on a large array of elements.
Firstly, here’s the code for a merge sort algorithm written in python.
Next, here’s the code for using the merge sort algorithm to sort an array of 100000 elements using 1 Raspberry Pi.
(This code will be used in the following programs as well so have them in the same directory before running them!)
With only one Raspberry Pi performing the task, it takes about 24 seconds to complete the task.
Now, let’s distribute the task to 2 Raspberry Pis but to do so we first have to set the IP addresses of each Raspberry Pi.
For the first Pi, type this in the command line, configuring its IP address to 192.168.1.1, and this Pi will act as the Server in the network.
sudo ifconfig eth0 192.168.1.1 broadcast 192.168.1.255 netmask 255.255.255.0
Similarly, type the following for the second Pi, configuring its IP address to 192.168.1.2, and this Pi will act as the Client.
sudo ifconfig eth0 192.168.1.2 broadcast 192.168.1.255 netmask 255.255.255.0
For the first Pi, run the following code.
When the line “Waiting for client…” is printed on the first Pi’s command line, run the following code on the second Pi.
The time taken to sort the array has decreased to about 16 seconds, which is not a 2-fold decrease due to the overhead in processing and transferring of the data between the 2 Pis. This increase in speedup will be more prominent with the use of more Raspberry Pis by connecting them via a hub.
What now?
Hopefully this really short tutorial gives you a glimpse of what distributed computing is about and how to implement it simply. As exemplified there are many applications of this and perhaps you could start your own distributed computing project using your Raspberry Pi (and a friend’s or friends’)!