Toy Akka in python

Subscribe Send me a message home page tags


In this post, we present a toy implementation of akka in python. The idea is to create a separate process for each actor. In this way, we can get around of the python GIL. On the other hand, there is no free lunch. Setting up processes introduce overheads and we will need to have background process to distribute messages, which also consumes resources.

The source code can be found here:

Here is how we define a customized actor. We just need to override the onReceive method.

class Counter(Actor):
    def onReceive(self, message):
        self.getActorRef().send(self.getLastSender(), "finished")

To instantiate the actor instance, we need to create a Context object first. In the following code, we create a list of worker actors and an actor to collect results. Note that actor codes are wrapped within the context (line 9) and we need to call context.join() at the end of the block (line 13).

context = Context()

workers = []
for i in range(nWorkers):
    workers.append(Counter("woker-" + str(i), context).getActorRef())

resultCollector = ResultCollector(startTime, nWorkers, "actor-ResultCollector", context).getActorRef()

with context as c:
    logging.debug("enter the with block")
    for worker in workers:
        resultCollector.send(worker, n / nWorkers)

The implementation is so simple that we need to kill the process in order to stop the program.


We will test the performance of this toy implementation. We define the following computation intensive task:

def countDown(n):
    while n > 0:
        n -= 1

The test is running with processor: 2.7 GHz Dual-Core Intel Core i5. Here is the test configuration:

n = 200000000
nWorkers = 5

test 1: run countDown(n) in a single thread(process).
test 2: run countDown(n/nWorkers) in a single thread(process).
test 3: run countDown(n/nWorkers) in a child process.
test 4: run countDown(n/nWorkers) in nWorkers threads.
test 5: run countDown(n/nWorkers) in actors.

Here is the output:

test 1: Time elapsed (single threaded): 12.610621929168701
test 2: Time elapsed (single threaded n / 5): 2.455589771270752
test 3: Time elapsed (single threaded n / 5 in child process): 2.5409128665924072
test 4: Time elapsed (multi-threaded): 14.095125913619995
test 5: Time elapsed (actor model): 9.327059984207153

We have two observations:

  1. Due to the GIL, the multi-threading approach takes longer than the single threaded solution
  2. Actor model does accelerate the execution but we are not close to the theoretic performance (will two cores, multi-process approach should take half of the time.)

Personally, I don't think we can blame all the performance loss to overhead introduced by setting up actors and running some background tasks. Part of the reason is that we should not expect the program to run 2 times faster with multiple processes in the first place. For example, suppose the probability of a process being picked up by a core is \(p\), when we run the countDown(n) in a process, the probability of that process being picked up by any cores is \(1 - (1-p)^2 = 2p - p^2\) and it will take \(\frac{T}{2p - p^2}\). When we split the task into multiple pieces and assign them to different processes, we still have 2 cores. So we essentially split the original task into two pieces and each core works on one piece of the task. It follows that the multi-process approach should take \( \frac {T/2} {p} = \frac {T}{2p} \) and the ratio between the two approaches should be \(\frac{2-p}{2}\), which is greater than 0.5.

----- END -----

Welcome to join reddit self-learning community.
Send me a message Subscribe to blog updates

Want some fun stuff?