Akka actors based, generic primes calculation parallel algorithm
Actors programming is an interesting solution to take benefits of all your server CPU core to compute primes number while still being able to compute their order positions.
Here the principle is to create an actor router, which will load balance primes computation, CheckerActors. The load balancing is based on the size of each load balanced actors, a new task will be given to the actor with the lowest mailbox size.
All checked values are then sent to a single actor which will reorder received values and attribute them their respective positions. This actor, the ValuesManagerActor, is also responsible of managing the load, and check that final destination of all results can manage the result flow.
Here all results are sent to PrinterActor, this actor output results on the standard ouput, if you try here to print all results this actor may not as fast as the received result flow, so its mailbox will grow, up to the jvm maximum heap size thus generating OutOfMemory exception. That's why here I've added an acknowledge mechanism that verify we do not have too many results sent and not taken into account. If we have too many results not taken into account then ValuesManagerActor suspend the load balancing to CheckerActors.
My primes project at github for experiments and tests.
How long time to get the 500000th prime (sbt console) (first results, jvm 1.7.0.51, Intel Core i7 2.3 Ghz, 4 CPUs) :
Algo | Duration | Comments |
---|---|---|
Classic | 61s | pgen.primes.take(500000).last |
Parallel | 22s | pgen.primesPar.take(500000).last |
Actor (ack on each response) | 38s | with fork-join-executor |
Actor (ack every 5000 responses) | 35s | with fork-join-executor |
Actor (ack on each response) | 33s | with thread-pool-executor |
Actor (ack every 5000 responses) | 30s | with thread-pool-executor |
Actors based algorithm is quite faster than a classic sequential one, but not as fast as using a parallel collection in order to parallelize the classic algorithm.
Although my implementation is not as fast as the parallel solution (I'll have to investigate that point), it has a major benefits, it can be distributed across several computers. It'll be the subject of a new article.