Friday, January 30, 2015

Quick hack to automatically ip geoloc people who tried to get access to your server (using SSH)

The following script gives this kind of output (for only 3 days of logs...) :
# ./sshhack-iploc /var/log/auth.log*
-----------------------------
-- SSH failed distinct attempts by country
Hong Kong -> 42
China -> 38
France -> 10
United States -> 6
Germany -> 2
Venezuela -> 1
United Kingdom -> 1
India -> 1
Republic of Korea -> 1
Brazil -> 1
Netherlands -> 1
-----------------------------
-- SSH top 10 of tested users
User 'root' -> 78318
User 'admin' -> 158
User 'oracle' -> 21
User 'postgres' -> 18
User 'ts' -> 14
User 'vnc' -> 13
User 'test' -> 12
User 'bin' -> 12
User 'git' -> 11
User 'teamspeak3' -> 10
-----------------------------
-- General Information
Processed time range : 92 hours (~3 days)
Total number of auth failures 78757 (~856 by hour)
Distinct tested logins 116

Tuesday, January 27, 2015

serial versus parallel isPrime function, parallelization is not always interesting...

Parallel implementation of isPrime function based on promise and futures generates sometime too much overhead in comparison to the simple one. In fact it is really faster with prime number... but with not prime numbers it won't be always the case. So how to decide which implementation is best for a given number, ... no idea yet Full code is available here.
$ sbt 'test-only fr.janalyse.primes.IsPrimesTest'

[info] IsPrimesTest:
[info] - isPrimePara tests
[info] - isPrime mono versus parallel tests
[info]   + duration for 10000 : 1050ms serial processing, highest prime 104729 
[info]   + duration for 10000 : 5451ms parallel processing, highest prime 104729 (ForkJoinPool) 
[info]   + duration for 10000 : 13384ms parallel processing, highest prime 104729 (CachedThreadPool) 
[info] - very big prime test
[info]   + duration for 17436413019234331 : 22063ms sequential isPrime 
[info]   + duration for 17436413019234331 : 5839ms parallel isPrime (ForkJoinPool) 
[info]   + duration for 17436413019234331 : 5848ms parallel isPrime (CachedThreadPool) 

Thursday, January 22, 2015

Primes computation : From home made akka actors based solution to akka-stream

Available on github, Primes is a library dedicated to primes number computation, it is fully generic and can work with various numeric types (Int, Long, BigInt for example). This library generates CheckedValue instances which contains the just tested numeric value, it tells you if it is prime number or not, gives you its position and the number of digit it contains. The fact that it computes the prime position, or not prime position, implies that the computation starts from the beginning or from a previous persisted state (previous highest prime CheckedValue and not prime CheckedValue).

In January 2014,

I spent some times writing an actor based primes computation algorithm, it was not a very easy task because it implies to manage myself many things :
  • back pressure management,
  • parallelism,
  • reordering results.

The full source code of this old implementation is available here.

Now, one year later,

it was time to look at something else, back pressure, results ordering, parallelism, ... are too common patterns with actors that we shouldn't have to deal with them directly, it is up to the framework to provide us with the right solution.

Akka-stream brings us a great abstraction, for them, and the result becomes quite easier to read and to understand. back-pressure is automatically taken into account, parallel processing and ordering are just done through mapAsync call instead of just a map call.

The full source code of this new implementation is available here.

Now let's give place to the code, and just compare NEW versus OLD code :

NEW : Akka-stream actors primes computation

Check this to see how to use it, for example directly from the scala console (sbt console).

OLD : Akka actors primes computation

From performance point of view

Some "naive" raw performance results ("sbt test" to try yourself) :

Wednesday, October 22, 2014

scala & drools : project skelon updated

Scala & drools project skeleton updated with latest release of Drools (6.1), more features added (logging, test cases wired, drl errors catched up), and rules clean up.
Some advices coming from experiences with big scala drools projects (hundreds of rules) :
  • Scala case classes are perfect for your rules
  • Use only java collections within the classes used by your rules. Avoid the scala collections in that precise case but rely on collection.JavaConversions._ implicits to hide that restriction.
  • In knowledge bases only use declarative classes (declare) for internal usage, such as intermediary reasoning state.
  • To change the java compiler used by drools, use the following system property : "drools.dialect.java.compiler=JANINO". It will replace ECJ by Apache Janino, but you will have to provide the dependency yourself : "org.codehaus.janino" % "janino" % "2.5.16" (Use only this release, at least up to Drools 6.1)

Sunday, July 13, 2014

Using Stream to simplify execution alternatives...

First thing to remember before using Stream to simplify execution alternatives : Stream(...) will evaluate all its arguments before the stream is created, but if you use the #:: notation, only the first one will be evaluated at stream creation.
Now using this to simplify conditions : So everything is working fine, and "oups" was not printed on the console.
But one of the best usage I found of this pattern, is when we want to get some information that may be available from several sources : This time we have a piece of code that is very easy to read !
Try to do the same job using only traditional "if-then-else" blocks, or pattern matching.

Traditional approach examples : Using if block suffer from a bug, the first fromX function returning a non empty result will be evaluated twice (we should have use a temporary value to keep the result of from evaluation). The pattern matching variant is very difficult to read.

Sunday, April 27, 2014

java.library.path change at runtime, using scala

Based on this article "Changing Java Library Path at Runtime" on fahd.blog, here is the scala release : And a small function to take a resource and save it to the chosen destination (based on Scala IO) :

Sunday, March 2, 2014

Akka actors based, generic primes calculation parallel algorithm

Actors programming is an interesting solution to take benefits of all your server CPU core to compute primes number while still being able to compute their order positions.

Here the principle is to create an actor router, which will load balance primes computation, CheckerActors. The load balancing is based on the size of each load balanced actors, a new task will be given to the actor with the lowest mailbox size.

All checked values are then sent to a single actor which will reorder received values and attribute them their respective positions. This actor, the ValuesManagerActor, is also responsible of managing the load, and check that final destination of all results can manage the result flow.

Here all results are sent to PrinterActor, this actor output results on the standard ouput, if you try here to print all results this actor may not as fast as the received result flow, so its mailbox will grow, up to the jvm maximum heap size thus generating OutOfMemory exception. That's why here I've added an acknowledge mechanism that verify we do not have too many results sent and not taken into account. If we have too many results not taken into account then ValuesManagerActor suspend the load balancing to CheckerActors.

My primes project at github for experiments and tests.

How long time to get the 500000th prime (sbt console) (first results, jvm 1.7.0.51, Intel Core i7 2.3 Ghz, 4 CPUs) :

Algo Duration Comments
Classic61s pgen.primes.take(500000).last
Parallel22s pgen.primesPar.take(500000).last
Actor (ack on each response)38s with fork-join-executor
Actor (ack every 5000 responses)35s with fork-join-executor
Actor (ack on each response)33s with thread-pool-executor
Actor (ack every 5000 responses)30s with thread-pool-executor

Actors based algorithm is quite faster than a classic sequential one, but not as fast as using a parallel collection in order to parallelize the classic algorithm.

Although my implementation is not as fast as the parallel solution (I'll have to investigate that point), it has a major benefits, it can be distributed across several computers. It'll be the subject of a new article.