Sunday, November 24, 2013

Situations when you must not forget to use the scala blocking statement when working with futures...

Futures imply the use of an execution context, a custom one or the default one. An execution context is a mechanism that manage threads for you, map operations to be executed on them, they come with internal default limits which are most of time proportional to the number of CPU you have. This default behavior is the right one when dealing with CPU consuming operations, but with IO this is not the case. If you need to call 100 remote calls (1 seconds required for each of them) on an other system and you have only 2 cpu, your total response time will be 50 seconds, for an almost 0% cpu usage on your host...

That's why the blocking statement was introduced, it allows you to mark a code section as blocking, by blocking we mean time consuming and not cpu consuming. Thanks to this statement, the execution context will be able to grow as necessary, and you'll get very low latency.

On my 6 cores CPU, 50 virtual remote calls (2 seconds for each call) require only 2056ms with the blocking statement, and it requires of course 18 seconds without it (18s * 6cpu = 108s * 1cpu).

Take care with Futures created in scala for-comprehension...

Create Futures outside the for-comprehension, in order for them to start immediately their operations. If you create your futures inside the for-comprehension, your execution will be sequential and not concurrent ! This is a mistake very easy to make, not so easy to detect because it won't fail, you just won't be able to use all your CPU as you should when the processing involves CPU, or for other cases check for example how many network connections are established simultaneously...

Tuesday, October 1, 2013

Ulam spiral in scala

Just published a scala project on github, to play with primes and ulam spiral. To generate a 500x500 ulam spiral use :
$ sbt "run 500"
It will generate a PNG image file named "ulam-spiral-500.png"
To starts the console and play with primes :
$ sbt console"

scala> sexyPrimeStream(4)
res3: primes.Primes.PInteger = 17

scala> primeStream.take(15).toList
res3: List[primes.Primes.PInteger] = List(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47)

scala> sexyPrimeStream.take(5).toList
res5: List[primes.Primes.PInteger] = List(5, 7, 11, 13, 17)

scala> isolatedPrimeStream.take(10).toList
res2: List[primes.Primes.PInteger] = List(23, 37, 47, 53, 67, 79, 83, 89, 97, 113)

Saturday, September 28, 2013

Prime number scala stream based on an int stream

An integer stream filtered out as an prime number stream. So 21 seconds on my laptop to compute the first 100,001 prime numbers, the last one is 1 299 721, of course here only one processor is used...

Saturday, September 21, 2013

Scala collection chisel

First release and not yet good enough from my point of view but it works almost as it should :

Saturday, September 14, 2013

Wednesday, July 31, 2013

Generic collection compact functions : compact and compactBy

Truly generic collection compaction functions, with the second one you can even provide a custom merge. By compaction I mean merging consecutive "identical" values into a single one.

Monday, July 29, 2013

Serialization example using json4s & and a small github project for tests purposes

Once aware of json4s serialization limitations, it is quite simple to use :

{
  "receipts":[
    {
      "amount":15.0,
      "currency":{
        "jsonClass":"Euro"
      },
      "when":"2013-07-29T20:45:44.121Z",
      "where":{
        "jsonClass":"Address",
        "address":"XIII",
        "town":"Paris"
      },
      "what":"meal",
      "keywords":[
        "food",
        "4work"
      ]
    },
    {
      "amount":1.0,
      "currency":{
        "jsonClass":"Euro"
      },
      "when":"2013-07-29T20:45:44.121Z",
      "where":{
        "jsonClass":"Address",
        "address":"I",
        "town":"Paris"
      },
      "what":"bread",
      "keywords":[
        "food"
      ]
    }
  ],
  "description":"2013 archives"
}

Saturday, July 27, 2013

A scala script that counts the number of lines in source code files

The only dependencies of this script are :

  • "com.github.scala-incubator.io" %% "scala-io-core" % "0.4.2"
  • "com.github.scala-incubator.io" %% "scala-io-file" % "0.4.2"

scala-io provides really a great file operation abstraction layer for scala language.

On some of my projects, this script will print

$ time ./count-lines janalyse-*
----------------------------
All directories
global lines count : 9148 (in 101 files)
lines count for scala : 8875 (in 84 files)
lines count for sbt : 258 (in 14 files)
lines count for sh : 15 (in 3 files)
----------------------------
For directory : janalyse-jmx
global lines count : 1697 (in 21 files)
lines count for scala : 1607 (in 17 files)
lines count for sbt : 90 (in 4 files)
----------------------------
For directory : janalyse-ssh
global lines count : 3130 (in 34 files)
lines count for scala : 3064 (in 29 files)
lines count for sbt : 61 (in 4 files)
lines count for sh : 5 (in 1 files)
----------------------------
For directory : janalyse-snmp
global lines count : 342 (in 7 files)
lines count for scala : 308 (in 4 files)
lines count for sbt : 29 (in 2 files)
lines count for sh : 5 (in 1 files)
----------------------------
For directory : janalyse-series
global lines count : 3979 (in 39 files)
lines count for scala : 3896 (in 34 files)
lines count for sbt : 78 (in 4 files)
lines count for sh : 5 (in 1 files)

real 0m8.969s
user 0m14.058s
sys 0m0.819s

Tuesday, July 23, 2013

Something to know if you try to save heap memory during some processing...

I was used to use code block {} in order to separate logics, for example in report generator code while chaining report sections. But recently I noticed that my application was using a lot of heap memory while processing 6Gb of numerical series, and after some investigations I've found that all loaded series was kept in memory even once we exit the block. Thanks to google group and contributors.

The code example which fails with an OutOfMemoryError (using locally{} locally{} locally{} will give the same problem)

The simple work around :

Monday, July 22, 2013

Playing with generic numbers and collections through examples

Copy paste this code to a scala worksheet to view the results.

The final asquares method is fully generic both with collections and number types, you'll get this result :

  asquares(Seq(1,2,3))               //> res1: Seq[Int] = List(1, 4, 9)
  asquares(List(1,2,3))              //> res2: List[Int] = List(1, 4, 9)
  asquares(Vector(BigDecimal(1),BigDecimal(2),BigDecimal(3)))
                                     //> res3: Vector[BigDecimal] = Vector(1, 4, 9)
  asquares(Array(1,2,3))             //> res4: Array[Int] = Array(1.0, 4.0, 9.0)
  asquares(Set(2,4))                 //> res5: Set[Int] = Set(4, 16)

The right types are always returned !

Small hack to check if a remote HTTP server is accessible or not. Relying on just DNS resolution test is not enough, it is mandatory to try to get the content.

Thursday, May 9, 2013

Find the index of the first element superior or equal to a value

A small algorithm to find the index of the first element superior or equal to the given value. The given sequence must be ordered. This tail recursive algorithm is inspired by the binary search algorithm.

Monday, April 22, 2013

Play : Get ride of Thread.sleep to simulate remote processing

With play/akka it is very easy to simulate a remote processing requiring several seconds to be processed and without consuming system resources such as threads. Just create a Promise, ask akka scheduler to execute a small piece of code that will fulfill the promise after the given delay, and then finish with Async instruction.

Using this approach I was able to apply a C25K load on a play server using gatling2 stress tool. With a linux 3.7.x kernel and some tuning (nproc at least), we can even reach quite higher value.

Sunday, March 17, 2013

list, view and stream - An example to understand the differences

Consider the following simple example :

To understand why we have different values for final lists when using list, views and stream, we have to understand when map evaluation occurs.

The result we got for m1 is of course always List(1, 2, 3) because m1 is entirely evaluated when it is created (c1) so further changes to x variable have no effect.

When you create a view (c2) you didn't evaluate anything, the map operation does nothing, v1 only contains the way how you can get the result when you'll ask for it. It is only when you'll for example materialize a List from a view that the content will be evaluated, that's why each x variable change has a direct effect on the results we got. List(2,3,4) when x=2, and List(3,4,5) when x=3.

Using Stream is a little more different, you must think about Stream as a list that is not yet entirely built. In fact, when it is created (c3), only the first element is evaluated, and once an element has been evaluated, it won't be evaluated again. So in c3 the content is Stream(1, ...) (the "tail" in not yet known) only the first one have been evaluated, and when we apply the toList method (c6) while x=2 we force evaluation of the entire stream, that's why x=3 in c9 has no effect, because the stream have already been entirely evaluated in c6.

Sunday, February 10, 2013

1000 distinct remote ssh executions in 35s (1s required on each host)

1000 distinct SSH command execution using JASSH SSH API executed in less than 35 seconds, the executed remote shell code is "sleep 1 ; echo 'Hello from hostid'", so at least one second is spent on each host.

Of course, I'm using the same localhost destination host for this small SSH/JASSH performance test. So it may require some ssh daemon tuning, check /etc/ssh/sshd_config file, parameters MaxSessions and MaxStartups ; those parameters values must be set to a value superior to the chosen parallelism level.

JASSH is JSCH based, this library is doing classic blocking IO, so there are as many threads as we have simultaneous SSH connections established ; as many threads as the chosen parallelism level. High parallelism value may be counterproductive, I will need a NIO SSH implementation for better performances.

On my 6 core host, I got the following results with various parallelism level :
Parallelismduration Best theorical duration
1 1020s 1000s
10 116s 100s
25 48s 40s
50 35s 20s
75 37s 13s
100 41s 10s

So best value for my server is around 50 for this kind of processing.

The scala script requires you to use java 7 or better (ForkJoinPool is used). The concurrency model used is based on futures, check SIP-14 - Futures and Promises for more information.

Tuesday, February 5, 2013

Custom scala set example

which gives :

scala> val a=NamedCustomSet("toto", 1, 2, 3,4)
a: dummy.NamedCustomSet[Int] = NamedCustomSet(toto,1,2,3,4)

scala> val b=NamedCustomSet("tata")
b: dummy.NamedCustomSet[Nothing] = NamedCustomSet(tata)

scala> val c=NamedCustomSet("tutu",10)
c: dummy.NamedCustomSet[Int] = NamedCustomSet(tutu,10)

scala> a++c
res7: dummy.NamedCustomSet[Int] = NamedCustomSet(toto,10,1,2,3,4)

scala> a.map(_ + 1)
res8: dummy.NamedCustomSet[Int] = NamedCustomSet(toto,2,3,4,5)

scala> a++List(50,100)
res9: dummy.NamedCustomSet[Int] = NamedCustomSet(toto,1,2,3,50,4,100)

scala> a.map(_.toString)
res10: dummy.NamedCustomSet[String] = NamedCustomSet(toto,1,2,3,4)

scala> a.reduce(_ + _)
res11: Int = 10

scala> a.max
res12: Int = 4

Saturday, January 26, 2013

A scala script that ends before it begins :)

This script ends before it begins thanks to scala 2.10 futures ! This can be quite perturbing if you're new to this programming concept. See futures and promises for more information.

This script contains two general functions, a future select (unfortunately no yet available in scala future api, special thanks to Victor Klang for its help) and an other one asapFuturesProcess that automatically starts a kind of trigger/callback as soon as any new result becomes available.

Simple custom scala collection examples...

Some examples to illustrate how to create custom collection in scala. Those examples inherit all theirs features from scala collections, and of course theirs types are preserved when involved in any kind of operations (filtering, mapping, ...).

github project page, with test cases : custom collection examples project