Sunday, September 25, 2011

Timed numeric series library for Scala

The first public release of my open-source timed-series library, janalyse-series, is available. It comes with a CSV parser which guesses used CSV format.

A use case : Google stock quote operations.
#!/bin/sh
export CP=../target/scala-2.9.1.final/janalyse-series_2.9.1-0.5.1.jar
exec scala -cp $CP "$0" "$@"
!#

import fr.janalyse.series._

val allSeries = CSV2Series.fromURL("http://ichart.finance.yahoo.com/table.csv?s=GOOG")
val closeSeries = allSeries("Close")

println("GOOGLE stock summary")
println("Higher : "+closeSeries.max)
println("Lowest : "+closeSeries.min)
println("Week Trend : "+closeSeries.stat.linearApproximation.slope*1000*3600*24*7)
println("Latest : "+closeSeries.last)


Notice that the startup script example has an hardcoded janalyse-series library dependency, update if necessary.
$ ./stock.scala 
GOOGLE stock summary
Higher : (07-11-06 00:00:00 -> 741,79)
Lowest : (04-09-03 00:00:00 -> 100,01)
Week Trend : 0.9271503465158119
Latest : (11-03-25 00:00:00 -> 579,74)

Sunday, September 18, 2011

Shell like code in Scala...

When you starts a Java Virtual Machine, the current directory is set to the directory in which the JVM was started, and it can't be modified any more. If you want to write some shell-like commands in scala, you'll have to find a way to write you're own change directory command (cd), but as the current directory can't be modified, it is not so simple.

One approach is to use an implicit mutable variable (which will contain the current directory) and change some implicit definitions which came with scala.sys.process.
#!/bin/sh
exec scala "$0" "$@"
!#
import sys.process.Process
import sys.process.ProcessBuilder._

case class CurDir(cwd:java.io.File)
implicit def stringToCurDir(d:String) = CurDir(new java.io.File(d))
implicit def stringToProcess(cmd: String)(implicit curDir:CurDir) = Process(cmd, curDir.cwd)

implicit var cwd:CurDir="/tmp"
def cd(dir:String=util.Properties.userDir) = cwd=dir

// ----------------------
"pwd"!

cd("/var/tmp/")
"pwd"!

cd("/var/log/")
"pwd"!

cd()
"pwd"!



Scala map implicit conversion into Java Properties

As it looks that there's no implicit conversion of a Scala Map (of Strings) into a Java Properties even in the latest Scala release, let's define one.
  implicit def map2Properties(map:Map[String,String]):java.util.Properties = {
    val props = new java.util.Properties()
    map foreach { case (key,value) => props.put(key, value)}
    props
  }


Or even in a one line fashion, using folding :
  implicit def map2Properties(map:Map[String,String]):java.util.Properties = {
    (new java.util.Properties /: map) {case (props, (k,v)) => props.put(k,v); props}
  }


CONTEXT : Linux Gentoo / Scala 2.9.1 / Java 1.6.0_26

Sunday, September 11, 2011

Using O/R Broker for JDBC SQL operations from Scala

O/R Broker is an interesting tool which aim to simplify relational database operations from scala language. Many interesting features such as no need of external configuration files (the mapping is done in scala), sql requests can be stored in plain files which make sql requests tuning easier, no leak (connections, statements, resultsets,... are all "managed"), ...

Let's first install & prepare a mysql service on a gentoo linux system :
As root user : 
$ emerge mysql
     ==> To install the latest mysql
$ emerge --config =dev-db/mysql-5.1.56  
     ==> initialize / prepare the database !! Ask for a root password
     ==> choose 'mysql2011root' for example
     ==> You may have to change the release to the precise one that have been installed
$ eselect rc add mysql  default
     ==> For automatic startup as system starts 
$ /etc/init.d/mysql start
     ==> Manual start
$ mysql -u root -h localhost -p            (Will ask you the root password  = 'mysql2011root')
     SHOW DATABASES;
     CREATE DATABASE science;
     USE science;
     CREATE TABLE SCIENTIST (
             SCIENTIST_ID INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 
             FIRST_NAME VARCHAR(128) NOT NULL,
             LAST_NAME VARCHAR(128) NOT NULL,
              BIRTH_DATE DATE);
     INSERT INTO SCIENTIST (FIRST_NAME, LAST_NAME, BIRTH_DATE) VALUES ('Albert', 'Einstein', STR_TO_DATE('1879-03-18','%Y-%m-%d'));
     INSERT INTO SCIENTIST (FIRST_NAME, LAST_NAME, BIRTH_DATE) VALUES ('Thomas', 'Edison', STR_TO_DATE('1847-02-11','%Y-%m-%d'));
     INSERT INTO SCIENTIST (FIRST_NAME, LAST_NAME, BIRTH_DATE) VALUES ('Isaac', 'Newton', STR_TO_DATE('1643-01-16','%Y-%m-%d'));
     INSERT INTO SCIENTIST (FIRST_NAME, LAST_NAME, BIRTH_DATE) VALUES ('Samuel', 'Morse', STR_TO_DATE('1781-04-27','%Y-%m-%d'));
     INSERT INTO SCIENTIST (FIRST_NAME, LAST_NAME, BIRTH_DATE) VALUES ('Kurt', 'Gödel', STR_TO_DATE('1906-04-28','%Y-%m-%d'));
     select * from SCIENTIST;
     GRANT ALL ON SCIENTIST.* TO 'guest'@'localhost' IDENTIFIED BY 'guest2011';
$ mysql -u guest -h localhost -D science -p          (Password = guest2011)
     select * from SCIENTIST;

Let's create the SBT build configuration file (Build.scala) with all required dependencies :
import sbt._
import Keys._
object SQLSandboxBuild extends Build { 
  val ssbsettings = Defaults.defaultSettings  ++ Seq(
    name         := "sqlSandbox",
    version      := "1.0",
    scalaVersion := "2.9.1",
    libraryDependencies ++= Seq (
       "org.orbroker"   % "orbroker"             % "3.1.1"    % "compile",
       "mysql"          % "mysql-connector-java" % "5.1.6"    % "compile",
       "org.scalatest"  %% "scalatest"           % "1.6.1"    % "test",
       "c3p0"           % "c3p0"                 % "0.9.1.2"  % "compile"
    ),     
    resolvers += "GEOTools" at "http://download.osgeo.org/webdav/geotools/",
    resolvers += "JBoss Repository" at "http://repository.jboss.org/maven2",
    resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
  )
  lazy val ssbproject = Project("ssbproject",file("."), settings = ssbsettings)
}
Now the next step is to define the mapping for our Scientists database :
case class Scientist(firstName:String, lastName:String, birthDate:Option[Date]=None, id:Option[Int]=None)

object ScientistExtractor extends RowExtractor[Scientist] {
  val key = Set("SCIENTIST_ID")  
  def extract(row: Row) = {
    val id = row.integer("SCIENTIST_ID")
    val List(firstName,lastName) = List("FIRST_NAME", "LAST_NAME") map {row.string(_).get}
    val birthDate = row.date("BIRTH_DATE")
    Scientist(firstName, lastName, birthDate, id)
  }
}

object Tokens extends TokenSet(true) {
  val scientistSelectAll   = Token('scientistSelectAll, ScientistExtractor)
  val scientistSelectById  = Token('scientistSelectById, ScientistExtractor)
  val scientistInsert      = Token[Int]('scientistInsert)
  val scientistDelete      = Token('scientistDelete)
  val scientistUpdate      = Token('scientistUpdate)
}

So we defined a Scientist case class, and its extractor which tell how to build a Scientist instance from a row of "SCIENTIST" table data.

We store our sql request inside plain text files :
 $ find src/main/resources/sql/ -name "*.sql"
src/main/resources/sql/scientistSelectAll.sql
src/main/resources/sql/scientistDelete.sql
src/main/resources/sql/scientistSelectById.sql
src/main/resources/sql/scientistUpdate.sql
src/main/resources/sql/scientistInsert.sql

for example, scientistSelectById looks like :
SELECT *
FROM SCIENTIST
where SCIENTIST_ID = :id

In order to be able to execute sql requests, you first need to create a "broker" :
  val broker =  {
    val url = "jdbc:mysql://localhost/science"
    val username = "guest"
    val password = "guest2011"  
    val ds = getPooledDataSource(url,username,password) //new SimpleDataSource(url)
    val builder = new BrokerBuilder(ds)
    val res:Map[Symbol,String] = Tokens.idSet map {sym => sym -> "/sql/%s.sql".format(sym.name)} toMap;
    ClasspathRegistrant(res).register(builder)
    builder.verify(Tokens.idSet)
    builder.setUser(username, password)
    builder.build
  }

Get a scientist :
    val scientist = broker.readOnly() {
      _.selectOne(scientistSelectById, "id"->1)
    }
    scientist should not equal(None)

Get all scientists :
    val scientists = broker.readOnly() {
      _.selectAll(scientistSelectAll)
    }

Browse scientists :
    broker.readOnly() {
      _.select(scientistSelectAll) { scientist =>
        println(scientist)
        true
      }

And now the classical CRUD operations :
    // Create
    val newscientist = broker.transaction() { txn =>
      val scientist = Scientist("Benoist","Mandelbrot")
      val newid = txn.executeForKey(scientistInsert, "scientist"->scientist)
      scientist.copy(id = newid)
    }
    // Update
    val updatedScientist = newscientist.copy(birthDate = "1924-11-20")
    broker.transaction() { txn =>
      txn.execute(scientistUpdate, "scientist"->updatedScientist)
    }
    // Read
    val foundScientist = broker.readOnly() {
      _.selectOne(scientistSelectById, "id"->updatedScientist.id).get
    }
    foundScientist should equal(updatedScientist)
    // Delete
    broker.transaction() { txn =>
      txn.execute(scientistDelete, "id" -> updatedScientist.id.get)
    }
Notice here that once the new scientist has been created in the database we got back the automatically created numeric ID, so we return a new instance of scientist which contains the id chosen by the database, the original instance has not been modified, we want to keep all the benefits of immutability !

And now let's generate some load (40*10000 sql requests executed in parallel) on the database, and check the c3p0 database connection pool using an external jconsole :
    import actors.Actor._
    val howmany=40
    val creator = self
    for (n <- 1 to howmany) {
      actor {
        try {
          for(i<-1 to 10000) {
            val scientists = broker.readOnly() {
               _.selectAll(scientistSelectAll)
            }
          }
        } finally {
          creator ! "finished"
        }
      }
    }    
    for (i<- 1 to howmany) receive {
      case _ =>
    }
  }

The full SBT project is available here : sqlsandbox.tar.gz.

CONTEXT : Linux Gentoo / Scala 2.9.1 / Java 1.6.0_26 / SBT 0.10 / ScalaTest 1.6.1 / ORBroker 3.1.1

Sunday, September 4, 2011

Method receiver implicit conversions - data units use case

Scala implicit conversion is a great concept, basically you can use this feature to simplify object instanciation, I mean avoiding java constructor profusion by just providing in scala one construtor, and a set of implicit conversions (and of course default values). What's great with implicits is that they also apply on the receiver, let's see an example.

We want to be able to give durations or bytes size in a natural way, let's start a scala console through the sbt console, and try out my "unittools" library (download link : unittools.tar.gz) :
dcr@lanfeust ~/dev/unittools $ sbt
[info] Set current project to default-0b115b (in build file:/home/dcr/dev/unittools/)
> console
[info] Starting scala interpreter...
[info] 
Welcome to Scala version 2.9.1.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_26).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import unittools.UnitTools._
import unittools.UnitTools._

scala> "10h50m30s".toDuration
res0: Long = 39030000

scala> "1gb100kb".toSize
res1: Long = 1073844224

scala> 15444002.toDurationDesc
res1: String = 4h17m24s2ms

scala> 68675454743L.toSizeDesc
res2: String = 63gb982mb17kb791b

That's quite better to write "10h50m30s" inside your configuration files instead of "39030000", or "4h17m24s2ms" instead of "15444002" if you have to report a response time !

Of course strings don't have toDuration and toSize methods, in fact the "import unittools.UnitTools._" makes new implicit conversion definitions available to the compiler, those conversions make possible to convert a String instance to an internal object instance for which toDuration or toSize is available.

the "plumbing" is the following :
package unittools

class DurationHelper(value:Long, desc:String) {
  def toDuration()=value
  def toDurationDesc()=desc
}

class SizeHelper(value:Long, desc:String) {
  def toSize()=value
  def toSizeDesc()=desc
}

object UnitTools {
  // Implicit conversions
  implicit def string2DurationHelper(desc:String) = new DurationHelper(desc2Duration(desc), desc)
  implicit def long2DurationHelper(value:Long) = new DurationHelper(value, duration2Desc(value))

  implicit def string2SizeHelper(desc:String) = new SizeHelper(desc2Size(desc), desc)
  implicit def long2SizeHelper(value:Long) = new SizeHelper(value, size2Desc(value))

  //...
}

it's worth noting that the scala unittool internal algorithm contains less code than the java release I first made; 25 lines for scala, 40 lines for java.

CONTEXT : Linux Gentoo / Scala 2.9.1 / Java 1.6.0_26 / SBT 0.10 / ScalaTest 1.6.1