Sunday, November 13, 2011

Get the elements which are in duplicates in a collection

groupBy method is a powerfull feature that can be used for many kind of operations, it is a "group by" SQL like operation. Let's use it to find elements which are in duplicates in a given collection :
val alist = List(1,2,3,4,3,2,5,7,5)
val duplicatesItem = alist groupBy {x=>x} filter {case (_,lst) => lst.size > 1 } keys
// The result is : Set(5, 2, 3)

val dogs=List(Dog("milou"), Dog("zorglub"), Dog("milou"), Dog("gaillac"))
val duplicatesDog = dogs groupBy {_.name} filter {case (_,lst) => lst.size > 1 } keys
// The result is : Set(milou)

Now if we want to define a function to do the job for us :
def findDuplicates[A,B](list:List[B])(crit:(B)=>A):Iterable[A] = {
     list.groupBy(crit) filter {case (_,l) => l.size > 1 } keys 
}
Let's use this function :
scala> findDuplicates(alist) {x=>x}
res21: Iterable[Int] = Set(5, 2, 3)

scala> findDuplicates(dogs) {dog=>dog.name}
res22: Iterable[String] = Set(milou)

Saturday, November 12, 2011

IO Read binary file with Stream continually helper method

A straightforward approach to read a binary stream in Scala. Using continually helper Stream method gives to the code a functional taste.
#!/bin/sh
exec scala -J-Xmx1g -J-Xms512m -savecompiled "$0" "$@"
!#

import java.io._
import util.Properties.{userHome}
import java.io.File.{separatorChar=> sep, pathSeparatorChar=>pathSep}

def readBinaryFile(input:InputStream):Array[Byte] = {
  val fos = new ByteArrayOutputStream(65535)
  val bis = new BufferedInputStream(input)
  val buf = new Array[Byte](1024)
  Stream.continually(bis.read(buf))
      .takeWhile(_ != -1)
      .foreach(fos.write(buf, 0, _))
  fos.toByteArray
}

// Read 191Mo file
val fname=userHome+sep+"LibO_3.4.4_Win_x86_install_multi.exe"
val fb = readBinaryFile(new FileInputStream(fname))
println("%s, %d bytes".format(fname, fb.size))
This is quite better than writing :
  var c=0
  do {
    c = bis.read(buf)
    if (c > 0) fos.write(buf,0,c)
  } while(c > -1)

Wednesday, November 9, 2011

md5sum in scala

How to compute the md5sum of a file (to get the same result as the one given by linux md5sum command)
Inspired from MD5Sum in Scala and modified to work with input streams.

 def md5sum(input:InputStream):String = {
    val bis = new BufferedInputStream(input)
    val buf = new Array[Byte](1024)
    val md5 = java.security.MessageDigest.getInstance("MD5")
    Stream.continually(bis.read(buf)).takeWhile(_ != -1).foreach(md5.update(buf, 0, _))
    md5.digest().map(0xFF & _).map { "%02x".format(_) }.foldLeft(""){_ + _}
  }

Some notes about how to simplify scala code (and monads)

This is just some notes...To be continued and completed ... Updated on 2011-11-17
Interesting articles (and set of comments) to understand and recognize monads :
- Monads are not metaphors
- Monads are elephants

Exploring the best way to get a value from a map, or a default value if the key doesn't exist :
val codes=Map("A"->10, "B"->5, "C"->20)

{ // Basic approach
  var code = 30
  if (codes contains "D") code = codes.get("D").get
}

{ // More compact approach
  val code = if (codes contains "D") codes.get("D").get else 30
}

{ // Match approach
  val code = codes.get("D") match {
    case Some(v)=>v
    case None=>30
  }
}

{ // Option monad approach
  val code = codes get "D" getOrElse 30
}

{ // Compact approach
  val code = codes.getOrElse("D",30)
}
With monads, many test cases can be avoided, the code looks like a data flow :
import collection.JavaConversions._
import java.io.File
val sp = java.lang.System.getProperties.toMap

val lookInto = sp.get("baseDir") orElse sp.get("rootDir") orElse sp.get("user.home")
val lookIntoDirFile = lookInto map {new File(_)} filter {_.exists}
val count = lookIntoDirFile map {_.list.size}

println(count map {_+" files"} getOrElse "Dir not found")

// will print "Dir not found" or for example "184 files"

An other example using Option Monad, the second fullName implementation becomes very simple:
// Using classical approach, a Java like approach
case class Person1(firstName:String, lastName:String) {
  def fullName() = {
    if (firstName!=null) {
      if (lastName!=null) {
        firstName+" "+lastName
      } else null
    } else null
  }
}

{
  val p1 = Person1("Einstein", "Albert")
  val p2 = Person1("Aristote", null)
  println("1-%s" format p1.fullName)
  val fullName = if (p2.fullName!=null) p2.fullName else "Unknown"
  println("1-%s" format fullName)
}

// Using Option monad approach  
case class Person2(firstName:Option[String], lastName:Option[String]) {
  def fullName() = firstName flatMap {fn => lastName map {ln => fn+" "+ln}}
}

{
  val p1 = Person2(Some("Einstein"), Some("Albert"))
  val p2 = Person2(Some("Aristote"), None)
  println("2-%s" format p1.fullName.getOrElse("Unknown"))
  println("2-%s" format p2.fullName.getOrElse("Unknown"))
}

// the Best and the cleanest, same usage as the previous one
case class Person3(firstName:Option[String], lastName:Option[String]) {
  def fullName() = for(fn<-firstName ; ln <- lastName) yield fn+" "+ln
}


To be continued and completed ...

Sunday, November 6, 2011

Automatic resource liberation

It is very easy to implements automatic resource liberation in Scala, thanks to parametric type which can be use without giving a specific class but rather any class which implements the given method signature !
In our case, we define a method which accepts all classes with a close method.
def using[T <: { def close()}, R] (resource: T) (block: T => R) = {
  try     block(resource)
  finally resource.close
}
An example usage :
import java.io._

using(new PrintStream("/tmp/dummy")) { o =>
  o.print("Hello world")
}