Sunday, November 13, 2011

Get the elements which are in duplicates in a collection

groupBy method is a powerfull feature that can be used for many kind of operations, it is a "group by" SQL like operation. Let's use it to find elements which are in duplicates in a given collection :
val alist = List(1,2,3,4,3,2,5,7,5)
val duplicatesItem = alist groupBy {x=>x} filter {case (_,lst) => lst.size > 1 } keys
// The result is : Set(5, 2, 3)

val dogs=List(Dog("milou"), Dog("zorglub"), Dog("milou"), Dog("gaillac"))
val duplicatesDog = dogs groupBy {_.name} filter {case (_,lst) => lst.size > 1 } keys
// The result is : Set(milou)

Now if we want to define a function to do the job for us :
def findDuplicates[A,B](list:List[B])(crit:(B)=>A):Iterable[A] = {
     list.groupBy(crit) filter {case (_,l) => l.size > 1 } keys 
}
Let's use this function :
scala> findDuplicates(alist) {x=>x}
res21: Iterable[Int] = Set(5, 2, 3)

scala> findDuplicates(dogs) {dog=>dog.name}
res22: Iterable[String] = Set(milou)

2 comments:

  1. Two years later, but I found this to be handy today. Thanks!

    ReplyDelete
  2. It was helpful for me either. Thanks!

    ReplyDelete