void techBlog() { //... }

My attemps to arrange 0 and 1 and make something vaguely useful

File parsing in Scala with Pattern Matching and Tail-Call optimization

In the process of learning Scala, I had to find a subject for my very first "Non Hello World" program in Scala. It was a good opportunity to code a script I wanted to write a long time ago in order to automatize the sync of the podcasts I download for my Mp3 player (I have very specific use cases that are not addressed by the usual podcast downloader software).

As my player is an MTP player,the script launches some mtp-tools command line utilities of libmtp to drive the player.

In particular the mtp-tracks command lists some information for the tracks stored into the player. The output of the command looks like :

Attempting to connect device(s)

mtp-tracks: Successfully connected

Friendly name: My Zen

Track ID: 40337

Title: Java Posse #273 - Roundup 09 - Managing Technical Debt

Artist: Tor Norbye, Carl Quinn, Joe Nuxoll, Dick Wall

Genre: Podcast

Album: The Java Posse

Origfilename: Tor Norbye, Carl Quinn, Joe Nuxoll, Dick Wall-Java Posse #273 - Roundup 09 - Managing Technical Debt.mp3

Track number: 0

Duration: 3809000 milliseconds

File size 45766104 bytes

Filetype: ISO MPEG-1 Audio Layer 3

Use count: 9 times

Track ID: 41737

Title: Java Posse #272 - Newscast for July 30th 2009

Artist: Tor Norbye, Carl Quinn, Joe Nuxoll, Dick Wall

Genre: Podcast


So basically, in order to parse the file I must skip the header (there’s a footer too) and find each track information. It looks like a job for Pattern Matching on list with a mix of RegExp matching. And as he goal is also to learn Scala (And improve my Functional Programming skills) I wanted to code the parsing without any mutable object (yes, I’m trying to follow the rules described in Programming in Scala)

So here is the part of the code that parses the mtp-tracks output :

//let's define some RegExp
Expval TrackId = """Track ID: ([0-9]+)""".r
val Title = """ *Title: (.+)""".r
val Album = """ *Album: (.+)""".r
val Filename = """ *Origfilename: (.+)""".r

//The recursive function
def collect(lines: List[String],tracks: List[MtpTrack]): List[MtpTrack] = {
  lines match{
   case TrackId(id) :: Title(title) :: _ :: _:: Album(album) :: Filename(filename) :: rest =>
      collect(rest,new MtpTrack(id,album,title,filename) :: tracks)
   case _ :: rest => collect(rest,tracks)
   case Nil => tracks

The list of lines of the mtp-tracks command output are stored in the lines list. This recursive function can then be called with collect(lines,Nil), it returns a list of parsed MtpTrack object.

To explain the three pattern matching cases of the collect function, I must first explain how pattern matching works for regexp.

In this code, four matchers are defined for four regexp : TrackId, Title, Album and Filename. For instance, if one line matches on the expression Track(id) the first group of the regexp is captures in the id variable.

So, now, the description of the three cases after lines match

caseTrackId(id) :: Title(title) :: _ :: _:: Album(album) :: Filename(filename) :: rest

this one says "match on list if list first element matches the regexp TrackId, then regexp Title, then whatever twice, then regexp Album, then regexp Filename. And also, if it matches extract the rest of the list in the rest variable. Also, capture all regexp groups in the declared variables id,title,album and filename". Rather powerfull.

Then, with the matched information we create an MtpTrack object that is added to tracks, the already collected list of MtpTrack (actually a new list is created as a List in Scala is immutable).

case _ :: rest

this one is tried when the previous case does not match. It says "if list starts with any element, extract the rest of the list after this element in a rest variable". rest may also by empty.case Nil

this one is the stop clause for the recursion, it matches when the list is empty.

Thanks to the power of pattern matching composition (RexgExp in list), header and footer are automatically ignored. And as the function ends with a direct recursive call, the Scala compiler has compiled the code with tail-call optimisation, so not stack overflow, even with very large command output.

I must say that I was amazed by how much can be expressed in Scala with so little lines of code. And as I’m a beginner in Scala, there are surely even smaller and idiomatic ways to code the same functionnality.

comments powered by Disqus