Better CLI option parsing in Scala
Command line interfaces have been with us almost as long as computers themselves - first cli's were created in early 1960s, only two decades later than first general purpose computer was presented (in 1946). From that old times command line and command line interfaces became trusted and loyal friends of the programmer.
Fast forward to modern times. Now, half a century later, graphical user interfaces seem to be triumphing over all other interfaces. GUI evangelists love to present pretty charts, which demonstrate that CLI's are dying. But is CLI really dead or planning to die?
No way, of course. The whole UNIX philosopy is built around small, orthogonal utilities, each doing its small piece of work perfectly (think wc, cat, find, bash pipes). And, obviously, this effect is almost impossible to achieve using pretty graphical systems with lots of buttons, checkboxes, sliders, knobs and what-not.
The "almost" part of that statement is what keeps the momentum in GUI development - Gnome, Ubuntu, KDE, Apple all search for the elusive solution, which could unite the world of interfaces once and for all.
The biggest problem with their effort is the heterogeneosity of the user base.
At one end of the scale, we have people, that see the computer first time in their life - so the interface must be immediately obvious to them. This is the main focus of the Apple development, for example - they build stunning, intuitive, easy-to-use interfaces, which are a pleasure to use. But focus on the users, which "see the computer the first time in their life" may be good for marketing, but not for real computer use - just try to do any real-word system scripting task via gui. Ok, you can even forget scripting - just copy a file from one dir to another, and you are already losing your precious time if you are doing it with gui.
Which brings us to the other end of the scale - hardcore programmers. The guys, they use nothing but the command line, and they are extremly productive using it (I can't back up this claim with any hard data, only my observations - can you do better?). Again, this approach has problems - it's hard to learn and requires some specialized knowledge. So, command line will probably remain the chosen weapon for "elite" users, and the mainstream would sit in the gui land.
But since most of programmers sit closer to the "hardcore" part of the distribution, products for them surely should expose some command-line interface. And exposing that interface must be as convenient as possible for the developer - ideally, the perfect CLI framework should be baked into the language or easily available.
So, here begins my story :)
Some time ago, I needed to add a CLI to one of my simple programs. I immediately turned to the often-mentioned Apache Commons CLI library. Since Apache community is usually producing something awesome, I expected that library would quickly relieve me of my problems, and I will be able to move to another tasks.
I was hugely disappointed.
First, the default option choice is crippled - to get anything meaningful, you'll need to use OptionBuilder. And that option builder looks like it was built for an example of how you should not architect the builder - and then developer suddenly made a commit into a master branch (by mistake, of course). I have many questions about the whole concept of mutable builders, but creating a mutable static builder?! And the fact that to construct the simple option (say, option that takes a single string argument) you need to use at least 5 lines of code is a bit overwhelming.
My next stop was scopt, popular utility to parse options in Scala. It looks good, but is unable to parse options, which take a list of arguments (i.e. -a 1 2 3). And you have no way to extend it to get those lists (except forking the lib).
So I set off on a journey to create "The Better Option Parser". I was inspired by option definition syntax in ruby's Trollop library and by way to extract options in configrity.
I had quite a success in this journey - I released the library recently, called Scallop.
It features:
- POSIX-style option parsing - capable of parsing short and long options
- property arguments - most famous from Ant (-Dkey=value key2=value2)
- Extracts flags, single-argument and multiple-argument options
- Default and required options
- Careful and powerful parsing of trailing arguments
- Completely immutable option builder - you can reuse it, delegate option definitions to submodules, etc.
On top of that, Scallop is easily extendable with new argument types.
Enough talking! Let me show you some code:
import org.rogach.scallop._;
val opts = Scallop(List("-d","--num-limbs","1"))
.version("test 1.2.3 (c) 2012 Mr S") // --version option is provided for you
// in "verify" stage it would print this message and exit
.banner("""Usage: test [OPTION]... [pet-name]
|test is an awesome program, which does something funny
|Options:
|""".stripMargin) // --help is also provided
// will also exit after printing version, banner, and options usage
.opt[Boolean]("donkey", descr = "use donkey mode") // simple flag option
.opt("monkeys", default = Some(2), short = 'm') // you can add the default option
// the type will be inferred
.opt[Int]("num-limbs", 'k',
"number of libms", required = true) // you can override the default short-option character
.opt[List[Double]]("params") // default converters are provided for all primitives
//and for lists of primitives
.props('D',"some key-value pairs")
.args(List("-Dalpha=1","-D","betta=2","gamma=3", "Pigeon")) // you can add parameters a bit later
.trailArg[String]("pet name") // you can specify what do you want to get from the end of
// args list
.verify
opts.get[Boolean]("donkey") should equal (Some(true))
opts[Int]("monkeys") should equal (2)
opts[Int]("num-limbs") should equal (1)
opts.prop('D',"alpha") should equal (Some("1"))
opts.prop('E',"gamma") should equal (None)
opts[String]("pet name") should equal ("Pigeon")
intercept[WrongTypeRequest] {
opts[Double]("monkeys") // this will throw an exception at runtime
// because the wrong type is requested
}
println(opts.help) // returns options description
println(opts.summary) // returns summary of parser status (with current arg values)
If you will run this option setup with "--help" option, you would see:
test 1.2.3 (c) 2012 Mr Placeholder
Usage: test [OPTION]...
test is an awesome program, which does something funny
Options:
-Dkey=value [key=value]...
some key-value pairs
-d, --donkey
use donkey mode
-m, --monkeys
-k, --num-limbs
number of libms
-p, --params ...
Scallop has extensive support for trailing arguments parsing, which can be used for simple things:
val opts = Scallop(List("first","second"))
.trailArg[String]("required file")
.trailArg[String]("optional file", required = false)
.verify
opts[String]("required file") should equal ("first")
opts.get[String]("optional file") should equal (Some("second"))
...and for complex things. For example, scallop's parser is clever enough to handle the following case correctly:
val opts = Scallop(List("-Ekey1=value1", "key2=value2", "key3=value3",
"first", "1","2","3","second","4","5","6"))
.props('E')
.trailArg[String]("first list name")
.trailArg[List[Int]]("first list values")
.trailArg[String]("second list name")
.trailArg[List[Double]]("second list values")
.verify
opts.propMap('E') should equal ((1 to 3).map(i => ("key"+i,"value"+i)).toMap)
opts[String]("first list name") should equal ("first")
opts[String]("second list name") should equal ("second")
opts[List[Int]]("first list values") should equal (List(1,2,3))
opts[List[Double]]("second list values") should equal (List[Double](4,5,6))
And last but not the least, you can easily extend it for providing your own arguments types support:
case class Person(name:String, phone:String)
val personConverter = new ValueConverter[Person] {
val nameRgx = """([A-Za-z]*)""".r
val phoneRgx = """([0-9\-]*)""".r
// parse is a method, that takes a list of arguments to all option invocations:
// for example, "-a 1 2 -a 3 4 5" would produce List(List(1,2),List(3,4,5)).
// parse returns Left, if there was an error while parsing
// if no option was found, it returns Right(None)
// and if option was found, it returns Right(...)
def parse(s:List[List[String]]):Either[Unit,Option[Person]] =
s match {
case ((nameRgx(name) :: phoneRgx(phone) :: Nil) :: Nil) =>
Right(Some(Person(name,phone))) // successfully found our person
case Nil => Right(None) // no person found
case _ => Left(Unit) // error when parsing
}
val manifest = implicitly[Manifest[Person]] // some magic to make typing work
val argType = org.rogach.scallop.ArgType.LIST
}
val opts = Scallop(List("--person", "Pete", "123-45"))
.opt[Person]("person")(personConverter)
.verify
opts[Person]("person") should equal (Person("Pete", "123-45"))
The code is hosted on GitHub - suggestions, bug reports, and pull requests are all welcome!
This comment has been removed by the author.
ReplyDeleteI'm using this in production at Klout already, it even parses some big data arguments from Hadoop as a Scoobi jar! --Alexy
ReplyDeleteif you wanted to leverage something better than ApacheCLI on the jvm - args4j or jcommander
ReplyDeletebrian:
ReplyDeleteI was fully aware of those projects at that moment. But I felt that they lacked Scala-specific features and proper type safety. Thus I decided to create the "better" option parser, and I feel that now (long after this post :) Scallop eclipses both args4j and jcommander in terms of features and code conciseness. You can read more in documentation to Scallop - https://github.com/Rogach/scallop/wiki.
Hey Rogach,
ReplyDeleteScallop is awesome. For the most part it allows a very clean specification of the commandline interface and allows me to declare configuration variables in exactly one place. Another favorite features of mine is the summary method which I just discovered this morning. I have three feature requests / suggestions if you are up for that:
1) We would really like to avoid having to dereference Options with parens whenever we read data from the Config object (i.e. we want to be able to write config.numIterations rather than config.numIterations()). The parens significantly reduce readability as they suggest to someone reading the code that the some method is being called and that the method may have side effects (you may disagree here). Anyway, to avoid the parens, we've resorted to something like the following:
val _numIterations = opt[Int]("num-iterations", default = Some(Int.MaxValue)); lazy val numIterations = _numIterations()
Which has the effect that we want, but looks uglier than we'd like. It would be great if those semantics could be matched (however, access to the _numIterations variable isn't necessary) without the ugly syntax. Any ideas?
2) (config.summary almost entirely achieves this) I have appreciated CL toolkits that print default values in the automatic help
3) (this last one is not essential) often the variable name is redundant with the command line parameter name; it would be cool if there were a way to have a default for the commandline parameter name be a function of the variable name (maybe inferred somehow via reflection?)
Anyway, thanks again for great work on an awesome commandline parser for scala!
Thanks for your review! Sorry for the late reply, though - was away.
DeleteAbout your suggestions:
1) This was quite a big problem almost from the start. Most other CLI parsers work around that problem by using annotations + reflection to set option values, the approach I decided to avoid - since it closes a lot of interesting possibilities. I hope that macroses in 2.10 will help to completely banish the issue (the option definition will be just a simple macro call, that will generate all the needed boilerplate).
For now, I can suggest this implicit conversion as a workaround (very ugly, contributes to "implicits hell", but works sometimes):
implicit def openOption[A](option: ScallopOption[A]) = option.apply()
val opts = new ScallopConf(Seq("-a","2")) {
val apples = opt[Int]("apples")
}
opts.apples + 2 // 4
2) This managed to sneak under my radar :) Really good idea. I'm not sure about the right place to put default value, but it's definitely a worthy addition. Can you suggest some CL toolkit that does this - so I could look at their examples?
3) This one is hard. I can get all the methods that return ScallopOption, and somehow keep count of how many option initialisations were already done. But this approach would break on the following snippet:
object Conf extends ScallopConf(args) {
val apples = opt[Int]()
val applesPlus = apples.map(2+)
val bananas = opt[Int]()
}
Here, I'll find three methods that seem to be options, but there will be only two actual options. And for some reason I can't get the proper stack trace while I'm initialising the class.
On the bright sight - when the macroses will come, all those problems will vanish.
Again, thanks a lot for your suggestions!
Thanks for your response. (I did notice your mention of the macro version, but I didn't realize before that you needed to wait for the new version of scala :)
Delete> Can you suggest some CL toolkit that does this - so I could look at their examples?
Aaron Dunlop's cltool4j: http://code.google.com/p/cltool4j/
take a look at the Usage Information example (the verbosity option has a default value)
Thanks again!
--Adam
I managed to solve points 2 and 3 - see the latest release of Scallop (0.5.2).
DeleteFor extra safety, I decided to hide option name guessing behind a flag - see this test for example: https://github.com/Rogach/scallop/blob/master/src/test/scala/OptionNameGuessing.scala#L7
Hi rogach, scallop looks nice, congrats!
ReplyDeleteHow do you compare it to argot (http://software.clapper.org/argot/)?
Thanks!
DeleteWell, I feel that argot lacks some features that I found quite useful in my projects - for example, multi-value options (that accept not only one option, but a list - argot's "multi-value option" does not). And some more advanced features are also missing.
Yet it still has it's benefits (I like the usage formatting), so using Scallop or Argot is, as usual, a matter of taste :)
I can't see in the article how can I be DRY about option names & types with scallop, when trying to mimic argot, particularly this idiom:
Deleteval argotParser = ...
val someOption = argotParser.option[Int]("some-name", ...)
...
argotParser.parse(...)
...
// from here can use
someOption.value
// as an Int corresponding to the argument parsed from cl
So I have static type checking on someOption and I don't need to repeat "some-name" when declaring and when getting the value, avoiding potential errors and hassle.
How I can accomplish this with scallop?
Well, this article documents quite old version of Scallop :) You can read the next one, http://rogach-scala.blogspot.com/2012/04/configuration-objects-in-scallop.html, which describes configuration objects:
Deleteobject Conf extends ScallopConf(args) {
val someOption = opt[Int]("some-name", ...)
}
...
someOption() // Int
// or
someOption.get // Option[Int]
You have static typing on option, you don't need to repeat "some-name", (avoiding errors :), and more - you can pass this config object into methods, as Conf.type.
Argot and Scallop have more things in common than differences, there are only some things that I feel are done better in Scallop.
Thanks for sharing the article.
ReplyDelete