Scala with bigger tuples
Scala enforces upper limit on number of elements tuple can have. And if you try to go over that limit, you are greeted with friendly compiler message: "Implementation restriction...". At the first sight, that seems quite logical - after all, if in your code you need to write "tup._69", you are surely in a lot of trouble.
But it is not so simple! The same restriction exists for number of function arguments, and, more importantly, number of elements in case classes.
It just so happens that over time, people created amazing libraries that allow us to do truly spectacular things. Incidentally, sometimes those libraries require case classes with more arguments.
One example of such a library is ScalaQuery, a library which enables us to write type-safe database queries. It maps tuples and case classes to database rows, which, combined with 22-limit, effectively restricts your database tables to 22 columns (including id and scaffolding!).
I ran into that restriction lately, when I needed to create a small app that stores patients data at our lab. Obviously, you need to store a lot of information about that patient - names, addresses, phone numbers, experiment data.
And this case just isn't very well suited to "split your table in two" technique. Right now, I did split that table in two (keeping a link to secondary table in each row in "primary" one). But it grows quickly, and soon it'll require 3 (!) 22-column tables. Not good.
Side note: maybe I needed to use some non-relational data storage for this? Probably, but the only easily reachable one was MySQL on server, so I went with it. If you have some good suggestion, I'll be happy to hear about it.
Ok, so I decided to work around that problem the hard way. If you think about it - there's nothing "fundamental" about number 22, just that it's hard-wired into compiler. So it should be easy to increase it - just download compiler source, run search-and-replace from 22 to, say, 222, build it, and you're done. Great, eh?
Wrong. Not so simple.
The biggest problem with that is the fact that compiler is bootstrapped, thus you can't just generate source files for bigger tuples and build it - you'll get that message about implementation restriction, since the old compiler (the one that builds your new version) still believes that the limit is 22.
To be specific, scala build system features three versions of the compilers. Firs, "starr", the most stable, old and wise one, which is just the previous stable release compiler. It is used to build distribution and the newer compiler, called "locker". The "locker" one is the actual compiler that usually builds the compiler sources, outputting "quick" compiler.
The interesting files for this experiment are src/build/genprod.scala file, that generates those ProductN, TupleN and FunctionN source files, and src/compiler/scala/tools/nsc/symtab/Definitions.scala file, which governs that 22 limit.
I used Scala 2.9.2 for this, since it's the most stable release. But I believe that this technique can be used on later scala versions as well, with some modifications.
First, we'll need to build the compiler with restriction lifted - in this example, I'll just change it from 22 to 222. Then, we'll replace "locker" compiler with newer, more liberal version, and use it to compile the new compiler with big, fat tuples. (note - I compiled 222 tuples, and it took more than 10Gb of memory on my machine. So be warned, you'll need a lot of time and memory!)
After that, a bit of trickery happens - to build the distribuiton, the "starr" compiler is used, but we can't replace it automatically for reasons I can't fully comprehend ("ant replacestarr" fails). Thus we will just replace it manually, since we know where it lives. After that, we can finally build the full distribution and even publish it locally, to be able to use it in our projects (the most amazing thing - sbt will even understand it and will work without any specific changes!).
Obviously, after this you'll need to recompile all your libraries that will need big tuples. For examle, I have to recompile ScalaQuery now, but that's a lot easier than recompiling scalac.
Here's the automated script, that would pull, modify, recompile, and publish locally the big-tupled scala distribution (the resulting version would be "2.9.2-tuplicity", thanks to som-snytt for pretty name):
git clone git://github.com/scala/scala.git scala-tuplicity
cd scala-tuplicity
git checkout v2.9.2
export ANT_OPTS="-Xmx8192m -Xss25M -Xms4096M -XX:MaxPermSize=512M"
VERS="-Dbuild.release=true -Dversion.number=2.9.2-tuplicity -Dmaven.version.number=2.9.2-tuplicity"
ant build
sed -i 's/\(val MaxTupleArity, .*\) 22/\1 222/' src/compiler/scala/tools/nsc/symtab/Definitions.scala
ant build
ant replacelocker
sed -i 's/\(MAX_ARITY .*\) 22/\1 222/' src/build/genprod.scala
echo 'genprod.main(Array("src/library/scala/"))' >> src/build/genprod.scala
scala src/build/genprod.scala
ant build $VERS
cd lib
cp ../build/pack/lib/* ./
for x in *.jar; do sha1sum $x | sed 's/\(\w*\) \(.*\)/\1 ?\2/' > ${x}.desired.sha1; done
cd ..
ant fastdist-opt distpack-opt $VERS
cd dists/maven/latest
ant deploy.release.local
So, hooray for open source! As we have seen, you can always go and recompile your language to suit your needs. Happy hacking!
Update: to make it actually work with sbt, seems that local maven repository does not work. You'll need to specify path to your distribution by hand: scalaHome := Some(file("/path/to/scala-tuplicity/dists/latest/")).
Comments
Post a Comment