Hi, I‘m Martin. You should follow me: @martinklepsch

November 2014 Why Boot is Relevant For The Clojure Ecosystem

Boot is a build system for Clojure projects. It roughly competes in the same area as Leiningen but Boot’s new version brings some interesting features to the table that make it an alternative worth assessing.

Compose Build Steps

If you’ve used Leiningen for more than packaging jars and uberjars you likely came across plugins like lein-cljsbuild or lein-garden, both compile your stuff into a target format (i.e. JS, CSS). Now if you want to run both of these tasks at the same time — which you probably want during development — you have two options: either you open two terminals and start them separately or you fall back to something like below that you run in a dev profile (this is how it’s done in Chestnut):

(defn start-garden []
  (future
    (print "Starting Garden.\n")
    (lein/-main ["garden" "auto"])))

Now there are issues with both of these options in my opinion. Opening two terminals to initiate your development environment is just not very user friendly and putting code related to building the project into your codebase is boilerplate that unnecessarily can cause trouble by getting outdated.

What Boot allows developers to do is to write small composable tasks. These work somewhat similar to stateful transducers and ring middleware in that you can just combine them with regular function composition.

A Quick Example

Playing around with Boot, I tried to write a task. To test this task in an actual project I needed to install it into my local repository (in Leiningen: lein install). Knowing that I’d need to reinstall the task constantly as I change it I was looking for something like Leiningen’s Checkouts so I don’t have to re-install after every change.

Turns out Boot can solve this problem in a very different way that illustrates the composing mechanism nicely. Boot defines a bunch of built-in tasks that help with packaging and installing a jar: pom, add-src, jar & install.

We could call all of these these on the command line as follows:

boot pom add-src jar install

Because we’re lazy we’ll define it as a task in our project’s build.boot file. (Command-line task and their arguments are symmetric to their Clojure counterparts.)

(require '[boot.core          :refer [deftask]]
         '[boot.task.built-in :refer [pom add-src jar install]])

(deftask build-jar
  "Build jar and install to local repo."
  []
  (comp (pom) (add-src) (jar) (install)))

Now boot build-jar is roughly equivalent to lein install. To have any changes directly reflected on our classpath we can just compose our newly written build-jar task with another task from the repertoire of built-in tasks: watch. The watch-task observes the file system for changes and initiates a new build cycle when they occur:

boot watch build-jar

With that command we just composed our already composed task with another task. Look at that cohesion!

There Are Side-Effects Everwhere!

Is one concern that has been raised about Boot. Leiningen is beautifully declarative. It’s one immutable map that describes your whole project. Boot on the other hand looks a bit different. A usual boot file might contain a bunch of side-effectful functions and in general it’s much more a program than it is data.

I understand that this might seem like a step back at first sight, in fact I looked at it with confusion as well. There are some problems with Leiningen though that are probably hard to work out in Leiningen’s declarative manner (think back to running multiple lein X auto commands.

Looking at Boot’s code it becomes apparent that the authors spent a great deal of time on isolating the side effects that might occur in various build steps. I recommend reading the comments on this Hacker News thread for more information on that.

When To Use Boot, When To Use Leiningen

Boot is a build tool. That said it’s task composition features only get to shine when multiple build steps are involved. If you’re developing a library I’m really not going to try to convince you to switch to Boot. Leiningen works great for that and is, I’d assume, more stable than Boot.

If you however develop an application that requires various build steps (like Clojurescript, Garden, live reloading, browser-repl) you should totally check out Boot. There are tasks for all of the above mentioned: Clojurescript, Clojurescript REPL, Garden, live reloading. I wrote the Garden task and writing tasks is not hard once you have a basic understanding of Boot.

If you need help or have questions join the #hoplon channel on freenode IRC. I’ll try to help and if I can’t Alan or Micha, the authors of Boot, probably can.


October 2014 S3-Beam — Direct Upload to S3 with Clojure & Clojurescript

In a previous post I described how to upload files from the browser directly to S3 using Clojure and Clojurescript. I now packaged this up into a small (tiny, actually) library: s3-beam.

An interesting note on what changed to the process described in the earlier post: the code now uses pipeline-async instead of transducers. After some discussion with Timothy Baldridge this seemed more appropriate even though there are some aspects about the transducer approach that I liked but didn’t get to explore further.

Maybe in an upcoming version it will make sense to reevaluate that decision. If you have any questions, feedback or suggestions I’m happy to hear them!


October 2014 Patalyze — An Experiment Exploring Publicly Available Patent Data

For a few months now I’ve been working on and off on a little “data-project” analyzing patents published by the US Patent & Trademark Office. Looking at the time I spent on this until now I think I should start talking about it instead of just hacking away evening after evening.

It started with a simple observation: there are companies like Apple that sometimes collaborate with smaller companies building a small part of Apple’s next device. A contract like this usually gives the stock of the small company a significant boost. What if you could foresee those relationships by finding patents that employees from Apple and from the small company filed?

An API for patent data?

Obviously this isn’t going to change the world for the better but just the possibility that such predictions or at least indications are possible kept me curious to look out for APIs offering patent data. I did not find much. So thinking about something small that could be “delivered” I thought a patent API would be great. To build the dataset I’d parse the archives provided on Google’s USPTO Bulk downloads page.

I later found out about Enigma and some offerings by Thomson Reuters. The prices are high and the sort of analysis we wanted to do would have been hard with inflexible query APIs.

For what we wanted to do we only required a small subset of the data a patent contains. We needed the organization, it’s authors, the title and description, filing- and publication dates and some identifiers. With such a reduced amount of data that’s almost only useful in combination with the complete data set I discarded the plan to build an API. Maybe it will make sense to publish reduced and more easily parseable versions of the archives Google provides at some point. Let me know if you would be interested in that.

What’s next

So far I’ve built up a system to parse, store and query some 4 million patents that have been filed at the USPTO since beginning of 2001. While it sure would be great to make some money off of the work I’ve done so far I’m not sure what product could be built from the technology I created so far. Maybe I could sell the dataset but the number of potential customers is probably small and in general I’d much more prefer to make it public. I’ll continue to explore the possibilities with regards to that.

For now I want to explore the data and share the results of this exploration. I setup a small site that I’d like to use as home for any further work on this. By now it only has a newsletter signup form (just like any other landing page) but I hope to share some interesting analysis with the subscribers to the list every now and then in the near future. Check it out at patalyze.co. There even is a small chart showing some data.


September 2014 Running a Clojure Uberjar inside Docker

For a sideproject I wanted to deploy a Clojure uberjar on a remote server using Docker. I imagined that to be fairly straight foward but there are some caveats you need to be aware of.

Naively my first attempt looked somewhat like this:

FROM dockerfile/java
ADD https://example.com/app-standalone.jar /
EXPOSE 8080
ENTRYPOINT [ "java", "-verbose", "-jar", "/app-standalone.jar" ]

I expected this to work. But it didn’t. Instead it just printed the following:

[Opened /usr/lib/jvm/java-7-oracle/jre/lib/rt.jar]
# this can vary depending on what JRE you're using

And that has only been printed because I added -verbose when starting the jar. So if you’re not running the jar verbosely it’ll fail without any output. Took me quite some time to figure that out.

As it turns out the dockerfile/java image contains a WORKDIR command that somehow breaks my java invocation, even though it is using absolute paths everywhere.

What worked for me

I ended up splitting the procedure into two files in a way that allowed me to always get the most recent jar when starting the docker container.

The Dockerfile basically just adds a small script to the container that downloads and starts a jar it downloads from somewhere (S3 in my case).

FROM dockerfile/java
ADD fetch-and-run.sh /
EXPOSE 42042
EXPOSE 3000
CMD ["/bin/sh", "/fetch-and-run.sh"]

And here is fetch-and-run.sh:

#! /bin/sh
wget https://s3.amazonaws.com/example/yo-standalone.jar -O /yo-standalone.jar;
java -verbose -jar /yo-standalone.jar

Now when you build a new image from that Dockerfile it adds the fetch-and-run.sh script to the image’s filesystem. Note that the jar is not part of the image but that it will be downloaded whenever a new container is being started from the image. That way a simple restart will always fetch the most recent version of the jar. In some scenarios it might become confusing to not have precise deployment tracking but in my case it turned out much more convenient than going through the process of destroying the container, deleting the image, creating a new image and starting up a new container.


September 2014 Using core.async and Transducers to upload files from the browser to S3

In a project I’m working on we needed to enable users to upload media content. In many scenarios it makes sense to upload to S3 directly from the browser instead of routing it through a server. If you’re hosting on Heroku you need to do this anyways. After digging a bit into core.async this seemed like a neat little excuse to give Clojure’s new transducers a go.

The Problem

To upload files directly to S3 without any server in between you need to do a couple of things:

  1. Enable Cross-Origin Resource Sharing (CORS) on your bucket
  2. Provide special parameters in the request that authorize the upload

Enabling CORS is fairly straightforward, just follow the documentation provided by AWS. The aforementioned special parameters are based on your AWS credentials, the key you want to save the file to, it’s content-type and a few other things. Because you don’t want to store your credentials in client-side code the parameters need to be computed on a server.

We end up with the following procedure to upload a file to S3:

  1. Get a Javascript File object from the user
  2. Retrieve special parameters for post request from server
  3. Post directly from the browser to S3

Server-side code

I won’t go into detail here, but here’s some rough Clojure code illustrating the construction of the special parameters and how they’re sent to the client.

Client-side: Transducers and core.async

As we see the process involves multiple asynchronous steps:

To wrap all that up into a useful minimal API that hides all the complex back and forth happening until a file is uploaded core.async channels and transducers turned out very useful:

(defn s3-upload [report-chan]
      (let [upload-files (map #(upload-file % report-chan))
            upload-chan  (chan 10 upload-files)
            sign-files   (map #(sign-file % upload-chan))
            signing-chan (chan 10 sign-files)]

        (go (while true
              (let [[v ch] (alts! [signing-chan upload-chan])]
                ; that's not really required but has been useful
                (log v))))
        signing-chan))

This function takes one channel as argument where it will put! the result of the S3 request. You can take a look at the upload-file and sign-file functions in this gist.

So what’s happening here? We use a channel for each step of the process: signing-chan and upload-chan. Both of those channels have an associated transducer. In this case you can think best of a transducer as a function that’s applied to each item in a channel on it’s way through the channel. I initially trapped upon the fact that the transducing function is only applied when the element is being taken from the channel as well. Just putting things into a channel doesn’t trigger the execution of the transducing function.

signing-chan’s transducer initiates the request to sign the File object that has been put into the channel. The second argument to the sign-file function is a channel where the AJAX callback will put it’s result. Similary upload-chan’s transducer initiates the upload to S3 based on information that has been put into the channel. A callback will then put S3’s response into the supplied report-chan.

The last line returns the channel that can be used to initiate a new upload.

Using this

Putting this into a library and opening it up for other people to use isn’t overly complicated, the exposed API is actually very simple. Imagine an Om component upload-form:

(defn queue-file [e owner {:keys [upload-queue]}]
      (put! upload-queue (first (array-seq (.. e -target -files)))))
(defcomponent upload-form [text owner]
      (init-state [_]
        (let [rc (chan 10)]
          {:upload-queue (s3-upload rc)
           :report-chan rc}))
      (did-mount [_]
        (let [{:keys [report-chan]} (om/get-state owner)]
          (go (while true (log (<! report-chan))))))
      (render-state [this state]
        (dom/form
         (dom/input {:type "file" :name "file"
                     :on-change #(queue-file % owner state)} nil))))

I really like how simple this is. You put a file into a channel and whenever it’s done you take the result from another channel. s3-upload could take additional options like logging functions or a custom URL to retrieve the special parameters required to authorize the request to S3.

This has been the first time I’ve been doing something useful with core.async and, probably less surprisingly, the first time I played with transducers. I assume many things can be done better and I still need to look into some things like how to properly shut down the go blocks. Any feedback is welcome! Tweet or mail me!

Thanks to Dave Liepmann who let me peek into some code he wrote that did similar things and to Kevin Downey (hiredman) who helped me understand core.async and transducers by answering my stupid questions in #clojure on Freenode.


« 1 2 3 4 5 6 7 »