acha.ninja

Powerful scripting with janetsh

In this post we will write a Janetsh script that will do some parsing and scraping of command line output. Janetsh itself is my new shell and scripting tool based on the Janet programming language.

The task we are trying to solve is connect to a remote git repository, list all the remote version tags that are also semantic version numbers and print the tags as useful json values.

Writing the script

First we setup a shebang so our script can be run directly, and import the sh module so we can call some functions we need.

#! /usr/bin/env janetsh
(import sh)

The first step will be to fetch a list of remote tags from a git repository. The git ls-remote does this job, and we can easily run it from janetsh and collect the output into a string.

(sh/$$_ git ls-remote "https://github.com/janet-lang/janet")

The $$ function from the sh package is shorthand for running a command and capturing the output, while $$_ means the same, but trim any trailing white space.

The string we are given looks a bit like:

0ce5acec89b697b79fec898c93558534e7c70ad0        refs/heads/clean-string-api
b61d1a0a0ec624a3ad6165303687e0921a24c0b5        refs/heads/core-image
...
75271425491a5c0b6d5b56b18de6cc9798ac7b34        refs/tags/v0.4.1
c7dc3611bcf2325f2a319ff5c5c4c42e9cbe4cb7        refs/tags/v0.5.0

Given this knowledge, we can use the janet PEG (parsing expression grammar) module to write a parser for this output. We are only interested in semantic version tags, so our parser will also filter unwanted lines.

To use the PEG module we need to write a peg grammar, I will try to break it down into understandable parts and then show the final grammar. The general idea is we define what we are parsing in simple terms in a way the peg library can understand.

If you are unsure of what any of the following terms mean the janet peg documentation should be your first reference.

Now for the parser…

Let’s define a commit hash term as a list of hex chars:

:hex-char (choice (range "09") (range "af"))
:commit-hash (some :hex-char)

The commit separator is a tab or space:

:sep (some (choice "\t" " "))

A semantic version is an optional ‘v’ followed by three dot separated numbers:

:vnum (some (range "09"))
:semver (sequence (? "v") :vnum "." :vnum "." :vnum )

A semantic version tag ref looks like:

:semver-tag (sequence :commit-hash :sep "refs/tags/" :semver )

Actually, we are interested in extracting the commit and version numbers from the parser, so we specify capture, which pushes the parsed value onto out returned parse array.

:commit-hash (capture ...)
:vnum (capture ...)

And here we put it all together into a peg grammar and define a parser function:

(def semver-tag-grammar
  (quote # We must use quoting to avoid symbol lookups/calls
    {
      :hex-char (choice (range "09") (range "af"))
      :commit-hash (capture (some :hex-char))
      :sep (some (choice "\t" " "))
      :vnum (capture (some (range "09")))
      :semver (sequence (? "v") :vnum "." :vnum "." :vnum )
      :semver-tag (sequence :commit-hash :sep "refs/tags/" :semver )
      :main :semver-tag
    }
  ))

(defn parse-semver-tag
  [s]
  (peg/match semver-tag-grammar s))

Let’s test our parser:

$ (parse-semver-tag "41bb6a9833cbb06ff1b83c10b42e446330eadd21        refs/heads/pointer")
nil
$ (parse-semver-tag "c7dc3611bcf2325f2a319ff5c5c4c42e9cbe4cb7        refs/tags/v0.5.0")
@["c7dc3611bcf2325f2a319ff5c5c4c42e9cbe4cb7" "0" "5" "0"]

Now we have a parser, we need to write a function to take a remote url as an argument, and return a list of parsed tags from that remote.

(defn ls-remote-semver-tags
  [remote]
  (var output ($$_ git ls-remote [remote]))
  (var split-output  (string/split "\n" output))
  (var tags-and-nils (map parse-semver-tag split-output))
  (filter (fn [x] (not (nil? x))) tags-and-nils)))

-» Is a janet threading macro which can be read as take the output of the first argument and thread it through the last parameter of each successive element. This is one way you can avoid excessive nesting or pointless intermediate variables in janet.

Here we refactor our code to use the threading macro:

(defn ls-remote-semver-tags
  [remote]
  (->> 
    (sh/$$_ git ls-remote [remote]))
    (string/split "\n")
    (map parse-semver-tag)
    (filter (fn [x] (not (nil? x))))))

We are finally ready to print our collected tags in whatever format we want. We can just print straight to well formed json because our parser only accepts well formed tags that don’t contain quotations or other garbage ad hoc scripts typically miss.


(def remote (process/args 1))

(when (not remote)
  (error "please specify a remote."))

(def tags (ls-remote-semver-tags remote)
(each tag tags
    (print 
      "{ \"commit\" : \""  (tag 0) "\", "
         "\"major\" : \""  (tag 1) "\", "
         "\"minor\" : \""  (tag 2) "\", "
         "\"patch\" : \""  (tag 3) "\" }"))

Let’s test it out:

$ ./get-semver-tags https://github.com/janet-lang/janet
...
{ "commit" : "75271425491a5c0b6d5b56b18de6cc9798ac7b34", "major" : "0", "minor" : "4", "patch" : "1" }
{ "commit" : "c7dc3611bcf2325f2a319ff5c5c4c42e9cbe4cb7", "major" : "0", "minor" : "5", "patch" : "0" }

Note the script is also robust to error conditions because if git returns an error, $$_ throws a janet error.

$ ./get-semver-tags https://foo/
fatal: unable to access 'https://foo/': Could not resolve host: foo
error: job failed!

Conclusion

Here is the final full script,

#! /usr/bin/env janetsh

(import sh)

(def semver-tag-grammar
  (quote # We must use quoting to avoid symbol lookups/calls
    {
      :hex-char (choice (range "09") (range "af"))
      :commit-hash (capture (some :hex-char))
      :sep (some (choice "\t" " "))
      :vnum (capture (some (range "09")))
      :semver (sequence (? "v") :vnum "." :vnum "." :vnum )
      :semver-tag (sequence :commit-hash :sep "refs/tags/" :semver )
      :main :semver-tag
    }
  ))

(defn parse-semver-tag
  [s]
  (peg/match semver-tag-grammar s))

(defn ls-remote-semver-tags
  [remote]
  (->> 
    (sh/$$_ git ls-remote [remote]))
    (string/split "\n")
    (map parse-semver-tag)
    (filter (fn [x] (not (nil? x))))))

(def remote (process/args 1))

(when (not remote)
  (error "please specify a remote."))

(each tag (ls-remote-semver-tags remote)
    (print 
      "{ \"commit\" : \""  (tag 0) "\", "
         "\"major\" : \""  (tag 1) "\", "
         "\"minor\" : \""  (tag 2) "\", "
         "\"patch\" : \""  (tag 3) "\" }"))

Janet + Janetsh is a powerful choice as a shell and scripting language. Using it you will be able to write readable and robust scripts while also having a lot of fun :).

Note that Janetsh is currently pre-release software, so it won’t (yet) be packaged for your operating system. If you would like to help make janetsh ready for general use please donate here via paypal or give it a star.