../wasm⥪fressian⥭cljs

2018_10_17

In Nov 2017 support for compiling directly to wasm landed in Rust. Afterwards there was a window, before wasm-bindgen was thing, where it was simultaneously very easy to compile wasm but very hard to use with javascript: WebAssembly functions can only return numbers that fit inside a double to their javascript callers, and any other exchange of information must be done via shared access to the module’s memory. This means reading and writing binary data and swapping pointers back and forth between javascript and rust ffi style.

So I wondered, did clojure offer a solution for this?

Fressian

Fressian is a self-describing binary format designed by Rich Hickey. As you would expect from the Clojure team, Fressian is highly dynamic and easily extended.

In short, the Fressian data model includes all the standard clojure primitives and generic sequential and map-like collections. Users can easily use these primitives to extend Fressian with their own types and handle them using their own read and write handlers. You can read the official documentation here, and Stuart Halloway gave a talk about Fressian that you can watch here.


So… Fressian is flexible and dynamic… words that have never ever been used to describe Rust. Assuming we can implement fressian in clojurescript, how can we get those dynamic values to play nicely with the strongly typed and infamously restrictive rustlang?


Serde

Serde is a data serialization framework for rust authored by David Tolnay. Framework is the operative word here: Serde doesn’t know anything about your actual format. It is up to implementers to rig up their own parsers and serializers into Serde’s data model. The Deserialize trait lets types pull the data they need from deserializers in a format agnostic way. The Serialize trait lets types write data in a format agnostic way. I highly recommend reading the serde docs and playing with serde-json if you need something more concrete and familiar than fressian.

There is good news though: Serde is magic. It is so magical that you can take the serializer for one format, rig it up to the deserializer for another format, and you get a transcoder for free. Yeah! And it’s fast… all these traits are tucked away at compile time and the only dispatch is introduced by implementation authors.

Serde is spectacularly easy for end users to consume. Converting Fressian bytes into Rust data structures is literally a one-liner.

serde-fressian

Serde-fressian is an implementation of Fressian, in Rust, using Serde. Fress is a Fressian implementation for clojurescript. Together, they can seamlessly exchange values between WebAssembly modules and Clojurescript.

The serde_fressian::wasm module is designed to interface with the fress.wasm namespace. From cljs, all you really need is one function: fress.wasm/call takes any fressianable object (or none), writes it into wasm memory, and hands off those bytes to an exported wasm function. The wasm function then does its thing. If it returns a pointer, fress.wasm/call will take it and read a value from it.

Rust…

use serde_fressian::error::{Error as FressError};
use serde_fressian::value::{Value};
use serde_fressian::wasm::{self};

#[no_mangle]
pub extern "C" fn echo(ptr: *mut u8, len: usize) -> *mut u8
{
    let res: Result<Value, FressError> = wasm::from_ptr(ptr, len);

    wasm::fress_dealloc(ptr, len);

    wasm::to_js(res)
}

CLJS…

(require '[fress.wasm])

(def module (atom nil))

(defn init-module [buffer]
  (.then (fress.wasm/instantiate buffer)
    (fn [Mod] (reset! module Mod))
    (fn [err] (handle-err err))))

(defn echo [any]
  (if-let [Mod @module]
    (fress.wasm/call Mod "echo" any)
    (throw (js/Error "missing module"))))

(assert (= (echo "hello world!") [nil "hello world!"]))

The details of this interaction are walked through in detail here.

You can take things for a spin with the fressian-wasm-quick-start.

Wait, what about those dynamic types?

Just as described for serde-json, serde-fressian supports both ‘weakly typed deserialization’ and ‘strongly typed deserialization’.

weakly typed deserialization

This is where we let a self-describing format drive deserialization. We accomplish this with serde_fressian::value::Value, an enum that can handle any fressian type you throw at it. Value’s Deserialize impl will just read values off as they come, storing them in a giant recursive enum. Want to give rust a map where the keys are all different types, some even themselves maps? Go for it. It may not make any sense to do stuff like that, but the point is you can. If you don’t know what you are being given, deserializing to Value will give you something to work with.

strongly typed deserialization

This is where you know the content encoded in the bytes. This time, Fressian’s self-describing nature is incidental. Using the same interface, you can send more specific types, and serde doesn’t care as long as the bytes are valid.

let res: Result<Vec<String>, FressError> = wasm::from_ptr(ptr, len);

Want to send a vector of strings from clojurescript? Serde-fressian can pick it up as Vec<String> seamlessly.

What’s the catch?

There are 2:

  • If you want to accept values from cljs, you must make sure your rust interfaces accept strictly (ptr: *mut u8, len: usize) to be deserialized by wasm::from_ptr.
  • If you want send values to cljs, you must return *mut u8 via wasm::to_js.

Deserializing with wasm::from_ptr only borrows the bytes it is given. You could also take ownership of them via a vec and deserialize with serde_fressian::de::from_vec. Your rust function is responsible for dealing with those bytes in a way that makes sense for what you are trying to do. This isn’t really a catch as much as it is a feature. Setting up those interfaces and reasoning about ownership of the byte slices is really all the interop overhead there is.

Wasm-Bindgen

Wasm-bindgen solves the interface problem by creating an AST and using it to generate both wrapper rust functions that produce the necessary pointers, and typescript functions to interop with them. It will also automatically handle js function imports.

It should be possible to do this for fressian. We can use the same AST technique to generate similar rust wrappers, and then hand off a json description of the signatures to clojure for generating functions or namespaces. Is it worth going the whole way and defining concrete type signatures? I don’t know, you might as well use wasm-bindgen at that point (and tbf maybe you should anyway), but certainly things can be easier.

In the long term, all of this may not matter. Wasm engines are being developed faster than you think, and it wouldn’t be surprising to see threads, native javascript values, and DOM apis landing in browsers in the next couple of years, rendering all this stuff moot.

Regardless, Wasm-Bindgen really has a lot to offer, at cost of some extra build time, and well, typescript. The rust-wasm team is very friendly and has all kinds of tools in development, including a packaging solution. You should also check out their book.

👌 One does not simply ‘write rust’

If this is your first brush with Rust, or you’ve had a bad go of it already, please do yourself the favor of doing your homework:

  1. TRPL
  2. Rust By Example
  3. Programming Rust

Rust is different, and if you bash at it like you would a more familiar language, you will get frustrated quickly. Most friction surrounds getting function interfaces to agree with the ‘ownership’ needs of the data you are trying to work with. Probably more than half of the time I have spent with Rust has been just refactoring things so that signatures agree. Once you get your data though, operating on it is a breeze. Rust has expressions, pattern matching, closures, and a wonderful iterator system. It all feels succinct, natural, and fast.

Trust the compiler to put you on the right track. I think Rust’s difficulty is oversold simply because the compiler is so vocal and smart in comparison to other languages. Rustc is an angry pedant and it is going to yell at you all the time… and it will be right to do so. Make peace with it and converse like you would a repl. Give it a fair shot and do your reading. You can do it.

…or WebAssembly

So Fressian makes value exchange pretty easy, and consequently, development is smoother. What it does not do, however, is make writing wasm functions automatically worthwhile.

There are many examples out there of successfully compiling game engines to wasm, but if you are a webdev don’t succumb to FOMO quite yet. Bashing pixel buffers and handing off views on memory is a simpler exchange vs. say, allocating collections of JS strings from serialized bytes. There is a fixed cost to reading bytes and turning them into Javascript values, and it is non-trivial to find performance margins that justify that cost.

You really need a problem complex enough where you can accept the handicap and let rust’s fine grain memory control gain ground and overcome javascript engines. Earlier this year there was a very interesting back and forth between @fitzgen and @mraleph about moving Mozilla’s source-map parsing from JavaScript into WebAssembly. If you are looking for performance, it’s an enlightening read.

What’s next

✨✨✨ I want a wasm pretty printer and I want it yesterday. ✨✨✨

To get there we need patterns for persisting state that unlock rust’s finer capabilities. Serde-Fressian is still pretty naive: serializing strings and bytes involves copying them, when really it should be possible to skip the copy and serialize pointers themselves for javascript to read from. If you understand Rust’s ownership rules, then you can anticipate that doing this requires data to survive the function that serializes it. This means global state and maybe a compiler flag.

Fress is also a little fat and slow. There is unnecessary dispatch, and some well placed compiler constants could allow large swaths of undesired fressian type support to be excised from advanced builds. For example, if you know you’ll never use chunked strings or bytes, you shouldn’t have to ship the associated handlers. Maybe it should even be re-written in js similar to transit.

And finally, documentation & tutorials. This is alot I know, but there’s cool stuff on the way.

HMU \m/

– Thanks to John Newman and Stuart Halloway for reading drafts of this post.