JNA is surely deserving of all the praise it has been getting. It’s being used on some pretty high profile projects like JRuby with great success. After having done JNI the hard way, the painful, tortuous, despicable, bang-head-on-keyboard-while-wondering-if-ReleaseStringUTFChars-applies-here and why-the-jvm-is-segfaulting-again way, well I have a deep appreciation for JNA.

Still who wants to go write a bunch of useless Java interfaces for stuff that already exists in built into the native library itself? Not me. So that’s where Jython and JRuby come in.

This week I needed Jython/Python access to some native modules, namely ssdeep for fuzzy hashing. There’s already a pretty nice solution for connecting pure python to native libraries — you can either use swig or pyrex. The pyrex piece for ssdeep has been mostly written here. Needed to add the fuzzy_hash_buf method into that mix but it was nice and easy. From inside pure python with ssdeepmodule.so (via pyrex) and libfuzzy.so (from ssdeep) (or .dylib for Mac or .dll on Windows) sitting there on your LD_LIBRARY_PATH, you get to do this coolness:

  from ssdeep import ssdeep
  import sys, os
  f = open("/bin/ls","rb")
  data = f.read()
  ss = ssdeep()
  fuzzy_hash = ss.fuzzy_hash_buf(data)

Pretty nice you have to admit. But from my pure java p2p data-driven workflow framework, I really wanted to do this from Jython to keep from having to start the interpreter up in a subprocess over and over. Pyrex extensions do not work in Jython. Makes perfect sense. M’kay. I could write a whole bunch of lame JNi code to hook libfuzzy.so in there. Or I could use JNA and write some non-dry interface in java and figure out all the details of the types and so forth. Or … I could just push all that code down into the python module that I’m going to call.

from com.sun.jna import NativeLibrary, Function, Memory
import sys,os

class ssdeep:
    fuzlib = None
    hash_func = None

    def __init__(self):
        self.fuzlib = NativeLibrary.getInstance('fuzzy')
        self.hash_func = fuzlib.getFunction('fuzzy_hash_buf')

    def hash_data(self,data):
        ptr = Memory(self.FUZZY_HASH_SIZE)
        i = self.hash_func.invokeInt([data,len(data),ptr])
        return ptr.getString(0,False)

With a class and method conveniently named exactly the same as the pyrex module I can make it all flexible enough to work either way:

   # try the pyrex extension module
   from ssdeep import ssdeep
except ImportError:
     # try the jna wrapper when in jython
     from ssdeepjna import ssdeep
    except ImportError:
      # write tmp files and just exec the dumb thing

Which is all pretty nice I think. Not too many worries about creating interfaces or other crazy things. Seems very efficient, a little extra packaging and we are good to go.

Well there was one problem in getting different values from the hashes when the data was binary. Turns out the JNA layer needs to be told how to convert data with -Djna.encoding=8859_1 on the JVM command line. Since I usually run with -Dfile.encoding=UTF-8 and in a UTF-8 locale, this made all the difference.

If that is inconvenient or you want to encode things differently only sometimes, the extra steps in the python layer would be something like

  from java.lang import String
  def wrap_hash_buf(self,data):
    javastr = String(data,"8859_1")
    jbytes = javastr.getBytes("8859_1")
    return hash_buf(jbytes)

The same type of thing would work just as well from JRuby.

Welcome to the sweet spot. Code on, baby!