Writing Haskell FFI for Binary Ninja

Haskell · FFI · Binary Ninja — December 22, 2025

This post details implementation of haskell bindings for Binary Ninja.

Resources

Imperative functional programming. Simon L. Peyton Jones and Philip Wadler. Paper pdf.
Stranger in a Strange Land: An introductory tour of the Haskell FFI. P.C. Shyamshankar @ galois. Youtube video

Motivation

Haskell is one of the best languages. Binary ninja provides the best api, ui and performance compared to ida pro and ghidra.

I want to implement efficient, parallel program analysis code in Haskell to aid reverse engineering and vulnerability research. Think Frama-C but for Medium Level IL instead of C.

There are three first-party supported languages for the binary ninja API:

Python is the most popular and most documented first party binding though python is fairly slow. C++ and rust are both promising however it's easier for me to write parallel code in Haskell and the GC is nice to have.

There exists a third-party binding for haskell from kudu-dynamics (acquired by Leidos). This repo generates bindings by processing a binary ninja C header file provided by vector35 into compliant C for use by c2hs. Due to the required processing of this header file along with my preference to have more control over types I decided to roll my own bindings.

How?

Elements:

C header file to reference available functions and types: binaryninjacore.h
Python bindings (indirect documentation for binaryninjacore.h)
ForeignFunctionInterface language extension: GHC FFI Doc
Shared object shipped with binary ninja

The general process I've taken:

Pick a feature
Enumerate required types and functions via Python bindings
Call functions in python to help derive specification
Implement the low level types and functions in haskell
Spot-check test and implement higher level types / functions to taste
Repeat

Examples: C to Haskell

Most binja types and functions lift to Haskell in a handful of patterns.

Binding of an enum type

typedef enum BNStringType
{
  AsciiString = 0,
  Utf16String = 1,
  Utf32String = 2,
  Utf8String = 3
} BNStringType;

Haskell

data BNStringType
  = AsciiString
  | Utf16String
  | Utf32String
  | Utf8String
  deriving (Eq, Show, Enum)

Note if the C enum type has jumps (such as: 0, 1, 125, ...) then instead of deriving Enum for the haskell type an instance of Enum should be created to match values of the C enum.

Binding of a struct

typedef struct BNStringReference
{
  BNStringType type;
  uint64_t start;
  size_t length;
} BNStringReference;

Haskell

data BNStringRef = BNStringRef
  { bnType :: !BNStringType,
    bnStart :: !Word64,
    bnLength :: !CSize
  }
  deriving (Eq, Show)

instance Storable BNStringRef where
  sizeOf _ = 24
  alignment _ = Binja.Types.alignmentS
  peek ptr = do
    t <- toEnum . fromIntegral <$> (peekByteOff ptr 0 :: IO CInt)
    s <- peekByteOff ptr 8 :: IO Word64
    l <- peekByteOff ptr 16 :: IO CSize
    pure (BNStringRef t s l)
  poke ptr (BNStringRef t s l) = do
    pokeByteOff ptr 0 $ fromEnum t
    pokeByteOff ptr 8 s
    pokeByteOff ptr 16 l

The Storable instance offsets of BNStringRef can be derived by reviewing binary ninja's shared object in binary ninja or trial and error based on the C header file types. A Storable instance allows for converting a pointer to a value and a value to a pointer.

Binding of a function involving list of pointers

BNFunction** BNGetAnalysisFunctionList(
   BNBinaryView* view,
   size_t* count);

Haskell

foreign import ccall unsafe "BNGetAnalysisFunctionList"
  c_BNGetAnalysisFunctionList ::
    BNBinaryViewPtr ->
    Ptr CSize ->
    IO (Ptr BNFunctionPtr)

foreign import ccall unsafe "BNFreeFunctionList"
  c_BNFreeFunctionList ::
    Ptr BNFunctionPtr ->
    CSize ->
    IO ()

data FunctionList = FunctionList
  { flArrayPtr :: !(ForeignPtr BNFunctionPtr),
    flCount :: !Int,
    flList :: ![BNFunctionPtr],
    flViewPtr :: !BNBinaryViewPtr
  }
  deriving (Eq, Show)

getFunctionList :: BNBinaryViewPtr ->
                   IO FunctionList
getFunctionList view =
  alloca $ \countPtr -> do
    rawPtr <- c_BNGetAnalysisFunctionList view countPtr
    count <- fromIntegral <$> peek countPtr
    xs <-
      if rawPtr == nullPtr || count == 0
        then pure []
        else peekArray count rawPtr
    arrPtr <- newForeignPtr rawPtr (c_BNFreeFunctionList rawPtr (fromIntegral count))
    pure
      FunctionList
        { flArrayPtr = arrPtr,
          flCount = count,
          flList = xs,
          flViewPtr = view
        }

functions :: BNBinaryViewPtr ->
             IO [BNFunctionPtr]
functions = fmap flList . getFunctionList

A function BNGetAnalysisFunctionList that takes as input a pointer to a BNBinaryView and a count. After the call count is populated with the amount of BNFunction*'s found in the view.

c_BNGetAnalysisFunctionList and c_BNFreeFunctionList are allocation and deallocation direct bindings. getFunctionList retrieves a FunctionList and manages memory via alloca and newForeignPtr. The function functions is a convenience to extract the list of BNFunctionPtr. Note how the IO monad trickles upward through

c_BNGetAnalysisFunctionList
getFunctionList
functions

Convert return type from value to pointer

Functions which return a structure by value must be wrapped to populate a reference passed as a function argument. The set of these wrappers used by Beluga is here.

BNParameterVariablesWithConfidence
  BNGetFunctionParameterVariables(BNFunction* func);

void BNGetFunctionParameterVariablesPtr(BNParameterVariablesWithConfidence* out,
                                        BNFunction* func)
{
  *out = BNGetFunctionParameterVariables(func);
}

Testing

Differential testing is used as the testing oracle over a large set of executables. The dataset consists of:

945,220 windows executables of total size 140GB Assemblage
639,000 linux executables containing 502,000,000 functions via Assemblage
High value macOS targets including v8, webkit, sudo, signal with more planned
Drivers/firmware from pixel cellphones planned
High value userspace targets for android planned (messaging apps, etc)
csmith generated testsuite

Currently the purpose of testing is:

Similarity of the python bindings with haskell bindings
Regression testing full stack (program analysis, haskell bindings, ghc, binja)
Measurement for optimization as a dual purpose

Three hosts are used for testing covering 2 operating systems and 2 architectures.

Full testsuite: Apple m4 max with 128gig memory
Full testsuite: Debian i9 13900k 24 core cpu with 64gig memory
High value targets: Linux arm64 x13s thinkpad