Writing Haskell FFI for Binary Ninja logo, ninja beluga whale

This post details implementation of haskell bindings for Binary Ninja.

Resources

  • Imperative functional programming. Simon L. Peyton Jones and Philip Wadler. Paper pdf.
  • Stranger in a Strange Land: An introductory tour of the Haskell FFI. P.C. Shyamshankar @ galois. Youtube video

Motivation

Haskell is one of the best languages. Binary ninja provides the best api, ui and performance compared to ida pro and ghidra.

I want to implement efficient, parallel program analysis code in Haskell to aid reverse engineering and vulnerability research. Think Frama-C but for Medium Level IL instead of C.

There are three first-party supported languages for the binary ninja API:

Python is the most popular and most documented first party binding though python is fairly slow. C++ and rust are both promising however it's easier for me to write parallel code in Haskell and the GC is nice to have.

There exists a third-party binding for haskell from kudu-dynamics (acquired by Leidos). This repo generates bindings by processing a binary ninja C header file provided by vector35 into compliant C for use by c2hs. Due to the required processing of this header file along with my preference to have more control over types I decided to roll my own bindings.

How?

Elements:
  • C header file to reference available functions and types: binaryninjacore.h
  • Python bindings (indirect documentation for binaryninjacore.h)
  • ForeignFunctionInterface language extension: GHC FFI Doc
  • Shared object shipped with binary ninja

The general process I've taken:

  1. Pick a feature
  2. Enumerate required types and functions via Python bindings
  3. Call functions in python to help derive specification
  4. Implement the low level types and functions in haskell
  5. Spot-check test and implement higher level types / functions to taste
  6. Repeat

Examples: C to Haskell

Most binja types and functions lift to Haskell in a handful of patterns.

Binding of an enum type

C
typedef enum BNStringType
{
  AsciiString = 0,
  Utf16String = 1,
  Utf32String = 2,
  Utf8String = 3
} BNStringType;
Haskell
data BNStringType
  = AsciiString
  | Utf16String
  | Utf32String
  | Utf8String
  deriving (Eq, Show, Enum)
          
Note if the C enum type has jumps (such as: 0, 1, 125, ...) then instead of deriving Enum for the haskell type an instance of Enum should be created to match values of the C enum.

Binding of a struct

C
typedef struct BNStringReference
{
  BNStringType type;
  uint64_t start;
  size_t length;
} BNStringReference;
Haskell
data BNStringRef = BNStringRef
  { bnType :: !BNStringType,
    bnStart :: !Word64,
    bnLength :: !CSize
  }
  deriving (Eq, Show)

instance Storable BNStringRef where
  sizeOf _ = 24
  alignment _ = Binja.Types.alignmentS
  peek ptr = do
    t <- toEnum . fromIntegral <$> (peekByteOff ptr 0 :: IO CInt)
    s <- peekByteOff ptr 8 :: IO Word64
    l <- peekByteOff ptr 16 :: IO CSize
    pure (BNStringRef t s l)
  poke ptr (BNStringRef t s l) = do
    pokeByteOff ptr 0 $ fromEnum t
    pokeByteOff ptr 8 s
    pokeByteOff ptr 16 l
    
The Storable instance offsets of BNStringRef can be derived by reviewing binary ninja's shared object in binary ninja or trial and error based on the C header file types. A Storable instance allows for converting a pointer to a value and a value to a pointer.

Binding of a function involving list of pointers

C
BNFunction** BNGetAnalysisFunctionList(
   BNBinaryView* view,
   size_t* count);
            
Haskell
foreign import ccall unsafe "BNGetAnalysisFunctionList"
  c_BNGetAnalysisFunctionList ::
    BNBinaryViewPtr ->
    Ptr CSize ->
    IO (Ptr BNFunctionPtr)

foreign import ccall unsafe "BNFreeFunctionList"
  c_BNFreeFunctionList ::
    Ptr BNFunctionPtr ->
    CSize ->
    IO ()

data FunctionList = FunctionList
  { flArrayPtr :: !(ForeignPtr BNFunctionPtr),
    flCount :: !Int,
    flList :: ![BNFunctionPtr],
    flViewPtr :: !BNBinaryViewPtr
  }
  deriving (Eq, Show)

getFunctionList :: BNBinaryViewPtr ->
                   IO FunctionList
getFunctionList view =
  alloca $ \countPtr -> do
    rawPtr <- c_BNGetAnalysisFunctionList view countPtr
    count <- fromIntegral <$> peek countPtr
    xs <-
      if rawPtr == nullPtr || count == 0
        then pure []
        else peekArray count rawPtr
    arrPtr <- newForeignPtr rawPtr (c_BNFreeFunctionList rawPtr (fromIntegral count))
    pure
      FunctionList
        { flArrayPtr = arrPtr,
          flCount = count,
          flList = xs,
          flViewPtr = view
        }

functions :: BNBinaryViewPtr ->
             IO [BNFunctionPtr]
functions = fmap flList . getFunctionList
              

A function BNGetAnalysisFunctionList that takes as input a pointer to a BNBinaryView and a count. After the call count is populated with the amount of BNFunction*'s found in the view.

c_BNGetAnalysisFunctionList and c_BNFreeFunctionList are allocation and deallocation direct bindings. getFunctionList retrieves a FunctionList and manages memory via alloca and newForeignPtr. The function functions is a convenience to extract the list of BNFunctionPtr. Note how the IO monad trickles upward through

    • c_BNGetAnalysisFunctionList
    • getFunctionList
    • functions

    Convert return type from value to pointer

    Functions which return a structure by value must be wrapped to populate a reference passed as a function argument. The set of these wrappers used by Beluga is here.
    C
    BNParameterVariablesWithConfidence
      BNGetFunctionParameterVariables(BNFunction* func);
    
    C
    void BNGetFunctionParameterVariablesPtr(BNParameterVariablesWithConfidence* out,
                                            BNFunction* func)
    {
      *out = BNGetFunctionParameterVariables(func);
    }
    	
              

    Testing

    Differential testing is used as the testing oracle over a large set of executables. The dataset consists of:
    • 945,220 windows executables of total size 140GB Assemblage
    • 639,000 linux executables containing 502,000,000 functions via Assemblage
    • High value macOS targets including v8, webkit, sudo, signal with more planned
    • Drivers/firmware from pixel cellphones planned
    • High value userspace targets for android planned (messaging apps, etc)
    • csmith generated testsuite
    Currently the purpose of testing is:
    • Similarity of the python bindings with haskell bindings
    • Regression testing full stack (program analysis, haskell bindings, ghc, binja)
    • Measurement for optimization as a dual purpose
    Three hosts are used for testing covering 2 operating systems and 2 architectures.
    • Full testsuite: Apple m4 max with 128gig memory
    • Full testsuite: Debian i9 13900k 24 core cpu with 64gig memory
    • High value targets: Linux arm64 x13s thinkpad