Writing Haskell FFI for Binary Ninja
This post details implementation of haskell bindings for Binary Ninja.
Resources
- Imperative functional programming. Simon L. Peyton Jones and Philip Wadler. Paper pdf.
- Stranger in a Strange Land: An introductory tour of the Haskell FFI. P.C. Shyamshankar @ galois. Youtube video
Motivation
Haskell is one of the best languages. Binary ninja provides the best api, ui and performance compared to ida pro and ghidra.
I want to implement efficient, parallel program analysis code in Haskell to aid reverse engineering and vulnerability research. Think Frama-C but for Medium Level IL instead of C.
There are three first-party supported languages for the binary ninja API:
- C++: Documentation
- Python: Documentation
- Rust: Documentation
There exists a third-party binding for haskell from kudu-dynamics (acquired by Leidos). This repo generates bindings by processing a binary ninja C header file provided by vector35 into compliant C for use by c2hs. Due to the required processing of this header file along with my preference to have more control over types I decided to roll my own bindings.
How?
Elements:- C header file to reference available functions and types: binaryninjacore.h
- Python bindings (indirect documentation for binaryninjacore.h)
- ForeignFunctionInterface language extension: GHC FFI Doc
- Shared object shipped with binary ninja
The general process I've taken:
- Pick a feature
- Enumerate required types and functions via Python bindings
- Call functions in python to help derive specification
- Implement the low level types and functions in haskell
- Spot-check test and implement higher level types / functions to taste
- Repeat
Examples: C to Haskell
Most binja types and functions lift to Haskell in a handful of patterns.
Binding of an enum type
typedef enum BNStringType
{
AsciiString = 0,
Utf16String = 1,
Utf32String = 2,
Utf8String = 3
} BNStringType;
data BNStringType
= AsciiString
| Utf16String
| Utf32String
| Utf8String
deriving (Eq, Show, Enum)
Binding of a struct
typedef struct BNStringReference
{
BNStringType type;
uint64_t start;
size_t length;
} BNStringReference;
data BNStringRef = BNStringRef
{ bnType :: !BNStringType,
bnStart :: !Word64,
bnLength :: !CSize
}
deriving (Eq, Show)
instance Storable BNStringRef where
sizeOf _ = 24
alignment _ = Binja.Types.alignmentS
peek ptr = do
t <- toEnum . fromIntegral <$> (peekByteOff ptr 0 :: IO CInt)
s <- peekByteOff ptr 8 :: IO Word64
l <- peekByteOff ptr 16 :: IO CSize
pure (BNStringRef t s l)
poke ptr (BNStringRef t s l) = do
pokeByteOff ptr 0 $ fromEnum t
pokeByteOff ptr 8 s
pokeByteOff ptr 16 l
Binding of a function involving list of pointers
BNFunction** BNGetAnalysisFunctionList(
BNBinaryView* view,
size_t* count);
foreign import ccall unsafe "BNGetAnalysisFunctionList"
c_BNGetAnalysisFunctionList ::
BNBinaryViewPtr ->
Ptr CSize ->
IO (Ptr BNFunctionPtr)
foreign import ccall unsafe "BNFreeFunctionList"
c_BNFreeFunctionList ::
Ptr BNFunctionPtr ->
CSize ->
IO ()
data FunctionList = FunctionList
{ flArrayPtr :: !(ForeignPtr BNFunctionPtr),
flCount :: !Int,
flList :: ![BNFunctionPtr],
flViewPtr :: !BNBinaryViewPtr
}
deriving (Eq, Show)
getFunctionList :: BNBinaryViewPtr ->
IO FunctionList
getFunctionList view =
alloca $ \countPtr -> do
rawPtr <- c_BNGetAnalysisFunctionList view countPtr
count <- fromIntegral <$> peek countPtr
xs <-
if rawPtr == nullPtr || count == 0
then pure []
else peekArray count rawPtr
arrPtr <- newForeignPtr rawPtr (c_BNFreeFunctionList rawPtr (fromIntegral count))
pure
FunctionList
{ flArrayPtr = arrPtr,
flCount = count,
flList = xs,
flViewPtr = view
}
functions :: BNBinaryViewPtr ->
IO [BNFunctionPtr]
functions = fmap flList . getFunctionList
A function BNGetAnalysisFunctionList that takes as input a pointer to a BNBinaryView and a count. After the call count is populated with the amount of BNFunction*'s found in the view.
c_BNGetAnalysisFunctionList and c_BNFreeFunctionList are allocation and deallocation direct bindings. getFunctionList retrieves a FunctionList and manages memory via alloca and newForeignPtr. The function functions is a convenience to extract the list of BNFunctionPtr. Note how the IO monad trickles upward through
- c_BNGetAnalysisFunctionList
- getFunctionList
- functions
Convert return type from value to pointer
Functions which return a structure by value must be wrapped to populate a reference passed as a function argument. The set of these wrappers used by Beluga is here.BNParameterVariablesWithConfidence
BNGetFunctionParameterVariables(BNFunction* func);
void BNGetFunctionParameterVariablesPtr(BNParameterVariablesWithConfidence* out,
BNFunction* func)
{
*out = BNGetFunctionParameterVariables(func);
}
Testing
Differential testing is used as the testing oracle over a large set of executables. The dataset consists of:- 945,220 windows executables of total size 140GB Assemblage
- 639,000 linux executables containing 502,000,000 functions via Assemblage
- High value macOS targets including v8, webkit, sudo, signal with more planned
- Drivers/firmware from pixel cellphones planned
- High value userspace targets for android planned (messaging apps, etc)
- csmith generated testsuite
- Similarity of the python bindings with haskell bindings
- Regression testing full stack (program analysis, haskell bindings, ghc, binja)
- Measurement for optimization as a dual purpose
- Full testsuite: Apple m4 max with 128gig memory
- Full testsuite: Debian i9 13900k 24 core cpu with 64gig memory
- High value targets: Linux arm64 x13s thinkpad