Operational Models

In ESBMC we replaced some libc APIs with our own implementation to facilitate the verification. This page aims to ramp up new team members to have a good understanding of the big picture of how C operational models (OMs) are built and linked, and how C++ OMs are used in a different way from the C’s. More importantly, this page aims to explain the concept in layman’s terms. Note that OMs and internal libraries are used interchangeably in the sections below.

In order to better explain the basic ideas for C OMs build and linking, let’s start with the snapshot commit 11d4688af7(referred to as the snapshot version) to get the gist - this will (hopefully) help the reader better understand the major extensions in PR 602 in the current master branch tip as of this writing.

Build and link process:

As of the snapshot version, the c2goto directory contains 2x CMakeLists which generates:

  • 2x static libraries: libclibs.a and libc2gotoheaders.a
  • 1x executable: c2goto

libc2gotoheaders.a is of no interest to us, as it is only used to build the old C frontend. There are two groups of OMs we’d like to use in our verification flow:

  • internal C libraries (consider them as ESBMC’s implementation of libc APIs)
  • internal C math libraries (consider them as ESBMC’s implementation of libm APIs)

Screenshot 2023-05-21 at 20 58 29

How does ESBMC use these interal libraries to replace the standard C library APIs? We got to find a way to somehow link them in our verification flow …

This is done three phases:

  • [Phase 1] First the c2goto executable is built WITHOUT internal libraries (see NO_CPROVER_LIBRARY in the above figure). It is a compiler shares the Clang frontend with esbmc converting C OMs to *.goto binaries. Then these binaries will be flailed to char arrays. Using file command to check their type, e.g. file clib32.goto returns ‘data’ type.

You may want use hexdump clib32.goto -C to take a look at the their header “GBF”. Looks familiar? - it’s from the global function write_goto_binary

Screenshot 2023-05-21 at 21 18 34

called as part of the c2goto’s conversion flow. So the c2goto first converts ESBMC’s internal libraries to goto functions and the associated context (aka. symbol table). Then write them to the corresponding *.goto binary file. Four biinaries files are created using difference machine word length 32/64 in float-bv and fixed-bv respectively. They are clib32.goto, clib32_fp.goto, clib64.goto and clib64_fp.goto.

  • [Phease 2] CMake flails the *.goto binaries of Phase 1 to generate four *.c file that contains a char array. This process is done by the flail.py script in src/scripts directory. The output files are clib32.c, clib32_fp.c, clib64.c and clib64_fp.c.

Now we’ve got a bunch of .goto binary files and the corresponding .c files in which we encode the symbol table and the equivalent GOTO functions of ESBMC’s internal libraries (i.e. the OMs). So how are they used? … OK here comes the linking phase.

  • [Phease 3] These binary files and source files generated by c2goto and flail.py are linked with cprover_library.cpp and read by a function called add_cprover_library.

Based on the machine word length configuration and fixed- or float-bv encoding, add_cprover_library knows which clib* array and binary to read and decode them to get the internal C APIs for verification. Basically, we dump all the model’s goto functions and load only the functions that are used by the program (and their dependencies).

You may wondering why we go through these phases. Could there be a simpler way? — unfortunately, no.

The C/C++ language families currently do not have a way to include the binary content of separate files into a translation unit, see also the rationale provided in n2499 submitted for inclusion to C23. Commonly used methods are either platform-specific (e.g. by invoking ld or objcopy) or encode the binary content as an array of bytes (or chars) in a new translation unit. The latter approach is taken by ESBMC’s build system via scripts/flail.py.

Adding C operational models:

The reason for bundling headers in esbmc is the same as for c2goto: consistency of types defined in these models. On the other hand the models’ source files are bundled, too, in order for esbmc to be able to reason about programs using library functions such as strlen() or pthread_create(). For this purpose, final() links the operational models to the translations of the given sources: either a pre-compiled version if the translation parameters match (e.g., --32 --floatbv), or, as a fallback if no matching pre-compiled version has been bundled, the bundled sources of the operational models are extracted to a temporary directory and translated using the Clang frontend.

Updated libc and libm build process:

PR 602 updated the internal libraries build process. The new build process is different from the snapshot version in the previous section. We now bundle the libraries in a way more like the standard C library releases( e.g. glibc ): Libraries in src/c2goto/library/ are bundled in libc and those in src/c2goto/library/libm are bundled in libm - this is more like the standard C libraries as pointed by “LIBRARY_PATH”, you may want to check libc.a and libm.a in /usr/lib/x86_64-linux-gnu/ or something similar.

c_link:

Since linking was aforementioned, it’s probably a good time to explain this terminology in ESBMC world.

In ESBMC linking actually means merging two contexts (i.e. two symbol tables), and is done by a function called c_link. Your program under verification probably uses some standard C APIs and there might exist an ESBMC’s internal implementation in one of the OMs. c_link merges them by pulling in the corresponding function body (i.e. ESBMC’s internal implementation). c_link used to be called merge_context (see this commit back in 2015).

c_link merges two symbol tables at a time. For example, let’s say you have a module that contains multiple {file1,file2,file3,}.c files to verify and the corresponding translations units (TUs) are {TU1, TU2, TU3} as well as a TU from the ESBMC’s OMs. Then c_link combines them in the following way: Screenshot 2023-05-21 at 22 29 23

C++ OMs

ESBMC’s C++ OMs are used in a different way from C’s. It’s not part of the build process and there are no binaries or cryptic char arrays generated. For the time being it’s included in the source files under verification directly, and we just use the in-memory AST to generate the symbol table and the corresponding goto functions.