Operational Models
In ESBMC we replaced some libc APIs with our own implementation to facilitate the verification. This page aims to ramp up new team members to have a good understanding of the big picture of how C operational models (OMs) are built and linked, and how C++ OMs are used in a different way from the C’s. More importantly, this page aims to explain the concept in layman’s terms. Note that OMs and internal libraries are used interchangeably in the sections below.
In order to better explain the basic ideas for C OMs build and linking, let’s start with the snapshot commit 11d4688af7(referred to as the snapshot version) to get the gist - this will (hopefully) help the reader better understand the major extensions in PR 602 in the current master branch tip as of this writing.
Build and link process:
As of the snapshot version, the c2goto directory contains 2x CMakeLists which generates:
- 2x static libraries:
libclibs.aandlibc2gotoheaders.a - 1x executable:
c2goto
libc2gotoheaders.a is of no interest to us, as it is only used to build the old C frontend. There are two groups of OMs we’d like to use in our verification flow:
- internal C libraries (consider them as ESBMC’s implementation of
libcAPIs) - internal C math libraries (consider them as ESBMC’s implementation of
libmAPIs)
How does ESBMC use these interal libraries to replace the standard C library APIs? We got to find a way to somehow link them in our verification flow …
This is done three phases:
- [Phase 1] First the
c2gotoexecutable is built WITHOUT internal libraries (seeNO_CPROVER_LIBRARYin the above figure). It is a compiler shares the Clang frontend withesbmcconverting C OMs to *.goto binaries. Then these binaries will be flailed to char arrays. Usingfilecommand to check their type, e.g.file clib32.gotoreturns ‘data’ type.
You may want use hexdump clib32.goto -C to take a look at the their header “GBF”. Looks familiar? - it’s from the global function write_goto_binary
called as part of the c2goto’s conversion flow. So the c2goto first converts ESBMC’s internal libraries to goto functions and the associated context (aka. symbol table). Then write them to the corresponding *.goto binary file. Four biinaries files are created using difference machine word length 32/64 in float-bv and fixed-bv respectively. They are clib32.goto, clib32_fp.goto, clib64.goto and clib64_fp.goto.
- [Phease 2] CMake flails the *.goto binaries of Phase 1 to generate four *.c file that contains a char array. This process is done by the
flail.pyscript insrc/scriptsdirectory. The output files areclib32.c,clib32_fp.c,clib64.candclib64_fp.c.
Now we’ve got a bunch of .goto binary files and the corresponding .c files in which we encode the symbol table and the equivalent GOTO functions of ESBMC’s internal libraries (i.e. the OMs). So how are they used? … OK here comes the linking phase.
- [Phease 3] These binary files and source files generated by
c2gotoandflail.pyare linked withcprover_library.cppand read by a function calledadd_cprover_library.
Based on the machine word length configuration and fixed- or float-bv encoding, add_cprover_library knows which clib* array and binary to read and decode them to get the internal C APIs for verification. Basically, we dump all the model’s goto functions and load only the
functions that are used by the program (and their dependencies).
You may wondering why we go through these phases. Could there be a simpler way? — unfortunately, no.
The C/C++ language families currently do not have a way to include the binary content of separate files into a translation unit,
see also the rationale provided in n2499 submitted for inclusion to C23.
Commonly used methods are either platform-specific (e.g. by invoking ld or objcopy) or encode the binary content as an array of bytes (or chars) in a new translation unit. The latter approach is taken by ESBMC’s build system via scripts/flail.py.
Adding C operational models:
The reason for bundling headers in esbmc is the same as for c2goto: consistency of types defined in these models.
On the other hand the models’ source files are bundled, too,
in order for esbmc to be able to reason about programs using library functions such as strlen() or pthread_create().
For this purpose, final() links the operational models to the translations of the given sources:
either a pre-compiled version if the translation parameters match (e.g., --32 --floatbv),
or, as a fallback if no matching pre-compiled version has been bundled,
the bundled sources of the operational models are extracted to a temporary directory and translated using the Clang frontend.
Updated libc and libm build process:
PR 602 updated the internal libraries build process. The new build process is different from the snapshot version in the previous section. We now bundle the libraries in a way more like the standard C library releases( e.g. glibc ): Libraries in src/c2goto/library/ are bundled in libc and those in src/c2goto/library/libm are bundled in libm - this is more like the standard C libraries as pointed by “LIBRARY_PATH”, you may want to check libc.a and libm.a in /usr/lib/x86_64-linux-gnu/ or something similar.
c_link:
Since linking was aforementioned, it’s probably a good time to explain this terminology in ESBMC world.
In ESBMC linking actually means merging two contexts (i.e. two symbol tables), and is done by a function called c_link. Your program under verification probably uses some standard C APIs and there might exist an ESBMC’s internal implementation in one of the OMs. c_link merges them by pulling in the corresponding function body (i.e. ESBMC’s internal implementation). c_link used to be called merge_context (see this commit back in 2015).
c_link merges two symbol tables at a time. For example, let’s say you have a module that contains multiple {file1,file2,file3,}.c files to verify and the corresponding translations units (TUs) are {TU1, TU2, TU3} as well as a TU from the ESBMC’s OMs. Then c_link combines them in the following way:
C++ OMs
ESBMC’s C++ OMs are used in a different way from C’s. It’s not part of the build process and there are no binaries or cryptic char arrays generated. For the time being it’s included in the source files under verification directly, and we just use the in-memory AST to generate the symbol table and the corresponding goto functions.