Implicit function prototypes are evil.

Recently I stumbled upon a major issue when using the cffi library to pass data generated inside C library to Python. The following code was supposed to access pointers and lengths of arrays from the C library and make the data available for further processing in Python:

count = lib.overflow_buffer_count();                                        
if count > 0:                                                               
    buf_size = lib.overflow_buffer_size(0);                                 
    for i in range(0, count - 1):                                           
        data.append(ffi.unpack(lib.overflow_buffer_access(i), buf_size))    
    # last buffer might not be full                                         
    buf_size = lib.overflow_buffer_size(count - 1)                          
    data.append(ffi.unpack(lib.overflow_buffer_access(count - 1), buf_size))

This code would work until it wouldn’t. On rare occasions, when the buffer length is sufficiently large to create only a single allocation, the code would cause a segmentation fault. Obviously, I tried to find the bug in the code handling allocation of multiple buffers. Yet, it wasn’t the reason behind incorrect memory accesses. Finally, I was able to confirm that pointer values used in C and Python code were different!

# C output
PYPAPI: Overflow buffer acccess 0x7f0c5af5e010 at 0
# Python output
<cdata 'long long *' 0x5af5e010>

# C output
PYPAPI: Overflow buffer acccess 0x7f25d19a9010 at 0
# Python output
<cdata 'long long *' 0xffffffffd19a9010>

The 64-bit pointer has been truncated to a 32-bit value! The bug would manifest itself only when pointer returned from malloc was, in fact, a 64-bit value. The question remained: why did it happen?

An inspection of the build process of C library allocating buffers revelead a couple of warnings that are usually ignored as harmless.

pypapi/_papi.c:1939:10: warning: implicit declaration of function ‘overflow_buffer_access’; did you mean ‘_cffi_d_overflow_buffer_access’? [-Wimplicit-function-declaration]

Clearly, the C bindings generated by cffi are missing the declaration of my C function overflow_buffer_access. The problem lied deep in the build code:

_ROOT = os.path.abspath(os.path.dirname(__file__))                              
_PAPI_H = os.path.join(_ROOT, "papi.h")                                         
_PAPI_CALLBACKS = os.path.join(_ROOT, "papi_callbacks.h")                       
                                                                                
                                                                                
ffibuilder = FFI()                                                              
ffibuilder.set_source(                                                          
        "pypapi._papi",                                                         
        '#include "papi.h"',                       
        sources=['pypapi/papi_callbacks.c'],                                    
        extra_objects=[os.path.join(_ROOT, "..", "papi", "src", "libpapi.a")],  
        include_dirs=[_ROOT],                                                   
        )                                                                       
ffibuilder.cdef(open(_PAPI_H, "r").read())                                      
ffibuilder.cdef(open(_PAPI_CALLBACKS, "r").read()) 

While we provide headers through a call to cdef, they are only used by cffi to register C types for usage from Python code. The actual headers used for the compilation of binding are provided in the second argument of a call to set_source. Due to a small typo, I provided a proper include to PAPI used by my C library but forgot the #include "papi_callbacks.h". Adding the include resolved both compiler warnings and the problem with pointer truncation.

But why did it cause the issue in the first place? The actual reason is the permissive C allowing to use functions without providing its declaration, in contrast to C++ rules that are much more strict there. Since the function declaration might not available when processing the calling code, the compiler generates a prototype applying a set of standard rules. The return type is not a part of function signature and is by default assumed to be an integer. As we can find in the C89 standard (link to one of the drafts available online, the official standard release is still quite expensive even though the language is 30 years old).

If the expression that precedes the parenthesized argument list in a function call consists solely of an identifier, and if no declaration is visible for this identifier, the identifier is implicitly declared exactly as if, in the innermost block containing the function call, the declaration

         extern int  identifier();

appeared.

With an implicit cast to integer, the mystery of pointer truncation is resolved. Of course, I wasn’t the only one to cause such problems. After all, it’s an easy mistake to make, and fortunately, the implicit function declaration has been removed from C99 standard, although the compiler is still allowed to create an implicit declaration for compatibility reasons, as we find in the rationale for C99 standard.

A new feature of C99: The rule for implicit declaration of functions has been removed in C99. The effect is to guarantee the production of a diagnostic that will catch an additional category of programming errors. After issuing the diagnostic, an implementation may choose to assume an implicit declaration and continue translation in order to support existing programs that exploited this feature.

One more reason to enable -std=c99 switch in all C builds and of course, always pay attention to compiler warnings!




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Yet another lesson on undefined behavior
  • C++ Toolchain with Taint Analysis
  • String formatting with optional values
  • LLVM Machine Passes
  • Read the Freaking Documentation