alittletooquiet.net

The Problem With ctypes

Not long ago, I migrated pytagsfs from the python-fuse C bindings to custom bindings implemented with ctypes, based on fuse.py. All was going well, and I believed it was the right choice at the time. If I knew then what I know now, though, I wouldn't have gone down that path.

I don't want to go into great detail explaining why I wanted to move away from the standard python-fuse bindings. Briefly, though, the things that I don't like about python-fuse:

  • The implementation suffers some ugliness that make certain kinds of bugs difficult to track down. This mostly comes down to some pieces being rather complicated and difficult to understand due to unnecessarily dynamic programming, bad variable names, and a lack of code comments.
  • It was designed with tight coupling of user-visible command-line options to the command-line that gets passed to the FUSE library.
  • The bindings expect to deal with file objects to implement stateful I/O, but I wanted to deal with integer file handles.

So I implemented my own bindings to the FUSE library based on fuse.py by Giorgos Verigakis. It worked very well, and I wasn't aware of any problems with this approach, although I would've been if I'd thought it through.

Bitten By Bad Assumptions

Then I tested my bindings on OS X. The first hint that I'd made the wrong choice was that an unrelated piece of test code that I'd also implemented with ctypes was failing. This code used ctypes to call the truncate system call. The test file was not being truncated at all.

This came as a bit of a shock. I debugged for a little while before a harsh reality (one that we Python programmer's are usually shielded from) sank in.

Here's the function signature for truncate (from the manpage on my Linux system):

int truncate(const char *path, off_t length);

This function has the same signature on Darwin. However, as it turns out, on Darwin, off_t is a 64-bit type, but on Linux, it is only a 32-bit type. I was passing a Python integer to the function via ctypes, with no casting. ctypes was putting that 32-bit value in the first half of the 64-bit slot, and filling the lower 32 bits with zeros. The end result is that the truncation was always being performed with a rather large offset.

In other words, when you use ctypes, you are relying on an unchanging ABI (application binary interface). While most systems and libraries make guarantees about APIs, most don't make any promises when it comes to ABIs. Types may vary between operating systems, OS versions, system architectures, library versions, etc.

One Wrong Way To Solve The Problem

One way that people deal with this is by trying to guess the right types to use depending on some of the factors listed above. For instance, fuse.py tries to do this, as you can see in this excerpt:

# Copyright (c) 2008 Giorgos Verigakis <verigak@gmail.com>
#
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

_system = system()

if _system == 'Darwin':
    ENOTSUP = 45
    c_dev_t = c_int32
    c_mode_t = c_uint16
    c_nlink_t = c_uint16

    class c_stat(Structure):
        _fields_ = [
                ('st_dev', c_dev_t),
                ('st_ino', c_ino_t),
                ('st_mode', c_mode_t),
                ('st_nlink', c_nlink_t),
                ('st_uid', c_uid_t),
                ('st_gid', c_gid_t),
                ('st_rdev', c_dev_t),
                ('st_atimespec', c_timespec),
                ('st_mtimespec', c_timespec),
                ('st_ctimespec', c_timespec),
                ('st_size', c_off_t),
                ('st_blocks', c_blkcnt_t),
                ('st_blksize', c_blksize_t),
        ]
elif _system == 'Linux':
    ENOTSUP = 95
    c_dev_t = c_ulonglong
    c_mode_t = c_uint
    c_nlink_t = c_ulong

    if machine() == 'x86_64':
        class c_stat(Structure):
            _fields_ = [
                    ('st_dev', c_dev_t),
                    ('st_ino', c_ino_t),
                    ('st_nlink', c_nlink_t),
                    ('st_mode', c_mode_t),
                    ('st_uid', c_uid_t),
                    ('st_gid', c_gid_t),
                    ('__pad0', c_int),
                    ('st_rdev', c_dev_t),
                    ('st_size', c_off_t),
                    ('st_blksize', c_blksize_t),
                    ('st_blocks', c_blkcnt_t),
                    ('st_atimespec', c_timespec),
                    ('st_mtimespec', c_timespec),
                    ('st_ctimespec', c_timespec),
            ]
    else:
        class c_stat(Structure):
            _fields_ = [
                    ('st_dev', c_dev_t),
                    ('__pad1', c_short),
                    ('st_ino', c_ino_t),
                    ('st_mode', c_mode_t),
                    ('st_nlink', c_nlink_t),
                    ('st_uid', c_uid_t),
                    ('st_gid', c_gid_t),
                    ('st_rdev', c_dev_t),
                    ('__pad2', c_short),
                    ('st_size', c_off_t),
                    ('st_blksize', c_blksize_t),
                    ('st_blocks', c_blkcnt_t),
                    ('st_atimespec', c_timespec),
                    ('st_mtimespec', c_timespec),
                    ('st_ctimespec', c_timespec),
            ]
else:
    raise NotImplementedError('%s is not supported.' % _system)

It should be obvious why this kind of runtime detection is a bad idea. The definitions have to be maintained, which means they are almost guaranteed to be out of date. Even in the best case, the information is really only likely to be accurate on the author's system. And there are so many potential combinations of OS, OS version, library versions, and system architectures that it is incredibly likely that some combination will be incorrectly handled.

C programmers don't have to worry about this because the C pre-processor does all of the work for them. The types are usually defined in header files using the #define preprocessor macro. ctypes doesn't have access to these definitions, as the header files may not even be present at runtime.

The Really Bad News

So your software doesn't work on some systems. Okay, that would be fine if it halted execution and complained (like fuse.py does in some cases).

However, if the application doesn't recognize that the system is not a supported on, it will probably continue executing, but with the wrong types. If type definitions are wrong, the problems will manifest themselves in subtle ways that will likely lead to data corruption. The application will probably not have any way of even detecting the problem, except via explicit test cases and runtime assertions.

Well, Maybe It's Not Always A Bad Idea

Some systems and libraries actually do make guarrantees about ABI stability. For instance, I assume that Windows makes such guarantees, since applications are usually distributed in binary form only. ctypes is probably remarkably useful and reliable on Windows platforms.

But those of us writing Python code for other systems need to be aware of the limitations of ctypes. If you are just writing code that will see limited use in well-understood environments, ctypes is probably a great option. However, if you are distributing your software for use by others, it could end up being used in a wide variety of situations, many of which you may not have planned for. You best make sure that it either works correctly or doesn't work at all. That may mean writing your bindings in C or Pyrex.