Learning Rust: Interfacing with C

Why Rust?

I had spent the last two rainy days of my summer vacation on learning Rust. Rust is becoming ever-more popular and is even making its way into the Linux kernel – so it feels like something I should know a little about.

There have been a lot of new languages in the recent years, like Kotlin or Go. None of them are particularly attractive to me personally, as their strenghts and “selling points” just don’t apply enough to what I do – so far, that has been covered rather well between C, Python, and JavaScript. But Rust’s central design of “you can’t write unsafe code” is a really convincing argument to start letting go of C code at last.

In particular, Cockpit has a lot of C code, much of which is very security critical – messing up the webserver or its setuid root session helper can have dire consequences. Both of these look like great candidates for Rust (in some still rather distant future..).

I started working through the Rust Book (chapter 15 by now), and keep a little exercise git repo. I am seriously impressed by the thoroughness of the language design, the proactive and helpful compiler errors, and the quality of the standard library – having comprehensive, executable, and tested code examples for pretty much every function is super-helpful! Likewise, the interactive in-browser playground is an ingenious tool for testing API, sharing code snippets, and testing a little thing from a phone even.

Calling C functions from Rust

Starting new projects with pure Rust is all good, but most of my life revolves around maintaining and using large existing C code bases – it’s neither an option nor even desirable to throw these all out in one step. In order to even be a viable successor to C, a programming language must be able to interface with C in both directions. This is what I started with on today’s Red Hat Day of Learning.

The Embedded Rust book has a little chapter about how to call C functions from Rust. This gave me enough confidence that this actually works in a reasonable way, and also pointed out bindgen to automatically create a Rust interface for a C header file (it’s just an apt install or dnf install away).

The book tutorial interfaces with a local C file – but to get a real-world use case I want to call an existing shared library. As a first step I created two little example C programs as a reference – c-langinfo.c calls glibc’s nl_langinfo(), and c-mount.c calls libmount. The latter seemed a little more interesting to me, so I started rustifying that.

The libmount API is structurally relatively easy, but not too easy: It has things like output string function arguments (const char **ver_string) and custom structs (struct libmnt_fs and struct libmnt_table), which makes it a good experiment.

My first struggle was about creating a Rust interface for libmount.h:

$ bindgen /usr/include/libmount/libmount.h -- -I /usr/include/linux
/usr/include/stdio.h:36:10: fatal error: 'stdarg.h' file not found, err: true
thread 'main' panicked at 'Unable to generate bindings: ()', src/main.rs:54:36

Indeed that’s a bit weird – stdarg.h only exists in /usr/include/c++/11/tr1/, /usr/lib64/clang/12.0.1/include/, and /usr/lib/gcc/x86_64-redhat-linux/11/include/, thus is some internal compiler magic. Fortunately this can be avoided by removing the unnecessary #include <stdio.h>. But after doing that:

libmount-hack.h:368:63: error: unknown type name 'size_t'

size_t is a builtin compiler type, not actually defined in any header file as far as I can see. So I applied a typedef hack and build a temporary local hacked header file which gets fed to bindgen, and then it’s happy. Neither of these problems has a lot of Google juice, so I left this as a future exercise.

After that little detour, calling libmount from Rust was actually reasonably straightforward, after I discovered the three main workhorses: The unsafe keyword (which has to wrap every C call), the std::ptr module to manipulate raw pointers, and ffi:CStr to wrap C’s byte pointers (char* in C, *i8 in Rust parlance) into something Rust can reasonably work with.

Thus a C call like

const char *version;
mnt_get_library_version(&version);
printf("libmount version: %s\n", version);

becomes

let mut version = ptr::null();
unsafe { libmount::mnt_get_library_version(ptr::addr_of_mut!(version)) };
println!("libmount version: {:?}", unsafe { CStr::from_ptr(version) });

I left out the error handling in the above, but my actual commits have them.

The first commit has all the build system, bindgen, and Cargo plumbing and the first function call, and the second commit implements the full functionality of the reference C file, and adds tasteful error handling. Both programs now show pretty much the same thing:

fstab path C-String: "/etc/fstab"
fstab path str: /etc/fstab
libmount version: "2.36.2"
first fs type: "proc" source: "proc" target: "/proc"

The main wart is that .cargo/config file which specifies the -lmount to link against. This should rather be some magic flag inside the generated libmount.rs. One can do that as an annotation on every single function declaration, but that’s too unwieldy to do with sed. Maybe/hopefully I am missing something here, but then again this is “good enough” at least for a single library.

Calling Rust functions from C

This direction is equally important – e.g. I want to convert some complicated and error prone code to Rust, without having to port my entire program all at once.

The aforementioned book also has a small chapter about calling Rust from C. This tells you the most important things: cbindgen (available as deb/rpm as well) to generate header files from a Rust library, and how to declare Rust functions in a C compatible way. Indeed it does not take much to set up a skeleton and call a trivial function.

It becomes more interesting with non-trivial data types (strings and arrays of strings in particular), and when the Rust code calls its own standard library. It took me two hours get the right incantation of pointer type declarations, CStr/CString conversions, discovering mem::forget(), and getting over linker errors, but at last I have three increasingly complex string handling/returning functions which work. 🎉

A fairly generic helper for that is the following function, which returns a vector to C as a pointer list, and lobotomizes it from Rust’s memory cleanup:

/// Return a vector of pointers as C pointer array
///
/// The memory of `vec` gets leaked, as otherwise Rust would free it once the vector
/// goes out of scope, and C would access invalid memory.
fn return_c_vec<T>(mut vec: Vec<T>) -> *mut T {
    let p = vec.as_mut_ptr();
    std::mem::forget(vec);
    p
}

String lists (i.e. const char**) have to be treated in a similar manner, and of course each individual string first has to be converted to a const char*:

fn impl_grep<'a>(needle: &str, haystack: &'a str) -> Vec<&'a str> {
   // Rust code...
}

/// C adapter for impl_grep()
#[no_mangle]
pub extern "C" fn r_grep(needle: *const raw::c_char, haystack: *const raw::c_char) -> *mut *const raw::c_char {
    let haystack_cstr = unsafe { CStr::from_ptr(haystack).to_str() }.unwrap();
    let strvec = impl_grep(unsafe { CStr::from_ptr(needle).to_str().unwrap() }, haystack_cstr);

    // Vec[str] -> Vec[const char*]
    let p_vec: Vec<_> = strvec.into_iter()
        .map(|s| {
            let s = CString::new(s).unwrap();
            let p = s.as_ptr();
            std::mem::forget(s);
            p
        })
        .collect();

    return_c_vec(p_vec)
}

This is quite some syntactical overhead, but in a real-world library one would create a few more helper functions/macros for this to make this look less creepy.

On the C side it looks very straightforward:

    const char **matches = r_grep("ell", "Hello\nworld\ncan you tell?");
    for (const char **m = matches; *m; m++)
        printf("matched line: %s\n", *m);
    free (matches);

However, I am fairly convinced that the strings inside the returned lists get leaked. There is no CString::as_mut_ptr(), I can’t free them on the C side as they are const (only the list itself is not const and can be free()d), and there is no way to clean them up on the Rust side after releasing them. Thus returning complex data structures from Rust to C better gets avoided for long-running programs.

Conclusion

Rust offers a reasonably powerful interface to/from C, and its standard library has all the required tools: C data types, pointers, and even the “scope cleanup escape hatch” std::mem::forget(). It does take some time to figure out all this plumbing, but after that, adding new functions is rather straightforward. This works best for complicated C code which is hard to port, but has a reasonably simple API.