N
This short ~30-line Rust program [1] does not contain any
unsafe
keywords. It slices an array of length 3 and tries to access the 16777216'th element of the slice. Obviously, the resulting slice has length 2**64 - 1
, so the program tries to write to an invalid memory location, segfaults and dies.Oh wait.
This is not supposed to happen. In fact, the whole selling point of Rust is that it can't happen, provided that your code doesn't use
unsafe
, the unsafe code from the standard library is free from UB (see [2] for an attempt to formalize this), and there are no bugs in the Rust compiler. Any illegal memory access you make should terminate your program quickly and ruthlessly before you actually read or write data, unless you want your system to be completely compromised. If this sounds overly dramatic to you, read this OpenBSD [3] advisory. The OpenBSD developers take security extremely seriously (I mean this in a literal way, not in the "oops, your data has just been leaked" way), and yet they just found a remote code execution hole in their software, all due to a single out-of-bounds read.It turns out that in this case both the Rust compiler and the Rust standard library have no bugs. The culprit is LLVM -- the code generation framework Rust uses internally. Fortunately, this particular issue is incredibly hard to trigger accidentally (see [4] for the whole investigation with all the gory details), but still, I had a couple of teachable moments when I was trying to debug this.
First, contrary to what I believed, Rust-related LLVM bugs happen all the time. In retrospect, this should be pretty obvious, since LLVM is at least twice as large as the Rust compiler and the standard library combined. An ongoing effort to upgrade Rust to LLVM 10 [5] lists lots of examples; as far as I can see, none of those are memory safety issues, but there are a lot of scary stuff there -- infinite loops in LLVM passes, compile-time regressions, performance regressions, etc.
And second, the complexity is there for a reason. The stuff LLVM does to your code is mind-boggling; already after a dozen of passes I can barely recognize the initial program I wrote, because all abstractions are stripped away and the control flow is aggressively restructured. I can't say that I understand even 10% of what's going on in LLVM passes, but the part I really liked (incidentally, the one that caused the LLVM bug in question) is the ScalarEvolution framework [6], [7], which allows you to infer all sorts of info about the loops in your program.
[1]: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=0e6866dbb74fd88796c73b218bffcbf6
[2]: https://plv.mpi-sws.org/rustbelt/stacked-borrows/paper.pdf
[3]: https://www.openwall.com/lists/oss-security/2020/02/24/5
[4]: https://github.com/rust-lang/rust/issues/69225
[5]: https://github.com/rust-lang/rust/pull/67759
[6]: https://www.youtube.com/watch?v=AmjliNp0_00
[7]: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Analysis/ScalarEvolution.cpp