Undefined Behavior: Rust comes to the rescue December 26, 2022 | 13 min Read

Undefined Behavior: Rust comes to the rescue

Have you ever encountered a strange error or bug in your code that you just couldn’t seem to fix? It’s frustrating, isn’t it? It’s even more frustrating when the root cause of the issue is something that you didn’t even know was a potential problem in the first place. This is where the concept of undefined behavior comes into play.

In computer programming, undefined behavior refers to the outcome of an operation or expression that is not specified by the language’s specification. In other words, the language doesn’t specify what should happen when certain conditions are met, leaving it up to the compiler or runtime to decide how to handle it. This can lead to all sorts of weird and unexpected issues in your code.

Undefined behavior can have a huge negative impact on a business, as it can cause unexpected errors and bugs that can be difficult to track down and fix. These issues can lead to lost time, lost money, and even lost customers if they result in a faulty product. It’s important for programmers to be aware of and avoid undefined behavior in order to produce stable, reliable code.

Enter Rust. Rust is a programming language that was designed with the goal of eliminating undefined behavior from the get-go. In Rust, the compiler and runtime are designed to catch and prevent any undefined behavior from occurring, ensuring that your code runs always reliable and stable. In this blog post, I’ll take a closer look at undefined behavior in three popular programming languages (C, C++, and Java) and how Rust can help prevent it.

Undefined behavior in C

C is a popular programming language that has been around for decades. It’s known for being fast and efficient, but it also has a reputation for being a bit unforgiving when it comes to undefined behavior. Let’s take a look at a couple of examples of undefined behavior in C and how Rust can help prevent them.

Signed integer overflow

One common type of undefined behavior in C is signed integer overflow. This occurs when the result of an arithmetic operation exceeds the maximum value that can be stored in a signed integer data type. In C, the behavior of signed integer overflow is undefined, meaning the compiler is free to do whatever it wants with the result. This can lead to all sorts of strange and unexpected issues in your code.

Here’s an example in C:

#include <stdio.h>

int main() {
  int x = 2147483647;  // maximum value for a signed int
  x = x + 1;           // undefined behavior
  printf("%d", x);
  return 0;
}

In this example, we start with the maximum value for a signed int and try to add 1 to it. This should result in signed integer overflow, but the behavior of the operation is undefined, so it’s anyone’s guess what will happen. The compiler might just wrap the value around to the minimum value for a signed int, or it might do something completely different. This can be very confusing and frustrating for programmers trying to debug their code.

Now let’s see how Rust handles this situation:

fn main() {
  let x = 2147483647;            // maximum value for a signed int
  let y = x.overflowing_add(1);  // tuple of (result, overflow)
  println!("{:?}", y);           // (2147483648, true)
}

In Rust, we have a method called overflowing_add() that we can use to add two integers and check if the result overflowed. This method returns a tuple containing the result of the operation and a boolean value indicating whether the operation resulted in overflow. This way, we can handle the overflow in a controlled and predictable manner rather than relying on undefined behavior.

Rust in develop mode will prevent integer overflows and panic in those cases. This behavior can be overwritten with a Cargo feature flag. In release mode it removes those overflow checks and defaults to wrapping operations. Read more about it in the Rust Book.

Out of bound access

Another common type of undefined behavior in C is out of bound access, which occurs when we try to access an element of an array that is outside the bounds of the array. In C, this behavior is also undefined, meaning the compiler is free to do whatever it wants with the result. This can lead to all sorts of strange and unexpected issues in your code.

Here’s an example in C:

#include <stdio.h>

int main() {
  int arr[5] = {1, 2, 3, 4, 5};
  int x = arr[5];   // undefined behavior
  printf("%d", x);
  return 0;
}

In this example, we try to access the element at index 5 of an array that only has 5 elements. This should result in out of bound access, but the behavior of the operation is undefined.

Now let’s see how Rust handles this situation:

fn main() {
  let arr = [1, 2, 3, 4, 5];
  let x = match arr.get(5) {  // returns None if out of bounds
    Some(x) => x,
    None => {
      println!("Out of bounds!");   // out of bounds error printed
      return;
    }
  };
  println!("{}", x);
}

In Rust, we have a method called get() that we can use to safely access elements of an array. This method returns an Option type that contains either the requested element or None if the index is out of bounds. This way, we can handle out of bound access in a controlled and predictable manner rather than relying on undefined behavior.

As we can see, Rust provides a number of ways to prevent undefined behavior in C, making it a safer and more reliable choice for programming.

Undefined behavior in C++

C++ is another popular programming language that is widely used in the industry. Like C, it’s known for being fast and efficient, but it also has a reputation for being prone to undefined behavior. Let’s take a look at an of example of undefined behavior in C++ and how Rust can help prevent it.

The use of an object after it has been destroyed

A common type of undefined behavior in C++ is the use of an object after it has been destroyed. This can happen when an object is deleted or goes out of scope, but it is still being used or accessed in some way. In C++, the behavior of this operation is undefined, meaning the compiler is free to do whatever it wants with the result. This can lead to all sorts of strange and unexpected issues in your code.

Here’s an example in C++:

#include <iostream>

class MyClass {
 public:
  MyClass() {}
  ~MyClass() {}
  some_method() { std::cout << "something" << std::endl; }
};

int main() {
  MyClass* obj = new MyClass();
  delete obj;
  obj->some_method();  // undefined behavior
  return 0;
}

In this example, we create a new instance of MyClass using the new keyword and then delete it using the delete keyword. However, we then try to call a method on the object after it has been destroyed, which will result in undefined behavior.

Now let’s see how Rust handles this situation:

struct MyStruct {
  data: i32,
}

fn main() {
  let obj = MyStruct { data: 42 };
  let p = &obj;
  drop(obj); // compile-time error: Can't borrow obj mutably, since it's still used.
  println!("{}", p.data);
}

In Rust, we don’t have to worry about the use of an object after it has been destroyed because the Rust compiler will catch this issue at compile time and prevent it from happening. In this example, we create a new instance of MyStruct and then take a reference to it. We then drop the object, which should destroy it and make the reference invalid. However, when we try to access the data field of the reference, the Rust compiler gives us a compile-time error telling us that we are trying to use an object after it has been destroyed. To be more correct, the compiler tells us that we can’t drop the obj to begin with, since it’s used later on. This ensures that our code is always safe and reliable, without the risk of undefined behavior.

As we can see, Rust provides a number of ways to prevent undefined behavior that occurs in C++.

Undefined behavior in Java

Java is a popular programming language known for its simplicity, portability, and object-oriented nature. It is used in a wide range of applications, from web development to Android app development.

One aspect of Java that can be challenging for programmers is its handling of concurrent access to shared data by multiple threads. In Java, unsynchronized read and write access to shared data is considered undefined behavior. This means that the Java compiler cannot guarantee the correctness of the program when multiple threads are accessing shared data concurrently.

This can be a problem for programmers, as it can be difficult to determine if a class in Java is thread-safe or not. Even if a class seems thread-safe, it may still exhibit unexpected behavior when used in a multithreaded environment. This can lead to difficult-to-debug issues and potential security vulnerabilities.

Java Example

To illustrate the issue of undefined behavior in Java when using multiple threads, consider the following example:

public class Counter {
    private int count = 0;

    public void increment() {
        count++;
    }

    public int getCount() {
        return count;
    }
}

public class Main {
    public static void main(String[] args) {
        Counter counter = new Counter();
        Runnable r = () -> {
            for (int i = 0; i < 1000; i++) {
                counter.increment();
            }
        };

        Thread t1 = new Thread(r);
        Thread t2 = new Thread(r);

        t1.start();
        t2.start();

        try {
            t1.join();
            t2.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        System.out.println(counter.getCount());  // expected output: 2000
    }
}

In this example, we have a class Counter that has a simple increment method and a getter method for the count variable. In the main method, we create two threads that each run a loop that increments the count 1000 times. We then start the threads and wait for them to finish before printing the final count.

Based on the code, we would expect the final count to be 2000, since each thread increments the count 1000 times. However, because the increment operation is not synchronized, it is possible for the threads to interleave their increments in such a way that the final count is less than 2000. This is an example of undefined behavior in Java.

Despite this, the Java compiler will not reject the code as incorrect. It is up to the programmer to ensure that the code is correct and thread-safe.

Rust compile time super powers

In contrast to Java has a strong focus on compile time verified thread safety. To help ensuring this, Rust has, besides the borrow checker, two traits called Send and Sync.

The Send trait indicates that a type is safe to be sent across threads. This means that if a type implements Send, it can be passed as a parameter to a thread or stored in a Arc (atomic reference counted pointer).

The Sync trait indicates that a type is safe to be shared across threads. This means that if a type implements Sync, it can be stored in an Mutex or RwLock (read-write lock) and can be used from multiple threads concurrently. The Atomic* types are also Sync.

By using these traits, Rust can guarantee at compile time that certain types are thread-safe. This helps programmers avoid the issues of undefined behavior that can occur in Java when using multiple threads.

To demonstrate how Rust’s Send and Sync traits can be used to address this issue of undefined behavior we saw in the Java example, let’s consider the following Rust code:

use std::cell::Cell;
use std::sync::Arc;
use std::thread;

struct Counter {
    count: Cell<u32>,
}

impl Counter {
    fn new() -> Counter {
        Counter { count: Cell::new(0) }
    }

    fn increment(&self) {
        let old = self.count.get();
        self.count.set(old + 1);
    }

    fn get_count(&self) -> u32 {
        self.count.get()
    }
}

fn main() {
    let counter = Arc::new(Counter::new());
    let mut handles = vec![];

    for _ in 0..1000 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            let counter = counter;
            counter.increment();
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("{}", counter.get_count());  // expected output: 1000
}

In this Rust code, we have a Counter struct with an increment method and a getter method for the count variable. In the main function, we create a vector of 1000 threads, each of which increments the count once.

However, unlike in the Java example, the Rust code will not compile. This is because the Counter struct does not implement the Send or Sync traits (since it uses the thread unsafe Cell to store the counter value), which is required for types to be safely shared between threads.

Fixing the problem

So how can we make the Counter Send and Sync, so that it is possible for two threads to concurrently increment the count.

To address this issue, we can use Rust’s atomic types to ensure that the increment operation is atomic. An atomic type is a type that provides atomic access to its data, meaning that it is guaranteed to be updated in a single operation and can be safely accessed by multiple threads concurrently.

Here is an example of how we can modify the Counter struct to use an atomic type:

use std::thread;
use std::sync::Arc;
use std::sync::atomic::{AtomicU32, Ordering};

struct Counter {
    count: AtomicU32,
}

impl Counter {
    fn new() -> Counter {
        Counter { count: AtomicU32::new(0) }
    }

    fn increment(&self) {
        self.count.fetch_add(1, Ordering::SeqCst);
    }

    fn get_count(&self) -> u32 {
        self.count.load(Ordering::SeqCst)
    }
}

fn main() {
    let counter = Arc::new(Counter::new());
    let mut handles = vec![];

    for _ in 0..1000 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            let counter = counter;
            counter.increment();
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("{}", counter.get_count());  // expected output: 1000
}

In this example, we use the AtomicU32 type to store the count. The fetch_add() method is used to atomically increment the count, and the load method is used to atomically read the count. The Ordering parameter specifies the memory ordering of the atomic operations.

Using atomic types in this way ensures that the Counter struct is thread-safe and avoids the issue of undefined behavior that we saw in the Java example at compile time.

In summary, Rust’s Send and Sync traits and different std::sync types provide a robust and safe way to handle concurrent access to shared data in multithreaded programs. These features help programmers avoid the issues of undefined behavior, that can occur in languages like Java when using multiple threads, at compile time.

Preventing undefined behavior with Rust

In this blog post, we’ve seen how undefined behavior can be a major issue in programming languages like C, C++, and Java. Undefined behavior can lead to all sorts of strange and unexpected issues in your code, making it difficult to debug and maintain.

However, we’ve also seen how Rust can help prevent undefined behavior in these languages. Rust provides a number of features that help ensure that your code is always safe and reliable, including:

  • Automatic bounds checking for arrays and vectors
  • Controlled and predictable handling of out of bounds access
  • Safe handling of integer overflow
  • Automatic compile time detection of use after free errors
  • Compile time checked synchronized access to shared data between threads

By using these and other features, Rust can help you write code that is safe, reliable, and free from undefined behavior. This makes Rust a great choice for programming, especially when working on mission-critical or business-critical projects.

So if you’re tired of dealing with undefined behavior and want to write code that is safe, reliable, and easy to maintain, give Rust a try. You might just find that it’s the perfect programming language for your needs.

Nils Hasenbanck

Nils Hasenbanck

Nils Hasenbanck is the founder of Tsukisoft GmbH and a senior developer. His passion is building technically elegant, easy to maintain …