Now it is time to talk about references and borrowing. To understand this topic, first check out this post where I talk about ownership and move semantics. As we have seen in the named article, the way Rust manages memory allocations is rather unique. This is also true when we talk about referencing some place in the memory, something that can be achieved in C with pointers.
GDB
In this post I am going to explore what is happening in memory using the GNU Debugger (gdb) with the special command rust-gdb
:
$ rust-gdb ./target/debug/references_and_borrowing
What is a reference in Rust?
A reference is a value that points to data in memory. Although it is similar to a classic pointer there is a crucial difference between the two: a reference is guaranteed to always point to a memory address that contains a valid piece of data whereas pointers are not¹. The checks performed to guarantee that a reference is always valid is done at compile time.
Consider the following code:
|
|
This code compiles and runs correctly. What happens in memory? Let’s check it out with GDB!
Breakpoint 1, references_and_borrowing::main () at src/main.rs:3
3 let s1 = String::from("hello world!");
(gdb) n
5 let s2 = &s1;
(gdb) n
8 println!("{}", s2);
At this point, the String
s1
is initialized with the text "hello world!
and the s1
’s reference named s2
is set. Let’s check the stack:
|
|
The lines 4 to 6 is the representation of s1 in the stack: 0x7fffffffd938
is ptr
, 0x7fffffffd940
is len
and 0x7fffffffd948
is capacity
. The reference to s1
is located at 0x7fffffffd950
. Let’s print the address value in hexadecimal:
(gdb) x/xg 0x7fffffffd950
0x7fffffffd950: 0x00007fffffffd938
As we can see, the value contained in the address 0x7fffffffd950
is 0x00007fffffffd938
², the beginning of the s1
’s stack representation!
¹ An invalid memory region refers to a region that was not assigned to our process or memory that was valid at some point of the program execution but then was freed.
² Zeroes are trimmed for legibility when printed as a memory address by GDB.
The two rules of references
As everything in Rust, references have their own set of rules.
- References are always valid.
- At any given time we either have any number of immutable references or one mutable reference.
References are always valid
There’s no way of testing this rule at runtime (or at least I don’t know one). As I stated earlier in this post, references are guaranteed to always be valid and this validation is done at compile time.
At any given time we either have any number of immutable references or one mutable reference
At first glance, this rule feels like an unnecessary limitation but thanks to it we are able to catch hidden bugs in our code because data races are avoided at compile time.
A classic example is the one where we have n
mutable references of the same piece of numeric data that represents a counter, all in different threads. The only thing the threads do is increment the counter. References by themselves do not have a synchronization mechanism. This is the concurrent counter problem, here’s the whole explanation and an example code in Java. This can’t happen in Rust (code won’t compile) since we need some kind of synchronization mechanism to mutate the same piece of data in different threads.
This is not the only problem this rule keeps us away from! In fact, we don’t even need concurrency, it can avoid bugs in simpler situations. Consider the following code in python:
|
|
What we are trying to do here is to insert a 0
at the index of a value, if the value is an even number. The expected result for the input [1, 2, 3, 4, 5, 6]
is [1, 0, 2, 3, 0, 4, 5, 0, 6]
but if we run it, we get [1, 0, 0, 0, 0, 0, 0, 2, 3, 4, 5, 6]
. What is happening? The source of the problem resides in the fact that we are mutating the vector while iterating it:
- We start at index
0
where the value1
is located, since1
is not even we continue to index1
. - At index
1
we find value2
. It is even so we insert a0
at index1
. Now the array is:[1, 0, 2, 3, 4, 5, 6]
. We continue to index2
. - At index
2
we find the value2
again, because it was moved from its original position in the previous iteration. It is even so we insert a0
at index2
. Now the array is:[1, 0, 0, 2, 3, 4, 5, 6]
. - This process is repeated 4 more times since the array length is 6 and how many iterations are going to be executed is calculated at the beginning of the
for
statement.
This is known as the iterator invalidation problem.
What happens in Rust?
|
|
We get a compilation error that enforces the rule!
error[E0502]: cannot borrow `*vec` as mutable because it is also borrowed as immutable
--> src/main.rs:4:13
|
2 | for (i, n) in vec.iter().enumerate() {
| ----------------------
| |
| immutable borrow occurs here
| immutable borrow later used here
3 | if n % 2 == 0 {
4 | vec.insert(i, 0);
| ^^^^^^^^^^^^^^^^ mutable borrow occurs here
For more information about this error, try `rustc --explain E0502`.
Where are the mutable and immutable references? In the vector’s function signatures:
.iter()
takes an immutable reference of the vector (&self
):
pub fn iter(&self) -> Iter<'_, T>
.insert()
takes a mutable reference of the vector (&mut self
):
pub fn insert(&mut self, index: usize, element: T)
Does this mean that there’s no way of modifying a vector in Rust while iterating it? No! You can do it:
|
|
We also have two references, one immutable (len
function) and one mutable (insert
function). Why does it work? Because the scope of the immutable reference that is in len
ends right after it the function is used (the scope of a reference begins at its creation and extends until the last time the reference is used).
Notice that the error message we got with the for
loop says “immutable borrow occurs here
” and “immutable borrow later used here
”. Both errors come from the same place, the iter()
function, where the immutable reference is used.
Does it make sense for a programming language to have these kinds of rules if it is possible to write code to circumvent them? Yes! The way the last code is written is rather “unnatural”. Most of the time Rust will catch bugs at compile time thanks to these rules.
Borrowing
There are times when you don’t want a specific scope to lose ownership of a value. There could be several reasons for that, for example, you need to reuse the value. Consider the following code:
|
|
This code won’t compile:
error[E0382]: use of moved value: `s`
--> src/main.rs:12:9
|
10 | let s = String::from("fellow blog reader");
| - move occurs because `s` has type `String`, which does not implement the `Copy` trait
11 | hello(s);
| - value moved here
12 | bye(s);
| ^ value used here after move
|
note: consider changing this parameter type in function `hello` to borrow instead if owning the value isn't necessary
--> src/main.rs:1:13
|
1 | fn hello(s: String) {
| ----- ^^^^^^ this parameter takes ownership of the value
| |
| in this function
help: consider cloning the value if the performance cost is acceptable
|
11 | hello(s.clone());
| ++++++++
Here we have a similar situation as we had here. As the compiler error says, we are moving s
into hello
, so when we try to use it in bye
we get the “use after move” error. How can we solve this?
Solution 1: Duplicating the value
We can do as the compiler says, and clone the value. This way, both functions get a separate copy of the value that they can own:
|
|
This works! the code compiles and executes without a warning. Is this a good solution? No.
We don’t really need to duplicate s
since we are only reading it to print it out. This solution does a lot of extra work by duplicating s
’s value in memory.
Solution 2: Returning the ownership back to the caller
Instead of duplicating s
’s value, we can return the ownership to the caller, so it can use it again:
|
|
This also works! the code compiles and executes without a warning. Is this a good solution? Also no.
Passing ownership back and forth functions is not a very comfortable and idiomatic way of doing things. On top of that, the function signatures are not semantically accurate. The signature of hello
suggests that we pass an String
value and returns back another String
value. By just looking at it, it is hard to understand what the function intends to do and it does not make sense to return anything if the only objective of the function is only to print something.
Solution 3: Borrowing
We need to keep the ownership of s
in the scope of the main
function, we don’t want to duplicate values and we don’t want to move the values back and forth either. What can we do? use a reference!
|
|
The code compiles and executes without a warning. Is this a good solution? Yes.
Given that we only need to read the value, we don’t want to move it or duplicate it, using a reference is the best solution. Also, it is more idiomatic and semantically correct. By looking at the function’s signatures we know that they do not need to own any value and they will not return any result from the operation they are performing.
Let’s now check what is happening in the memory with GDB:
|
|
Looks like our String
representation in the stack starts at 0x7fffffffd980
. Let’s confirm it.
(gdb) x/xg 0x7fffffffd980
0x7fffffffd980: 0x00005555555afba0
(gdb) x/18c 0x00005555555afba0
0x5555555afba0: 102 'f' 101 'e' 108 'l' 108 'l' 111 'o' 119 'w' 32 ' ' 98 'b'
0x5555555afba8: 108 'l' 111 'o' 103 'g' 32 ' ' 114 'r' 101 'e' 97 'a' 100 'd'
0x5555555afbb0: 101 'e' 114 'r'
Excellent, now let’s continue with the program execution and check what’s in hello
function’s stack:
|
|
At 0x7fffffffd920
found a pointer pointing to s
in main
’s stack (0x7fffffffd920: 0x00007fffffffd980
)! We can confirm that the whole representation still belongs to main
’s scope and, in hello
and bye
functions, we are just referencing it. s
memory will be freed once main
finishes.
There’s no need to change the scope to borrow a value: the code used in the previous section, is just a slight modification of an example used in a previous post that did not compile. We fixed it by borrowing s1
’s value to s2
.
Conclusion
Sometimes we have a hard time fighting the Rust compiler because it usually fails with errors that do not exist in other programming languages. Those errors feel arbitrary but, as we have seen in this post, they are there to protect us. It can take some time to wrap your head around them.
The more you code in Rust, the less you fight with the compiler and you end up with more performant and more secure programs. Also, a lot of errors are caught at compile time, saving us a lot of precious debugging time.
This post concludes a series of post about how Rust handles memory the internals of it: