← Back to Rust series
🦀
Ownership
Ownership · Prerequisite: lesson 09

10. String and &str in Depth — UTF-8, indexing

Rust strings are UTF-8 encoded. `String` is a heap-owned mutable buffer, `&str` is a reference view into part of it. This lesson explains why integer indexing is intentionally disallowed and how `.chars()` / `.bytes()` / `.char_indices()` give you explicit control over Unicode iteration.

RustString&strUTF-8Unicodechars
Duration
~1.5 hours
Level
📊 Intermediate
Prerequisite
🎯 Lesson 09
OUTCOME
Rust strings are UTF-8 encoded. `String` is a heap-owned mutable buffer, `&str` is a reference view into part of it. This lesson explains why integer indexing is intentionally disallowed and how `.chars()` / `.bytes()` / `.char_indices()` give you explicit control over Unicode iteration.

What you'll learn

  • 1Describe the memory layouts of String and &str
  • 2Build Strings and extend them with `push_str` / `push` / `+`
  • 3Explain why integer indexing on strings is blocked
  • 4Pick between `.chars()` / `.bytes()` / `.char_indices()`
  • 5Compose strings with the `format!` macro

Overview

If you're used to `s[0]` for the first character, Rust will feel awkward at first — that doesn't compile. The reason is honest: in UTF-8 a character is 1 to 4 bytes long, so an integer index is fundamentally ambiguous (bytes? characters?). Rust refuses to guess and gives you `.chars()` instead.

Core Concepts

1) String's memory layout

Internally a `Vec<u8>` — heap-owned, growable bytes. The bytes are UTF-8 encoded; methods enforce safe access.

2) &str's memory layout

(data pointer, byte length) — fat pointer. Can point to a String, a `'static` literal, or someone else's memory.

3) UTF-8 variable width

CharacterBytesExample
ASCII1'a' = 0x61
Korean / CJK3'한' = 0xED 0x95 0x9C
Some emoji4'😀' = 0xF0 0x9F 0x98 0x80

`s[0]` would be "first byte" or "first char" — ambiguous → compile error.

4) Iteration choices

  • **.chars()** — per-character (char)
  • **.bytes()** — per-byte (u8)
  • **.char_indices()** — (byte index, char) pairs

Hands-on Examples

Building and concatenating strings:

rust
fn main() {
    let mut s = String::new();
    s.push_str("hello");
    s.push(' ');
    s.push_str("world");
    println!("{}", s); // hello world

    let a = String::from("Hello, ");
    let b = String::from("world!");
    let c = a + &b;    // a is moved, b is borrowed
    println!("{}", c);
}

Per-character processing:

rust
fn main() {
    let s = "안녕 hi😀";
    println!("byte length: {}", s.len());            // 13
    println!("char count: {}", s.chars().count());   // 6
    for c in s.chars() { print!("[{}]", c); }
    println!();
}

`format!` — the idiomatic string builder:

rust
fn main() {
    let name = "Rust";
    let n = 22;
    let msg = format!("{} track {} lessons", name, n);
    println!("{}", msg);
}

Common Mistakes

Q. Why can't I grab the first character with s[0]?

A. You can't. Use `.chars().next().unwrap()` or `.chars().nth(0).unwrap()`. For pure ASCII you can index bytes with `.as_bytes()[0]`.

Q. Adding two Strings with + made the first one disappear

A. `+` takes ownership of the left side (move). To keep both alive, use `format!("{}{}", a, b)` or `.clone()`.

Q. The length of a Korean string looks wrong

A. `.len()` is **byte length**. For character count use `.chars().count()`. Mixing these up is a common source of panics in Korean text processing.

Recap

  • String = heap-owned Vec<u8>, &str = fat-pointer view
  • UTF-8 variable width means integer indexing is blocked on purpose
  • Use .chars() / .bytes() / .char_indices() for explicit iteration
  • `format!` is the idiomatic way to build a new String

Try It Yourself

  1. Read a Korean sentence and print byte length vs. character count
  2. Write `fn reverse(s: &str) -> String` using `.chars().rev()`
  3. Split a string on spaces and uppercase the first letter of each word
Example code / lecture materials

All lecture materials and example code are openly available on GitHub.

View on GitHub ↗