10. String and &str in Depth — UTF-8, indexing

Rust strings are UTF-8 encoded. `String` is a heap-owned mutable buffer, `&str` is a reference view into part of it. This lesson explains why integer indexing is intentionally disallowed and how `.chars()` / `.bytes()` / `.char_indices()` give you explicit control over Unicode iteration.

RustString&strUTF-8Unicodechars

Duration

⏱ ~1.5 hours

Level

📊 Intermediate

Prerequisite

🎯 Lesson 09

OUTCOME

What you'll learn

1Describe the memory layouts of String and &str
2Build Strings and extend them with `push_str` / `push` / `+`
3Explain why integer indexing on strings is blocked
4Pick between `.chars()` / `.bytes()` / `.char_indices()`
5Compose strings with the `format!` macro

Overview

If you're used to `s[0]` for the first character, Rust will feel awkward at first — that doesn't compile. The reason is honest: in UTF-8 a character is 1 to 4 bytes long, so an integer index is fundamentally ambiguous (bytes? characters?). Rust refuses to guess and gives you `.chars()` instead.

Core Concepts

1) String's memory layout

Internally a `Vec<u8>` — heap-owned, growable bytes. The bytes are UTF-8 encoded; methods enforce safe access.

2) &str's memory layout

(data pointer, byte length) — fat pointer. Can point to a String, a `'static` literal, or someone else's memory.

3) UTF-8 variable width

Character	Bytes	Example
ASCII	1	'a' = 0x61
Korean / CJK	3	'한' = 0xED 0x95 0x9C
Some emoji	4	'😀' = 0xF0 0x9F 0x98 0x80

`s[0]` would be "first byte" or "first char" — ambiguous → compile error.

4) Iteration choices

**.chars()** — per-character (char)
**.bytes()** — per-byte (u8)
**.char_indices()** — (byte index, char) pairs

Hands-on Examples

Building and concatenating strings:

rust

fn main() {
    let mut s = String::new();
    s.push_str("hello");
    s.push(' ');
    s.push_str("world");
    println!("{}", s); // hello world

    let a = String::from("Hello, ");
    let b = String::from("world!");
    let c = a + &b;    // a is moved, b is borrowed
    println!("{}", c);
}

Per-character processing:

rust

fn main() {
    let s = "안녕 hi😀";
    println!("byte length: {}", s.len());            // 13
    println!("char count: {}", s.chars().count());   // 6
    for c in s.chars() { print!("[{}]", c); }
    println!();
}

`format!` — the idiomatic string builder:

rust

fn main() {
    let name = "Rust";
    let n = 22;
    let msg = format!("{} track {} lessons", name, n);
    println!("{}", msg);
}

Common Mistakes

Q. Why can't I grab the first character with s[0]?

A. You can't. Use `.chars().next().unwrap()` or `.chars().nth(0).unwrap()`. For pure ASCII you can index bytes with `.as_bytes()[0]`.

Q. Adding two Strings with + made the first one disappear

A. `+` takes ownership of the left side (move). To keep both alive, use `format!("{}{}", a, b)` or `.clone()`.

Q. The length of a Korean string looks wrong

A. `.len()` is **byte length**. For character count use `.chars().count()`. Mixing these up is a common source of panics in Korean text processing.

Recap

String = heap-owned Vec<u8>, &str = fat-pointer view
UTF-8 variable width means integer indexing is blocked on purpose
Use .chars() / .bytes() / .char_indices() for explicit iteration
`format!` is the idiomatic way to build a new String

Try It Yourself

Read a Korean sentence and print byte length vs. character count
Write `fn reverse(s: &str) -> String` using `.chars().rev()`
Split a string on spaces and uppercase the first letter of each word

Example code / lecture materials

All lecture materials and example code are openly available on GitHub.

View on GitHub ↗