← Back to C# series
📂
Exceptions & I/O
Exceptions/IO · Prerequisite: file IO

17. String Processing

Strings are the data type you handle most. Collect the everyday tools — interpolation, Split, Join, Replace, Substring, Trim, StringBuilder, string.Format.

C#.NET 8string
Duration
~1-1.5 hours
Level
📊 Intermediate
Prerequisite
🎯 File I/O
OUTCOME
Strings are the data type you handle most. Collect the everyday tools — interpolation, Split, Join, Replace, Substring, Trim, StringBuilder, string.Format.

What you'll learn

  • 1Know core methods like `Trim`/`Split`/`Join`/`Replace`/`ToUpper`/`Contains`/`StartsWith`
  • 2Use interpolated strings `$"..."` and format specifiers (`F2`, `yyyy-MM-dd`, ...)
  • 3Know that `StringBuilder` is faster for repeated concatenation
  • 4Use `Regex.IsMatch`/`Match`/`Matches` basics
  • 5Understand that parsing results can change with `CultureInfo`

Overview

Cleaning up text you've read is a must in almost every program. C# `string` has many methods, `StringBuilder` handles repeated concatenation, and `Regex` does pattern matching. We'll also call out the pitfall that **culture** affects number/date parsing.

Core Concepts

1) Common `string` methods

csharp
"  hello  ".Trim();          // "hello"
"a,b,c".Split(',');          // ["a", "b", "c"]
string.Join("-", ["a","b"]); // "a-b"
"abc".Replace("b", "X");     // "aXc"
"hi".ToUpper();              // "HI"
"hello".Contains("ll");      // true
"hello".StartsWith("he");    // true

`string` is immutable (lecture 02), so every method **returns a new string**.

2) Interpolated strings `$"..."`

The cleanest way to embed variables and expressions.

csharp
double pi = 3.14159;
DateTime now = DateTime.Now;

Console.WriteLine($"Pi is about {pi:F2}");         // 2 decimal places
Console.WriteLine($"Today is {now:yyyy-MM-dd}");
Console.WriteLine($"Aligned: |{42,5}|");            // width 5, right-aligned

Format specifiers follow the colon. `F2` is 2 decimals; `yyyy-MM-dd` is ISO date.

3) `StringBuilder` — for repeated concatenation

Concatenating with `+=` in a loop allocates a new object every iteration. With many iterations it's very slow. `StringBuilder` appends into an internal buffer — fast.

csharp
using System.Text;

var sb = new StringBuilder();
for (int i = 0; i < 1000; i++)
    sb.Append(i).Append(',');

string result = sb.ToString();

For a few concatenations `+` is fine; for hundreds or thousands, `StringBuilder`.

4) `Regex` — regular expressions

Lives in `System.Text.RegularExpressions`.

csharp
using System.Text.RegularExpressions;

Regex.IsMatch("hello123", @"\d+");           // true (any digits?)

var m = Regex.Match("order #4242", @"\d+");
if (m.Success) Console.WriteLine(m.Value);    // "4242"

foreach (Match x in Regex.Matches("a1 b22 c333", @"\d+"))
    Console.WriteLine(x.Value);               // 1, 22, 333

`@"..."` is a verbatim string that preserves backslashes — pairs well with regex.

5) Culture pitfall

Parsing numbers and dates is affected by the **system culture**. For example, Germany uses `,` as the decimal separator.

csharp
using System.Globalization;

// Korea/UK etc.: "3.14" is 3.14
// Germany (de-DE): "3.14" is parsed as 314, and "3,14" is 3.14
decimal de = decimal.Parse("3,14", CultureInfo.GetCultureInfo("de-DE"));
decimal inv = decimal.Parse("3.14", CultureInfo.InvariantCulture);

For **machine-readable formats** (config, CSV, logs) always specify `CultureInfo.InvariantCulture`. Use the system culture only for UI display.

Examples

Example 1 — `StringMethods`: everyday methods

csharp
string s = "  Hello, C# World  ";

Console.WriteLine($"Source  : [{s}]");
Console.WriteLine($"Trim    : [{s.Trim()}]");
Console.WriteLine($"Upper   : [{s.Trim().ToUpper()}]");
Console.WriteLine($"Replace : [{s.Trim().Replace("C#", "DotNet")}]");

string[] parts = "apple,banana,cherry".Split(',');
Console.WriteLine($"Split   : {parts.Length} items → {string.Join(" | ", parts)}");

Console.WriteLine($"Contains 'C#'   : {s.Contains("C#")}");
Console.WriteLine($"StartsWith ' ' : {s.StartsWith(" ")}");

**Output**

text
Source  : [  Hello, C# World  ]
Trim    : [Hello, C# World]
Upper   : [HELLO, C# WORLD]
Replace : [Hello, DotNet World]
Split   : 3 items → apple | banana | cherry
Contains 'C#'   : True
StartsWith ' ' : True

**Note:** Chain method calls to compose transformations.

Example 2 — `Interpolation`: interpolation and formatting

csharp
double pi = 3.14159265;
int score = 87;
DateTime now = new DateTime(2025, 5, 18, 14, 30, 0);

Console.WriteLine($"Pi        : {pi:F2}");
Console.WriteLine($"Score     : {score,5} pts");     // width 5, right-aligned
Console.WriteLine($"Score(L) : {score,-5}|");        // negative width → left-aligned
Console.WriteLine($"Date      : {now:yyyy-MM-dd HH:mm}");
Console.WriteLine($"Percent   : {0.873:P1}");        // 1-decimal percent

**Output**

text
Pi        : 3.14
Score     :    87 pts
Score(L) : 87   |
Date      : 2025-05-18 14:30
Percent   : 87.3%

**Note:** `,N` is the width, `:F2` is the format — together: `{value,10:F2}`.

Example 3 — `StringBuilderUse`: repeated-concatenation performance

csharp
using System.Text;

var sb = new StringBuilder();
sb.Append("[");
for (int i = 1; i <= 5; i++)
{
    if (i > 1) sb.Append(", ");
    sb.Append(i);
}
sb.Append("]");

Console.WriteLine(sb.ToString());
Console.WriteLine($"Length: {sb.Length}");
// Performance note: hundreds+ iterations is dramatically faster than +=

**Output**

text
[1, 2, 3, 4, 5]
Length: 15

**Note:** A few concatenations is fine with `+`; the more iterations, the more `StringBuilder` pays off.

Example 4 — `RegexBasics`: regex basics

csharp
using System.Text.RegularExpressions;

string text = "Contact emails: alice@example.com or bob@test.io.";

// 1) Match presence
bool hasDigit = Regex.IsMatch("order #42", @"\d+");
Console.WriteLine($"Contains digits? {hasDigit}");

// 2) First match
Match first = Regex.Match("price 1500 won", @"\d+");
Console.WriteLine($"First number: {first.Value}");

// 3) Extract all emails
foreach (Match m in Regex.Matches(text, @"[\w\.-]+@[\w\.-]+\.\w+"))
{
    Console.WriteLine($"email: {m.Value}");
}

**Output**

text
Contains digits? True
First number: 1500
email: alice@example.com
email: bob@test.io

**Note:** Real-world email regex is much more complex — this is a simple learning pattern.

Example 5 — `Culture`: same string, different result

csharp
using System.Globalization;

string s = "3,14";

decimal de  = decimal.Parse(s, CultureInfo.GetCultureInfo("de-DE"));
decimal inv;
bool ok = decimal.TryParse(s, NumberStyles.Number, CultureInfo.InvariantCulture, out inv);

Console.WriteLine($"de-DE parse     : {de}");          // 3.14
Console.WriteLine($"Invariant parse : ok={ok}, value={inv}"); // takes ',' as thousand separator → 314

// Safe machine format
decimal price = 1234.56m;
Console.WriteLine($"machine format: {price.ToString(CultureInfo.InvariantCulture)}");

**Output**

text
de-DE parse     : 3.14
Invariant parse : ok=True, value=314
machine format: 1234.56

**Note:** Always read/write CSV and config files with `InvariantCulture`. Use the user's culture only for UI display.

Full example code (src/)

src/Culture/Culture.csproj

xml
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

</Project>

src/Culture/Program.cs

csharp
// Same string can be parsed differently per culture
using System.Globalization;

string s = "3,14";

decimal de = decimal.Parse(s, CultureInfo.GetCultureInfo("de-DE"));
bool ok = decimal.TryParse(
    s, NumberStyles.Number, CultureInfo.InvariantCulture, out decimal inv);

Console.WriteLine($"de-DE parse     : {de}");                  // 3.14
Console.WriteLine($"Invariant parse : ok={ok}, value={inv}");   // ',' as thousand sep → 314

// Always specify InvariantCulture for machine formats
decimal price = 1234.56m;
Console.WriteLine($"machine format: {price.ToString(CultureInfo.InvariantCulture)}");

src/Interpolation/Interpolation.csproj

xml
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

</Project>

src/Interpolation/Program.cs

csharp
// Interpolated strings and format specifiers
double pi = 3.14159265;
int score = 87;
DateTime now = new DateTime(2025, 5, 18, 14, 30, 0);

Console.WriteLine($"Pi        : {pi:F2}");          // 2 decimal places
Console.WriteLine($"Score     : {score,5} pts");     // width 5, right-aligned
Console.WriteLine($"Score(L) : {score,-5}|");        // negative → left-aligned
Console.WriteLine($"Date      : {now:yyyy-MM-dd HH:mm}");
Console.WriteLine($"Percent   : {0.873:P1}");        // 1-decimal percent

src/RegexBasics/Program.cs

csharp
// Regex's IsMatch / Match / Matches
using System.Text.RegularExpressions;

string text = "Contact emails: alice@example.com or bob@test.io.";

// 1) presence
bool hasDigit = Regex.IsMatch("order #42", @"\d+");
Console.WriteLine($"Contains digits? {hasDigit}");

// 2) first match
Match first = Regex.Match("price 1500 won", @"\d+");
Console.WriteLine($"First number: {first.Value}");

// 3) extract all emails (simple learning pattern)
foreach (Match m in Regex.Matches(text, @"[\w\.-]+@[\w\.-]+\.\w+"))
{
    Console.WriteLine($"email: {m.Value}");
}

src/RegexBasics/RegexBasics.csproj

xml
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

</Project>

src/StringBuilderUse/Program.cs

csharp
// StringBuilder: strong at repeated concatenation
using System.Text;

var sb = new StringBuilder();
sb.Append("[");
for (int i = 1; i <= 5; i++)
{
    if (i > 1) sb.Append(", ");
    sb.Append(i);
}
sb.Append("]");

Console.WriteLine(sb.ToString());
Console.WriteLine($"Length: {sb.Length}");
// Performance note: hundreds+ iterations is way faster than +=

src/StringBuilderUse/StringBuilderUse.csproj

xml
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

</Project>

src/StringMethods/Program.cs

csharp
// Common string method roundup
string s = "  Hello, C# World  ";

Console.WriteLine($"Source  : [{s}]");
Console.WriteLine($"Trim    : [{s.Trim()}]");
Console.WriteLine($"Upper   : [{s.Trim().ToUpper()}]");
Console.WriteLine($"Replace : [{s.Trim().Replace("C#", "DotNet")}]");

string[] parts = "apple,banana,cherry".Split(',');
Console.WriteLine($"Split   : {parts.Length} items → {string.Join(" | ", parts)}");

Console.WriteLine($"Contains 'C#'   : {s.Contains("C#")}");
Console.WriteLine($"StartsWith ' ' : {s.StartsWith(" ")}");

src/StringMethods/StringMethods.csproj

xml
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

</Project>

Common Mistakes

  1. Concatenating with `+=` in a loop — switch to `StringBuilder` for many iterations.
  2. Comparing user input without `Trim` — fails because of stray whitespace.
  3. Double-escaping in regex like `\\d` — use the verbatim string `@"\d+"`.
  4. Ignoring culture in numeric parsing — results vary by environment locale. For data files, use **`InvariantCulture`**.
  5. Using `==` for case-insensitive comparison — `string.Equals(a, b, StringComparison.OrdinalIgnoreCase)` is safe.

Summary

  • `Trim`/`Split`/`Join`/`Replace` are the everyday tools
  • Compose with interpolated strings `$"..."` + format specifiers
  • For lots of repetitions, use `StringBuilder`
  • Pattern matching: `Regex.IsMatch`/`Match`/`Matches`
  • Specify `CultureInfo.InvariantCulture` when parsing data formats

Practice

**Practice - 17. String Processing**

Problem 1 — Word count and longest word

  • Project folder: `Homework01/`
  • Key concepts: `Split`, iteration, comparison

Requirements

  • Read one line from the user (`Console.ReadLine()`).
  • Split by space and drop empty entries.
  • Print the word count and the longest word (first one wins on ties).

Expected output

text
Enter a sentence: The quick brown fox jumps over the lazy dog
Word count: 9
Longest word: quick

Hints

  • `input.Split(' ', StringSplitOptions.RemoveEmptyEntries)` to drop blanks.
  • Or `Split(new[]{' '}, StringSplitOptions.RemoveEmptyEntries)`.
  • Initialize "longest" with the first word; replace when a longer one appears.

Problem 2 — Extract all emails

  • Project folder: `Homework02/`
  • Key concepts: `Regex.Matches`, verbatim strings

Requirements

  • Find every email in the following body and print one per line:

``` Contacts: alice@example.com, bob.smith@test.io. For help, contact admin+help@foo.co.kr. ```

  • A pattern like `[\w\.\+-]+@[\w\.-]+\.\w+` is enough (learning only).

Expected output

text
alice@example.com
bob.smith@test.io
admin+help@foo.co.kr

Hints

  • `using System.Text.RegularExpressions;`
  • `foreach (Match m in Regex.Matches(text, pattern))`
  • The body can be a string constant in code.

Check your answer

Try it yourself, then compare against the [`answer/`](./answer/) folder.

Answer (answer/)

homework/answer/Homework01/Homework01.csproj

xml
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

</Project>

homework/answer/Homework01/Program.cs

csharp
// Word count and longest word
Console.Write("Enter a sentence: ");
string input = Console.ReadLine() ?? "";

string[] words = input.Split(' ', StringSplitOptions.RemoveEmptyEntries);

if (words.Length == 0)
{
    Console.WriteLine("Input is empty.");
    return;
}

string longest = words[0];
foreach (var w in words)
{
    if (w.Length > longest.Length)
        longest = w;
}

Console.WriteLine($"Word count: {words.Length}");
Console.WriteLine($"Longest word: {longest}");

homework/answer/Homework02/Homework02.csproj

xml
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

</Project>

homework/answer/Homework02/Program.cs

csharp
// Extract every email from a body of text
using System.Text.RegularExpressions;

string text = "Contacts: alice@example.com, bob.smith@test.io.\n"
            + "For help, contact admin+help@foo.co.kr.";

string pattern = @"[\w\.\+-]+@[\w\.-]+\.\w+";

foreach (Match m in Regex.Matches(text, pattern))
{
    Console.WriteLine(m.Value);
}

Try It Yourself

bash
cd src/StringMethods && dotnet run
cd ../Interpolation && dotnet run
cd ../StringBuilderUse && dotnet run
cd ../RegexBasics && dotnet run
cd ../Culture && dotnet run

Next Lecture

[18_Delegate_and_Lambda](../../05_%EB%AA%A8%EB%8D%98_CSharp/18_%EB%8D%B8%EB%A6%AC%EA%B2%8C%EC%9D%B4%ED%8A%B8%EC%99%80_%EB%9E%8C%EB%8B%A4/) — Learn how to treat functions themselves as variables.

Example code / lecture materials

All lecture materials and example code are openly available on GitHub.

View on GitHub ↗