17. String Processing
Strings are the data type you handle most. Collect the everyday tools — interpolation, Split, Join, Replace, Substring, Trim, StringBuilder, string.Format.
What you'll learn
- 1Know core methods like `Trim`/`Split`/`Join`/`Replace`/`ToUpper`/`Contains`/`StartsWith`
- 2Use interpolated strings `$"..."` and format specifiers (`F2`, `yyyy-MM-dd`, ...)
- 3Know that `StringBuilder` is faster for repeated concatenation
- 4Use `Regex.IsMatch`/`Match`/`Matches` basics
- 5Understand that parsing results can change with `CultureInfo`
Overview
Cleaning up text you've read is a must in almost every program. C# `string` has many methods, `StringBuilder` handles repeated concatenation, and `Regex` does pattern matching. We'll also call out the pitfall that **culture** affects number/date parsing.
Core Concepts
1) Common `string` methods
" hello ".Trim(); // "hello"
"a,b,c".Split(','); // ["a", "b", "c"]
string.Join("-", ["a","b"]); // "a-b"
"abc".Replace("b", "X"); // "aXc"
"hi".ToUpper(); // "HI"
"hello".Contains("ll"); // true
"hello".StartsWith("he"); // true`string` is immutable (lecture 02), so every method **returns a new string**.
2) Interpolated strings `$"..."`
The cleanest way to embed variables and expressions.
double pi = 3.14159;
DateTime now = DateTime.Now;
Console.WriteLine($"Pi is about {pi:F2}"); // 2 decimal places
Console.WriteLine($"Today is {now:yyyy-MM-dd}");
Console.WriteLine($"Aligned: |{42,5}|"); // width 5, right-alignedFormat specifiers follow the colon. `F2` is 2 decimals; `yyyy-MM-dd` is ISO date.
3) `StringBuilder` — for repeated concatenation
Concatenating with `+=` in a loop allocates a new object every iteration. With many iterations it's very slow. `StringBuilder` appends into an internal buffer — fast.
using System.Text;
var sb = new StringBuilder();
for (int i = 0; i < 1000; i++)
sb.Append(i).Append(',');
string result = sb.ToString();For a few concatenations `+` is fine; for hundreds or thousands, `StringBuilder`.
4) `Regex` — regular expressions
Lives in `System.Text.RegularExpressions`.
using System.Text.RegularExpressions;
Regex.IsMatch("hello123", @"\d+"); // true (any digits?)
var m = Regex.Match("order #4242", @"\d+");
if (m.Success) Console.WriteLine(m.Value); // "4242"
foreach (Match x in Regex.Matches("a1 b22 c333", @"\d+"))
Console.WriteLine(x.Value); // 1, 22, 333`@"..."` is a verbatim string that preserves backslashes — pairs well with regex.
5) Culture pitfall
Parsing numbers and dates is affected by the **system culture**. For example, Germany uses `,` as the decimal separator.
using System.Globalization;
// Korea/UK etc.: "3.14" is 3.14
// Germany (de-DE): "3.14" is parsed as 314, and "3,14" is 3.14
decimal de = decimal.Parse("3,14", CultureInfo.GetCultureInfo("de-DE"));
decimal inv = decimal.Parse("3.14", CultureInfo.InvariantCulture);For **machine-readable formats** (config, CSV, logs) always specify `CultureInfo.InvariantCulture`. Use the system culture only for UI display.
Examples
Example 1 — `StringMethods`: everyday methods
string s = " Hello, C# World ";
Console.WriteLine($"Source : [{s}]");
Console.WriteLine($"Trim : [{s.Trim()}]");
Console.WriteLine($"Upper : [{s.Trim().ToUpper()}]");
Console.WriteLine($"Replace : [{s.Trim().Replace("C#", "DotNet")}]");
string[] parts = "apple,banana,cherry".Split(',');
Console.WriteLine($"Split : {parts.Length} items → {string.Join(" | ", parts)}");
Console.WriteLine($"Contains 'C#' : {s.Contains("C#")}");
Console.WriteLine($"StartsWith ' ' : {s.StartsWith(" ")}");**Output**
Source : [ Hello, C# World ]
Trim : [Hello, C# World]
Upper : [HELLO, C# WORLD]
Replace : [Hello, DotNet World]
Split : 3 items → apple | banana | cherry
Contains 'C#' : True
StartsWith ' ' : True**Note:** Chain method calls to compose transformations.
Example 2 — `Interpolation`: interpolation and formatting
double pi = 3.14159265;
int score = 87;
DateTime now = new DateTime(2025, 5, 18, 14, 30, 0);
Console.WriteLine($"Pi : {pi:F2}");
Console.WriteLine($"Score : {score,5} pts"); // width 5, right-aligned
Console.WriteLine($"Score(L) : {score,-5}|"); // negative width → left-aligned
Console.WriteLine($"Date : {now:yyyy-MM-dd HH:mm}");
Console.WriteLine($"Percent : {0.873:P1}"); // 1-decimal percent**Output**
Pi : 3.14
Score : 87 pts
Score(L) : 87 |
Date : 2025-05-18 14:30
Percent : 87.3%**Note:** `,N` is the width, `:F2` is the format — together: `{value,10:F2}`.
Example 3 — `StringBuilderUse`: repeated-concatenation performance
using System.Text;
var sb = new StringBuilder();
sb.Append("[");
for (int i = 1; i <= 5; i++)
{
if (i > 1) sb.Append(", ");
sb.Append(i);
}
sb.Append("]");
Console.WriteLine(sb.ToString());
Console.WriteLine($"Length: {sb.Length}");
// Performance note: hundreds+ iterations is dramatically faster than +=**Output**
[1, 2, 3, 4, 5]
Length: 15**Note:** A few concatenations is fine with `+`; the more iterations, the more `StringBuilder` pays off.
Example 4 — `RegexBasics`: regex basics
using System.Text.RegularExpressions;
string text = "Contact emails: alice@example.com or bob@test.io.";
// 1) Match presence
bool hasDigit = Regex.IsMatch("order #42", @"\d+");
Console.WriteLine($"Contains digits? {hasDigit}");
// 2) First match
Match first = Regex.Match("price 1500 won", @"\d+");
Console.WriteLine($"First number: {first.Value}");
// 3) Extract all emails
foreach (Match m in Regex.Matches(text, @"[\w\.-]+@[\w\.-]+\.\w+"))
{
Console.WriteLine($"email: {m.Value}");
}**Output**
Contains digits? True
First number: 1500
email: alice@example.com
email: bob@test.io**Note:** Real-world email regex is much more complex — this is a simple learning pattern.
Example 5 — `Culture`: same string, different result
using System.Globalization;
string s = "3,14";
decimal de = decimal.Parse(s, CultureInfo.GetCultureInfo("de-DE"));
decimal inv;
bool ok = decimal.TryParse(s, NumberStyles.Number, CultureInfo.InvariantCulture, out inv);
Console.WriteLine($"de-DE parse : {de}"); // 3.14
Console.WriteLine($"Invariant parse : ok={ok}, value={inv}"); // takes ',' as thousand separator → 314
// Safe machine format
decimal price = 1234.56m;
Console.WriteLine($"machine format: {price.ToString(CultureInfo.InvariantCulture)}");**Output**
de-DE parse : 3.14
Invariant parse : ok=True, value=314
machine format: 1234.56**Note:** Always read/write CSV and config files with `InvariantCulture`. Use the user's culture only for UI display.
Full example code (src/)
src/Culture/Culture.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
src/Culture/Program.cs
// Same string can be parsed differently per culture
using System.Globalization;
string s = "3,14";
decimal de = decimal.Parse(s, CultureInfo.GetCultureInfo("de-DE"));
bool ok = decimal.TryParse(
s, NumberStyles.Number, CultureInfo.InvariantCulture, out decimal inv);
Console.WriteLine($"de-DE parse : {de}"); // 3.14
Console.WriteLine($"Invariant parse : ok={ok}, value={inv}"); // ',' as thousand sep → 314
// Always specify InvariantCulture for machine formats
decimal price = 1234.56m;
Console.WriteLine($"machine format: {price.ToString(CultureInfo.InvariantCulture)}");
src/Interpolation/Interpolation.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
src/Interpolation/Program.cs
// Interpolated strings and format specifiers
double pi = 3.14159265;
int score = 87;
DateTime now = new DateTime(2025, 5, 18, 14, 30, 0);
Console.WriteLine($"Pi : {pi:F2}"); // 2 decimal places
Console.WriteLine($"Score : {score,5} pts"); // width 5, right-aligned
Console.WriteLine($"Score(L) : {score,-5}|"); // negative → left-aligned
Console.WriteLine($"Date : {now:yyyy-MM-dd HH:mm}");
Console.WriteLine($"Percent : {0.873:P1}"); // 1-decimal percent
src/RegexBasics/Program.cs
// Regex's IsMatch / Match / Matches
using System.Text.RegularExpressions;
string text = "Contact emails: alice@example.com or bob@test.io.";
// 1) presence
bool hasDigit = Regex.IsMatch("order #42", @"\d+");
Console.WriteLine($"Contains digits? {hasDigit}");
// 2) first match
Match first = Regex.Match("price 1500 won", @"\d+");
Console.WriteLine($"First number: {first.Value}");
// 3) extract all emails (simple learning pattern)
foreach (Match m in Regex.Matches(text, @"[\w\.-]+@[\w\.-]+\.\w+"))
{
Console.WriteLine($"email: {m.Value}");
}
src/RegexBasics/RegexBasics.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
src/StringBuilderUse/Program.cs
// StringBuilder: strong at repeated concatenation
using System.Text;
var sb = new StringBuilder();
sb.Append("[");
for (int i = 1; i <= 5; i++)
{
if (i > 1) sb.Append(", ");
sb.Append(i);
}
sb.Append("]");
Console.WriteLine(sb.ToString());
Console.WriteLine($"Length: {sb.Length}");
// Performance note: hundreds+ iterations is way faster than +=
src/StringBuilderUse/StringBuilderUse.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
src/StringMethods/Program.cs
// Common string method roundup
string s = " Hello, C# World ";
Console.WriteLine($"Source : [{s}]");
Console.WriteLine($"Trim : [{s.Trim()}]");
Console.WriteLine($"Upper : [{s.Trim().ToUpper()}]");
Console.WriteLine($"Replace : [{s.Trim().Replace("C#", "DotNet")}]");
string[] parts = "apple,banana,cherry".Split(',');
Console.WriteLine($"Split : {parts.Length} items → {string.Join(" | ", parts)}");
Console.WriteLine($"Contains 'C#' : {s.Contains("C#")}");
Console.WriteLine($"StartsWith ' ' : {s.StartsWith(" ")}");
src/StringMethods/StringMethods.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
Common Mistakes
- Concatenating with `+=` in a loop — switch to `StringBuilder` for many iterations.
- Comparing user input without `Trim` — fails because of stray whitespace.
- Double-escaping in regex like `\\d` — use the verbatim string `@"\d+"`.
- Ignoring culture in numeric parsing — results vary by environment locale. For data files, use **`InvariantCulture`**.
- Using `==` for case-insensitive comparison — `string.Equals(a, b, StringComparison.OrdinalIgnoreCase)` is safe.
Summary
- `Trim`/`Split`/`Join`/`Replace` are the everyday tools
- Compose with interpolated strings `$"..."` + format specifiers
- For lots of repetitions, use `StringBuilder`
- Pattern matching: `Regex.IsMatch`/`Match`/`Matches`
- Specify `CultureInfo.InvariantCulture` when parsing data formats
Practice
**Practice - 17. String Processing**
Problem 1 — Word count and longest word
- Project folder: `Homework01/`
- Key concepts: `Split`, iteration, comparison
Requirements
- Read one line from the user (`Console.ReadLine()`).
- Split by space and drop empty entries.
- Print the word count and the longest word (first one wins on ties).
Expected output
Enter a sentence: The quick brown fox jumps over the lazy dog
Word count: 9
Longest word: quickHints
- `input.Split(' ', StringSplitOptions.RemoveEmptyEntries)` to drop blanks.
- Or `Split(new[]{' '}, StringSplitOptions.RemoveEmptyEntries)`.
- Initialize "longest" with the first word; replace when a longer one appears.
Problem 2 — Extract all emails
- Project folder: `Homework02/`
- Key concepts: `Regex.Matches`, verbatim strings
Requirements
- Find every email in the following body and print one per line:
``` Contacts: alice@example.com, bob.smith@test.io. For help, contact admin+help@foo.co.kr. ```
- A pattern like `[\w\.\+-]+@[\w\.-]+\.\w+` is enough (learning only).
Expected output
alice@example.com
bob.smith@test.io
admin+help@foo.co.krHints
- `using System.Text.RegularExpressions;`
- `foreach (Match m in Regex.Matches(text, pattern))`
- The body can be a string constant in code.
Check your answer
Try it yourself, then compare against the [`answer/`](./answer/) folder.
Answer (answer/)
homework/answer/Homework01/Homework01.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
homework/answer/Homework01/Program.cs
// Word count and longest word
Console.Write("Enter a sentence: ");
string input = Console.ReadLine() ?? "";
string[] words = input.Split(' ', StringSplitOptions.RemoveEmptyEntries);
if (words.Length == 0)
{
Console.WriteLine("Input is empty.");
return;
}
string longest = words[0];
foreach (var w in words)
{
if (w.Length > longest.Length)
longest = w;
}
Console.WriteLine($"Word count: {words.Length}");
Console.WriteLine($"Longest word: {longest}");
homework/answer/Homework02/Homework02.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>CodingNow.Lecture.IoEx17</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
homework/answer/Homework02/Program.cs
// Extract every email from a body of text
using System.Text.RegularExpressions;
string text = "Contacts: alice@example.com, bob.smith@test.io.\n"
+ "For help, contact admin+help@foo.co.kr.";
string pattern = @"[\w\.\+-]+@[\w\.-]+\.\w+";
foreach (Match m in Regex.Matches(text, pattern))
{
Console.WriteLine(m.Value);
}
Try It Yourself
cd src/StringMethods && dotnet run
cd ../Interpolation && dotnet run
cd ../StringBuilderUse && dotnet run
cd ../RegexBasics && dotnet run
cd ../Culture && dotnet runNext Lecture
[18_Delegate_and_Lambda](../../05_%EB%AA%A8%EB%8D%98_CSharp/18_%EB%8D%B8%EB%A6%AC%EA%B2%8C%EC%9D%B4%ED%8A%B8%EC%99%80_%EB%9E%8C%EB%8B%A4/) — Learn how to treat functions themselves as variables.
All lecture materials and example code are openly available on GitHub.
View on GitHub ↗