7. Working with strings

7.1 Case insensitive matching

The i or I as a flag for the substitution command implies case-insensitive substitution. Here, the pattern is lowercase and the s command will substitute to upper case ignoring case.

sed  -n 's/victor hugo/VICTOR HUGO/pi' books.log

The same command without the i flag becomes case sensitive and will give no output.

7.2 escape sequences in the replacement strings

To complement the I flag, GNU sed (not necessarily available on other sed versions) provides escape sequences to change the case of replacement strings.

escape sequence Description
\L converts all the following characters to lowercase (overridden by \U or \E)
\U converts all the following characters to uppercase (overridden by \L or \E)
\l converts the next character to lowercase
\u converts the next character to uppercase
\E stops the case conversion (end)


* \L : convert all the remaining characters of the word after \L as lowercase characters

sed -n 's/Amélie Nothomb/\LAMELIE Nothomb/p' books.log

7. Stupeur et tremblements | amelie nothomb | Format: Poche | Pages: 160

* \U : convert all the remaining characters of the word after \U as uppercase characters

sed -n 's/Amélie Nothomb/A\UMELIE Nothomb/p' books.log

7. Stupeur et tremblements | AMELIE NOTHOMB | Format: Poche | Pages: 160

* \l : convert the immediate character after \l as lowercase character

sed -n 's/Amélie Nothomb/\lAMELIE Nothomb/p' books.log

7. Stupeur et tremblements | aMELIE Nothomb | Format: Poche | Pages: 160

* \u : convert the immediate character after \u as uppercase character

sed -n 's/Amélie Nothomb/A\umélie Nothomb/p' books.log

7. Stupeur et tremblements | AMélie Nothomb | Format: Poche | Pages: 160

* \E : stop case conversion (2 examples)

sed -n -E 's/(Amélie Nothomb)/\U\1\E/p' books.log

7. Stupeur et tremblements | AMÉLIE NOTHOMB | Format: Poche | Pages: 160
  1. -E : Enables Extended Regular Expressions (ERE) , allowing parentheses and + without escaping them (\ backslashes).
  2. (Amélie Nothomb) : Captures the author’s name as group 1.
  3. \U : Starts uppercase conversion. All text that follows will be uppercased.
  4. \1 : Inserts the text matched by group 1 — i.e. Amélie Nothomb.
  5. \E : Stops the uppercase conversion so the rest of the line remains unchanged.

The above example converts a specific author to uppercase while the second handles multiple lines (e.g., converts all authors to uppercase in the file) :

sed -E 's/\| ([^|]+) \|/\| \U\1\E \|/' books.log

1. Les Misérables | VICTOR HUGO | Format: Poche | Pages: 1232
2. Madame Bovary | GUSTAVE FLAUBERT | Format: Poche | Pages: 464
3. Le Père Goriot | HONORÉ DE BALZAC | Format: Poche | Pages: 368
4. Oscar et la dame rose | ÉRIC-EMMANUEL SCHMITT | Format: Poche | Pages: 112
5. Monsieur Ibrahim et les fleurs du Coran | ÉRIC-EMMANUEL SCHMITT | Format: Poche | Pages: 96
6. L'Étranger | ALBERT CAMUS | Format: Poche | Pages: 186
7. Stupeur et tremblements | AMÉLIE NOTHOMB | Format: Poche | Pages: 160
8. Vernon Subutex 1 | VIRGINIE DESPENTES | Format: Poche | Pages: 352
9. Couleur du temps | LEÏLA SLIMANI | Format: Poche | Pages: 224
10. La vie est facile, ne t'inquiète pas | AGNÈS MARTIN-LUGAND | Format: Poche | Pages: 288
11. book book title | AUTHOR AUTHOR | size: pocket standard | Pages: 1000 500 500 500

Part one of the command : The search pattern /| ([^|]+) |/

  1. | : matches a literal pipe
  2. : matches a space character after the pipe
  3. ( ) : Capture parentheses: these define a group whose content will be stored in memory (and can then be accessed using \1, \2, etc.). Here, everything between ( and ) will be stored as group 1.
  4. [^|] : any character except |
    • : one or more times 4+5. [^|]+ : Take all consecutive characters until a pipe (|) is encountered. we cannot use " **.*** " it is said as greedy, it takes as many characters as possible, up to the end of the line. So as soon as our file contains fixed separators (| : , ; etc.) we use [^SEP]+ to cleanly capture a field between two separators.
  5. : matches a space character before the next pipe
  6. : | : matches the second pipe

Part two of the command : The replacement part | \U\1\E |

  1. | : prints the literal pipe
  2. : prints a space character after the pipe
  3. \U: Starts uppercase conversion — everything after this will be uppercased.
  4. \1 : Refers to the text from capturing group 1
  5. \E : Ends uppercase conversion — the following text returns to normal case.
  6. : prints a space again
  7. | : prints the closing pipe
7.3 useful examples
  • change all characters to lowercase
sed -n '7 s/.*/\L&/p' books.log

7. stupeur et tremblements | amélie nothomb | format: poche | pages: 160
  • change all characters to uppercase
sed -n '7 s/.*/\U&/p' books.log

7. STUPEUR ET TREMBLEMENTS | AMÉLIE NOTHOMB | FORMAT: POCHE | PAGES: 160
  • change only the first character to uppercase
sed -n '7 s/\b\w/\u&/pg' books.log

7. Stupeur Et Tremblements | Amélie Nothomb | Format: Poche | Pages: 160


8. Flags (Modifiers)

Some common modifiers seen earlier can control aspects of regular expressions, such as case sensitivity or global matching, just like options modify the behavior of shell commands. In Sed they are added after the closing delimiter of the regular expression (e.g., /pattern/flags), and for now we'll delve into a couple of them.

8.1 write to file

We've seen in 4.2 w as an option that writes all lines that are printed to a file and works after the entire script is executed, it affects the final output of sed, not a specific pattern. When w is used as a flag it writes only the modified lines to a file solely if the replacement actually happened.

sed -E 's/TND/USD/w modified.log' bank.csv

cat modified.log

withdrawal,15-08-07,100,USD,2630
purchase,25-08-07,60,USD,2650
payment,25-08-02,900,USD,2950

In the above example, the file modified.log contains only only lines where TND → USD changed

8.2 m or M : Multiline mode (GNU Sed)

The m (or M) flag in GNU sed enables multi-line mode for regular expression matching. This modifier changes the behavior of certain special characters within a regular expression:

  • ^ and $: In multi-line mode, ^ matches the beginning of the line and the position immediately after a newline character. Similarly, $ matches the end of the line and the position immediately before a newline character. Without the m flag, ^ and $ only match the beginning and end of the entire pattern space, respectively.
  • . (dot): In multi-line mode, the period character does not match a newline character. This means matching across newlines, needs to explicitly include \n in our regular expression (e.g., (.|\n)*). To have multiple lines in the pattern buffer the command N is used in most cases.

Example 1 : Replace "Line 1" and "Line 2" with a single string. We gonna use the N command to append the next line to the pattern space, and then the m flag to allow ^ and $ to match within the combined lines.

cat test.txt

Line 1
Line 2
Line 3

sed -E ':j;N;$!bj;s/^Line 1\nLine 2$/Multiline in action/m' test.txt

Multiline in action
Line 3

Part one of the command : reads the whole file into the pattern space :j;N;$!bj;

Symbol Meaning
:j create a label named j
N append the next input line to the current pattern space, inserting a literal newline (\n) between the old and appended lines.
$!bj if we are not on the last line ($!), branch (b) back to label a (so bj jumps to :j).

After that loop, the entire file is in the pattern space, with original newlines preserved as \n. pattern space becomes the single string: "Line 1\nLine 2\nLine 3"

Part two : The substitution: s/^Line 1\nLine 2$/Multi-line in action/m

  1. s/.../.../ — standard substitution (no g, so it replaces at most one match).
    • . ^ = beginning-of-line (anchor)
    • . Line 1
    • . \n = literal newline between lines
    • . Line 2
    • . $ = end-of-line (anchor)

m enables multiline anchor behaviour: it makes ^ and $ match not only the start/end of the entire pattern space, but also positions immediately after/ before internal newlines.

^ can match the start of the buffer or the position right after any \n.

$ can match the end of the buffer or the position right before any \n.

The pattern ^Line 1\nLine 2$ will match the two-line sequence Line 1\nLine 2 even though the pattern space contains additional data (\nLine 3) after it — because with the m flag ^ matches the very beginning of the buffer and $ matches the position before the newline that precedes Line 3.

Without m, ^ and $ by default in sed match only the start and the very end of the entire pattern space. With the whole file in the buffer ("Line 1\nLine 2\nLine 3"), the pattern ^Line 1\nLine 2$ would not match, because the $ would only match at the end of the buffer (after Line 3), and Line 2 is not right before that end.
Example 2 : two examples to highlight the effect of multi-line mode on the . dot We'll use the N command here too, and two similar basic commands with and without the m flag.
a- without m :

printf 'Hey team\npush ourselves to the limits and achieve our goals\n'| sed 'N; s/H.*m/---/'

***its and achieve our goals
  • After reading the first line (Hey team\n), sed executes N, so the pattern space becomes: Hey team\npush ourselves to the limits and achieve our goals
  • Then the substitute command s/H.*m/---/ is applied to this pattern space.
    • H matches the literal character H (capital H).
    • . match any character repeated zero or more times. Since the m flag is not used, the behavior of . and is default (in standard GNU sed) : the . will match the newline character.
    • m matches literal ‘m’.
    • the regex H.*m matches from the H in “Hi team” all the way through the newline character and into “push ourselves to the limits and achieve our goals. It will go up to the last ‘m’ that it can match. The last m in that line is in “limits”.

This entire matched portion gets replaced by X. After substitution, what remains in the pattern space is:

--- (the replacement), plus the part of the original pattern space after the matched portion.
b- with m :

printf 'Hey team\npush ourselves to the limits and achieve our goals\n'| sed 'N; s/H.*m/---/gm'

---
push ourselves to the limits and achieve our goals
  • With the m flag . does NOT match newlines.So in this example, the . metacharacter will stop at a newline — it cannot “cross” from one line to the next.
  • No match is found in the second line (there’s no capital “H” there), so first line is replaced with ---, second line remains unchanged.

9. Basic and Extended regular expressions : BRE - ERE

Known as regex, Sed supports regular expressions which extend its capabilities and helps in resolving advanced cases. By default Sed uses Basic Regular Expressions, (and so on for grep), in contrast to egrep and awk that use ERE by default. To enable Extended Regular Expression in Sed we have to use the -E option.

What is the difference between BRE and ERE ? :

BRE is the older, simpler form of regex in Unix tools and some special characters (like ?, +, |, and ()) need to be escaped with a backslash \ whilst ERE is a newer, more natural form of regex syntax where we don’t need to escape (), +, ?, or | which make it easier to read and write.

Key Differences:

In BRE, (, ), {, }, ?, +, and | are treated as literal characters unless escaped with a backslash (e.g., (, {, \?). In ERE, (, ), {, }, ?, +, and | are metacharacters by default and must be escaped with a backslash if we want to treat them literally.

9.1 Line anchors
  1. ^ : the caret (^) symbol matches the start of a line.

Print all the lines that start with "w" :

sed -n '/^w/p' bank.csv

withdrawal,25-08-10,150,USD,2600
withdrawal,15-08-07,100,TND,2630
  1. $ : the dollar ($) symbol matches the end of a line.

Print all the lines that end with "2600" :

sed -n '/2600$/p' bank.csv

withdrawal,25-08-10,150,USD,2600
9.2 metacharacters

9.2.1 Extensions to regular expressions

\b indicates a word boundary. It functions for anchoring at both the start and the end of a word.

words starting with "wit" :

sed -n '/\bwith/p' bank.csv

withdrawal,25-08-10,150,USD,2600
withdrawal,15-08-07,100,TND,2630


words ending with "wal" :

sed -n '/wal\b/p' bank.csv

withdrawal,25-08-10,150,USD,2600
withdrawal,15-08-07,100,TND,2630


\B indicates the opposite of \b.

The example below will match 'wit' but not at the start of a word gives no output :

sed -n '/\Bwit/p' bank.csv


match 'wit' but not at the end of a word :

sed -n '/wit\B/p' bank.csv

withdrawal,25-08-10,150,USD,2600
withdrawal,15-08-07,100,TND,2630


\s implies single whitespace character : tab, newline, vertical tab, form feed, carriage return and space

The following example will replace any multiple spaces between fields with a single space; useful if the file has inconsistent spacing

sed -E 's/\s+/ /g' students.txt


Another example that replaces any whitespace around pipes | with a single space, to normalize "fields" in the book lines

sed -E 's/\s*\|\s*/ | /g' books.log

note : In ERE mode (sed -E), the pipe | is a special regex operator that means OR, so we had to escape it to match a literal pipe character.

\w Matches any single word character : letter, digit or the underscore character _.

echo -e "-$%#\n1900\nABC\ng_h" | sed -n '/\w/p'

1900
ABC
g_h


\W Matches any non single word character

echo -e "-$%#\n1900\nABC\ng_h" | sed -n '/\W/p'

-$%#


\< Matches the beginning of a word.

echo "alpha %-= omega." | sed 's/\</*/g'

*alpha %-= *omega.


\> Matches the beginning of a word.

echo "alpha %-= omega." | sed 's/\>/*/g'

alpha* %-= omega*.


\` Matches exclusively at the beginning of the pattern space. Note that this differs from ^ in multi-line.

printf 'Hey team\npush ourselves to the limits and achieve our goals\n'| sed 'N; s/\`/---/gm'

---Hey team
push ourselves to the limits and achieve our goals


\' Matches exclusively at the end of the pattern space. Note that this differs from $ in multi-line

printf 'Hey team\npush ourselves to the limits and achieve our goals\n'| sed "N; s/\'/\!/gm"

Hey team
push ourselves to the limits and achieve our goals!


9.2.2 The pipe |

The | pipe is similar to logical OR, it matches items from either side of the pipe.

Print operations from either the 2nd or the 7th of august : (using BRE )

sed -n '/08-02\|08-07/p' bank.csv

withdrawal,15-08-07,100,TND,2630
purchase,25-08-07,60,TND,2650
payment,25-08-02,900,TND,2950

The same command using ERE : (no need to escape the pipe | metacharacter)

sed -n -E '/08-02|08-07/p' bank.csv

withdrawal,15-08-07,100,TND,2630
purchase,25-08-07,60,TND,2650
payment,25-08-02,900,TND,2950


9.2.3 The dot .

The Dot . metacharacter matches any single character including the newline character but not the end of line character.

Example 1 : show lines containing any date that ends with “07”

sed -n '/08.07/p' bank.csv

withdrawal,15-08-07,100,TND,2630
purchase,25-08-07,60,TND,2650

08 followed by any single character (the .) and then followed by 07. So the . matches the dash (-) in the date.
Example 2 : select lines where the amount is three digits

sed -n '/,.[0-9][0-9],/p' bank.csv
withdrawal,25-08-10,150,USD,2600
withdrawal,15-08-07,100,TND,2630
payment,25-08-02,900,TND,2950

The line with 60, TND which is two digits is not displayed as it doesn't match the pattern.

9.2.4 Quantifiers

  1. the question mark ? (works only in ERE)

Matches 0 or 1 occurrence of the preceding element.

Example 1 : The following example will match "withdraw" and "withdrawal"

sed -E -n '/withdraw(al)?/p' bank.csv

withdraw → the fixed base (al)? → the group al may appear once or not at all /p → print matching lines

Example 2 : The following example will match "payment" and "payments" and substitute with "pay-in"

sed -E -n 's/payments?/pay-in/p' bank.csv
pay-in,25-08-02,900,TND,2950
pay-in,25-08-01,200,USD,1450

Example 3 : match dates where the day may be one or two digits

sed -E -n '/[0-9]?[0-9]-08-10/p' bank.csv

[0-9]? → one optional digit [0-9] → one required digit Together [0-9]?[0-9] means: one or two digits, so it matches both 5-08-10 and 25-08-10.

  1. The plus symbol + (works only in ERE)

Matches one or more occurrences of the preceding element (group or character).

Example 1 : The following example will match amounts of 1 or more digits (with 1 or more zeros)

sed -E -n '/1+/p' bank.csv

withdrawal,25-08-10,150,USD,2600
withdrawal,15-08-07,100,TND,2630
payments,25-08-01,200,USD,1450

Example 2 : The following example will match words containing double letters like “ss”, “oo”, “ee”, etc.

sed -E -n '/([a-z])\1+/p' bank.csv

operation,date,amount,currency,balance

([a-z]) → capture any lowercase letter

\1+ → match the same letter one or more times after it

  1. The Asterisk symbol *

Matches zero or more occurrences of the preceding element (character or bracket expression). It is less strict than + symbol (a+ → at least one a a* → zero or more a, so it will even match if there’s no a at all).

Example 1 : The following example will match all lines where the currency field TND appears

sed -E -n '/TND*/p' bank.csv

withdrawal,15-08-07,100,TND,2630
purchase,25-08-07,60,TND,2650
payment,25-08-02,900,TND,2950

Example 2 : The following example will match match “withdrawal” even if it had missing letters like “witdrawal”

sed -E -n '/with*drawal/p' bank.csv

withdrawal,25-08-10,150,USD,2600
withdrawal,15-08-07,100,TND,2630


  1. Range of integer numbers {}

For interval expressions with or without boundaries braces are used as follows : (e.g., {2} matches "aa")

Quantifier Description
{n} matches exactly "n" occurrences of the preceding character
{n,} matches at least "n" occurrences of the preceding character
{,n} match up to n times of the preceding character, 0 included
{m, n} matches at least "m" and at most "n" occurrences of the preceding character


Example with {n} : Print only books having exactly 3 digits in number of pages. book 1 with 1232 pages and book 5 (96 pages) are excluded.

sed -E -n '/Pages: [0-9]{3}$/p' books.log 

2. Madame Bovary | Gustave Flaubert | Format: Poche | Pages: 464
3. Le Père Goriot | Honoré de Balzac | Format: Poche | Pages: 368
4. Oscar et la dame rose | Éric-Emmanuel Schmitt | Format: Poche | Pages: 112
6. L'Étranger | Albert Camus | Format: Poche | Pages: 186
7. Stupeur et tremblements | Amélie Nothomb | Format: Poche | Pages: 160
8. Vernon Subutex 1 | Virginie Despentes | Format: Poche | Pages: 352
9. Couleur du temps | Leïla Slimani | Format: Poche | Pages: 224
10. La vie est facile, ne t'inquiète pas | Agnès Martin-Lugand | Format: Poche | Pages: 288


Examples with {n, } :

Print only books with at least 4 digits in number of pages. only book 1 is printed, all other excluded this time.

sed -E -n '/Pages: [0-9]{4,}$/p' books.log
1. Les Misérables | Victor Hugo | Format: Poche | Pages: 1232


Books with titles with words made of 10+ characters : “Éric-Emmanuel” and “tremblements” are ≥ 10 letters (ignoring accents):

sed -E -n '/[A-Za-z]{10,}/p' books.log

1. Les Misérables | Victor Hugo | Format: Poche | Pages: 1232
7. Stupeur et tremblements | Amélie Nothomb | Format: Poche | Pages: 160


Example with {, n} : Match short first names (≤5 letters)

sed -E -n '/^[0-9]+ [A-Za-z]{,5} /p' students.txt

140 dridi mohamed 80 success
148 mekki ryan 75 success
153 sako aminata 70 fail

Examples with {m, n} :

Print only bank amounts that are 1 to 2 digits long

sed -nE '/,[0-9]{1,2},/p' bank.csv

purchase,25-08-07,60,TND,2650


Books where the page count has 2–3 digits

sed -n -E '/Pages: [0-9]{2,3}$/p' books.log

2. Madame Bovary | Gustave Flaubert | Format: Poche | Pages: 464
3. Le Père Goriot | Honoré de Balzac | Format: Poche | Pages: 368
4. Oscar et la dame rose | Éric-Emmanuel Schmitt | Format: Poche | Pages: 112
5. Monsieur Ibrahim et les fleurs du Coran | Éric-Emmanuel Schmitt | Format: Poche | Pages: 96
6. L'Étranger | Albert Camus | Format: Poche | Pages: 186
7. Stupeur et tremblements | Amélie Nothomb | Format: Poche | Pages: 160
8. Vernon Subutex 1 | Virginie Despentes | Format: Poche | Pages: 352
9. Couleur du temps | Leïla Slimani | Format: Poche | Pages: 224
10. La vie est facile, ne t'inquiète pas | Agnès Martin-Lugand | Format: Poche | Pages: 288


{m, n} with substitution : uppercase only the forename column when it contains 3–6 letters, without changing the ID, name, mark, or result. The last line is untouched because the result " fail" did not meet the pattern.

sed -E 's/^([0-9]+\s+[a-z]+\s+)([a-z]{3,10})(\s+[0-9]+\s+success)/\1\U\2\E\3/' students.txt

ID name forename grade result
120 weslati FIRAS 85 success
140 dridi MOHAMED 80 success
145 yacoubi SALAH 95 success
147 benrejeb WISSAL 90 success
148 mekki RYAN 75 success
152 mbanebe PHILIPPE 80 success
153 sako aminata 70 fail
Symbol Meaning
^ Start of line
([0-9]+\s+[a-z]+\s+) Group 1: ID + surname + spaces (unchanged)
([a-z]{3,10}) Group 2: forename 3–6 lowercase letters → uppercase
(\s+[0-9]+\s+success) Group 3: the rest of the line


9.3 character classes and their metacharacters

A list of characters enclosed by brackets [ and ] is called a character set or bracket expression. It is used to match any single character in that list. Whether or not using the -E option, bracket expressions are applicable in both BRE and ERE. Some metacharacters seen previously have a different meaning if used inside the character classes.

9.3.1 Match Character Set [ ]

As seen in the multiple examples before, here is another basic one : Print lines where the currency is either USD or TND by checking the first letter of the currency column:

sed -n -E '/,[UT][NS]D,/p' bank.csv

withdrawal,25-08-10,150,USD,2600
withdrawal,15-08-07,100,TND,2630
purchase,25-08-07,60,TND,2650
payment,25-08-02,900,TND,2950
payments,25-08-01,200,USD,1450

9.3.2 Exclusive Set [^]

It negates the set of characters, so all characters other than those in the brackets will match.

sed -n -E '/^[^123]/p' books.log

4. Oscar et la dame rose | Éric-Emmanuel Schmitt | Format: Poche | Pages: 112
5. Monsieur Ibrahim et les fleurs du Coran | Éric-Emmanuel Schmitt | Format: Poche | Pages: 96
6. L'Étranger | Albert Camus | Format: Poche | Pages: 186
7. Stupeur et tremblements | Amélie Nothomb | Format: Poche | Pages: 160
8. Vernon Subutex 1 | Virginie Despentes | Format: Poche | Pages: 352
9. Couleur du temps | Leïla Slimani | Format: Poche | Pages: 224


9.3.3 Character Range [-]

The regular expression will match any character listed within square brackets instead of specifying each one separately.

Example 1 : Replace names with *** in all lines :

sed -E 's/^([0-9]+)(\s+)[A-Za-z]+/\1\2*** /' students.txt
ID name forename grade result
120 ***  firas 85 success
140 ***  mohamed 80 success
145 ***  salah 95 success
147 ***  wissal 90 success
148 ***  ryan 75 success
152 ***  philippe 80 success
153 ***  aminata 70 fail
Symbol Meaning
([0-9]+) The ID → kept via \1
(\s+) The whitespace after ID → kept via \2
[A-Za-z]+ The name field → replaced
*** New name placeholder


Example 2 : Numbers can be used to create a numeric range :

sed -E 's/\b0*[1-9][0-9]{2,}\b/ELEC1/g' students.txt

ID name forename grade result
ELEC1 weslati firas 85 success
ELEC1 dridi mohamed 80 success
ELEC1 yacoubi salah 95 success
ELEC1 benrejeb wissal 90 success
ELEC1 mekki ryan 75 success
ELEC1 mbanebe philippe 80 success
ELEC1 sako aminata 70 fail
Symbol Meaning
\b Word boundary → makes sure the number is a standalone word
0* Zero or more leading zeros allowed
[1-9] First non-zero digit, ensures number ≥ 100
[0-9]{2,} At least 2 more digits → total ≥ 3 digits (>=100
\b End word boundary → stops matching inside longer strings


9.4 POSIX bracket expressions

The POSIX standard defines some classes or categories of characters composed with a name enclosed between [: and :] and has to be used within a character class [], along with other characters as needed.

Class Equivalent to Description
[:alnum:] [A-Za-z0-9] Digits, uppercase and lowercase letters.
[:alpha:] [A-Za-z] Upper- and lowercase letters.
[:ascii:] [\x00-\x7F] ASCII characters.
[:blank:] [ \t] Space and horizontal tab only.
[:cntrl:] [\x00-\x1F\x7F] Control characters (non-printables + DEL).
[:digit:] [0-9] Digits 0-9.
[:graph:] [^[:cntrl:] ] Graphic characters (printable excluding space).
[:lower:] [a-z] Lowercase letters only.
[:print:] [[:graph:] ] Printable characters including space.
[:punct:] [-!"#$%&'()*+,./:;<=>?@\[\\\]^_`{|}~] Punctuation characters (all printable except letters and digits).
[:space:] [ \t\n\r\f\v] Whitespace: space, tab, newline, carriage return, form feed, vertical tab.
[:upper:] [A-Z] Uppercase letters only.
[:word:] [A-Za-z0-9_] Word characters: letters, digits & underscore.
[:xdigit:] [0-9A-Fa-f] Hexadecimal digits.


Some simple examples :

Class Goal Command Example Result / Explanation
[:alnum:] Replace alphanumeric characters sed 's/[[:alnum:]]/_/g' students.txt Every letter and number → _
[:alpha:] Remove letters, keep digits/spaces sed 's/[[:alpha:]]//g' students.txt Leaves only numbers & whitespace
[:blank:] Show where spaces & tabs exist sed 's/[[:blank:]]/·/g' students.txt Spaces become visible (·)
[:digit:] Remove digits from grade column sed 's/[[:digit:]]//g' students.txt Keeps only names and words
[:lower:] Highlight lowercase letters sed 's/[[:lower:]]/*/g' students.txt Case-sensitivity demonstration
[:space:] Remove all whitespace sed 's/[[:space:]]//g' students.txt Every line becomes one compact block
[:upper:] Show uppercase letters only sed -n 's/[^[:upper:]]//gp' students.txt Outputs only ID from line1
[:punct:] Keep all punctuation characters sed -E 's/[^[:punct:]]+//g' bank.csv Only punctuation remains


9.5 Escape Sequences

Some special characters like tab, carriage return, newline, vertical tab can’t be typed easily. To use these non-printing characters in patterns or replacement strings we have to escape them using the backward slash() character.

Common escape sequences in a cheat sheet :

Escape sequence meaning Example Notes / Result / Explanation
\n Newline sed 's/,/\n/g' bank.csv Insert a newline after each comma (on one line input)
\t Horizontal tab sed 's/ /\t/g' students.txt Convert spaces to tabs:
\r Carriage return Useful in Windows CRLF line endings
\\ Literal backslash sed 's/,/\\/g' bank.csv Insert a backslash before each student ID
\/ Escaping a Slash, needed when using / as delimiter / echo "/home/user/file" | sed 's/\//\\/g' gives \home\user\file
\^ Literal caret ^ echo 'a^c' | sed 's/\^/b/' abc Without escaping, ^ = start of line in regex
\$ Literal dollar echo 'abc$' | sed 's/\$/d/' abcd Without escaping, $ = end of line
\dnnn decimal ASCII value is "nnn" echo -e "p\nn\ng" | sed -n '/\d112/ p' p ascii value for p is 112
\xnnn Hex ascii value echo "Hello A world" | sed 's/A/\x42/' Hello B world Replace A with the character whose hexadecimal code is 42 (i.e., B).
\onnn octal ascii value echo "Hello A world" | sed 's/A/\102/' Hello B world Replace A with the character whose octal code is 102 (i.e., B).


10. Branching and loops

Previous Post Next Post