CIS52 – File Manipulation
File Manipulation Utilities
Regular Expressions
sed, awk
© 2001 John Urrutia. All rights reserved.
1
Overview
comm – comparison of sorted files
cut – output sections of lines in a file
find – find files that match a pattern
paste – merges records in files
pr – paginate files into pages
tr – translate or delete characters
© 2001 John Urrutia. All rights reserved.
2
Overview
regular expressions
sed – Stream Editor (batch file editor)
awk – Aho,Weinberger,Kernighan (Pattern
match)
© 2001 John Urrutia. All rights reserved.
3
The comm before the storm
Compares 2 sorted files
Results reported in 3 columns
1st – records found only in file 1
2nd – records found only in file 2
3rd – records that match in both files
Options remove corresponding
columns
– [1] [2] [3]
© 2001 John Urrutia. All rights reserved.
4
comm – cont.
Either file name can be substituted
with standard input
Example:
File1
aa
dd
ee
gg
hh
File2
bb
cc
dd
ee
ff
© 2001 John Urrutia. All rights reserved.
5
comm results
File1 -12
option
-1 File2
-2
aa
dd
bb
ee
bb
dd
cc
cc
ee
gg
dd
hh
ee
ff
ff
gg
hh
Both
dd
ee
© 2001 John Urrutia. All rights reserved.
6
cut to the chase
Allows you to extract portions of
each record in a file.
Delimits data in the file into fields or
columns.
Default delimiter is the tab character
Can be changed by the –d option
© 2001 John Urrutia. All rights reserved.
7
cut cont.
cut - [b | c | [ f [-d char] [-s] ] list
[--output-delimiter=string]
b – bytes
c – characters (same as bytes)
f – fields
d – delimiter character
s– display only records with
delimiters
© 2001 John Urrutia. All rights reserved.
8
cut ! print
char – single byte used to delimit
fields in a record
list – list of range/s of characters to
display
Ranges are comma separated.
1-7 first 7 characters in record
1,7 first and seventh characters
© 2001 John Urrutia. All rights reserved.
9
cut ! print again
string – list of characters to
substitute for the delimiters.
© 2001 John Urrutia. All rights reserved.
10
cut - Example
[/@linux2 uid]$ cat file1
The quick brown fox eyed the jactitating dog
[/@linux2 uid]$ cut –f1,3,5,8 –d’ ‘ file1
The brown eyed dog
[/@linux2 uid]$ cut –f1,4-6,8 –d’ ‘ file1
The fox eyed the dog
© 2001 John Urrutia. All rights reserved.
11
find that pot of gold
find – selects all files that meet the
selection criteria in the expression
No action is taken unless it is specified
Sub-directories are scanned
automatically
The expression can be simple or
complex
© 2001 John Urrutia. All rights reserved.
12
find me something
The criteria expression:
And’s each operand separated by a
space
Or’s each operand separated by –o
Processes left to right sequentially
© 2001 John Urrutia. All rights reserved.
13
find criteria continued
Actions
-print prints the path of all files that
meet the selection criteria
-exec cmds\; executes the
commands before the \:
-ok same as –exec but must have a
Y from stdin.
© 2001 John Urrutia. All rights reserved.
14
find criteria continued again
Evaluations
-type specify a type of file (ie. directory)
-atime ±n accessed ±n days ago.
-mtime ±n modified ±n days ago.
-user uid owner of the file
-nouser uid owner is not known to
system
© 2001 John Urrutia. All rights reserved.
15
paste tastes good
paste [options] [filelist]
each record in the file is merged into 1
record
-s process filelist sequentially. All
records are processed before going to
the next file
-d [delimiter list] each character in turn
delimits the file records.
© 2001 John Urrutia. All rights reserved.
16
paste continued
[/@linux2 uid]$ cat file1
A
B
C
[/@linux2 uid]$ cat file2
1
2
3
[/@linux2 uid]$ cat file3
x
y
z
© 2001 John Urrutia. All rights reserved.
17
paste continued
[/@linux2 uid]$ paste –s
file1
file1
file2
file2
file3
file3
Output file
A
B
1
C
x
1
B
2
3
y
x
C
y
3
z
© 2001 John Urrutia. All rights reserved.
18
pr – public relations--NOT
pr paginate file(s) for printing
Can specify page attributes
Changed lines through the –l option
For multiple files each starts a new
page
© 2001 John Urrutia. All rights reserved.
19
pr – continued
pr paginate a file for printing
Creates a header and trailer
Changed through the –h option
Suppress through the –t option
Can create columns of data
–nbr Number of columns per line
–Sx Character used to separate
columns
© 2001 John Urrutia. All rights reserved.
20
pr – continued
Can create numbers for each line
–nck
c - character data separator
default is tab character
k – number of digits
© 2001 John Urrutia. All rights reserved.
21
Regular Expressions
A set of characters that define the
criteria used to identify a string
within a record.
Used by vi, grep, sed, awk, and
others.
© 2001 John Urrutia. All rights reserved.
22
tr – Translate this
tr – [c] [d] [s] [t] set1 [ set2 ]
Translate from set1 to set2
c – compliment of set1
d – delete characters found in set1
s – squeeze out duplicates
t – truncate set1 to length of set2
© 2001 John Urrutia. All rights reserved.
23
Regular Expressions
Simple strings
Bound by / … /
Interpreted literally
ie. /e D/ - matches exactly e D
Taste Dee – OK
Taste don’t – not OK
© 2001 John Urrutia. All rights reserved.
24
Regular Expressions
The • special single sub character
Matches any single character
ie. – /.eny/ matches Aeny Beny Ceny
The [ char-range ] define a character
class
The [^ char-range ] define the not-incharacter class
© 2001 John Urrutia. All rights reserved.
25
Regular Expressions
The
(asterisk)
Matches 0 or more of the preceding character.
What’s this?
/. /
/ [ a-zA-Z ] /
/ ([^)] )/
© 2001 John Urrutia. All rights reserved.
26
Regular Expressions
The /^ (for the rabbit) character
In the beginning …
The $/ (for the teacher) character
At the end …
© 2001 John Urrutia. All rights reserved.
27
Regular Expressions
Quote the raven – backslash
\.
This yields •
\\
This yields \
\*
This yields *
\[
This yields [
\]
This yields ]
\ /
This yields /
© 2001 John Urrutia. All rights reserved.
28
sed – the old Stream EDitor
sed [-n] [-fscript ] [file-list]
Copies and edits to standard output
Edits file(s) in a non-interactive mode
Gets its instructions from a script file
–f filename contains sed instructions
No option 1st command argument is used
–n suppress stdout unless specified
© 2001 John Urrutia. All rights reserved.
29
sed – the old mill stream
Record processing
1. Read record from file list
2. Read record from script (or cmd line)
3. Apply selection criteria
4. If selected perform instruction
and repeat 2 4 until no more script
5. Repeat 1 5 until no more file list.
© 2001 John Urrutia. All rights reserved.
30
He sed what!!??
Instruction format
[addr1 ] ,addr2 ] ] inst [arg-list]
Address
A line number
Regular expression
Addr1 – start
Addr2 – stop
© 2001 John Urrutia. All rights reserved.
31
Address line numbers
$ Designates the last line of the last file
1st address line number
Starts selecting records based on their
position in the input file list relative to 1.
2nd address line number
Stops selecting records when position in
the input file list is > than the line number.
© 2001 John Urrutia. All rights reserved.
32
He sed some more
Instructions
! – Not negates the address selection
sed ‘!/line/ p’ file.list
{…} – Groups the instructions for the
address selection
© 2001 John Urrutia. All rights reserved.
33
sed Instructions
p – Print now and continue
d – Delete and get the next record
q – Quit processing; Stop; Go Away
© 2001 John Urrutia. All rights reserved.
34
sed Instructions
c – Change
[addr1] [addr2] c\ yada yada yada
all selected records are replaced as a
group by the change value
a – Append
[addr1] a\ …
add the text to the end of the selected
records
© 2001 John Urrutia. All rights reserved.
35
sed Instructions
i – Insert
[addr1] a\ …
add the text to the beginning of the
selected records
n – Next
[addr1] n
writes the current, gets the next and
continues the script
© 2001 John Urrutia. All rights reserved.
36
sed Instructions
w – Write
[addr1] [,addr2] w filename
writes the selected records to a file
r – Read
[addr1] r filename
reads records from the filename and
appends them to the selected record
© 2001 John Urrutia. All rights reserved.
37
sed Instructions
s – Substitute
[addr1] [,addr2] s/ptrn /repl /[g] [p] [w f ]
for each selected record match the
pattern and replace
g – Replace all non-overlapping
occurrences
p – Print the record
w – write the record to the filename
© 2001 John Urrutia. All rights reserved.
38
Hawk – Squawk – awk
The programmable utility that does everything.
Aho – Weinberger – Kernighan
Provides:
Conditional execution
Looping
Handles:
Numeric & string variables
Regular expresions
C print facilities
© 2001 John Urrutia. All rights reserved.
39
awk
awk [–Fc] [–f] program-file [ file list ]
F – field delimiter character
f – name of the awk program file
program-file
instream instructions
List of files to process
© 2001 John Urrutia. All rights reserved.
40
awk – program lines
pattern [ action ]
Like sed pattern selects records
Record processing is the same as sed
© 2001 John Urrutia. All rights reserved.
41
awk – pattern
Patterns follow regular expression format.
~ Tests for match to regular expression
!~ Tests for NO match to regular expression
, – Establishes a pattern range all records
are processed inclusively within the range
BEGIN
executes before the first record is processed
END
executes after the last record is processed
© 2001 John Urrutia. All rights reserved.
42
awk – relational operators
< – less than
<= – less than or equal to
== – equal to
!= – not equal to
>= – greater than or equal to
> – greater than
© 2001 John Urrutia. All rights reserved.
43
awk – operators
Arithmetic
+ – addition
- – subtraction
* – multiplication
/ – division
Assignment
= – assigns value to the left
+= – adds value to the left
© 2001 John Urrutia. All rights reserved.
44
awk – boolean operators
&&
– and
||
– or
!
– not
© 2001 John Urrutia. All rights reserved.
45
awk – actions
# - Comment to the right on any line
Default action is print to stdout
Multiple actions can be taken
Use {…} to enclose multiple actions
Separate actions with ;
© 2001 John Urrutia. All rights reserved.
46
awk – actions
print variable …
Var , Var2 , Var3
Prints variables separated by delimiter
Var Var2 Var3
NO separators
“literal value “
Prints exactly everything between the “ “
© 2001 John Urrutia. All rights reserved.
47
awk – actions
printf “cntl string” variable …
Control String
\n – new line
\t – tab
%[-] [n] [.d] conv char
- left justification
n number of character
.d decimal positions
© 2001 John Urrutia. All rights reserved.
48
awk – actions
%[-] [n] [.d] conv char
- left justification
n number of character
.d decimal positions
conv char – conversion character
d - decimal, e - exponent, f - floating-point
o - octal, x - hexadecimal
s - string
© 2001 John Urrutia. All rights reserved.
49
awk – variables
awk provided variables
NF – total number of fields
$1…$n – each field in the current record
FS – input field separator
(default space or tab )
OFS – output field separator
(default space )
© 2001 John Urrutia. All rights reserved.
50
awk – variables
awk provided variables
NR – current record number
$0 – entire current record
RS – record separator
(default newline )
ORS – output record separator
(default newline )
FILENAME – name of current input file
© 2001 John Urrutia. All rights reserved.
51
awk - variables
Associative Arrays
array_name [ string ]
The array name should be meaningful
The index of the array is a string
Elements are automatically created
for ( element in array ) actions
© 2001 John Urrutia. All rights reserved.
52
awk - functions
length(string) – returns the number of
characters in string
int(num) – returns the integer portion
index(str1,str2) – returns the index of
str2 found in str1 or 0 if not present
split(str,arr,del) – populates arr[ ] from
fields in str delimited by del – returns
count of elements.
© 2001 John Urrutia. All rights reserved.
53
awk - functions
sprintf(fmt , args) – formats args using
the fmt and returns the formatted string.
substr(str , pos , len) – returns a
substring of str starting with position
pos for a length of len.
© 2001 John Urrutia. All rights reserved.
54
© Copyright 2026 Paperzz