The Unix Shell

Sort a list of numbers from smallest to largest

Run this script and look at the order of the numbers it prints. Is the smallest number first? What order would you expect?

#!/usr/bin/env bash
# Print the numbers in numericsort.txt in ascending order.

sort numericsort.txt

Show explanation

The bug is calling sort without the -n flag, which causes lexicographic (alphabetical) ordering so 10 appears before 2.

Shows: the difference between alphabetical and numeric sort and when to use sort -n.

To find it: look at where 10 appears in the output — if it comes before 2, the sort is alphabetical because "1" < "2" at the first character. Run sort -n on the same input and compare.

Collect a summary line from each file in a directory

Run this script and then look at summary.txt. How many lines does it contain? How many did you expect?

#!/usr/bin/env bash
# Collect the first (header) line of each .dat file into summary.txt.

for f in overwrite_a.dat overwrite_b.dat overwrite_c.dat
do
    head -n 1 "$f" > summary.txt
done

Show explanation

The bug is using > inside the loop, which overwrites the file on every iteration instead of appending to it, so the summary contains only one entry.

Shows: the difference between > (overwrite) and >> (append).

To find it: run the script and then count the lines in summary.txt with wc -l summary.txt. If only one line appears instead of one per file, look inside the loop for > and check whether it should be >>.

Copy monthly notes files to an archive directory

Look at the list of .txt files in the directory. Which files does *.txt match? Are all of them files you want to copy?

#!/usr/bin/env bash
# Copy the monthly notes files to the archive directory.
# Monthly notes are notes_jan.txt and notes_feb.txt.
# notes_archive.txt is the running archive and should NOT be copied.

cp *.txt archive/

Show explanation

The bug is that *.txt matches every .txt file in the directory, including the running archive file itself, so the script copies more than the monthly notes files.

Shows: how to check what a wildcard matches before using it in a destructive command, and how to narrow a pattern (e.g., notes_???.txt) to match only the intended files.

To find it: before running the script, run echo *.txt in the directory and read the expansion. If the list includes files you do not want to copy, narrow the pattern before using it.

Count distinct species in a CSV file

Run this script and check the count it reports. Then look at the species column in species.csv. How many distinct species are there, and does the count match?

#!/usr/bin/env bash
# Count the number of distinct species recorded in species.csv.

cut -d, -f2 species.csv | uniq | wc -l

date,species,count
2024-03-01,sparrow,12
2024-03-01,robin,5
2024-03-02,sparrow,8
2024-03-02,crow,3
2024-03-03,robin,7
2024-03-03,sparrow,10
2024-03-04,crow,4
2024-03-04,robin,6
2024-03-05,sparrow,9
2024-03-05,crow,2

Show explanation

The bug is piping directly to uniq without sorting first. Only adjacent identical lines are collapsed, so non-adjacent occurrences of the same species are counted separately and the reported count is too high.

Shows: that uniq only removes adjacent duplicates and that sort | uniq is the correct pattern for counting distinct values.

To find it: run sort species.csv | uniq | wc -l and compare the count to cat species.csv | uniq | wc -l. If the two counts differ, non-adjacent duplicates are being missed by uniq alone.

Count words in files whose names contain spaces

Run this script with a filename that contains a space (e.g., "field notes.txt"). Does it process the file correctly?

#!/usr/bin/env bash
# Count the lines in each text file passed to this script.

for f in "$@"
do
    wc -l $f
done

Show explanation

The bug is using $f without quotes, so the shell splits the filename on the space and passes the two halves as separate arguments to wc, causing the loop to fail for any filename that contains a space.

Shows: why loop variables should always be quoted as "$f" and how spaces in filenames require consistent quoting throughout a script.

To find it: replace the command inside the loop with two consecutive lines (i.e., echo "$f" and echo $f) then run with a filename containing a space. The unquoted version prints the name split into two words; the quoted version prints it as one.

Filter lines from an input file into an output file

Run the script with a small input file and the name of an output file. Look at which file was created and which file was modified.

#!/usr/bin/env bash
# Usage: wrongarg.sh input_file output_file
# Copy the first 20 lines of input_file into output_file.

head -n 20 "$2" > "$1"

Show explanation

The bug is that the arguments $1 and $2 are swapped, so the script reads from the output path and writes to the input path instead.

Shows: how to verify which argument is which by reading the usage comment, and how to use echo to print argument values before acting on them.

To find it: replace the grep command temporarily with echo "reading from $1, writing to $2" and run the script. If the output shows the input and output filenames in the wrong positions, swap $1 and $2.

Print the most recent entries from a log file

Run this script and note which rows are printed. Are they from the beginning or end of the file?

#!/usr/bin/env bash
# Print the last 5 data rows of headtail.txt (excluding the header line).

head -n 5 headtail.txt

experiment_id,temperature,pressure
E001,22.1,101.3
E002,23.4,100.8
E003,21.9,102.1
E004,24.0,99.5
E005,22.8,101.7
E006,23.1,100.2
E007,21.5,102.8
E008,24.3,99.1
E009,22.6,101.4
E010,23.9,100.5

Show explanation

The bug is using head (which prints the first N lines) instead of tail (which prints the last N lines), so the script shows the oldest entries instead of the most recent ones.

Shows: the difference between head and tail and how to combine them, for example tail -n 5 for the last 5 or head -n 10 | tail -n 5 for lines 6-10.

To find it: run head -n 5 logfile.txt and compare the timestamps to the script's output. If both show the earliest timestamps, the wrong command is being used. Run tail -n 5 logfile.txt to confirm those are the most recent entries.

Save output to a shared directory one level up

Read the comment at the top of the script. Map out the expected directory structure on paper. How many levels up does ../.. go? Is that where shared/ lives?

#!/usr/bin/env bash
# Run this script from inside results/2024/.
# Save a sorted copy of measurements.txt to the shared/ directory,
# which is one level above the current directory (i.e., results/shared/).

sort -n measurements.txt > ../../shared/sorted.txt

depth_m,temp_c,salinity
0,18.4,35.1
10,17.9,35.2
20,17.1,35.4
50,14.3,35.7
100,10.8,36.0
200,7.2,36.3

Show explanation

The bug is writing ../.. when the target is only one level up (..), so the script saves output two levels up instead of one.

Shows: how to trace relative paths by counting directory levels, and how to use pwd and ls .. to verify the directory structure before running a script.

To find it: add echo "saving to: $(pwd)/../../shared/" before the output command and run the script. Count how many directory levels ../../ climbs in the printed path and compare that to a sketch of the directory tree you drew on paper.

Count the records in a data file

Run this script and look at the numbers it prints. Do they match the number of lines (records) in the file?

#!/usr/bin/env bash
# Report the number of observation records in wcflag.txt.
# Each line is one record.

wc -w wcflag.txt

sparrow observed at grid B4
robin observed at grid A2
crow observed at grid C1
sparrow observed at grid D3
robin observed at grid B1
crow observed at grid A4
sparrow observed at grid C3

Show explanation

The bug is using wc -w (count words) instead of wc -l (count lines), so the script prints a much larger number than the number of records.

Shows: the difference between wc flags and how to use wc --help to check which flag produces which count.

To find it: run both wc -l file.txt and wc -w file.txt on the same file and compare the two numbers. If the script is printing the larger number, it is counting words rather than lines.

Find all CSV files in a directory tree

Run this script from a directory that contains at least one .csv file. What arguments does find actually receive? Use echo in place of find to check.

#!/usr/bin/env bash
# Find all CSV files anywhere under the data/ directory.

find data -name *.csv

Show explanation

The bug is passing *.csv without quotes, so the shell expands the glob before find runs and the command searches for files whose names match already-expanded filenames from the current directory.

Shows: that the shell expands unquoted wildcards before passing them to any command, and that patterns given to find -name must be quoted.

To find it: replace find with echo find and run the command. The shell will print the arguments find would receive. If *.csv was expanded, the argument list shows specific filenames rather than the literal pattern *.csv.

Extract family names from a tab-separated file

Run this script and examine the output. Does each output line contain just the family name, or does it contain the whole row?

#!/usr/bin/env bash
# Extract the family name (first column) from the tab-delimited roster.

cut -d, -f1 cutdelim.txt

Smith	Jane	Biology
Jones	Tom	Chemistry
Garcia	Maria	Physics
Chen	Wei	Biology
Okafor	Amara	Chemistry

Show explanation

The bug is specifying -d, (comma delimiter) when the file uses tabs. Since there are no commas, cut treats each line as a single field and returns it whole.

Shows: how to identify the actual delimiter in a file (using cat -A to show invisible characters) and how to specify a tab with -d$'\t'.

To find it: run cat -A file.txt | head -n 3 to make invisible characters visible. Tab characters appear as ^I. If ^I separates the fields but -d, specifies a comma, cut will see no delimiter and return each entire line as a single field.

Filter error lines from a log file

Run this script and read the output. Do the lines shown contain the word you were searching for?

#!/usr/bin/env bash
# Print all lines in the log that contain the word "ERROR".

grep -v "ERROR" grepinvert.txt

2024-03-01 08:02 INFO  server started
2024-03-01 08:15 ERROR disk usage above 90%
2024-03-01 09:00 INFO  backup complete
2024-03-01 09:47 ERROR connection timeout after 30s
2024-03-01 10:30 INFO  request processed
2024-03-01 11:15 ERROR out of memory in worker 3
2024-03-01 12:00 INFO  daily report sent

Show explanation

The bug is the -v flag, which inverts the match so grep shows lines that do not contain the pattern. The script prints everything except the error lines instead of just the error lines.

Shows: what -v does and how to check the result of a grep command against a small known file to confirm it is filtering in the right direction.

To find it: run the script on the sample file and check whether any output line contains the search term. If every output line lacks the search term, -v is inverting the match. Remove -v to keep only matching lines.

Count the lines in a file given on the command line

Run this script with no arguments. Does it print useful output, produce an error, or do something else? Use Ctrl-C to stop it if it appears to hang.

#!/usr/bin/env bash
# Usage: missingarg.sh filename
# Print the filename and its line count.

echo "File: $1"
wc -l $1

Show explanation

The bug is that $1 expands to nothing when no argument is given and wc -l with no filename reads from standard input, so the script hangs waiting indefinitely for keystrokes.

Shows: how positional parameters expand to empty strings when omitted, and how to check for missing arguments with echo "Usage: …" before using them.

To find it: run the script with no arguments and watch whether it hangs. Press Ctrl-C to stop it. Then add echo "Got: '$1'" as the first line — running again shows Got: '', proving $1 is empty and that wc -l is reading from stdin instead of a file.

Concatenate report sections in the correct order

Run this script and read the resulting report.txt. Does the report begin with the introduction, or with a different section?

#!/usr/bin/env bash
# Assemble the three sections of the report in order: introduction, methods, results.

cat section3.txt section1.txt section2.txt > report.txt

Introduction
============
This report summarises findings from the 2024 field season.
Three sites were surveyed between January and March.

Methods
=======
Samples were collected using standard protocols.
Each site was visited on three consecutive days.
Temperature and salinity were recorded at each visit.

Results
=======
Site A showed elevated salinity compared to baseline.
Site B was within normal range on all measures.
Site C had anomalously low temperatures on day 2.

Show explanation

The bug is that the filenames are listed in the wrong sequence on the cat command line, so the sections appear in the wrong order.

Shows: that cat concatenates files in the order they are given, and how to verify the result with head before treating the output as correct.

To find it: run head -n 3 report.txt after the script finishes. The first few lines should come from the introduction file. If they do not, compare the order of filenames on the cat command line to the intended section order.

Append a summary line from each file to a report

Run this script once, then run it again. How many lines does summary.txt contain after the first run? After the second run?

#!/usr/bin/env bash
# Collect word counts for all .dat files into summary.txt.
# Running this script a second time should produce the same summary,
# not a file with doubled entries.

for f in *.dat
do
    wc -w "$f" >> summary.txt
done

Show explanation

The bug is using >> inside the loop without clearing the file first, so each run appends to whatever the previous run left and the file contains double the expected entries after a second run.

Shows: how to decide between > and >>, and the pattern of redirecting the first write with > or removing the output file before the loop begins.

To find it: run the script twice and count the lines in summary.txt with wc -l after each run. If the count doubles on the second run, the file is not being cleared before the loop. Remove the file or use > on the first write before switching to >> for subsequent writes.

List the three largest files in a directory

Run this script in a directory that has files of different sizes. Compare the output to ls -s | sort -rn | head -n 3. Are the results the same?

#!/usr/bin/env bash
# List the three largest files in the current directory by block size,
# largest first.

ls -s | sort -r | head -n 3

Show explanation

The bug is using sort -r, which reverses alphabetical order rather than numeric order. A file of size 100 blocks sorts as smaller than 8 because "8" comes after "1" in alphabetical order, so the three largest files are listed in the wrong order.

Shows: that -r alone reverses the current sort order and that -rn is needed to sort numbers in descending order.

To find it: run ls -s in the directory and note the block counts for a few files. If the script output shows a file with size 100 below a file with size 8, the sort is alphabetical rather than numeric. Compare with ls -s | sort -rn | head -n 3.

Extract ten data rows from a CSV file

Run this script and count the data rows in the output. Then count the data rows in the original file. Are they the same?

#!/usr/bin/env bash
# Extract the header row plus the first 10 data rows from headcount.csv
# (11 lines total: 1 header + 10 data).

head -n 10 headcount.csv

site,reading,value
A,1,42.3
A,2,38.9
A,3,55.1
B,1,31.7
B,2,29.4
B,3,33.8
C,1,47.2
C,2,51.0
C,3,44.6
D,1,28.3
D,2,30.1

Show explanation

The bug is head -n 10 when the header itself is one of the ten lines, so only nine data rows remain instead of ten.

Shows: how to account for header lines when counting with head and how to use wc -l to verify the actual line count of the output.

To find it: run wc -l on both the script's output and the original file. If the output has 10 lines total but you expected 10 data rows plus a header, the header was counted as one of the 10 lines. Use head -n 11 to include the header and 10 data rows.

Process a list of filenames passed as script arguments

Run this script with a filename that contains a space. Does it process the file, or does it report an error about a non-existent file?

#!/usr/bin/env bash
# Usage: quotedall.sh file1 file2 ...
# Count the lines in each file passed as an argument.

for f in $@
do
    wc -l "$f"
done

Show explanation

The bug is using $@ without quotes, which causes word-splitting so each space-separated token is treated as a separate argument and the script fails for any argument that contains a space.

Shows: the difference between $@ and "$@": the quoted form preserves each argument as a single token, even if it contains spaces.

To find it: add echo "Processing: '$1'" inside the loop and run with a filename containing a space. The unquoted $@ version prints two separate single-word arguments for one filename; the quoted "$@" version prints one argument with the space preserved.

Count log files in a directory tree

Create a directory whose name ends in .log (e.g., mkdir debug.log). Now run this script. Does the count include the directory?

#!/usr/bin/env bash
# Count all log files (and only log files) under the current directory.

find . -name "*.log" | wc -l

Show explanation

The bug is that -name "*.log" matches any filesystem entry, not just regular files, so directories whose names end in .log are included in the count.

Shows: the use of -type f to restrict find results to regular files and -type d to restrict to directories.

To find it: create a directory whose name ends in .log — e.g., mkdir debug.log — then run find . -name "*.log" and see whether that directory appears in the output. Add -type f and rerun to confirm it is excluded.

Copy a directory to a backup location

Run this script. Does it copy the directory, or does it produce an error message? What does the error message say?

#!/usr/bin/env bash
# Back up the entire results/ directory to backup/results/.

cp results/ backup/results/

Show explanation

The bug is calling cp without the -r flag, which is required to copy a directory and its contents recursively, so the script fails with the message "omitting directory".

Shows: the difference between copying a file and copying a directory, and how to read cp --help to find the right flag.

To find it: run the script and read the error message. It says cp: -r not specified; omitting directory, i.e., the message names the missing flag directly.