CS/CE 218 Lecture -*-Outline-*-

Note: This wasn't a lecture delivered in class; it should have been.
It's a synopsis of what I discussed at a review session.

* How to program in the Bourne Shell

** mindset
*** Pascal programming
	I think: what are the variables?
	How to update the variables
*** Shell programming
	I think: what program am I going to run?
	Also: how to check that this program has been called correctly

** Places
	Data in a shell program is all character strings (or files).

	Q: Where can data appear in a shell program?  What kinds of places
		hold the data that a shell manipulates?

	positional parameters (command line arguments, $1, $2, ... $*)
	variables
		special variables
			name of the shell ($0)
			exit value of previous command ($?)
			process id of the shell ($$)
			number of positional parameters ($#)
	files
	standard input
	standard output

	It's interesting to consider the matrix of how to transfer data from
	one place to another:

	e.g., to move into a variable X:
	from positional parameter $1		X=$1
	from another variable, Y		X=$Y
	from file f				X=`cat f`
	from standard output of cmd		X=`cmd`
	from a line of standard input		read X

	...

** examples
	We worked 2 examples...
*** makelist (page 488 of unix book)
**** step 1: what is the problem?
	What does the data look like, what does the output look like?
	We solved this assuming that the input files had two fields,
		separated by tabs, the book apparently assumes otherwise
	Also, we assumed that the printouts didn't have to be separated...
    what is the syntax?
	the first thing is to get an idea of the paramters...
		I record this in USAGE variable:
USAGE="Usage: $0 [-t] [-m] [-p] file ..."
	Then I would write a comment describing the entire program
# makelist -- print names from the given files, with a header, omitting phone
# numbers...

**** step 2: how does the command work, ignoring the options?
	have to cut the first field from each file...
		at this point I would consult the book or manual about cut
		and options for pr...
	assuming cut works for each of it's file parmaters, can write
cut -f1 $* | pr -h'Name Listing'
	if cut doesn't take more than one file name, can write
cat $* | cut -f1 | pr -h'Name Listing'
	or
for f
do
	cut -f1 $f
done | pr -h'Name Listing'
	I believe that all the above alternatives have the same effect.
	If we want a separate pr for each file we would write:
for f in $*
do
	cut -f1 $f |pr -h'Name Listing'
done
	(one can also leave out the "in $*" above, the effect is the same).

	Don't be baffled by the several different ways of doing this,
	it's important to understand at least one.

	Let's take as our basic way of doing things the following:

# makelist -- print names from the given files, with a header, omitting phone
# numbers...
USAGE="Usage: $0 [-t] [-m] [-p] file ..."

cut -f1 $* | pr -h'Name Listing'

*** step 4: I would do a bit of testing at this point...
*** step 5: have to parse the arguments.

	have to check for the options first...
	the outline for doing that is the following code, which is standard.
				   
while test -n "$1"
do
	case "$1" in
	-t)
		#record t option
		shift
		;;
	-p)
		#record p option
		shift
		;;
	-m)
		#record m option
		shift
		;;
	-*)
		echo "$USAGE" >&2
		exit 1
		;;
	*)
		break
		;;
	esac
done

	Now, how to record the options?

	  The book shows one way,
		for each option, have a shell variable that holds either yes
			or no...
		then in the main program use an if statement
		if test $together = yes
		then
			sort $* | cut -f1 | pr -h'Name Listing'
		else
			cut -f1 $* | pr -h'Name Listing'
		fi
	The problem with the book's way is that with many options,
		code has to be repeated, as can be seen from above.
	Slightly better would be something like
		if test $together = yes
		then
			sort $*
		else
			cat $*
		fi | cut -f1 | pr -h'Name Listing'
	(Even better, by setting together to "true" instead of "yes"
		one can write if $together instead of if test $together = yes)

	I like a different approach, that of storing the name of the command
	to execute (sort or cat) in the variable in question.
	So one executes
		$SORT $* | cut -f1 | pr -h'Name Listing'
	where $SORT is either sort or cat

	Using this trick for each option gives the following program

#!/bin/sh
# makelist -- print names from the given files, with a header, omitting phone
# numbers...
USAGE="Usage: $0 [-t] [-m] [-p] file ..."

#defaults
SORT=cat
MULTIPLE=""
LP=cat

while test -n "$1"
do
	case "$1" in
	-t)
		SORT=sort
		shift
		;;
	-p)
		LP=lp
		shift
		;;
	-m)
		MULTIPLE=-3
		shift
		;;
	-*)
		echo "$USAGE" >&2
		exit 1
		;;
	*)
		break
		;;
	esac
done

$SORT $* |cut -f1 | pr $MULTIPLE -h'Name Listing' | $LP

**** exercises
	try the above, does it work?
	The above doesn't do quite the same thing as the program in the book;
	can you adapt it to do the same thing but using the same coding style?
	Which program do you think is clearer, the one in the book or this one?

*** cleanup (from hw3)
**** step 1: what is the problem
	we know this somewhat from homework 3.
# cleanup -- remove files not named *.c or [Mm]akefile under user control
	To record the usage...
USAGE="cleanup [dir]"

**** step 2: what program am I going to call?
	I'll need to find the right files, and then pass those on the
		command line to rm.
	So I look at the manual page for rm.
	I see that in the synopsis for rm, that rm takes arguments like
		rm [-ifr] file ...
	The -i option is interesting (it's discussed in the book)
	turns out that rm -i has the right behavior...

	$ rm -i foo
	foo: ? (y/n) n
	$ ls foo
	foo

	so rm -i gives the right prompts (wasn't that nice of the prof...)
	moreover, rm iterates, since it handles multiple files...

	$ rm -i foo bar
	foo: ? (y/n) y
	bar: ? (y/n) n

	this means that what I want to do is invoke rm -i with the right
	list of arguments...

	Now clearly, getting the right list of arguments is going to
	involve a pipeline, some greps, etc.
	I can get the output of that pipeline to the arguments of rm -i
	by using command substitution
rm -i `ls -l | grep '^-' | cut ...`
	where the 3 dots represent stuff I haven't written yet.

	So to complete the main part of the program, we have to figure
	out this pipeline.  Part of the pipeline was given in the first exam.
	To get the names of files, we use
ls -l | grep '^-' | cut -c55- | ...
	What we want with the list of files is all files not named *.c or
	[Mm]akefile.  So because we want names that do *not* match these
	patterns, grep -v is called for; by a little set theory we see that
	the right thing is
ls -l | grep '^-' | cut -c55- | grep -v '.*\.c$' | grep -v [Mm]akefile
	so the guts of our command is

rm -i `ls -l | grep '^-' | cut -c55- | grep -v '.*\.c$' | grep -v [Mm]akefile`

*** step 3: check the arguments
	this command has 1 optional argument, so 2 or more arguments are bad

if test $# -gt 1
then
	echo "$USAGE" >&2
	exit 1
fi

**** step 4, use the argument given, take care of the default
	if no argument is given, the files are sought in directory .
	I think it easiest to change to that directory...
	One could write
		if test $# -eq 1
		then
			cd $1
		else
			cd .
		fi
	but here the cd . is redundant, and I would leave out the whole else
		clause from the above if.
	We also have to check that the directory given is writeable,
		and issue an error message if it's not.
	So we would write something like
		if test $# -eq 1
		then
			cd $1
			if test ! -w $1
			then
				echo "$0: ERROR: directory \`$1\' is not writeable" >&2
				exit 2
			fi
		else
			if test ! -w .
			then
				echo "$0: ERROR: directory \`.\' is not writeable" >&2
				exit 2
			fi
		fi
	but notice the duplication of code for testing the directory
		and issuing the error message.
	to avoid this I prefer the following.

if test $# -eq 0
then
	set -- .   # sets $1 to be .
fi

cd $1

if test ! -w .
then
	echo "$0: ERROR: directory \`$1\' is not writeable" >&2
	exit 2
fi

	Several things to note about the above; I try to get the program
		into a single case, using the set and cd helps that;
		you can see the payoff in the test for the directory being
		writeable
	The -- in set -- . helps tell set not to think of what comes next
		as an option to the shell;
		if you say set -x, it makes the shell turn on tracing,
			consider what would happen if $X was -x and
			you said set $X, instead of set -- $X;
			the former has the effect of turning on tracing,
			the latter sets the first positional paramter to be -x

	Putting the whole program together, one gets the following..

# cleanup -- remove files not named *.c or [Mm]akefile under user control

USAGE="cleanup [dir]"

if test $# -gt 1
then
	echo "$USAGE" >&2
	exit 1
fi

if test $# -eq 0
then
	set -- .   # sets $1 to be .
fi

cd $1

if test ! -w .
then
	echo "$0: ERROR: directory \`$1\' is not writeable" >&2
	exit 2
fi

rm -i `ls -l | grep '^-' | cut -c55- | grep -v '.*\.c$' | grep -v [Mm]akefile`

**** exercises
	Try the above, does it really work?
	Can you explain the pipeline?
	Where does rm -i send its prompts?  To the terminal or standard output?
		if it doesn't send them to the terminal, adapt the above
		using a for loop...
	Does the above program give the right exit value in all cases?

** systematization
	The above program developments followed a few rules:
		understand the problem first
		understand the tools you have (the programs to call)
		get the main case working first
		check the arguments for correct syntax etc.;
			this has to be done at run-time before the real work
	What other rules are there that you see that I can't see?

	Is this the best way to go about it?
	How could the process be improved?