CS/CE 218 Lecture -*-Outline-*- for text-filters note: when I taught from this, I didn't draw the diagrams, except the first one Doing so would help the students, but would take more time.... connection: we have finished the introductory section of the course. we learned about input to unix, how to invoke commands, how to find out information about specific commands, and about files and directories, the basic data manipulated by unix. Thus far we've been looking at ways that you, a human, can create, modify, and observe directories and files. But the real strength of Unix is that it amplifies our feeble fingers by letting us manipulate directories and files under the control of our programs. Manipulation under program control is the topic of our next large section of the course: Shell programming. * Unix text filters and pipelines advert: Shell programming is not only useful in and of itself, it's also the medium through we'll describe the basic conventions for Unix programs: "shape" of a unix process, powerful standard way to construct software, very influential helps make tools easy to combine Similar ideas in MS-DOS,... (copied from Unix) but not applied as consistently as a demonstration below, use a file containing the following #! /bin/sh echo "stdout" echo "stderr" 1>&2 ** plumbing connections of a process a process is a program that is running (has space, takes up CPU) Q: If a process were a piece of plumbing, what shape would it have? /dev/tty ^ | 2=stderr ======== | ======= /dev/tty ---> --> /dev/tty 0=stdin ================== 1=stdout a process in Unix can be viewed as a pipe, more accurately as a T it reads from one standard file, writes another, and sends errors to a third Q: What are the standard files that a process uses? diagnostic output often called standard error does a program have to follow these conventions? no, but most do general exceptions are ``display'' programs (emacs) Q: Why do you think diagnostic output is distinguished from regular output in Unix? to carry the plumbing analogy further, water comes from a spring, etc. eventually goes to a waste treatment plant... What are some sources of input in Unix? terminal also has a file name: /dev/tty What are some sinks for output? the ultimate sink is /dev/null (like NUL in MS-DOS) normal connections of a process run by the shell... ------------ $ testio stdout stderr ------------ /dev/tty ^ | | 2=stderr 0=stdin ======== | ======= 1=stdout /dev/tty ---> testio --> /dev/tty ================== *** redirection To carry out the plumbing analogy, we should be able to divert the water (data) into "buckets" (files) or to pump it out of other buckets it can be redirected to other files e.g., sed -e 's/dis//g' output copies the file input to output, changing disorder to order **** output As in MS-DOS Q: Where does the standard output go by default? to redirect the standard output, you use > and then a pathname... Q: What does the character that is used to redirect standard output mean to you pictorially? Is it well-chosen? ------------- $ testio >junk stderr $ cat junk stdout ------------- picture of execution of testio >junk /dev/tty ^ | 2=stderr ======== | ======= /dev/tty --> testio --> junk 0=stdin ================== 1=stdout when you say foo >bar, bar is created as an empty file, this means that if bar exists it's emptied, then a new one... Q: Do you agree with the way the shell handles redirection of output to an existing file? To add to the end of an existing file instead, you use >> Q: In what situations would you want to use >> instead of >? **** input Q: Where does the standard input come from by default? You may need to tell a program you don't want to talk to it anymore. What character indicates end-of-file (by default)? ^D (not ^Z as in MS-DOS) redirection of input is similar to redirection of output, except the sign is different (hence < instead of >) and it hardly makes any sense to create the input file picture of execution of foo foo --> /dev/tty 0=stdin ================== 1=stdout It's possible (as in the example) to use both input and output redirection in the same command. Q: Which is done first: creation of files for output redirection or reading input files? so what happens if you say sort bar? redirection has important implications for programmers: prompting: stdin might not be a person at a keyboard should test to see if stdin is a terminal prompts sent to stdout or stderr might not be seen! so prompts should be sent to /dev/tty **** diagnostic output (section 4.7) where does stderr go by default? like standard output, can also redirect stderr in the shell Note: an advantage of the bourne shell you can't easily redirect stderr in the cshell! it only allows you to send both stderr and stdout to same place Q: How do you send error messages to a file instead of the terminal? 2> errfile This is curious, can it be used in general? yes, 1> f is like >f 0errs stdout $ cat errs stderr --------------- picture of execution of testio 2>errs errs ^ | 2=stderr ======== | ======= /dev/tty --> testio --> /dev/tty 0=stdin ================== 1=stdout ------------- $ testio 2>errs 1>junk $ cat junk stdout $ cat errs stderr $ ------------- *** duplication of file descriptors (&1, &2, ...) the file descriptors of a child process are inherited from the shell can treat a file descriptor as a file for redirection e.g., 2>&1 makes 2 go to a "duplicate" of fd 1, i.e., stdout 1>&2 makes stdout go to the error stream Q: If "echo" sends output to standard output, how would you use it to send an error message to the diagnostic output? echo "$errmsg" 1>&2 or simply echo "$errmsg" >&2 ---------------- $ testio 1>&2 stdout stderr $ testio 2>&1 stdout stderr ---------------- picture of the execution of testio 1>&2 /dev/tty ^ /|------------- / | \ | | 2=stderr | 0=stdin ======== | ======= 1=stdout /dev/tty ---> sh --> /dev/tty | =================== | | | / | \ / | \ / | \ / | | 2=stderr | | ======== | ======= | -> testio -- 0=stdin ================== 1=stdout picture of the execution of testio 2>&1 /dev/tty ^ | | 2=stderr 0=stdin ======== | ======= 1=stdout /dev/tty ---> sh --> /dev/tty | ================== /| | / | | ------ | | / | | / | | | 2=stderr | | ======== | ======= | -> testio -- 0=stdin ================== 1=stdout **** order of duplication and redirection matters consider the following examples (>&2 is same as 1>&2) ------------- $ testio >&2 2>errs stdout $ cat errs stderr ------------- picture of the execution of testio >&2 2>errs /dev/tty ^ |\------------ | \ |2=stderr | 0=stdin ======== | ======= 1=stdout /dev/tty ---> sh --> /dev/tty | ================== | | / | / | errs / | ^ / | | 2=stderr | | ======== | ======= | -> echo -- 0=stdin ================== 1=stdout ------------- $ testio 2>errs >&2 $ cat errs stdout stderr ------------- picture of execution of testio 2>errs >&2 /dev/tty ^ | | 2=stderr 0=stdin ======== | ======= 1=stdout /dev/tty ---> sh --> /dev/tty | ================== | | | errs <-------- | ^ ^ | | 2=stderr | | ======== | ======= | -> testio -- 0=stdin ================== 1=stdout ------------- $ testio 2>&1 >junk stderr $ cat junk stdout $ testio >junk 2>&1 $ cat junk stdout stderr ------------- picture of the execution of testio 2>&1 >junk /dev/tty ^ | | | 2=stderr 0=stdin ======== | ======= 1=stdout /dev/tty ---> sh --> /dev/tty | ================== / | / | --->-- | / | / | | 2=stderr | ======== | ======= -> testio --> junk 0=stdin ================== 1=stdout picture of execution of testio >junk 2>&1 /dev/tty ^ | | | 2=stderr 0=stdin ======== | ======= 1=stdout /dev/tty ---> sh --> /dev/tty | ================== | | | ---------->-| | ^ | | | 2=stderr | | ======== | ======= | -> testio --> junk 0=stdin ================== 1=stdout ** pipelines (section 4.5, 4.6) plumbing is most useful to transport water over long distances, without having to put it in buckets similarly, "pipelines" can put data through its paces without having to store the intermediate results in buckets *** examples ----------------------- echo iron copper | sed -e 's/.*/gold/' echo these are different args | tr ' ' '\012' | sort find . -type f -mtime $N -print | sed -e 's!\./!!' ----------------------- last finds files modified in last N days, the sed takes the ./ out of their names ----------------------- deroff -w <$f | sort -u +0 | \ /usr/lib/spell/spellprog /usr/lib/spell/hstop 1 | \ /usr/lib/spell/spellprog /usr/lib/spell/hlista /dev/null |\ sed '/^\./d' | sort -u +1f +0 ----------------------- spell program basically like this; $f filtered into words, then sorted, then checked for various kinds of matches, then .'s are deleted then output is sorted easier to sort twice than to get programs to produce sorted output *** principles of pipelining Q: What are some of the advantages of using pipelines instead of storing intermediate results in temporary files? don't have to name and remove temporary files parallel processing (overlap between I/O and computing) may be really parallel on a multicomputer a "piece of a pipeline" is an important thing in Unix, so it has to have a name... Q: What is a "step of a pipeline" called? Can all commands be used as filters? no, it's possible to write one's that can't be. but not good style! Should always be able to use as filter if possible If a command can take one file argument, it should recognize no args as "standard input" If a command can sensibly write to standard output, it should do so by default *** more intricate plumbing with tee to carry the analogy of pipelines further, it would be nice to be able to split a stream and do several things with it (in parallel) the shell doesn't allow this in full generality, but the command tee allows you to capture intermediate stages of a pipeline Q: How would you make a script of your terminal session using tee? tee inputs | ksh -i 2>&1 | tee outputs Q: How else could you do the job that tee does without using it? one can imagine a graphical interface that allows more interesting plumbing to be set up...