CS/CE218 --- Unix and C Programming. 17 September 91 HOMEWORK 2: TEXT MANIPULATION Due: 27 September 1991 (Note, as an exam over this material is scheduled for October 4, you should make every effort to complete this homework on time, so that it may be graded and returned to you before the exam.) Be sure to work this homework on Zippy or Zaphod, since there are small differences between HP-UX and other versions of Unix (e.g., on Vincent) that will cause you to get different answers on other systems. 1. (2 pts) After executing the following commands umask 077 mkdir temp and assuming that mkdir temp succeeded, a. what permissions does the command ls -ld temp show? b. why were the permissions set in that way? 2. (extra credit only) Suppose file ``junk'' already exists and is writeable by you. There are two ways that Unix could interpret the command echo 'some text' > junk it could either (i) empty out the existing file and then put ``some text'' in the existing file without removing the file first, or (ii) it could remove the file first and then create a new file. a. What does Unix actually do? b. What evidence do you have that proves that your interpretation is correct? 3. (3 pts) Consider the following pipeline (where f is a shell variable bound to some file name, and S=/usr/lib/spell). deroff -w <$f | sort -u +0 | $S/spellprog $S/hstop 1 | \ $S/spellprog $S/hlista /dev/null | sed '/^\./d' | sort -u +1f +0 Write down a pipeline using ``tee'' that captures the data flowing through each ``joint'' (|) of the above pipeline. The standard output of the original pipeline and the standard output of the pipelines using tee should be identical; that is, don't change the output of the whole pipeline. 4. a. (1 pt) Should a compiler's messages about syntax errors be sent to standard output or to the diagnostic output in Unix? b. (2 pts) Why? 5. (4 pts) Unix compilers typically don't produce a listing of the source program being compiled; indeed there is no way to produce a listing using either cc or gcc. a. Where (what file descriptor) do the C compilers ``cc'' and ``gcc'' on Zippy or Zaphod send their messages about syntax errors? b. If what you found in part (a) does NOT agree with your answer to problem (4a), explain why you think the compiler writers chose the file descriptor (stdout vs. stderr) that they did? 6. (2 pts) Execute the command /bin/sh >junk a. Where (what file descriptor or file) does the shell (/bin/sh) send its prompts? b. Where does it send its error messages? (Note: to get out of this experiment, type ^D or ``exit''.) 7. (extra credit only) What happens if you execute the following? tee inputs | /bin/sh | tee outputs a. Does the shell prompt you? b. Why? c. Do you think the shell is doing the right thing? 8. (2 pts) In his ``note'' command, Prof. Leavens has a sed command to extract an emacs outline from a file, which is essentially as follows. sed -n -e '/^\*/p' file.txt This prints only (-n) the lines that match the regular expression \* at the beginning of a line. Without using sed, but using some other command described in Chapter 6, how could you do the same thing? (Show the command and its arguments.) 9. (extra credit only) Use the ``time'' command to see whether using sed or your alternative in problem 7 is faster for some of the files in $PUB/lectures/*/*.txt. (You should count the user and sys times, as the real time is a measure of the wall clock time that varies depending on how many people are logged in, etc.) 10. This problem requires you to do some simple relational data base manipulations using the commands described in chapter 6. This problem will help you develop your skill in writing pipelines (Unix ``one-liners'') and show you some of the things that can be done from the shell. The files you'll need are all in the directory /home/cs218/public/hw2; a description follows. Files named ``people'', ``occupation'', and ``live'' are provided in the directory /home/cs218/public/hw2 File ``people'' contains 4 columns of data in the following format: idnumber firstname middlename lastname File ``live'' contains 2 columns of data in the following format: idnumber city File ``occupation'' contains 2 columns of data in the following format: idnumber occupation You'll have to investigate to see what the column separators are. Write a command to do each of the following: (If possible, do each part with one pipeline. You may want to use a file containing your commands as a shell script to develop your code, see Chapter 13.) Make sure you list all the required commands with their arguments in full. a. (4 pts) Write a command to print all people's names (names only), in sorted order. b. (4 pts) Write a command to print all people's ids (ids only), in sorted order. c. (6 pts) print all people's names in the following format: lastname, firstname x. where x is the first letter of the middle name (note the exact spacing, single blanks, required and the period) (for example: one line should be ``Reedy, Margaret E.'') (Hint, you will need to use sed for this.) d. (6 pts) print a table of each student's id, followed by their name. (Student is their occupation; don't print people who aren't students.) Hint: read the manual page for the Unix command ``join'' to see how to build a table of people and their occupations. You won't need to use any of the options of join. e. (8 pts) print a table of each student's id and their names, but only for the students that live in Ames. Hint: you'll also need to use ``join'' for this.