CS 228 meeting -*- Outline -*-

* searching random access collections (HR 12.4)

	this section studies searching in arrays (random access collections)

	the difference between linear and binary search is in whether
		the data are ordered or not

** linear search (pp. 556-7)

------------------------------------------
        LINEAR SEARCH PROBLEM

// Poison.h
#ifndef Poison_h
#define Poison_h 1
#include "String.h"
struct Poison {
  String name;      // the key
  String treatment;
};
#endif

// LinearSearch.h
#include "Poison.h"
#include "bool.h"

extern void LinearSearch(
	     const Poison arr[],
	     int size,
	     const String & key,
	     Boolean & found,
	     int & loc);
  // PRE: 0 <= size
  // && 0..size-1 are legal indexes of arr
  // MODIFIES: found, loc
  // POST: found --> arr[loc].name == key
  // (NOT found) --> there is no element
  // in arr[0..size-1] with name field key
------------------------------------------

	Q: What does a caller have to do to get the treatment?
		LinearSearch(poisonDB, DB_SIZE, poisonName, found, loc);
		if (found) {
		  cout << poisonDB[loc].treatment << endl;
		} else {
		  cout << "Sorry, better get to a hospital!" << endl;
	        }

	Q: Why not return a String (the treatment) instead of void?
		might not be one

	Q: Why return the index of the record, instead of the treatment?
		less useful, for example, can't update the treatment this way.

------------------------------------------
          LINEAR SEARCH ALGORITHM

// LinearSearch.C
#include "LinearSearch.h"

void LinearSearch(
	     const Poison arr[],
	     int size,
	     const String & key,
	     Boolean & found,
	     int & loc)
{


------------------------------------------
  loc = 0;
  // INV: arr[0..loc-1].name != key && loc <= size
  while (loc < size && arr[loc].name != key) {
    loc ++;
  }
  // ASSERT: arr[0..loc-1].name != key && loc <= size
  // && either loc == size or arr[loc].name == key
  found = (loc < size);
}

	We've seen this before, it should be automatic.

	Have them fill in the assertions!!

	Q: What's the time complexity of this?
		linear, hence the name

	Q: Is that good for the poison hotline, where every second counts?
		No (:-)

** binary search

	Q: Suppose you had a copy of the "Poison Dictionary".
		How would you look up the poison for the caller?

	2 basic ways:
		- guess about where it is, then correct using linear search
			actually a lot like hashing

		- since it's ordered, you can open to the middle and compare
			keys to see if you use the right or left
				this is like binary search

	Idea: each probe elminates 1/2 of the remaining stuff to search,
		so only do logarithmic number of probes
	Needs to be a sorted array, however

------------------------------------------
	BINARY SEARCH OF SORTED ARRAY

// BinarySearch.h
#include "Poison.h"
#include "bool.h"

extern void BinarySearch(
	     const Poison arr[],
	     int size,
	     const String & key,
	     Boolean & found,
	     int & loc);
  // PRE: 0 <= size <= INT_MAX / 2
  // && 0..size-1 are legal indexes of arr
  // && arr is sorted in increasing order
  // MODIFIES: found, loc
  // POST: found --> arr[loc].name == key
  // (NOT found) --> there is no element
  // in arr[0..size-1] with name field key
------------------------------------------
	The implementation, besides assuming more about its input,
	also keeps more information in its local data

	Idea: more restrictive assumptions leads to speedup
		(trading generality and space for time)
	      in binary search we make heavy use of the fact that arr is sorted

------------------------------------------
       BINARY SEARCH ALGORITHM

// BinarySearch.C
#include "BinarySearch.h"
void BinarySearch(
	     const Poison arr[],
	     int size,
	     const String & key,
	     Boolean & found,
	     int & loc)
{


------------------------------------------
	draw a picture of the execution as you explain the loop,
		showing where lowerIndex and upperIndex are.
	talk about the invariant at end of loop

  int lowerIndex = 0;
  int upperIndex = size - 1;

  loc = upperIndex / 2;
  // INV: IF key is in arr[0..size-1].name,
  //      THEN key is in arr[lowerIndex..upperIndex].name
  while (lowerIndex <= upperIndex && arr[loc].name != key) {
    if (key > arr[loc].name) {
      lowerIndex = loc + 1;
    } else {
      upperIndex = loc - 1;
    }
    // ASSERT: size <= INT_MAX / 2
    /* hence */
    // ASSERT: upperIndex + lowerIndex <= INT_MAX
    loc = (upperIndex + lowerIndex) / 2;
  }
  // ASSERT: IF key is in arr[0..size-1].name,
  //      THEN key is in arr[lowerIndex..upperIndex].name
  // && either NOT(lowerIndex <= upperIndex) or arr[loc].name == key
  found = (lowerIndex <= upperIndex);
}

	Q: (if time) What would this look like written recursively?
		Hint, you'll need a helping function.

*** pictures

	Draw one executions picture (see page 559 for examples),
	and then have them do at least one by themselves.

	Draw both successful and unsuccessful searches

*** analysis (HR 12.5)

	Reemphasize the running time is O(log size)

------------------------------------------
      ANALYSIS OF SEARCH ALGORITHMS

def: a *best case analysis* gives ideal
     (smallest) cost.

   Technique		Best Case Cost
   linear search
   binary search
------------------------------------------
			... O(1)
			... O(1)

------------------------------------------
def: a *worst case analysis* gives maximum
     (largest) cost.


   Technique		Worst Case Cost
   linear search
   binary search
------------------------------------------
			... O(size)
			... O(log size)

	This is what we usually mean by complexity.

------------------------------------------
def: a *average case analysis* gives
     a representative cost.


   Technique		Average Case Cost
   linear search
   binary search
------------------------------------------
			... O(size)
			... O(log size)

	For linear, the average cost is size/2 comparisons.

	For binary, the average cost is still log_2 size

	Q: When size is 100,000, how many comparisons are needed in worst case
		for binary search?
		42, log_2 100,000 = 21, but make 2 comparisons each iteration.

	Q: is there any way to get rid of that extra comparison in each loop?
		yes, by noting that average and worst case times the same,
		don't try to "get out early" by comparing with !=,
		just make indexes meet

------------------------------------------
    IMPROVED BINARY SEARCH

// ImprovedBinarySearch.C
#include "ImprovedBinarySearch.h"
void ImprovedBinarySearch(
	     const Poison arr[],
	     int size,
	     const String & key,
	     Boolean & found,
	     int & loc)
{
  int lowerIndex = 0;
  int upperIndex = size - 1;

  // INV: if key in arr[0..size-1].name,
  //  key in a name field of
  // arr[lowerIndex..upperIndex]
  // && if 0 < size, then
  // lowerIndex <= upperIndex
  while (lowerIndex < upperIndex) {
    loc = (upperIndex + lowerIndex) / 2;
    if (key > arr[loc].name) {
      lowerIndex = loc + 1;
    } else {
      // ASSERT:
      upperIndex = loc;
    }
  }
  // ASSERT: 
  //
  loc = lowerIndex;
  found = (0 < size
           && arr[loc].name == key);
}
------------------------------------------
	Q: Can you fill in the assertions?
		(have them do that)

	draw pictures of executions (or have them)

	Q: To summarize, what's being saved here over the old binary search?
		a lot of calls to String::operator !=, which can be expensive.