CS 228 meeting -*- Outline -*- * searching random access collections (HR 12.4) this section studies searching in arrays (random access collections) the difference between linear and binary search is in whether the data are ordered or not ** linear search (pp. 556-7) ------------------------------------------ LINEAR SEARCH PROBLEM // Poison.h #ifndef Poison_h #define Poison_h 1 #include "String.h" struct Poison { String name; // the key String treatment; }; #endif // LinearSearch.h #include "Poison.h" #include "bool.h" extern void LinearSearch( const Poison arr[], int size, const String & key, Boolean & found, int & loc); // PRE: 0 <= size // && 0..size-1 are legal indexes of arr // MODIFIES: found, loc // POST: found --> arr[loc].name == key // (NOT found) --> there is no element // in arr[0..size-1] with name field key ------------------------------------------ Q: What does a caller have to do to get the treatment? LinearSearch(poisonDB, DB_SIZE, poisonName, found, loc); if (found) { cout << poisonDB[loc].treatment << endl; } else { cout << "Sorry, better get to a hospital!" << endl; } Q: Why not return a String (the treatment) instead of void? might not be one Q: Why return the index of the record, instead of the treatment? less useful, for example, can't update the treatment this way. ------------------------------------------ LINEAR SEARCH ALGORITHM // LinearSearch.C #include "LinearSearch.h" void LinearSearch( const Poison arr[], int size, const String & key, Boolean & found, int & loc) { ------------------------------------------ loc = 0; // INV: arr[0..loc-1].name != key && loc <= size while (loc < size && arr[loc].name != key) { loc ++; } // ASSERT: arr[0..loc-1].name != key && loc <= size // && either loc == size or arr[loc].name == key found = (loc < size); } We've seen this before, it should be automatic. Have them fill in the assertions!! Q: What's the time complexity of this? linear, hence the name Q: Is that good for the poison hotline, where every second counts? No (:-) ** binary search Q: Suppose you had a copy of the "Poison Dictionary". How would you look up the poison for the caller? 2 basic ways: - guess about where it is, then correct using linear search actually a lot like hashing - since it's ordered, you can open to the middle and compare keys to see if you use the right or left this is like binary search Idea: each probe elminates 1/2 of the remaining stuff to search, so only do logarithmic number of probes Needs to be a sorted array, however ------------------------------------------ BINARY SEARCH OF SORTED ARRAY // BinarySearch.h #include "Poison.h" #include "bool.h" extern void BinarySearch( const Poison arr[], int size, const String & key, Boolean & found, int & loc); // PRE: 0 <= size <= INT_MAX / 2 // && 0..size-1 are legal indexes of arr // && arr is sorted in increasing order // MODIFIES: found, loc // POST: found --> arr[loc].name == key // (NOT found) --> there is no element // in arr[0..size-1] with name field key ------------------------------------------ The implementation, besides assuming more about its input, also keeps more information in its local data Idea: more restrictive assumptions leads to speedup (trading generality and space for time) in binary search we make heavy use of the fact that arr is sorted ------------------------------------------ BINARY SEARCH ALGORITHM // BinarySearch.C #include "BinarySearch.h" void BinarySearch( const Poison arr[], int size, const String & key, Boolean & found, int & loc) { ------------------------------------------ draw a picture of the execution as you explain the loop, showing where lowerIndex and upperIndex are. talk about the invariant at end of loop int lowerIndex = 0; int upperIndex = size - 1; loc = upperIndex / 2; // INV: IF key is in arr[0..size-1].name, // THEN key is in arr[lowerIndex..upperIndex].name while (lowerIndex <= upperIndex && arr[loc].name != key) { if (key > arr[loc].name) { lowerIndex = loc + 1; } else { upperIndex = loc - 1; } // ASSERT: size <= INT_MAX / 2 /* hence */ // ASSERT: upperIndex + lowerIndex <= INT_MAX loc = (upperIndex + lowerIndex) / 2; } // ASSERT: IF key is in arr[0..size-1].name, // THEN key is in arr[lowerIndex..upperIndex].name // && either NOT(lowerIndex <= upperIndex) or arr[loc].name == key found = (lowerIndex <= upperIndex); } Q: (if time) What would this look like written recursively? Hint, you'll need a helping function. *** pictures Draw one executions picture (see page 559 for examples), and then have them do at least one by themselves. Draw both successful and unsuccessful searches *** analysis (HR 12.5) Reemphasize the running time is O(log size) ------------------------------------------ ANALYSIS OF SEARCH ALGORITHMS def: a *best case analysis* gives ideal (smallest) cost. Technique Best Case Cost linear search binary search ------------------------------------------ ... O(1) ... O(1) ------------------------------------------ def: a *worst case analysis* gives maximum (largest) cost. Technique Worst Case Cost linear search binary search ------------------------------------------ ... O(size) ... O(log size) This is what we usually mean by complexity. ------------------------------------------ def: a *average case analysis* gives a representative cost. Technique Average Case Cost linear search binary search ------------------------------------------ ... O(size) ... O(log size) For linear, the average cost is size/2 comparisons. For binary, the average cost is still log_2 size Q: When size is 100,000, how many comparisons are needed in worst case for binary search? 42, log_2 100,000 = 21, but make 2 comparisons each iteration. Q: is there any way to get rid of that extra comparison in each loop? yes, by noting that average and worst case times the same, don't try to "get out early" by comparing with !=, just make indexes meet ------------------------------------------ IMPROVED BINARY SEARCH // ImprovedBinarySearch.C #include "ImprovedBinarySearch.h" void ImprovedBinarySearch( const Poison arr[], int size, const String & key, Boolean & found, int & loc) { int lowerIndex = 0; int upperIndex = size - 1; // INV: if key in arr[0..size-1].name, // key in a name field of // arr[lowerIndex..upperIndex] // && if 0 < size, then // lowerIndex <= upperIndex while (lowerIndex < upperIndex) { loc = (upperIndex + lowerIndex) / 2; if (key > arr[loc].name) { lowerIndex = loc + 1; } else { // ASSERT: upperIndex = loc; } } // ASSERT: // loc = lowerIndex; found = (0 < size && arr[loc].name == key); } ------------------------------------------ Q: Can you fill in the assertions? (have them do that) draw pictures of executions (or have them) Q: To summarize, what's being saved here over the old binary search? a lot of calls to String::operator !=, which can be expensive.