CS 132
Computer Science II
Sorting Out Sorts

Steven K. Andrianoff
Robert M. Harlan
David B. Levine
Computer Science Department
St. Bonaventure University
Copyright, 1996, 1999, 2000

Objective

      The objective for this lab is to explore to what degree the “rough analysis” we’ve been doing in class can enable us to make good predictions about the behavior of a program on a given computer.  We will do this by measuring the actual time taken to sort lists of various sizes using insertion, selection and quick sort, and comparing the actual growth rates of the time taken by the respective sorts to those predicted by our run time analysis. We will also compare best and average case performance of selection and insertion sort.

 

Background

 

On the basis of our run time analysis, we have argued that selection and insertion have a O(n2) average case growth rate and that quick sort has a growth rate in the best case that is O(nlog2n). We have also argued that selection sort is insensitive to the order of the list, and hence will have a O(n2) growth rate even when the list is completely sorted, whereas insertion sort should have a O(n) growth rate in the best case.

 

We now wish to put our theoretical predictions to empirical test. This lab will determine the actual amount of time 3 different sorts take to perform an internal sort on a vector of randomly generated numbers. Each sort will sort a vector of 2,000, 4,000, 8,000, 16,000, 32,000 and 64,000 integers. For quick sort, you will investigate the time required to sort data sets of size up to 1,024,000.

 

After you complete the data gathering portion of this lab, you will prepare a report summarizing the data, the predictions from our theoretical analysis, and the degree to which the theory enables us to predict the running time for various sorts.  Your report should include relevant tables of data and graphs prepared using Microsoft Excel that show the time taken by each algorithm to sort data sets of various sizes. We have chosen sample sizes that double with each increment, which will make it easy to compare the actual growth rate with the predicted one. The report should also analyze the actual growth rate of each algorithm and compare it with the predicted one.

 

Your report should also compare the best case growth rates of insertion and selection sort by running the sorts on sorted arrays of up to 64,000 elements; again, tables and graphs are expected.

Instructions:

  1. Begin by copying the program SortDemo.exe from the CSNet server to your local machine..  


  2. The program you have just built will allow you to store a large number of randomly ordered integers and will allow you to sort them using insertion, selection or quick sort. It will report the time taken to perform the sort.

     

    The interface is, alas, primitive. The program does protect for an array that is too small or too large. It does not protect against an upper bound for integer values that is too large. Remember that the largest signed integer is 32767.

     

    Sorttest prompts for the number of elements to sort (n) and for the upper bound of the size of each element.  You may view the array at any time -- prior to a sort to verify that it is randomly ordered, or after a sort to verify that it has been sorted. You select the sort you wish to use. The program will report the time taken to sort the array in seconds. The timer permits measurement to the sixtieth of a second, so the results are fairly fine grained, but small data sets may be sorted so quickly that the time reports 0.  Verify that you understand the user interface by creating a data set of size 100, printing it to see it, sorting it (using any sort) and then printing it again.  Once this works the way you anticipate, you are ready to begin the “real” portion of the lab.

     

    Note that to sort an ordered list, run the sort on the list that has been sorted. Make sure, however, that the list is randomized if you wish to perform a test on an unordered list


  3. Run each of the sorts on a data set of size 2000 – 64000, doubling each set size as you go.  Time each sort on both randomly ordered data and sorted data.  Construct tables to record your test results on ordered and unordered data sets, the rows representing sorts and the columns the number of items sorted.

     

    Sample table:

               

    N

    Selection

    Insertion

    Quick

    2000

     

     

     

    4000

     

     

     

    8000

     

     

     

    16000

     

     

     

    32000

     

     

     

    64000

     

     

     



  4. Run QuickSort on randomly ordered data sets of size 128000, 256000, 512000, and 1024000.  Additionally, use these data set sizes on any other sort that would appear to be able to finish in a reasonable amount of time.


  5. Perform the requisite analysis, generate the appropriate graphs and write your report.


  6. Analyze the empirical run time behavior of insertion, selection and quick sorts on the randomly ordered data of various sizes. Compare the observed growth rate with the one predicted by running time analysis. Also compare insertion and selection sorts with one another and explain any differences. In particular, explain properties they have in common and those in which they differ.


  7. Analyze the behavior of selection and insertion sorts on the ordered data set. Compare the results obtained with those predicted by the running time analysis.

A Note on Writing

 

            A portion of the lab grade for this lab will be determined by the quality of the writing of the report.  This includes the completeness of the report, the clarity (and grammar) of the writing, and the use of visual aids (i.e. tables and graphs) to present data.  The grades on Lab 3 were lower than on the other labs for two reasons; one was that the testing performed was not always complete, but the more significant reason is that the analysis was poorly explained.  Don’t make that mistake here!!!!

 

A brief document explaining how to graph data in Microsoft Excel and how to include such data in a Microsoft Word document can be found on the server.  If you already know how to do these things, then you may ignore it; if not, read it and use what you learn.

 

To Hand In

 

            You need only hand in the final report.  It is due on Tuesday, November 7.

 

Assignment Type (see Academic Practices and Policies Document):

            Group assignment, limited collaboration.