CPS 100E Fall 1999: Anagrams

Due: Friday, October 1, by midnight

20 points


Two words are anagrams if they are composed of the same letters. For example, "bagel" and "gable" are anagrams as are "drainage" and "gardenia". In this assignment you'll write a program that reads a dictionary (a sorted list of words) and finds all the words within the list that are anagrams of each other. This assignment is meant to familiarize you with the following concepts discussed in class and the textbook:

Getting Started

Change into your cps100e directory and create a directory called assign3. Change into that directory and copy the following files from the professor's home directory using the Unix command cp (do not forget the trailing dot "."):
cp ~rcduvall/cps100e/assign/assign3/* .

If you type the Unix command ls you should see the files listed below.

Several example files are provided in ~rcduvall/data on the acpub system: words4, words5 and words4-8, and words. These files contain between 1,785 and 45,402 words, so you may want to create smaller examples to test your program initially. You do not need to copy these files, you can simply link to them by typing:

ln -s ~rcduvall/data

Understanding the Code Provided

You must use the class Anaword whose declaration is in the file anaword.h; you will need to complete the implementation of this class in the file anaword.cpp which has been started for you.

An Anaword object is constructed from a string, and prints as the string, but is compared using a normalized or canonical form created by sorting the string. For example, the code fragment below should print the two lines of output shown.

Anaword a("bagel"); 
Anaword b("gable"); 
cout << a << " " << b << endl; 
if (a == b)
   cout << "they're anagrams!" << endl;

Output as shown:

bagel gable
they're anagrams!

The objects a and b are equal because the operator == is overloaded for Anaword objects to use the sorted or normalized form of the word for comparison. The normalized form of "bagel" and "gable" is "abegl", the sorted version of each word. In an Anaword object, the private instance variable mySortedWord should store the sorted version of the string in myWord, e.g., "abegl" when myWord is "bagel".

You must implement the member functions described in anaword.h so that the real word (e.g., "bagel") is used for printing, but the normalized or sorted word is used for comparison using == and <.

You will need to implement the function Anaword::Normalize, and write implementations for overloaded operators <, <=, >, and >=. You can implement all overloaded boolean operators, e.g., !=, using Anaword::IsEqual, Anaword::IsLess, or operators you have already implemented, e.g., operator ==.

Algorithm

You are given code that reads all the words in a file, and stores them in a vector of Anawords. Storing Anaword objects uses the tvector::push_back function which adds the new words as the last element in the vector and automatically grows the vector if necessary. When using push_back to add new elements, the function tvector::size returns the number of elements stored in a vector, which is not the same as the vector's capacity, since the vector grows to accommodate new elements as needed.

In addition to completing the implementation of the Anaword member functions in anaword.cpp, you should modify the code given in anagrams.cpp to implement the algorithm and class your group designed in class. If you need to change the group's design, you should justify your changes in your README file.

Submitting

When your programs compile and produce the correct output, you should create a README file for this and all assignments. In this file, include your name, the date, and an estimate of how long you worked on the assignment. You must also include a list of names of all those people (students, professor, TAs, tutor) with whom you consulted on the assignment. See the rules for collaboration in the CPS 100e syllabus.

To submit your assignment, type:

submit100e assign3 README anagrams.cpp anaword.h anaword.cpp 

You should receive a message telling you that the program was submitted correctly.

To see that files were submitted correctly, you can use the submit100e command with just the assignment name and it will list the files submitted. That is, type:

submit100e assign3

Extra Credit

Before starting the extra credit, you should create a new directory assign3X and copy your solution from assign3 to assign3X. You should implement your extra credit on these copied files. Note, you must submit assign3 using the directions above to receive extra credit!

For extra credit you should implement Anaword using another method described here. You should time both implementations and write up your findings in your README file. You should take care to have enough data from running the code to backup claims you make about the two methods.

Modify the class Anaword so that the private string variables are pointers to strings rather than strings:

class Anaword 
{ 
    // from before 
  private: 
    string * myWord;       // regular string: "bagel"
    string * mySortedWord; // sorted form: "abegl"
};

You should do this only after you have implemented the original version of the program and timed it in finding all the anagrams in the list of all 40,000+ words in the file words. Then you should time the new pointer-based version of the program and compare the times.

To time a segment of code, you should use the CTimer object provided in the tapestry library. Below is an example of its use:

void TimeSomething ()
{
    CTimer timer;
    timer.Start();
    // do something
    timer.Stop();
    cout << "My something took " << timer.ElapsedTime() << " seconds." << endl;
};

To submit your assignment with extra credit,

submit100e assign3X README anagrams.cpp anaword.h anaword.cpp 

Comments?