Data Sets

A major factor in the success of your visualization is how test data will be input. Fundamentally, there are three approaches: supply built-in data sets, generate a random data set, or let the user supply the data. There are pros and cons to each approach.

User-provided Data Sets

Under the constructivist theory of learning, should be encouraged to provide their own test data to use when studying a given algorithm or data structure. This allows them to focus on key aspects that they are interested in, or have difficulty understanding. This gives them a measure of control and a stake in their learning. Entering data can be done with a simple input box in which the users type numbers or whatever else is appropriate. This is a feature provided by most algorithm visualizations.

If you intend for the visualization to be used by an instructor giving a presentation, then there needs to be an easy mechanism by which the instructor can enter a relatively complex "canned" data set. For example, if the instructor wants to show insertion or deletion in a complex tree structure, the instructor needs a fast and easy way to build up the tree with a meaningful set of input data, and then he or she needs a way to invoke the desired insert or delete command on that built-up tree. A good way to do this might be to allow the user to create a data file, and read that file in or cut/paste it into the visualization's data entry box as a whole.

This also opens up a new use for your visualization: debugging. If a learner has implemented something and wants to check his or her results, he or she can turn to your visualization as a reference implementation. He or she could also compare the output of a debugger with the steps taken by your visualization to understand where errors lie.

Random Data Sets

This can be a good way to give the user control over certain aspects of the data set, such as its size, without making them manage all of the details of providing the complete set. For example, the user might want to control the number of records to be sorted, and perhaps a few other key parameters such as the data distribution, or whether the data is pre-sorted or reverse sorted. The user might not care about the exact values being sorted, and might not want to provide, for example, 100 values. On the other hand, using random data could be a problem if it means that the user cannot reproduce previous runs of the visualization.

Good Built-in Data Sets

While constructivist learning theory indicates that allowing users to input data is the ideal, it is also well known that introductory computer science students are poor testors and debuggers. Thus, they will likely benefit from being provided with good test data sets that will appropriately exercise the key features of the algorithm or data structure being visualized. For instance, if you are demonstrating one of the sorting algorithms, it's normally best to use a uniform random data set of sufficient size (unless you are specifically trying to show the algorithm's performance under a different a class of input). Feeding in "10, 20, 30, 50, 40" to MergeSort and BubbleSort would give learners a misguided idea of the algorithms' respective performances.