Merge-sort for non-numerical data
Most sorting algorithms seem to be based on the idea that each node of data to be sorted has a numerical value associated with it. This, of course, is not always a case.Background
I had just signed up to the site YMDb.com, which allows users to put up a list of their top twenty favourite films. Being a meek individual, and daunted by the prospect of having to arrange my video library into some kind of order, I decided I needed some assistance.Method
I took a look on Wikipedia for sorting algorithms, and looked through the most common ones. A lot of them were supposedly quite efficient on large amounts of data, and even small amounts, but I found that a lot of the algorithms might end up comparing two bits of data twice.I needed an algorithm that only compared two items (films in my case) once, and additionally that the sorting algorithm only relied on a simple "greater than" or "less than" comparison, since it wouldn't be sensible for me to first go through my film collection and assign an arbitrary number to each film; I just wanted to be asked, "Is FILM1 better than FILM2?" lots of times.
I settled on Merge Sort, which seemed to work in a sensible way, and wrote a quick script in Python to do this. It divides a data set up into two groups based on whether each item was greater than or less than the middle item in the group, then sorts those subsets in the same fashion until there are only two items in each subgroup. This was ideal since the comparison could be in the form of a simple question to the user; me! And I like simple questions.
Solution
Click here to download the source code, shown below. Run python mergesort.py after editting the main list to contain the items you want.# Implementation of mergesort with user prompts. # Copyright (c) 2006 Nick Murdoch. list = ["Item 1", "Item 2", "Item 3", "..."] def mergesort(list): """Merge-sorts a list, giving the user prompts Returns a list with the user's preferences, sorted with their favourites at the end of the list""" #print "working on " + str(list) length = len(list) left = [] right = [] if length <= 1: return list else: middle = list[length/2] for item in list: if item is middle: pass else: try: choice = int(raw_input(middle + " or " + item + "? ")) except ValueError: choice = 2 if choice == 1: # if < middle left = left + [item] else: # if > middle right = right + [item] left = mergesort(left) right = mergesort(right) return left + [middle] + right print " *** MERGESORT by Nick Murdoch ***" print " Copyright (c) 2006\n" print "Type 1 if you prefer the first film given, or 2 if you prefer the second.\n" newlist = mergesort(list) print "" i = 1 length = len(newlist) while i <= length: print str(i) + ": " + newlist[length - i] i = i + 1
Last Modified: Tuesday, 06-Jun-2006 22:18:11 BST