Computer Science

From My Wiki
Revision as of 22:23, 18 October 2013 by Administrator (talk | contribs) (Created page with "Computers are used to store and search data. Computer Science is the scientific approach to this. <pre> public class HelloWorld { public static void main(String[] args) ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Computers are used to store and search data.

Computer Science is the scientific approach to this.

public class HelloWorld {

	public static void main(String[] args) {

		System.out.println("Hello World!");
	}

}

Computer science is no more about computers (or programming) than astronomy is about telescopes.

Basic Object-oriented programming, Java Language

Abstract Classes - Abstract classes permit declaration of classes that define only part of an implementation, leaving the subclasses to provide the details

If an abstract class contains only abstract method declarations, it should be implemented as an interface instead.

Concrete super class - Use extends to make subclass

Abstract class (At least one method is abstract) - Use extends to make subclass

Completely abstract class (interface) - A subclass implements the interface

interface can be implemented by many classes

class can implement many interfaces

interfaces can extend many interfaces


Generics = Type Safety - check for errors at compile time

ArrayList<String> words = new ArrayList<String>(); instead of defaulting to Objects

public static void printCollection(List<?> c);

<?> - any type. unbounded wildcard type. List of unknowns.

public static void printCollection(List<? extends T> c); <? extends T> - any subclass of T, including T itself. also known as "bounded wildcard".

<?> = <? extends Object> (Object is top of hierarchy)

Methods:

public static <T> void fromArrayToList(T[] a, List<T> c) { - one type parameter: T, this is unbounded. Examples:
List<Movie> moviesA = fromArrayToList(movies); OK
List<String> wordsA = fromArrayToList(words); OK
public<T extends Item> List<T> startsWithA(List<T> list); - type parameter T here is restricted to Items or below. Let Movies be subclass of Items. 
List<Movie> moviesA = startsWithA(movies); OK
List<String> wordsA = startsWithA(words); COMPILE TIME ERROR
public<T extends Item & Priced> void pricedItems(List<T> list); Let Movie, but not CD be subclass of Item, and CD be subclass of Priced
List<Movie> movies = pricedItems(movies); OK because Movie is subclass of Item
List<CD> cds = pricedItems(cds); OK because CD is subclass of Priced,

Control flow

how to decide what statements to execute.

if-statements

loops: while, for

Loops can be expressed as recursion or iteration.

Examples of the two in C

#include <stdio.h>
#include <iostream>
#include <string>
std::string reverse(std::string a);
std::string reverseIt(std::string a);

int main ()
{
	std::string s0 ("0123456789");

	std::cout << reverse(s0) << std::endl;
	std::cout << reverseIt(s0) << std::endl;
	return 0;
}
std::string reverse(std::string a)
{
	std::cout << a << std::endl;

	int length = a.length();

	if(length<=1)
	{
		return a;
	}

	std::string no (&a.at(length-1), 0, 1);  
	std::string yes (reverse(a.substr(1,length-2)));  
	std::string maybe (&a.at(0),0, 1);  
	std::string together = no + yes + maybe;

	return together;
}
std::string reverseIt(std::string a)
{
	int i = 0;
	for (i = 0;i<a.length()/2;i++)
	{
		char temp1 = a.at(a.length()-1-i);
		char temp2 = a.at(i);
		a.replace(i, 1, 1, temp1);
		a.replace(a.length()-1-i, 1, 1, temp2);
	}
	return a;
}

Examples of the two in Java

public class main{
	public static void main(String[] args) {

		String to = args[0];

		System.out.println(to);
		System.out.println(reverse(to));
		System.out.println(reverseIt(to));

	}
	public static String reverse(String a){
		System.out.println(a);

		int length = a.length();

		if(length<=1){
			return a;
		}

		return a.charAt(length-1)+
		reverse(a.substring(1,length-1))+
		a.charAt(0);
	}

	public static String reverseIt(String a){

		String newstring = ""; //String are immutable in Java. You can't change them. Not going to swap, instead create a new one. 

		for(int i =a.length()-1;i>=0;i--){
			newstring += a.charAt(i);
		}

		return newstring;
	}
}

Note: iteration uses a loop (in this case a for, though could use while , recursion uses if-statement to check base case. if not there is a call to its own method.

FizzBuzz

for i in range(1, 20):
    if (i%3==0) and (i%5==0):
        print "fizzbuzz"
    elif (i % 3) == 0:
        print "fizz"
    elif (i % 5) == 0:
        print "buzz"
    else:
        print "%d" % i

Data Structures

One of the most important uses of computers is to store data.

A data structure is systematic way of organizing data in memory. An array is a basic data structure. They are the oldest and most important data structure. It is a collection of variables that can be identified based on its index. They are important because they mimic the way memory (RAM) works. RAM stores the most fundamental data type, the byte. "the smallest addressable unit of memory" - source in a linear order so that each byte is addressable by what is known as a 'memory address' visual. This is useful because this index can be calculated at runtime making it easy to do things like go through all the variables. Lets say you have a collection of test scores and you want to know the average mark. Instead of doing:

int score1 = 31
int score2 = 74
int score3 = 44
int score4 = 5
int score5 = 62
int score6 = 50
int score7 = 3
int score8 = 40
int score9 = 55
int score10 = 62

double average = (score1 + score2 + score3 + score4 + score5)/5;

you can do

int score[] = new int[10];
score[0] = 31;
score[1] = 74;
score[2] = 44;
score[3] = 5;
score[4] = 62;
score[5] = 50;
score[6] = 3;
score[7] = 40;
score[8] = 55;
score[9] = 62;

double average = 0;
for(int i =0; i<score.length; i++){
	average  += average + score[i];
} 
average /= score.length;
// Iteration (going through everything and then printing them)
for(int i =0; i<score.length; i++){
	System.out.println(score[i]);
}

Arrays are a primative data type. Actually, they mimic the way computers store data in memory. Memory locations and their addresses are sequential similarly to array contents and indexes.

Abstract Data Types (ADTs)

You can abstract from this. specifiying a data structure in terms of what it can do (functionality) rather than how it works (specific representation). For example, a List data structure allows us to insert or delete elements in a way that is different to arrays, though the List itself can be implemented by an array.

Abstract data types are mathematical models for certain classes of data structures or types defined by the operations that may be performed on it.

Imagine ice cube makers. A user of an ice cube doesn't need to know about the evaporator, the condenser, the compressor and the throttle valve. These components form a black box which a user doesn't need to worry about. All a user needs to know is to attach this black box to a water and power supply. Then for day to day operation a user presses a button and expects ice to come out. An ice cube maker also has different states it is in, like if it is empty or full.

Similarly for abstract data types a user doesn't need to know the finer details of the data structure. All the user needs to know it what operations it can perform and how to perform these.

ADTs allow us to think at a higher level about algorithms. Algorithms work on ADTs.

All the following data structures are abstrations of arrays.

List

A list is an ADT that implements an ordered collection of values. Each instance of a value is called an element of the list. While duplicate values can occur in a list, each element is considered a distinct thing identified by its index.

Operations on a list can include creating a new, blank list. After creating this list, you can start to add items to the start of the list, or end of the list. After adding some elements you may want to find what the first item is, or the last item, or if there are any items at all.

LinkedList

The abstract concept of a list can be implemented in (mainly) two ways. One way is a linked list. In a linked list, the elements are nodes. Each node is composed of a value that is stored, like an integer or a string, and reference to the next node. Having each node be responsible for the next node allows for efficient insertion and removal of elements from any position. The downside to this is that if you want to search through all the nodes, or to get to a specific node, or even if you just want to get to the end you must start at the beginning and then follow the references to get to where you want to be.

class Node {
    public int data;
    public Node nextLink;

    //Node constructor
    public Node(int d) {
	    data = d;
    }

    //Print Node data
    public void printLink() {
	    System.out.print("{" + data + "} ");
    }
}

class LinkList {
    private Node first;

    //LinkList constructor
    public LinkList() {
	    first = null;
    }

    //Returns true if list is empty
    public boolean isEmpty() {
	    return first == null;
    }

    //Inserts a new Node at the first of the list
    public void insert(int d) {
	    Node link = new Node(d);
	    link.nextLink = first;
	    first = link;
    }

    //Deletes the link at the first of the list
    public Node delete() {
	    Node temp = first;
	    first = first.nextLink;
	    return temp;
    }
    
    //Prints list data
    public void printList() {
	    Node currentLink = first;
	    System.out.print("List: ");
	    while(currentLink != null) {
		    currentLink.printLink();
		    currentLink = currentLink.nextLink;
	    }
	    System.out.println("");
    }
}

class LinkListTest {
    public static void main(String[] args) {
	    LinkList list = new LinkList();

	    list.insert(1);
	    list.insert(2);
	    list.insert(5);
	    list.insert(4);
	    list.insert(3);

	    list.printList();

	    while(!list.isEmpty()) {
			Node deletedLink = list.delete();
			System.out.print("deleted: ");
			deletedLink.printLink();
			System.out.println("");
			list.printList();
	    }
    }
}

ArrayList

Another way two implement a list is through an arraylist. An arraylist is similar to an array except that it is not fixed in size. A count is kept on the number of elements. An arraylist is implemented by an array and when this array is close to getting full a new one is created and the old data is copied over.

import java.util.ArrayList;

public class ArrayListTest {

	public static void main(String[] args) {

		ArrayList dictionary = new ArrayList();
		dictionary.add(new String("Hello"));
		dictionary.add(new String("World"));
		
		System.out.println(dictionary.get(0));
		System.out.println(dictionary.get(1));
	}

}

ArrayList vs LinkedList

Linked list has several advantages over Array list.

Insertion/Deletion: Insertion or deletion of an element in a linked list simply requires the reference of the current node to be rewritten. This is a constant time operation if you are already at the current node. Otherwise to start from the beginning it will take O(n) time to browse to a specific node.

Insertion or deletion of elements in an arraylist presents some interesting problems.

Insertion or deletion of an element in an array list will require shifting half of the elements on average, and all the elements in the worst case. This is the case when elements are inserted or deleted from the front or middle. This takes theta(n) time. Inserting at the end takes theta(1) time because a variable keeps track of the location of the last element (by keeping track of the size). However, when the underlying array is almost full, a new one will have to allocated and repopulated. This is an expensive operation that runs in theta(n) time. However, since this doesn't happen often (indeed the size is often doubled so chances of needing a new array gets less and less) this shouldn't afftect insertion at the end (which does not run in theta(n) time). So insertion at the end happens in theta(1) amortized time.

Indexing: Dynamic arrays (as well as a array data structure which forms their implementation) allow random access to any element in constant-time while in linked list getting to a random element requires going through sequential elements.

This makes linked lists unsuitable for applications where it's useful to look up an element by its index quickly, such as heapsort.

Linked list also requires a lot of memory to store both the value and the reference to the next value while array list requires just enough to store the values and a small fixed amount for overhead.

the lesson; dear reader:

  • Arrays have O(1) random access, but are really expensive to add stuff onto or remove stuff from.
  • Linked lists are really cheap to add or remove items anywhere and to iterate, but random access is O(n).

Hashtable

Lets step away from ADTs and go back to arrays. Arrays are useful mostly because the element indices can be computed at run time. As you can see in the previous example, all the scores - stored as integers, were accessed by interatively incrementing an integer.

Lets take this further. Instead of merely incrementing, we can do something more complicated.

We create a function that given something, say a key, will compute an integer that serves as the index for a table. At this location is the value we want.

Suppose that each student has a name and an id number. The id number can serve as key to retrieve the value which is the name. Since the id number is already a numerical value, it can be used as an index to an array of Strings.

public class Hashtable {
	
	public static void main(String[] args) {

		//students have id numbers and names
		//1111111 Adam
		//2222222 Bob
		//3333333 Chris
		//4444444 David
		
		String[] students = new String[10000000];
		students[1111111] = new String("Adam");
		students[2222222] = new String("Bob");
		students[3333333] = new String("Chris");
		students[4444444] = new String("David");
		
		System.out.println( students[1111111] );
		System.out.println( students[2222222] );
		System.out.println( students[3333333] );
		System.out.println( students[4444444] );	
	}
}

The above code gives the basic idea. Of course it is inefficient to use the id number in such a way (and if the key is not a number it is impossible to use it).

What is need is a function that can take in a key and compute a unique number based on it. The unique number is called a hash code and is used as an index to an array, and the function that gives this has code is called a hash function.

The benefits are huge. Hash tables are fast because they access the location directly. The complicated part of this is in the hash function.

Hash functions are designed to give a unique string based on an input. They are used to verify files are correctly transfered (say over the internet) and are also used in cryptography to securely store passwords. Passwords are not stored as they are when entered. Instead they are put through a hash function. Hash functions are designed to be irreversible so that there is no way to get the original file or password from just the hash. They are also designed to have an avalanche effect, so that even the smallest of changes to the input gives a dramatically different hash. This is good when a file downloaded over the internet is slightly corrupted.

Examples of hash functions include md5 and sha1. Both are used in cryptography and file integrity.

Techniques used in a hash function

  • division
    • h(k) = k % n as hash value with k as the key and n is a prime number size of the table.
  • folding
    • divide the integer key, k, into sections
    • add, subtract and/or multiply them, combining them into the hash code, e.g.
      • k = 013402122
      • h(k) = 013 + 402 + 122 = 537
      • can also multiply the section of numbers by prime numbers.
  • middle squaring.
    • k = 013402122
    • mid = 402
    • 402 ^ 2 = 161404
    • mid h(k) = 6140
  • truncation - delete part of the key
    • k = 013402122
    • h(k) = 122 (last 3 digits)

A good hash function would give a value array index for each key, it must be fast, and the hash codes it produces should be uniformly random. This will help avoid collisions (where two different inputs give the same output).

A perfect hash function would give a different hash code (index value) for every key. Of course we don't live in a perfect would, collisions will alway occur.

To resolve this, the array locations with collisions has a list of all possible values in a technique called chaining. In the worst case all the hash codes are the same meaning all the values are in the same array location and thus are all stored in a list. Because it is a list, it will take O(n) to go through it all.

Summary of Hash table

Average Worst case
Space O(n) O(n)
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)

ADTs: Queue, Stack, Priority Queues

These ADTs are really basic. You can only add and remove items. What differs is how they remove the items.

Queue, like queues in real life, operate in a first in first out basis. The most recently added item has to wait until all the other items are removed before it can be removed.

Stacks, like stacks of plates, operate in a last in first out basis. The most recently added item is the first one to be removed.

Priority Queues are similar to queues, except they remove the highest value first instead of the value that was first inserted.

Here is the basic idea of how to use them.

import java.util.LinkedList;
import java.util.Queue;
import java.util.Stack;
import java.util.PriorityQueue;

public class HelloWorld {
	
	public static void main(String[] args) {
                Stack lifo = new Stack();

                for (int i = 1; i <= 10; i++)
                {
                        lifo.push ( new Integer(i) );
                }

                while ( !lifo.empty() )
                {
                        System.out.print ( lifo.pop() );
                        System.out.print ( ',' );
                }
                System.out.println("");
                
                Queue fifo = new LinkedList(); //Queues are abstract classes, needs LinkedList to implement it
                
                for (int i = 1; i <= 10; i++)
                {
                	fifo.offer( new Integer(i) );
                }
                
                while ( !fifo.isEmpty() )
                {
                        System.out.print ( fifo.poll() );
                        System.out.print ( ',' );
                }
                System.out.println("");
                
                PriorityQueue fifo2 = new PriorityQueue();
                fifo2.offer(new Integer(1));
                fifo2.offer(new Integer(7));
                fifo2.offer(new Integer(3));
                fifo2.offer(new Integer(5));
                
                while ( !fifo2.isEmpty() )
                {
                        System.out.print ( fifo2.poll() );
                        System.out.print ( ',' );
                }
                
	}

}

Array based implementation of these ADTs (rewrite this):

https://www.dropbox.com/s/64m0ikkubewsi01/CityStore.java

https://www.dropbox.com/s/t41bzhba18frvtd/CityStack.java

https://www.dropbox.com/s/2zt8w94uvzk8d64/CityQueue.java

Trees and Graphs

See: Graphs

Summary of Data Structures

Structure: Linked list Array ArrayList Balanced tree Random access list
Indexing Θ(n) Θ(1) Θ(1) Θ(log n) Θ(log n)
Insert/delete at beginning Θ(1) N/A Θ(n) Θ(log n) Θ(1)
Insert/delete at end Θ(n) N/A Θ(1) amortized Θ(log n) Θ(log n) updating
Insert/delete in middle search time + Θ(1) N/A Θ(n) Θ(log n) Θ(log n) updating
Wasted space (average) Θ(n) 0 Θ(n) Θ(n) Θ(n)

Algorithms

An algorithm is a clearly stated, step-by-step method of solving a problem.

There are often many ways to solve the same problem. 

Example

Imagine you are trying to compress an audio file to make it smaller. 

There are many algorithms to achieve this depending on what you want. 

Lossy audio compression algorithms throw out data to reduce file size. This works because the human brain can't percieve all the sounds, especially when multiple sounds are being played at the same time. 

Lossless audio compression algorithms work by modelling future values of the digital signal in terms of previous values. So imagine your audio streams looked like this:               

[ABCDEFABCDGHABCDEFABCDIJ]

you could rewrite it like this:

[^=ABCD][^EF^GH^EF^IJ]

No information is permanently lost, but the file size is reduced and the original can be reconstructed. 

In both cases, file sizes are reduced because you eliminate redundant information. 

A program is a sequence of computer instructions implementing the algorithm.

In audio and video compression, these programs are call codecs. Lossless codecs include Free Lossless Audio Codec (FLAC) and Apple Lossless (ALAC), lossy codecs include MP3 and Windows Media Audio (wma).

Complexities (Big O)

Big-O notation is a relative representation of the complexity of an algorithm.

Sorting Algorithms

One of the most important uses of computers is to sort data. 

Sorting a list assumes that everything on that list can be compared with each other. Numbers can be compared with in size. Words can be compared alphabetically. These are examples of totally ordered sets. Most sorting algorithms are comparison based, they use an order relation to compare keys to say what goes where.

Selection Sort

Lets say you wanted to sort the list of scores above. One way to do that is to grab the smallest element in the list, and stick that in front. Then grab the next smallest element in the list and stick that right next to the first one, and so on and so forth. This forms the basis of Selection sort.

public class SelectionSort {

	public static void main(String[] args) {

		int score[] = new int[10];
		score[0] = 31;
		score[1] = 74;
		score[2] = 44;
		score[3] = 5;
		score[4] = 62;
		score[5] = 50;
		score[6] = 3;
		score[7] = 40;
		score[8] = 55;
		score[9] = 62;

		int min = 0; // assume that index of min element is 0

		// lets go through the array from left to right
		for (int i = 0; i < score.length; i++) {

			// lets go through the part of the array we haven't seen yet 
			// and pick out the smallest element we can find
			for (int j = i + 1; j < score.length; j++) {
				if (score[j] < score[min]) {
					min = j;

				}
			}

			// if the smallest element isn't the one we are on right now
			// swap them around
			if (min != i) {
				int temp = score[i];
				score[i] = score[min];
				score[min] = temp;
			}

			// the elements previous to this iteration
			// are all sorted correctly!
		}

		// show off the sorted list
		for(int i = 0; i<score.length; i++){
			System.out.println(score[i]);
		}
	}
}

Selection sort is "in place" - it does not take any additional space to perform the sorting. As you see in the code, elements are merely swapped and not copied to another cache or anything. Because swapping occurs, pairs of elements with the same key that appear in input in a certain order can come out of that order.

suppose: 3a = 3b
3b 1 3a 4 2 5
1 3b 3a 4 2 5
1 2 3a 4 3b 5 // not stable!
1 2 3a 3b 4 5

Selection sort makes the smallest possible number of swaps (writes) which makes sense in places where writes are expensive (eg flash memory which has a limited write lifespan). Selection sort is a very simple algorithm, it inefficient (especially on large lists) and generally performs worse than the similar insertion sort.

Insertion Sort

Insertion sort is another simple sorting algorithm that builds up a final sorted list one element at a time. Similarly to selection sort it is not as efficient on large lists, however it has some unique advantages.

For each element in the unsorted part of the input list, find where it should go, put it there and shifting the other elements up. This sorting algorithm is not unlike the method card players use to sort their hand. Take a card, and put it where it should be.

public class InsertionSort {

	public static void main(String[] args) {

		int score[] = new int[10];
		score[0] = 31;
		score[1] = 74;
		score[2] = 44;
		score[3] = 5;
		score[4] = 62;
		score[5] = 50;
		score[6] = 3;
		score[7] = 40;
		score[8] = 55;
		score[9] = 62;

		// lets go through the array from left to right
		// but skip the first one
		for (int i = 1; i < score.length; i++) {

			// lets remember where we are now
			// and what we are looking at
			int currentValue = score[i];
			int holePos = i;

			// we now look back and try to find where this current element should go
			// we shift the elements up one while doing this
			// thus leaving a hole
			while(holePos>0 && currentValue < score[holePos-1]){
				score[holePos] = score[holePos -1];
				holePos = holePos - 1;
			}
			// our current value goes into the hole we made 
			score[holePos] = currentValue;

			// the elements previous to this iteration
			// are all sorted, although there can elements 
			// that are missing
		}

		// show off the sorted list
		for(int i = 0; i<score.length; i++){
			System.out.println(score[i]);
		}
	}
}

Unlike selection sort it is stable as there are no swaps happening, only shifting happens. Insertion sort happens in-place with no cache needed.

Selection vs Insertion Sort

selection sort: array has sorted subarry and unsorted subarray
insertion sort: array has sorted subarry and unsorted subarray
selection sort: grab the max / min element from the unsorted sub (which takes some effort to find)
insertion sort: grab the next element from the unsorted sub 
selection sort: and put it in the current place
insertion sort: and put right place (which takes some effort to find)
selection sort: swapping
insertion sort: shifting

Analysis

Worst case performance: (assumes the list to be sorted in the exact opposite order)

Selection sort: O(n^2) - finding lowest element requires scanning all n elements, next lowest requires scanning (n-1) elements, etc..

Insertion sort: O(n^2) - every iteration of the inner loop will scan and shift the entire sorted subsection of the array before inserting the next element

Best case performance: (assumes list is already in correct order)

Selection sort: O(n^2) - finding lowest element still requires scanning all n elements, next lowest requires scanning (n-1) elements ... theta(n)

Insertion sort: O(n) - during each iteration the first element of unsorted subarray is only compared to the right most of the sorted array

It takes more effort to find the smallest / largest element than it takes to find the right place to put an element.

Additional notes

Because insertion sort performs its heavy lifting on the sorted part of the list (in finding the correct place to insert) it presents some unique properties. It can sort a list without seeing it entirely. This means it can sort a list as soon as it receives it. This makes it an online algorithm.

To Fix: definition of 'in-place' could be better. in-place: use only fixed addtional space, independent of n.

Merge Sort

Rather than iterate through the entire list in one way or another merge sort splits the list into two halves, a left half and a right half. It splits the list recursively until only pieces of size one remains. Then the pieces are merged back into the correct order.

public class MergeSort{

	public static void main(String[] args) {

		int score[] = new int[10];
		score[0] = 31;
		score[1] = 74;
		score[2] = 44;
		score[3] = 5;
		score[4] = 62;
		score[5] = 50;
		score[6] = 3;
		score[7] = 40;
		score[8] = 55;
		score[9] = 62;

		int[] output = mergeSort(score);

		for(int i : output){
			System.out.println(i);
		}
	}

	public static int[] mergeSort(int a[]){
		// if list size is 0 (empty) or 1, consider it sorted and return it
		// (using less than or equal prevents infinite recursion for a zero length m)		
		if(a.length<=1){
			return a;
		}
		// else list size is > 1, so split the list into two sublists
		int middle = a.length / 2;
		int[] left = new int[middle];
		int[] right = new int[a.length-middle];
		// populate left and right sublists
		for(int i = 0; i<middle;i++){
			left[i] = a[i];
		}
		int j = 0;
		for(int i = middle; i<a.length;i++){
			right[j] = a[i];
			j++;
		}

		left = mergeSort(left);
		right = mergeSort(right);

		return merge(left, right);
	}


	public static int[] merge(int[] left, int[] right){
		int[] result = new int[left.length+right.length];
		int righti = 0;
		int lefti = 0;
		int i=0;

		while(i<result.length){
			// if there are left elements and right elements remaining
			// then choose the smaller one
			// otherwise if there are only left or only right elements remaining
			// then choose the one that has elements
			if(lefti<left.length && righti<right.length && left[lefti]<=right[righti]){
				result[i] = left[lefti];
				lefti++;
			}else if(lefti<left.length && righti<right.length && left[lefti]>right[righti]){
				result[i] = right[righti];
				righti++;
			}else if(lefti>=left.length && righti<right.length){
				result[i] = right[righti];
				righti++;
			}else if(righti>=right.length && lefti<left.length){
				result[i] = left[lefti];
				lefti++;
			}
			i++;
		}
		return result;

	}
}

Quick Sort

In a similar way to merge sort, quick sort also divides the list into two smaller sublists. However, unlike merge sort which splits the list in half, quick sort splits the list based on a pivot element. A pivot element is an element in the list, either the first one or chosen at random, and the list is ordered so that elements less than the pivot come before the pivot, and elements greater than the pivot come after it. This is the parition operation and it works by swapping over elements from both sides. The partition operation is then applied to recursively to onto the smaller sublists.

public class QuickSort{

	public static void main(String a[]){

		int score[] = new int[10];
		score[0] = 31;
		score[1] = 74;
		score[2] = 44;
		score[3] = 5;
		score[4] = 62;
		score[5] = 50;
		score[6] = 3;
		score[7] = 40;
		score[8] = 55;
		score[9] = 63; //if 62 (dupe value) then infinte loop

		System.out.println("start");
		quickSort(score,0,score.length-1);

		for(int i : score){
			System.out.println(i);
		}

	}

	public static void quickSort(int array[],int left, int right){
		if(left<right)
		{
			// Pick an element, called a pivot, from the list.
			int pivot = partition(array, left, right);
			// Recursively apply the above steps to the sub-list of 
			// elements with smaller values and separately the sub-list of 
			// elements with greater values.

			quickSort(array, left, pivot);
			quickSort(array, pivot+1, right);
		}
	}
	/*
	 * This is the in-place partition algorithm. It partitions the portion of the array between indexes 
	 * left and right, inclusively, by moving all elements less than array[pivotIndex] before the pivot, 
	 * and the equal or greater elements after it. In the process it also finds the final position for 
	 * the pivot element, which it returns. It temporarily moves the pivot element to the end of the subarray, 
	 * so that it doesn't get in the way. Because it only uses exchanges, the final list has the same elements 
	 * as the original list. Notice that an element may be exchanged multiple times before reaching its final place. 
	 * Also, in case of pivot duplicates in the input array, they can be spread across the right subarray, in any order. 
	 * This doesn't represent a partitioning failure, as further sorting will reposition and finally "glue" them together.
	 */
	public static int partition(int[] list, int leftIndex, int rightIndex) {
		//random
		int pivotIndex = (int) ((Math.random()*(rightIndex-leftIndex +1)+leftIndex));
		int pivotValue = list[pivotIndex];
		int storeIndex = leftIndex;

		for(int p:list){

			if(p==pivotValue){
				System.out.print("["+p+"] ");
			}else{
				System.out.print(p+" ");
			}
		}
		System.out.println();

		swap(list, pivotIndex, rightIndex);

		for (int i = leftIndex; i < rightIndex; i++) {
			if (list[i] <= pivotValue) {
				swap(list, storeIndex, i);
				storeIndex++;
			}
		}

		swap(list, rightIndex, storeIndex);

		return storeIndex;
	}

	/**
	 * Swaps two elements in a list.
	 *
	 * @param list      The list.
	 * @param index1    The index of the first element to swap.
	 * @param index2    The index of the second element to swap.
	 */
	public static void swap(int[] list, int index1, int index2) {
		int temp = list[index1];
		list[index1] = list[index2];
		list[index2] = temp;
	}
}

Merge Sort vs Quick Sort

Both mergesort and quicksort are divide and conquer algorithms. They break the list into parts that are easier to solve. Mergesort splits the list equally in half. Quicksort splits the list based on a pivot element. Both recursively sorts their sublists.

Analysis

Mergesort

Worst case performance O(n log n)

Best case performance O(n log n)

Average case performance O(n log n)

Because these are recursive algorithms, they can be analysed using recurrence relations

T(n) = 2T(n/2) + n          // T(n) is running time of merge sort for a list of length n
                            // 2 - 2 sub lists
                            // T(n/2) - of half size
                            // n - linear time to merge
T(1)=1, n=2^k               // recursion requires a base case, and let n = 2 ^ k
T(n/2) = 2T(n/4) + n/2      // Start by defining T(n/2)
T(n) = 2(2T(n/4) + n/2) + n // substitute back into original formula
T(n) = 2^2T(n/(2^2)) + 2n
T(n) = 2^3T(n/(2^3)) + 3n
..
T(n) = 2^kT(n/(2^k)) + kn
n = 2^k => k = log n 
T(n) = nT(1) + log n.n
T (n) = n + n log n ∈ Θ(n log n)

Quicksort

Worst case performance O(n2) (extremely rare)

Best case performance O(n log n)

Average case performance O(n log n)

Worst case will split list into sized of 0 and n-1

T(n) = O(n) + T(0) + T(n-1) 
     = O(n) + T(n-1)
T(n) = O(n^2) same as in insertion and selection sort

Best case will split list into sizes n/2

T(n) = O(n) + 2T(n/2)
Thus T(n) = O(n log n)


Master Theorem to recurrence relations:

T (n) = aT (n/b) + n^c
if (logb)a < c, T(n) belongs to Θ(n^c)
if (logb)a = c, T(n) belongs to Θ(n^c log n)
if (logb)a > c, T(n) belongs to Θ(n^(logb)a)

Summary of Sorting Algorithms

Method Worst Average Best Stable In-place
Insertion n^2 n^2 n y y
Selection n^2 n^2 n^2 n y
Shellsort ?? ?? n n y
Mergesort nlogn nlogn nlogn y n
Quicksort n^2 nlogn nlogn n almost
Heapsort nlogn nlogn nlogn n y

All sorting algorithms are at least O(n) because a sorting algorithm will always have to go through the entire set to ensure that it is all sorted.

Selection Algorithms

A selection algorithm is an algorithm for finding the kth smallest number in a list (such a number is called the kth order statistic).

QuickSelect

          • Graphs*****

A tree is a subset of graphs. Lets start with trees.

Unlike arrays, which are linear data structures, trees are hierarchical (or non-linear) data structure.

A tree is made up of nodes which are connected to each other.

Going back to arrays, an array is constructed by initializing with a type and size. Then at each index a value is inserted. Then finally, starting from 0, the index is incremented one by one and the value is printed in order to iterate through the array.

In comparision, trees are constructed by creating various number of nodes and then joining these nodes together. A node is inserted into a tree by having an existing node point to it. A node can have any number of pointers to other nodes. The tree equivalent of iterating through an array is called a tree traversal.

Two things define a node, the data item it contains, and the nodes that it points to. In a binary tree, each node can point up to two different nodes - these are referred to left and right child nodes.

Note that we didn't define the size of the tree initially. Trees are recursive data structures. Each node is responsible for connecting itself to at most two other nodes. And the collective of this makes a tree. Getting to a specific node from another may involve traversing through unrelated nodes.

Linked List

See also previous page on LinkedList. A Linked list is defined by the nodes it contains. A node holds the data and a pointer to one other node.

A single node can't see everything. It only knows about the other node it should point to. A node can be chosen to be the head. From the head all the other nodes making up the linked list can be visited by following the links between nodes.

In a tree, a node can point to at most two other nodes.

In a graph (which encompasses linked list and trees) a node can point to another other number of nodes.

Actually there is type of linked list called a doubley linked list in which node A links to node B and then node B links back to node A in addition to B linking to a new node C. This means it is possible to go backwards in the list and go from B to A.

Tree Traversal

Given that trees are recursive data structures, they are more complicated to traverse compared to linear data structures like arrays. The complexity comes from the fact that each node has multiple points and the decision has to be made to pick one. Tree traversal aims to visit each node once and only once in a systematic order.

There are two main methods in tree traversal. Depth first search and Breadth first search.

Depth First Search. Starts with a node, and then you go deep. You go deep by following the child nodes. You go to the node's first child node, then you go to that child's node. This is recursion. It ends when you get to a leaf which is a node without children. Then you backtrack and try to go deep again and this repeats until all the nodes have been visited.

Breadth First Search. Starts with a node, then you queue up all the children of that node. Then you take node at the front of the queue which is the oldest node in the queue. Remove it, and add its children to the back of the queue. Consistently taking the oldest value, and adding the newer nodes to the end of the queue ensures that you will finish one level before moving to the next. This is breadth first search. You want to cast a wide net before going down to the next level.

DFS - Stacks - each recursive method call places the method onto a call stack. this is handled by the operating system. A method is removed from the stack when all the nodes below it have been visited. 
BFS - Queues - each node is iterated through by removing it from the start of the queu

There are three types of DFS traversal. They differ in the order of the operations executed.

Preorder traversal: visit root, visit left subtree, visit right subtree. 
Inorder traversal: visit left subtree, visit root, visit right subtree. 
Postorder traversal: visit left subtree, visit right subtree, visit root. 

Level order is BFS. It works iteratively finishing one level before moving on to the next and uses a Queue to keep track of the nodes to be visited. The next node to be visited is removed from the front of the queue, and its child nodes are added to the end. This ensures you finish going through one level before starting another. Queue is an abstract data type (ADT) that is implemented with a LinkedList.

Here is an illustration of it:
           60
          /  \
        40    70
       /  \   / \
      20  50 65  90

Pre-order traversal:
	Visit the root.
	Traverse the left subtree.
	Traverse the right subtree.
i.e.: 60 40 20 50 70 65 90
In-order traversal:
	Traverse the left subtree.
	Visit the root.
	Traverse the right subtree.
i.e.: 20 40 50 60 65 70 90
Post-order traversal:
	Traverse the left subtree.
	Traverse the right subtree.
	Visit the root.
i.e.: 20 50 40 65 90 70 60
Level-order traversal: 
i.e.:60 40 70 20 50 65 90

Similarly to LinkedList there is no need to define a size initially as each node references another node.

Note how inorder gives a sorted order of elements. Tree sort uses this. 
Treesort: Average case    Worst case
            O(n*logn)       O(n^2)
Remember, all sorting algorithms have to run in at least O(n) time. 

Another thing about inorder traversal is that, if you save a graph in this format (serializing) you can use that to restore a binary search tree that is minimum height. A minimum height binary tree has as many nodes as possible at each level except the last. Doing this using

AVL?

preorder traversal gets you the original form of the the graph.

Example code:

import java.util.Queue; // The data structure behind Level order traversal 
import java.util.LinkedList; //Queue implemented as a LinkedList

public class TreeTraverse
{
    private static class Node<T> // a node can store any type of data item. so we use generics here <T>
    {
        public Node<T> left;
        public Node<T> right;
        public T data;
		
        public Node(T data)
        {
            this.data = data;
        }
        public Node<T> getLeft()
        {
            return this.left;
        }
        public void setLeft(Node<T> left)
        {
            this.left = left;
        }
        public Node<T> getRight()
        {
            return this.right;
        }
        public void setRight(Node<T> right)
        {
            this.right = right;
        }
    }
    public static void preorder(Node<?> n)
    {
        if (n != null)
        {
            System.out.print(n.data + " ");
            preorder(n.getLeft());
            preorder(n.getRight());
        }
    }
    public static void inorder(Node<?> n)
    {
        if (n != null)
        {
            inorder(n.getLeft());
            System.out.print(n.data + " ");
            inorder(n.getRight());
        }
    }
    public static void postorder(Node<?> n)
    {
        if (n != null)
        {
            postorder(n.getLeft());
            postorder(n.getRight());
            System.out.print(n.data + " ");
        }
    }
    public static void levelorder(Node<?> n)
    {
        Queue<Node<?>> nodequeue = new LinkedList<Node<?>>();
        if (n != null)
            nodequeue.add(n);
        while (!nodequeue.isEmpty())
        {
            Node<?> next = nodequeue.remove();
            System.out.print(next.data + " ");
            if (next.getLeft() != null)
            {
                nodequeue.add(next.getLeft());
            }
            if (next.getRight() != null)
            {
                nodequeue.add(next.getRight());
            }
        }
    }
    public static void main(final String[] args)
    {
        Node<Integer> x60 = new Node<Integer>(60);
        Node<Integer> x40 = new Node<Integer>(40);
        Node<Integer> x70 = new Node<Integer>(70);
        Node<Integer> x20 = new Node<Integer>(20);
        Node<Integer> x50 = new Node<Integer>(50);
        Node<Integer> x65 = new Node<Integer>(65);
        Node<Integer> x90 = new Node<Integer>(90);

        x60.setLeft(x40);
        x60.setRight(x70);
        x40.setLeft(x20);
        x40.setRight(x50);
        x70.setLeft(x65);
        x70.setRight(x90);
		
        preorder(x60);
        System.out.println();
        inorder(x60);
        System.out.println();
        postorder(x60);
        System.out.println();
        levelorder(x60);
        System.out.println();
    }
}

Sometime trees can get really complicated and it is not even possible go through it entirely (pratically speaking), for example game trees, so other traversal methods must be used. Imagine a chess computer. It plays by brute forcing every possible move. The nodes represent a possible game state, and the edges represent a legal move. All chess games start out in the same way. This state forms the root node. White goes first with 20 possible moves (each of the 8 pawns can be moved into 2 possible positions, and either one of the two knights can be moved into 2 possible positions). This means the root node has 20 children. Then black goes with 20 possible moves. This means that each of the 20 nodes representing white's first move has 20 children which represent black's first move. After white plays his first turn there are 20 possible states the board can be in, and after black plays his first turn there are 20*20 = 400 possible states the board can be in. If you were to calculate the entire tree for all possible chess games you would have a tree with 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, or 10^120 nodes, give or take a few. For comparision, there have been about 10^26 nanoseconds since the Big Bang. There are thought to be only 10^75 atoms in the entire universe. No chess computer would or could calculate this number of moves. Instead what they do is calculate only 5, 10 or 20 moves into the future. This is Iterative deepening depth-first search. It performs a depth first search up to a certain depth. This is not a perfect solution, there is a flaw known as the horizon effect. Lets say the computer looks 6 moves a head, and then plays its best move. Then it turns out that if it had just looked 8 moves a head it could have played a better move.

Binary Search Tree

A binary search tree is a tree that has two defining properties. Each node has at most two child nodes, hence "binary" and it is ordered in a way that makes search easier. Each node has value, and every node in its left subtree has a value less than it. And every node to the right subtree has a value greater than it. Duplicate values are not allowed.

Every binary search tree has a root node. And new nodes can be inserted systematically, there is no need to manually define and then link the nodes. And this process preserves the binary search property.

It is this property that makes binary search trees much faster for search applications than other data structures. Because of its 2D structure, values can be skipped to get closer to the target value quicker. Other, linear, data structures involve going through each value one by one and comparing it.

Imagine finding a number in a phone book given a name. You don't methodically go through As and Bs then Cs. You open the book to half way, figure out if you what you are looking is in the first half or second half, and then you take that half and recursively search again. until you find what you are looking for. If your contact starts with a letter in M - Z, there is no need to look through A - M. A BST allows you to do this. Comparisions lets to skip unnecsssariy nodes allowing for a faster search. This is why the operations of a tree are typically O(log n). log n is smaller than n (graph it).

           60
          /  \
        40    70
       /  \   / \
      20  50 65  90

The root is 60. 
The child nodes of the root form the roots of other trees. In the above example, 
40 and 70 form the roots of their own tree. 
And note that 20 is less than 40 and 50 is greater 40. 
65 is less than 70 and 90 is greater than 70. 

Example code:

/*

Because of this comparision operation, a BST needs to be sensitive to the type of data it holds. This starts with putting a generic on the class
public class BinarySearchTree<T extends Integer>
The <T extends Integer> allows an Integer type or a subclass of it. 

There exists a constructor because of this need to enforce a data type. 
Method relating to retrieving, inserting and deleting are specific to that instance of a BST, so these methods are not static. 

Of course the traversal is independent of the data type so the remain same as before. Traversal methods are left static. 

a compareTo method is used to compare to different objects. Each object should have a key that can be used for comparisons. 
For example an addressbook BST may be made up of Contact objects. A Contact has a full name and phone number. Every person's name is unique (for this purposes anyway) 
and a typical operation of an address book is to return a phone number given a name. there for, the key should be the String representing the name. It may be necessary to invoke a getKey() method to return this. 

Inserting an item in the BST invovles traversing to the right place. It invovles a recursive call similar to a traversal. The base cases is when null is reached. At each node a decision is made whether to go left child or right child. This decision is based on the data in the current node and the data that is to be inserted. 

              60
            /    \
    	  40      70
       /     \    / \
      20     50  65  90
     / \     / \
	10  30  45  55

EXAMPLE DELETE 40

Remember, in BST for each node all the nodes of the left child have to be less than it and all the nodes to the right child have to be greater than it.

To delete a node with 0 children, simply mark that node as null. For a node with one child, replace the node with the child. For a node with two children, we need to pick a replacement value. A good candidate for the replacement value is the value most closests to it. 
This is either the greatest value found in the values that are lower than it. The the lowest value found in the values that are greater than it. 
Or, in tree terminology, we replace the value with the rightmost value in the left subtree, or the leftmost value in the right subtree.

Take a look at 40. All the values forming the left subtree have values less than 40. all the values forming the right subtree have the values greater than 40.
to properly replace 40, we replace this values with either the greatest (right-most) value found in the left subtree. or the smallest (left-most) value in the right subtree. Lets choose the replacement value to be the smallest value found in right subtree.

Finding the smallest value in the right subtree simply involves taking the root of the subtree and recursively finding the left child until there isn't one. This node either has one right child or no child. To delete this node simply set it to be the right child. Either there is a right child which will take the node's place, or there isn't meaning the node will be marked null. In either case the node has been deleted. 	
*/

import java.util.Queue;
import java.util.LinkedList;

public class BinarySearchTree<T extends Integer> // need to be more specific about the object type we are dealing with here because we need to know for comparisons. 
// In this case we know that we are dealing with Integer objects or a subclass of them.
{
    protected Node<T> root;

    public BinarySearchTree(){
        root = null;
    }

    public BinarySearchTree(T rootItem){
        root = new Node<T>(rootItem);
    }

    public void setRootItem(T newItem) {
        root = new Node<T>(newItem);
    }

    public T getRootItem() {
        return root.getItem();
    }

    public Node<T> getRoot() {
        return root;
    }

    public void insert(T newItem){
        root = insertItem(root, newItem);
    }

    public T retrieve(T searchKey){
        return retrieveItem(root, searchKey);
    }

    public void delete(T searchKey) throws Exception{
        root = deleteItem(root, searchKey);
    }

    protected Node<T> insertItem(Node<T> tNode, T newItem){
        Node<T> newSubtree;
        if (tNode == null){
            //position of insertion found
            //insert after leaf
            //create new node
            tNode = new Node<T>(newItem);
            return tNode;
        }

        T nodeItem = tNode.getItem();

        //search for the insertion position

        if(newItem.compareTo(nodeItem) < 0){
            //search left subtree
            newSubtree = insertItem(tNode.getLeft(), newItem);
            tNode.setLeft(newSubtree);
            return tNode;
        } else {
            newSubtree = insertItem(tNode.getRight(), newItem);
            tNode.setRight(newSubtree);
            return tNode;
        }
    }
	
	protected T retrieveItem(Node<T> tNode, T searchKey){
        T treeItem;
        if (tNode == null){
            treeItem = null;
        } else {
            T nodeItem = tNode.getItem();
            if (searchKey.compareTo(nodeItem)==0){
                //item is in root of some subtree
                treeItem = tNode.getItem();
            } else if (searchKey.compareTo(nodeItem)<0){
                //search left subtree
                treeItem = retrieveItem(tNode.getLeft(), searchKey);
            }else{
                //search right subtree
                treeItem = retrieveItem(tNode.getRight(), searchKey);
            }
        }
        return treeItem;
    }

    protected Node<T> deleteItem(Node<T> tNode, T searchKey) throws Exception{
        Node<T> newSubtree;
        if(tNode == null){
            throw new Exception("TreeException: Item not found");
        }else{
            T nodeItem = tNode.getItem();
            if (searchKey.compareTo(nodeItem) == 0){
                //root
                tNode = deleteNode(tNode);
            }else if(searchKey.compareTo(nodeItem) < 0){
                //left
                newSubtree = deleteItem(tNode.getLeft(), searchKey);
                tNode.setLeft(newSubtree);
            }else{
                //right
                newSubtree = deleteItem(tNode.getRight(), searchKey);
                tNode.setRight(newSubtree);
            }
        }
        return tNode;
    }

    protected Node<T> deleteNode(Node<T> tNode){
        // Algorithm note: There are four cases to consider:
        //   1. The tNode is a leaf.
        //   2. The tNode has no left child.
        //   3. The tNode has no right child.
        //   4. The tNode has two children.
        // Calls: findLeftmost and deleteLeftmost

        T replacementItem;

        if( (tNode.getLeft()==null) && (tNode.getRight()==null)){
            //leaf
            return null;
        }
        else if(tNode.getLeft()==null){
            return tNode.getRight();
        }
        else if(tNode.getRight()==null){
            return tNode.getLeft();
        }
        else{
            replacementItem = findLeftmost(tNode.getRight());
            tNode.setItem(replacementItem);
            tNode.setRight(deleteLeftmost(tNode.getRight()));
            return tNode;
        }
    }

    protected T findLeftmost(Node<T> tNode){
        if(tNode.getLeft()==null){
            return tNode.getItem();
        }
        else{
            return findLeftmost(tNode.getLeft());
        }
    }

    protected Node<T> deleteLeftmost(Node<T> tNode){
        if(tNode.getLeft()==null){
            return tNode.getRight();
        }
        else{
            tNode.setLeft(deleteLeftmost(tNode.getLeft()));
            return tNode;
        }
    }

    protected int getHeight(){
        return getHeight(root, 0);
    }

    protected int getHeight(Node<?> root, int startHeight){
        if (root==null) {
            return startHeight;
        }
        int leftHeight = getHeight(root.getLeft(), startHeight+1);
        int rightHeight = getHeight(root.getRight(), startHeight+1);
        if (leftHeight>rightHeight) {
            return leftHeight;
        }else {
            return rightHeight;
        }
    }

    private static class Node<T>
    {
        public Node<T> left;
        public Node<T> right;
        public T data;
        public Node(T data)
        {
            this.data = data;
        }
        public Node<T> getLeft()
        {
            return this.left;
        }
        public void setLeft(Node<T> left)
        {
            this.left = left;
        }
        public Node<T> getRight()
        {
            return this.right;
        }
        public void setRight(Node<T> right)
        {
            this.right = right;
        }
        public T getItem()
        {
            return data;
        }
        public void setItem(T data)
        {
            this.data = data;
        }
        public T getKey()
        {
            return data;
        }
    }
    public static void preorder(Node<?> n)
    {
        if (n != null)
        {
            System.out.print(n.data + " ");
            preorder(n.getLeft());
            preorder(n.getRight());
        }
    }
    public static void inorder(Node<?> n)
    {
        if (n != null)
        {
            inorder(n.getLeft());
            System.out.print(n.data + " ");
            inorder(n.getRight());
        }
    }
    public static void postorder(Node<?> n)
    {
        if (n != null)
        {
            postorder(n.getLeft());
            postorder(n.getRight());
            System.out.print(n.data + " ");
        }
    }
    public static void levelorder(Node<?> n)
    {
        Queue<Node<?>> nodequeue = new LinkedList<Node<?>>();
        if (n != null)
            nodequeue.add(n);
        while (!nodequeue.isEmpty())
        {
            Node<?> next = nodequeue.remove();
            System.out.print(next.data + " ");
            if (next.getLeft() != null)
            {
                nodequeue.add(next.getLeft());
            }
            if (next.getRight() != null)
            {
                nodequeue.add(next.getRight());
            }
        }
    }
    public static void main(final String[] args)
    {
        BinarySearchTree<Integer> tree = new BinarySearchTree<Integer>(100);
        tree.insert(10);
        tree.insert(20);
        tree.insert(30);
        tree.insert(40);

        tree.insert(140);
        tree.insert(130);
        tree.insert(120);
        tree.insert(110);

        tree.insert(80);
        tree.insert(115);
        tree.insert(90);
        tree.insert(125);
		
        preorder(tree.getRoot());
        System.out.println();
        inorder(tree.getRoot());
        System.out.println();
        postorder(tree.getRoot());
        System.out.println();
        levelorder(tree.getRoot());
        System.out.println();
        System.out.println("The height of the binary search tree is "+tree.getHeight()+".");
    }
}

Analysis

There are two ways to go forward, left child or right child. This compares to array in which only one way to go forward, just use next index. In the worst case scenario only one of the child is used. For example, all nodes are left child of each another. This makes height of the tree equal to the number of elements, like in the case of the LinkedList. This means that operations work in O(n) time. More typically however, we use both childs in storing our data. So for each node, a decision is to be made whether to go with the left child or right child. This cuts down on the number of elements to go through. Operations work in O(log n) time now.

Average case	 Worst case
Retrieval	 O(log n)	 O(n)
Insertion	 O(log n)	 O(n)
Insertion	 O(log n)	 O(n)
Traversal	 O(n)	 O(n)
Traversal goes through each element so it has to be O(n). But the other operations can skip some elements.
Tree terminology
full, all nodes have 0 or 2 children
perfect, full, every parent has 2 children, leaves at same depth 
complete, full until h - 1, h filled from left to right. last level not completely full.
balanced, every left and right subtree differs in height by 0 or 1
full => perfect => complete => balanced
Balance property. Restore from inorder traversal

           60
          /  \
        40    70
       /  \  
      20  50 

20 40 50 60 70

   50
 40 60
20   70

AVL Trees...

Maintain balance automatically. How it works is that every time an operation changes a tree, a balance is taken, and is rotation is taken if necessary. A balanced tree operates more efficiently than an out of balance one.

An AVL Tree is one type of self balancing binary search tree.

The balance of a tree is the difference in the heights of the two child sub trees. Every time an insert or delete operation takes place the balance of the tree is taken. If this is greater than one, rebalancing occurs.

diagram: http://en.wikipedia.org/wiki/AVL_tree Rotation rearranges nodes of a tree while still keeping the binary search tree property - every value in the left subtree of a node is less than the its value. every value in right subtree of node is greater that its value.

/*

http://en.wikipedia.org/wiki/File:AVL_Tree_Rebalancing.svg
Consider the left -right case. imagine that 4 has just been inserted. and 5 is complaining that its left side is too heavy! first 3 and 4 need
to swap places. currently, 4 is attached as the right child of 3 because 4 is greater than 3. if these two are going to swap placed and
preserve the bst order then 3 needs to be attached as the left child of 4 because 3 is less than 4. note that A can stay attached to where it is
because A is less than 3 and it is less than anything greater than 3 and will always be. 
similarly with C, C is greater than 4 and is greater than anything less than 4, and will alway be greater. 
B is a whole another story. Currently it is greater than 3, but less than 4. It has to move because 4's left child will now be 3. 
B will now move from being 4's left child to being 3's right child. Previously B was a value 
greater than 3 and less than 4. Now it remains a value that is
less than 4 and greater than 3. 
and both statements mean the same thing. 

now its time for the right rotate. the recently inserted value of 4 continues on rotating. now with 5. currently 4 and 5 are arranged in a way that
suggests 4 is less than 5 because 4 is the left child of 5. these two need to be rearrange to say that 5 is greater than 4 because 5 is the right
child of 4. so that rearrangement is made. and 3 can remain as it is as the left child of 4 because 3 is less than both 4 and 5. D can remain as it is
because is it greater than both 4 and 5. C needs to change ever slightly from being the left child's right node to being the right child's left node. 
in other words it was previously where is was be cause it was less than5 and greater than4. now it is in somewhere where it is greater than4 and less 
than 5. 

after all of this. the tree is balance because for each node the height of one subtree is not sig greater than the other. 
and the root has changed. 

AVL tree
    Average Worst case
Space   O(n)    O(n)
Search  O(log n)    O(log n)
Insert  O(log n)    O(log n)
Delete  O(log n)    O(log n)

Binary search tree
    Average Worst case
Space   O(n)    O(n)
Search  O(log n)    O(n)
Insert  O(log n)    O(n)
Delete  O(log n)    O(n)

Worst case in BST is so because it could have all its values chained in a single line (like linkedlist)

*/


import java.util.Queue;
import java.util.LinkedList;

public class AVLTree<T extends Integer>
{
    protected Node<T> root;

    public AVLTree(){
        root = null;
    }

    public AVLTree(T rootItem){
        root = new Node<T>(rootItem);
    }

    public void setRootItem(T newItem) {
        root = new Node<T>(newItem);
    }

    public Node<T> getRoot() {
        return root;
    }

    public void insert(T newItem){
        root = insertItem(root, newItem);
    }

    public T retrieve(T searchKey){
        return retrieveItem(root, searchKey);
    }

    public void delete(T searchKey) throws Exception{
        root = deleteItem(root, searchKey);
    }

    public int getBalance(Node<T> tree)
    {
        if (tree!= null)
        {
            int x = 0, y = 0;
            if (tree.getLeft() != null)
            {
                x = tree.getLeft().getHeight();
            }
            if (tree.getRight() != null)
            {
                y = tree.getRight().getHeight();
            }
            return (x-y);
        }
        return 0;
    }

    protected Node<T> insertItem(Node<T> tNode, T newItem){
        System.out.println("insert item");

        Node<T> newSubtree;
        if (tNode == null){
            //position of insertion found
            //insert after leaf
            //create new node
            tNode = new Node<T>(newItem);
            return tNode;
        }
        System.out.println("inserting:"+newItem+" at loc:"+tNode.getItem());
        T nodeItem = tNode.getItem();

        //search for the insertion position

        if(newItem.compareTo(nodeItem) < 0){
            //search left subtree
            newSubtree = insertItem(tNode.getLeft(), newItem);
            tNode.setLeft(newSubtree);
        } else {
            newSubtree = insertItem(tNode.getRight(), newItem);
            tNode.setRight(newSubtree);
        }

        // find out the height of the node formed
        int a = 0, b = 0;
        if (tNode.getLeft() != null)
        {
            a = tNode.getLeft().getHeight();
        }
        if (tNode.getRight() != null)
        {
            b = tNode.getRight().getHeight();
        }
        tNode.setHeight((a>b?a:b)+1);

        int balance = getBalance(tNode);
        System.out.println (newItem + "   " + balance);

        // Left Left Case
        if (balance > 1 && newItem < tNode.getLeft().getItem())
        {
            //System.out.println ("left-left case : going for right : " + newItem + "   " + tNode.getRight().getItem());
            System.out.println("left-left");
            return rightRotate(tNode);
        }

        // Right Right Case
        if (balance < -1 && newItem > tNode.getRight().getItem())
        {
            //System.out.println ("right-right case : going for left : " + newItem + "   " + tNode.getRight().getItem());
            System.out.println("right-right");
            return leftRotate(tNode);
        }

        // Left Right Case
        if (balance > 1 && newItem > tNode.getLeft().getItem())
        {
            //System.out.println ("left-right case : going for left and then right : " + newItem + "   " + tNode.getRight().getItem());
            System.out.println("left-right");
            tNode.setLeft( leftRotate(tNode.getLeft()));
            return rightRotate(tNode);
        }

        // Right Left Case
        if (balance < -1 && newItem < tNode.getRight().getItem())
        {
            //System.out.println ("right-left case : going for right and then left : " + newItem + "   " + tNode.getRight().getItem());
            System.out.println("right-left");
            tNode.setRight(rightRotate(tNode.getRight()));
            return leftRotate(tNode);
        }

        return tNode;
    }

    protected Node<T> rightRotate(Node<T> x)
    {
        System.out.println("right-rotate");
        System.out.println("x:"+x.getItem());
        Node<T> y = x.getLeft();
        System.out.println("y:"+y.getItem());
        Node<T> tmp = y.getRight();

        //right rotation being done
        y.setRight(x);
        x.setLeft(tmp);

        //update the height of x node of AVL Tree
        int a = 0, b = 0;
        if(x.getLeft() != null)
            a = x.getLeft().getHeight();
        if(x.getRight() != null)
            b = x.getRight().getHeight();
        x.setHeight( (a > b ? a : b) + 1);

        //update the height of y node of AVL Tree
        a = 0; b = 0;
        if(y.getLeft() != null)
            a = y.getLeft().getHeight();
        if(y.getRight() != null)
            b = y.getRight().getHeight();
        y.setHeight( (a > b ? a : b) + 1);

        // y is the head of new AVL tree to be returned
        return y;
    }

    protected Node<T> leftRotate(Node<T> x)
    {
        System.out.println("left-rotate");
        System.out.println("x:"+x.getItem());
        Node<T> y = x.getRight();
        System.out.println("y:"+y.getItem());
        Node<T> tmp = y.getLeft();

        //left rotation being done
        y.setLeft(x);
        x.setRight(tmp);

        //update the height of x node of AVL Tree
        int a = 0, b = 0;
        if(x.getLeft() != null)
            a = x.getLeft().getHeight();
        if(x.getRight() != null)
            b = x.getRight().getHeight();
        x.setHeight( (a > b ? a : b) + 1);

        //update the height of y node of AVL Tree
        a = 0; b = 0;
        if(y.getLeft() != null)
            a = y.getLeft().getHeight();
        if(y.getRight() != null)
            b = y.getRight().getHeight();
        y.setHeight( (a > b ? a : b) + 1);

        // y is the head of new AVL tree to be returned
        return y;
    }

    protected T retrieveItem(Node<T> tNode, T searchKey){
        T treeItem;
        if (tNode == null){
            treeItem = null;
        } else {
            T nodeItem = tNode.getItem();
            if (searchKey.compareTo(nodeItem)==0){
                //item is in root of some subtree
                treeItem = tNode.getItem();
            } else if (searchKey.compareTo(nodeItem)<0){
                //search left subtree
                treeItem = retrieveItem(tNode.getLeft(), searchKey);
            }else{
                //search right subtree
                treeItem = retrieveItem(tNode.getRight(), searchKey);
            }
        }
        return treeItem;
    }

    protected Node<T> deleteItem(Node<T> tNode, T searchKey) throws Exception{
        Node<T> newSubtree;
        if(tNode == null){
            throw new Exception("TreeException: Item not found");
        }else{
            T nodeItem = tNode.getItem();
            if (searchKey.compareTo(nodeItem) == 0){
                //root
                tNode = deleteNode(tNode);
            }else if(searchKey.compareTo(nodeItem) < 0){
                //left
                newSubtree = deleteItem(tNode.getLeft(), searchKey);
                tNode.setLeft(newSubtree);
            }else{
                //right
                newSubtree = deleteItem(tNode.getRight(), searchKey);
                tNode.setRight(newSubtree);
            }
        }
        // find out the height of the node formed
        int a = 0, b = 0;
        if (tNode.getLeft() != null)
        {
            a = tNode.getLeft().getHeight();
        }
        if (tNode.getRight() != null)
        {
            b = tNode.getRight().getHeight();
        }
        tNode.setHeight((a>b?a:b));

        int balance = getBalance(tNode);
        System.out.println (searchKey + "   " + balance);

        // Left Left Case
        if (balance > 1 && searchKey < tNode.getLeft().getItem())
        {
            //System.out.println ("left-left case : going for right : " + newItem + "   " + tNode.getRight().getItem());
            System.out.println("left-left");
            return rightRotate(tNode);
        }

        // Right Right Case
        if (balance < -1 && searchKey > tNode.getRight().getItem())
        {
            //System.out.println ("right-right case : going for left : " + newItem + "   " + tNode.getRight().getItem());
            System.out.println("right-right");
            return leftRotate(tNode);
        }

        // Left Right Case
        if (balance > 1 && searchKey > tNode.getLeft().getItem())
        {
            //System.out.println ("left-right case : going for left and then right : " + newItem + "   " + tNode.getRight().getItem());
            System.out.println("left-right");
            tNode.setLeft( leftRotate(tNode.getLeft()));
            return rightRotate(tNode);
        }

        // Right Left Case
        if (balance < -1 && searchKey < tNode.getRight().getItem())
        {
            //System.out.println ("right-left case : going for right and then left : " + newItem + "   " + tNode.getRight().getItem());
            System.out.println("right-left");
            tNode.setRight(rightRotate(tNode.getRight()));
            return leftRotate(tNode);
        }

        return tNode;
    }

    protected Node<T> deleteNode(Node<T> tNode){
        // Algorithm note: There are four cases to consider:
        //   1. The tNode is a leaf.
        //   2. The tNode has no left child.
        //   3. The tNode has no right child.
        //   4. The tNode has two children.
        // Calls: findLeftmost and deleteLeftmost

        T replacementItem;

        if( (tNode.getLeft()==null) && (tNode.getRight()==null)){
            //leaf
            return null;
        }
        else if(tNode.getLeft()==null){
            return tNode.getRight();
        }
        else if(tNode.getRight()==null){
            return tNode.getLeft();
        }
        else{
            replacementItem = findLeftmost(tNode.getRight());
            tNode.setItem(replacementItem);
            tNode.setRight(deleteLeftmost(tNode.getRight()));
            return tNode;
        }
    }

    protected T findLeftmost(Node<T> tNode){
        if(tNode.getLeft()==null){
            return tNode.getItem();
        }
        else{
            return findLeftmost(tNode.getLeft());
        }
    }

    protected Node<T> deleteLeftmost(Node<T> tNode){
        if(tNode.getLeft()==null){
            return tNode.getRight();
        }
        else{
            tNode.setLeft(deleteLeftmost(tNode.getLeft()));
            return tNode;
        }
    }

    protected int getHeight(){
        return getHeight(root, 0);
    }

    protected int getHeight(Node<?> root, int startHeight){
        if (root==null) {
            return startHeight;
        }
        int leftHeight = getHeight(root.getLeft(), startHeight+1);
        int rightHeight = getHeight(root.getRight(), startHeight+1);
        if (leftHeight>rightHeight) {
            return leftHeight;
        }else {
            return rightHeight;
        }
    }

    private static class Node<T>
    {
        public Node<T> left;
        public Node<T> right;
        public T data;
        public int height;

        private Node(){

        }

        public Node(T data)
        {
            this.data = data;
            height = 1;
        }
        public Node<T> getLeft()
        {
            return this.left;
        }
        public void setLeft(Node<T> left)
        {
            this.left = left;
        }
        public Node<T> getRight()
        {
            return this.right;
        }
        public void setRight(Node<T> right)
        {
            this.right = right;
        }
        public T getItem()
        {
            return data;
        }
        public void setItem(T data)
        {
            this.data = data;
        }
        public T getKey()
        {
            return data;
        }
        public int getHeight()
        {
            return height;
        }
        public void setHeight(int height)
        {
            this.height = height;
        }
    }
    public static void preorder(Node<?> n)
    {
        if (n != null)
        {
            System.out.print(n.data + " ");
            preorder(n.getLeft());
            preorder(n.getRight());
        }
    }
    public static void inorder(Node<?> n)
    {
        if (n != null)
        {
            inorder(n.getLeft());
            System.out.print(n.data + " ");
            inorder(n.getRight());
        }
    }
    public static void postorder(Node<?> n)
    {
        if (n != null)
        {
            postorder(n.getLeft());
            postorder(n.getRight());
            System.out.print(n.data + " ");
        }
    }
    public static void levelorder(Node<?> n)
    {
        Queue<Node<?>> nodequeue = new LinkedList<Node<?>>();
        if (n != null)
            nodequeue.add(n);
        while (!nodequeue.isEmpty())
        {
            Node<?> next = nodequeue.remove();
            System.out.print(next.data + " ");
            if (next.getLeft() != null)
            {
                nodequeue.add(next.getLeft());
            }
            if (next.getRight() != null)
            {
                nodequeue.add(next.getRight());
            }
        }
    }
    public static void main(final String[] args)
    {
        AVLTree<Integer> tree = new AVLTree<Integer>(100);
        System.out.println("root 100");
        preorder(tree.getRoot());
        System.out.println();
        inorder(tree.getRoot());
        System.out.println();
        postorder(tree.getRoot());
        System.out.println();
        levelorder(tree.getRoot());
        System.out.println();
        System.out.println("inserting 10");
        tree.insert(10);
        preorder(tree.getRoot());
        System.out.println();
        inorder(tree.getRoot());
        System.out.println();
        postorder(tree.getRoot());
        System.out.println();
        levelorder(tree.getRoot());
        System.out.println();
        System.out.println("inserting 20");
        tree.insert(20);
        preorder(tree.getRoot());
        System.out.println();
        inorder(tree.getRoot());
        System.out.println();
        postorder(tree.getRoot());
        System.out.println();
        levelorder(tree.getRoot());
        System.out.println();
        System.out.println("inserting 15");
        tree.insert(15);
        preorder(tree.getRoot());
        System.out.println();
        inorder(tree.getRoot());
        System.out.println();
        postorder(tree.getRoot());
        System.out.println();
        levelorder(tree.getRoot());
        System.out.println();
        System.out.println("deleting 20");
        try{tree.delete(20);}catch(Exception e){e.toString();}
        
        preorder(tree.getRoot());
        System.out.println();
        inorder(tree.getRoot());
        System.out.println();
        postorder(tree.getRoot());
        System.out.println();
        levelorder(tree.getRoot());
        System.out.println();
        System.out.println("The height of the binary search tree is "+tree.getHeight()+".");

    }    
}

Graphs

Now to move onto something that is more general. Graphs are a fundamental data structure.

A graph (G) is set of vertices (V(G)) and edges which connect these vertices (E(G)).

Graphs have many applications including the web (vertices=web pages, edges =hyperlinks), and social networks (vertices=users, edges='is a friend'). Edges can be both directed (one direction only, webpage A has a link to webpage B, but webpage B does not have a link to webpage A) or undirected (both directions, friendships are not one sided so person A being a friend of person B implies person B is a friend of person A). Here is an interesting presentation by the NSA on graphs and their applications: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf

Graphs problems can range from finding a path (or a shortest ptah) from one vertex to another, to finding the max flow of water that can be put through a set of connected pipes.

Trees are a special type of graph. Any two vertices in a tree are connected by one simple path. A simple path is a path with no repeated vertices. Trees are also a connected graph. All in all, a tree is a graph that has just enough edges to connect every two vertices to each other ("connected") and no more ("simple path").

Order of a graph is the number of vertices in a graph. |V(G)|

Size of a graph is the number of edges. |E(G)| Sparse graphs have fewer edges than dense graphs.

Degree of a vertex is the number of edges that is connected to it.

In a tree, |E(G)| = |V(G)| - 1. Because a tree has just enough edges to ensure that everything is connected.

Terminology

  • tree - collection of vertices and edges
    • nodes - stores data item
      • parent nodes have children
      • child nodes have parent
      • leaf have no children and only have parent
      • root have no parent and only have children
    • edges - links between vertices
      • depth (distance between current node and the root)
      • height (greatest distance between current node and a leaf)
      • both measured in edge count


Graphs can have at most (|V(G)| * |V(G)| - 1) / 2 edges.

  • (|V(G)| * |V(G)| - 1), because an edge should connect a vertex to another vertex. -1 because it should be different though it is possible for an edge to connect a vertex to itself.
  • /2, because it is assumed that and edge from u to v (u,v) is same as (v,u). this is the case for undirected graphs. not the case with directed graph where edges, like arrows, have direction.


Neighbourhoods. Neighbours are the vertices that are one edge away from you. In directed graphs, there are two different types of neighbours. Let A be a vertex and B be a neighbor. B is an out-neighbor of A if it is possible to walk from A to B. B is an in-neighbor of A if it is possible to walk from B to A. Essentially you go out to meet your out-neighbours, and your in-neighbours walk in to see you.

  • Source: the vertex where all the arcs come from. Everything point out. Nothing points in (in-degree = 0). The root of a tree would be a source node.
  • Sink: the vertex where all the arcs point to. Nothing points out (out-degree = 0). Everything points in.
  • Walks: Sequence of vertices that are connected by an edge.
  • Paths: walks with no repeated vertices (sometimes also known as a simple path). In a tree every two vertices has a simple path between them.
  • Cycle: walks that start and end at same vertex (must have >=3 vertices). A cycle cannot exist in a tree.

Subgraphs and subdigraphs: Subsets of a graph/digraph.

  • Spanning sub(di)graph: contains all the original vertices (in other words, unnecessary edges are removed)
  • Induced sub(di)graph: contains the original edges when necessary (in other words, not all the original vertices needs to be present, but if they do the edges that connect them must be there).
  • Reverse digraph: the all arcs are reversed
  • Underlying graph of a digraph has all the arc converted into edges. (from directed to undirected).

Digraphs can have cycles.

  • Digraph without cycle is called acyclic. Full term is directed acyclic graph or 'DAG'.
  • Without a cycle, a walk involving all the vertices has limited length, starts with a source and ends with a sink.
  • Imagine a digraph and all its directed edges. And then imagine physically moving the vertices around (the graph remains logically the same). Eventually you could get a physical arrangement of the vertices so that all the arcs point in one direction. This is called a topological order, and only DAGs have them.
  • Recall that in connected graphs you can go from each vertex to another.
  • Connected components describe the subgraphs that are connected. The original graph is not connected. But if you take parts of this graph you can get subgraphs which are connected. These subgraphs are connected components. They're sort of like cliques in that no member of a clique contacts a member of another clique. But these all these cliques can all be considered students of a school.
  • A digraph is weakly connected if it can be described as connected if and only if you ignore the direction of the arcs. It is strongly connected if it is connected without ignoring the arcs.

Storing a graph (serializing)

There are two ways to store a graph. Adjacency matrix and Adjacency list.

Adjacency matrix is an |V| * |V| matrix (which takes up O(|V|^2) space). Each vertex is represented by a row and a column. Entries represents edges. (i,j)=1 represents edge from i to j. (i,j)=0 means no edge from i to j. Then there are weighted and negative weight edges where the value can be >1 or <0. Directed graphs can mean that (i,j)=1 and (j,i)=0. In undirected graphs (i,j)=1 must mean (j,i)=1.

Adjacency list is a list of vertex entries. Each entry of this list is another list of the vertex's out-bound neighbours.

For large numbers of edge connections ("dense") and for applications requiring large numbers of edge operations Adjacency matrix is fast with O(1) for add/remove/retrieval at the cost of higher storage cost and slower vertex operations O(|V|^2).

Adjacency list is well rounded with O(1) best case for adding edges and vertices, O(|V|+|E|) worst case for storage costs. Reading of edges requires traversal O(|V|).

http://bigocheatsheet.com/#graphs

Elementary Graph Operations

  • Does the arc (u,v) exist?
  • What is the outdegree of vertex u?
  • What is the indegree of vertex u?
  • Add an arc between two vertices u and v.
  • Delete the arc between vertices u and v.
  • Add a node to the (di)graph.
  • Delete a node from the (di)graph
class Edge(object):
    def __init__(self, u, v):
        self.source = u
        self.sink = v
    def __repr__(self):
        return "%s->%s" % (self.source, self.sink)
 
class Graph(object):
    def __init__(self):
        self.adj = {}
 
    def add_vertex(self, vertex):
        self.adj[vertex] = []
 
    def get_edges(self, v):
        return self.adj[v]
 
    def add_edge(self, u, v, w=0):
        if u == v:
            raise ValueError("u == v")
        edge = Edge(u,v)
        redge = Edge(v,u)
        edge.redge = redge  #redge is not defined in Edge class
        redge.redge = edge
        self.adj[u].append(edge)
        self.adj[v].append(redge)
 
g=Graph()
map(g.add_vertex, ['s','o','p','q','r','t'])
g.add_edge('s','o')
g.add_edge('s','p')
g.add_edge('o','p')
g.add_edge('o','q')
g.add_edge('p','r')
g.add_edge('r','t')
g.add_edge('q','r')
g.add_edge('q','t')

#print map(g.get_edges, ['s','o','p','q','r','t'])
print "%s: %s"% ('s',g.get_edges('s')) #[s->o, s->p]
print "%s: %s"% ('o',g.get_edges('o')) #[o->s, o->p, o->q]
print "%s: %s"% ('p',g.get_edges('p')) #[p->s, p->o, p->r]
print "%s: %s"% ('q',g.get_edges('q')) #[q->o, q->r, q->t]
print "%s: %s"% ('r',g.get_edges('r')) #[r->p, r->t, r->q]
print "%s: %s"% ('t',g.get_edges('t')) #[t->r, t->q]

print "Does the arc (p,o) exist?"
retrieve = g.adj['p'] # O(1)
result = "no"
for i in retrieve: # O(d)
    if i.sink == 'o':
        result = "yes"
print result

print "Does the arc (o, t) exist?"
retrieve = g.adj['o'] # O(1)
result = "no"
for i in retrieve: # O(d)
    if i.sink == 't':
        result = "yes"
print result

print "What is the out-degree of vertex 'o'?"
retrieve = g.adj['o'] # O(1)
result = 0
result = len(retrieve)
print result

print "What is the in-degree of vertex 'o'?" #O(n + |E|)
retrieve = g.adj # O(1)
result = 0
for i in retrieve: # O(n)
    if i != 'o':
        for j in retrieve[i]: # O(|E|)
            if j.sink == 'o':
                result = result + 1
print result

print "Add an arc between two vertices s and t."
g.adj['s'].append(Edge('s','t')) # O(1)
print "Done"

print "Delete the arc between two vertices s and t."
retrieve = g.adj['s']
for i in retrieve: # O(d)
    if i.sink == 't':
        retrieve.remove(i)
print "Done"

print "Add vertex 'b'"
retrieve = g.adj
retrieve['b'] = [] # O(1)
print "Done"

print "Remove vertex 'b'"

print "Done"

class Graph(object):
    def __init__(self):
        self.adj = {}
 
    def add_vertex(self, vertex):
        self.adj[vertex] = {}
        for i in self.adj:
            self.adj[i][vertex] = 0
            self.adj[vertex][i] = 0

    def get_edges(self, v):
        return self.adj[v]
 
    def add_edge(self, u, v, w=0):
        self.adj[u][v] = 1
        self.adj[v][u] = 1
 
g=Graph()
map(g.add_vertex, ['s','o','p','q','r','t'])

g.add_edge('s','o')
g.add_edge('s','p')
g.add_edge('o','p')
g.add_edge('o','q')
g.add_edge('p','r')
g.add_edge('r','t')
g.add_edge('q','r')
g.add_edge('q','t')

#print map(g.get_edges, ['s','o','p','q','r','t'])
print "%s: %s"% ('s',g.get_edges('s')) #[s->o, s->p]
print g.get_edges('o') #
print g.get_edges('p') #
print g.get_edges('q') #
print g.get_edges('r') #
print g.get_edges('t') #


print "Does the arc (p,o) exist?"
retrieve = g.adj['p']['o'] # O(1)
result = "no"
if retrieve == 1:
    result = "yes"
print result

print "Does the arc (o, t) exist?"
retrieve = g.adj['o']['t'] # O(1)
result = "no"
if retrieve == 1:
    result = "yes"
print result

print "What is the out-degree of vertex 'o'?"
retrieve = g.adj['o'] # O(1)
result = 0
for i in retrieve: #O(n)
    if g.adj['o'][i] == 1:
        result = result + 1
print result

print "What is the in-degree of vertex 'o'?"
retrieve = g.adj # O(1)
result = 0
for i in retrieve: #O(n)
    if g.adj[i]['o'] == 1:
        result = result + 1 # reassignment, not increment. in python integers are immutable. in java strings are immutable. 
print result

print "Add an arc between two vertices s and t."
g.adj['s']['t'] = 1 #O(1)
print "Done"

print "Delete the arc between two vertices s and t."
g.adj['s']['t'] = 0 #O(1)
print "Done"

print "Add vertex 'b'"
retrieve = g.adj 
retrieve['b'] = {}
for i in retrieve: # O(n)
    retrieve[i]['b'] = 0
    retrieve['b'][i] = 0
print "Done"

print "Remove vertex 'b'"

print "Done"

Delete a vertex from the (di)graph Adjacency matrix remove row and column (shift n-1 elements up and n-1 elements left) Θ(n^2) Adjacency list look at all n entries check if node needs to be removed Θ(m+n) In sufficiently sparse graphs, the list wins, but as m≤ 2n(n-1) for digraphs and m≤ n(n-1) for graphs, this isn't necessarily always so! Conclusion Use adjacency matrix for small, dense (di)graphs for which we wish to test for the existence of arcs, find the indegree of vertices, and/or delete arcs Use adjacency list for large, sparse (di)graphs for which we need to compute outdegree and add and/or delete vertices.

Basic Graph Example

Basic example of a graph done in python.

An Edge connects two vertices.

A Graph supports the insertion of vertices and edges which connect the vertices to each other.

Inserting a vertex involves creating a blank entry for it in the adjacency list. Adding an edge between two vertices involve creating an Edge object and then appending that to the appropriate vertex entry in the adjacency list.

To do: code the above basic graph operations.

class Edge(object):
    def __init__(self, u, v):
        self.source = u # Its not necess. to store source in Edge object because Edge is stored at an index in adj list/mat. this is redundant! 
        self.sink = v
    def __repr__(self):
        return "%s->%s" % (self.source, self.sink)
 
class Graph(object):
    def __init__(self):
        self.adj = {}
 
    def add_vertex(self, vertex):
        self.adj[vertex] = []
 
    def get_edges(self, v):
        return self.adj[v]
 
    def add_edge(self, u, v, w=0):
        if u == v:
            raise ValueError("u == v")
        edge = Edge(u,v)
        redge = Edge(v,u)
        edge.redge = redge  #redge is not defined in Edge class
        redge.redge = edge
        self.adj[u].append(edge)
        self.adj[v].append(redge)
 
g=Graph()
map(g.add_vertex, ['s','o','p','q','r','t'])
g.add_edge('s','o')
g.add_edge('s','p')
g.add_edge('o','p')
g.add_edge('o','q')
g.add_edge('p','r')
g.add_edge('r','t')
g.add_edge('q','r')
g.add_edge('q','t')

#print map(g.get_edges, ['s','o','p','q','r','t'])
print g.get_edges('s')
print g.get_edges('o')
print g.get_edges('p')
print g.get_edges('q')
print g.get_edges('r')
print g.get_edges('t')


The above uses a python dictionary data type to serve as an adjacency list. A vertex is added in by creating an empty entry with the vertex as a reference. Edges are attached to vertices by finding the vertice entry in the adjacency list and appending an Edge object to it. An Edge object consists of two end points which are the vertices it connects.

It is also possible to use an adjacency matrix instead of a list. Here is the code that can construct an adjacency matrix from a list.

self.muh_matrix = {}
    def to_matrix(self):
	for i in self.adj:
		self.muh_matrix[i] = {}
		for j in self.adj:
			self.muh_matrix[i][j] = 1
			self.muh_matrix[i][i] = 0

Graph Traversal

We want to visit all vertices by following the existing arcs/edges. Need to avoid revisiting vertices. We avoid revisiting vertices by marking them as either not yet visited ("white"), currently visiting ("grey") and have visited ("black"). We have a data structure that stores a list of nodes to be visited. It starts off with one vertex (a root). The graph traversal procedure removes this vertex from the data structure, marks it as visited and add its neighbors to it. We repeat this until all vertices have been visited.

There are two graph traversal algorithms. Depth-first search and Breadth-first search. Fundamentally the are the same. We add neighbours to a list of vertexs to visit. The thing that makes them different is how the next vertices are chosen.

Search forest

  • A is a parent of B if there is a directed edge from A to B
  • A is a child of B if there is a directed edge from B to A
  • A is a ancestor of B if if there is a directed path from A to B
  • A is a descendant of B if there is a directed path from B to A

Imagine a family tree with the older members on top and arrows pointing down the generations. Everytime we pick a node at random to traverse (as opposed to following neighbours) we start a new disjoint tree. Put together, these disjoint trees are called forests.

  • Arcs: There are four different kinds of arcs in the search forest F for a graph G. In the graph G, there exists arcs. These arcs can be classified in the following.
    • The arcs in G that also in F are called tree arcs. Remember, like a family tree, with arcs point from the older generation to the younger. Tree arcs are the arcs in the graph that point to an immediate descendent. Parents to their children.
    • Forward arcs. They don't exist in F. But they do connect one vertex to an decedent. Imagine an arc that connects from your grandparents to you.
    • Back arcs. They don't exist in F. But they do connect one vertext to an ancestor. Imagine an arc that connects from you to your grandparents. They don't exist with DAGs.
    • Cross arcs. Neither of the above. Say a sibling or an uncle/aunt arc.

Depth first search

The data struct behind this is the stack. LIFO. It specifies that we select the next grey vertex to pick as the youngest remaining grey vertex. We start at the root and add all its child nodes into the stack. We take one vertex out of the stack (the last one we add). And add its childs back in. Eventually we hit a leaf node that has no children. So we go up to its parent node and pick another child. and so on an so sort. stacks can mean recursion. in the recursive implementation below the stack is the system memory's call stack. when a function is called it is added to the stack. when its finished its removed.

Breadth first search

The data struct behind this is the queue. FIFO. It specifies that we select the next grey vertex to pick as the oldest remaining grey vertex. We start at the root and add their children into the queue. we take one vertex out of the queue (the first one we added) and add its children to the queue. then we go back to the queue, take one out and add its children in. queues = iteration Does not produce forward arcs (you're not going to be deep enough). but do have back and cross arcs. in addition to tree arcs. Depth first search produces all the arcs: forward arcs because it can go deep enough. There is no time complexity difference between DFS and BFS. Choosing next vertex is a constant time operation.

//Actually do DFS or BFS Graph Search. http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=graphsDataStrucs2

//See that CS 220 assignment

//One difficulty i'm having so far is in the bfs and dfs pick the neighbor nodes to traverse. it doesn't specify how to, it just says pick one. technically, you're supposed to pick the left most (lowest value) right? like how in a family tree the siblings are arranged left (youngest) to right (oldest). not too big a deal, but it may mess up the arc type for dfs.

but lets ignore that for now....


Note that if the graph is already in a sorted order. Then use Binary Search Tree instead! Use this method if not a binary (search) tree.

class Edge(object):
    def __init__(self, u, v):
        self.source = u
        self.sink = v
    def __repr__(self):
        return "%s->%s" % (self.source, self.sink)

class Graph(object):
    def __init__(self):
        self.adj = {}
        self.visited = []

    def add_vertex(self, vertex):
        self.adj[vertex] = []

    def get_edges(self, v):
        return self.adj[v]

    def add_edge(self, u, v, w=0):
        if u == v:
            raise ValueError("u == v")
        edge = Edge(u,v)
        redge = Edge(v,u)
        edge.redge = redge  #redge is not defined in Edge class
        redge.redge = edge
        self.adj[u].append(edge)
        self.adj[v].append(redge)

    def dfs(self, s):
    	label = {}
        stack = []
	explored = [] # stack to store explored (all neighbors discovered
        stack.append(s) # s is discovered (first visited)
        while len(stack) > 0:
            print stack
	    top = stack.pop()
            print top
# terminated if correct node found
# continue if not yet found
# or just traversing through all

		# the categorising is not working properly! https://en.wikipedia.org/wiki/Depth-first_search#Pseudocode
# a worked example http://www.cs.auckland.ac.nz/compsci220s1t/archive/compsci220ft/tutorials/tut06/TUTORIALl-6.pdf
            for i in self.adj[top]: # get all adjacent edges (i.sink are adjacent nodes)
                if i.sink not in explored and i.sink not in stack: # not discovered and not explored
                	stack.append(i.sink)
			print "%s-%s is tree-edge"%(top,i.sink)
		elif i.sink in stack:
      		 	print "%s-%s is back-edge"%(top,i.sink)
		else:
			print "%s-%s is forward or cross edge"%(top,i.sink)
            explored.append(top)

    def dfs_recursive(self, s, visited=[]):
        print visited
        print s
        visited.append(s)

        for i in self.adj[s]:
            if i.sink not in visited:
                g.dfs_recursive(i.sink,visited)

    def bfs(self, s): # https://en.wikipedia.org/wiki/Breadth-first_search
        queue = []
        queue.append(s)
        while len(queue) > 0:
            print queue
            top = queue[0]
            queue.remove(top)
            print top

	    # we now can do something with this node if we want to
	    # eg if statement to see if thise is node we want
	    # if so, then return it.
            #traverse, no need do anythin.

            self.visited.append(top)

            for i in self.adj[top]:
                if i.sink not in self.visited and i.sink not in queue:
                    queue.append(i.sink)


g=Graph()
map(g.add_vertex, ['s','o','p','q','r','t'])
g.add_edge('s','o')
g.add_edge('s','p')
g.add_edge('o','p')
g.add_edge('o','q')
g.add_edge('p','r')
g.add_edge('r','t')
g.add_edge('q','r')
g.add_edge('q','t')

#print map(g.get_edges, ['s','o','p','q','r','t'])
print g.get_edges('s')
print g.get_edges('o')
print g.get_edges('p')
print g.get_edges('q')
print g.get_edges('r')
print g.get_edges('t')

#g.bfs('s')
g.dfs('s')
#g.dfs_recursive('s')

Single source shortest path

Dijkstra's algorithm

We find shortest route from node A to node B in a graph with non-negative weight edges.

Each node has a distance from the start. The start node has a distance of 0. All other nodes have distance of inf to represent a not yet reachable target.

At each step we pick the node with the shortest distance from the start that has not been visited yet. This is where a heap comes in handy. A heap implements an abstract data type called a priority queue. Just like other ADTs, you can add items in and then take them out later. In a queue the item taken out is oldest one, in a stack it is the newest one. In a priority queue each item has a priority. And the one taken out is the one with the lowest (or highest) priority. Heaps implement the priority queue by using a tree-based data structure that satisfies the heap property. The heap property is that the root nodes have a value greater than that of any value in its decendante nodes.

So after we've picked a node we cause a ripple effect to the neighbors, in a process called 'relaxation'. The distance to each neighbor is calculated by taking into account the current node's distance from the start and the weight of the edge to that neighbor. If this distance is less than the recorded distance of that neighbor node, the recorded distance gets updated.

wiki

Each node in the graph is visited only once, though its distance may be updated multiple times (or not at all) depending on the number of neighbors. Each relaxation gets a smaller and smaller value. This allows us to find the best path through a graph in O(m * log(n)) time where n is the number of vertices and m is the number of edges in the graph.

Example. Traffic flow. Weights of edges represent how fast traffic flows. Vertices are intersections. You want to minimize time spent in traffic so you prefer routes with small weight edges. Dijkstra's algorithm can be used here.

# based on http://maxburstein.com/blog/introduction-to-graph-theory-finding-shortest-path/
import heapq
import sys

class Edge(object):
    def __init__(self, u, v, w):
        self.source = u
        self.sink = v
		self.distance = w
    def __repr__(self):
        return "%s->%s:%s" % (self.source, self.sink, self.distance)
 
class Graph(object):
    def __init__(self):
        self.adj = {}
 
    def add_vertex(self, vertex):
        self.adj[vertex] = []
 
    def get_edges(self, v):
        return self.adj[v]
 
    def add_edge(self, u, v, w=0):
        if u == v:
            raise ValueError("u == v")
        edge = Edge(u,v,w)
        redge = Edge(v,u,w)
        edge.redge = redge  #redge is not defined in Edge class
        redge.redge = edge
        self.adj[u].append(edge)
        self.adj[v].append(redge)

    def shortest_path(self, start, finish):
    	distances = {} # Distance from start to node
		previous = {} # Previous node in optimal path from source
		nodes = [] # Priority queue of all nodes in Graph
	
	# init
	for vertex in self.adj:
		if vertex == start:
			distances[vertex] = 0
			heapq.heappush(nodes, [0, vertex])
		else:
			distances[vertex] = sys.maxsize
			heapq.heappush(nodes, [sys.maxsize, vertex])
		previous[vertex] = None
	
	print "This is the heap which will be used in traversal"
	print "Remember a heap is popped to retrieve an entry. This entry is smallest one."
	print "In this case it is the node with the smallest distance from start."
	for n in nodes:
		print n

	while nodes:
		print ""
		print "Looking at this node now:",
		smallest = heapq.heappop(nodes)[1]
		print smallest
		print "%s is this far from start: %s"%(smallest,distances[smallest])

		if smallest == finish:
			print ""
			print "Finished. We found our goal."
			print "Going through 'previous' array now"
			path = []
			while previous[smallest]:
				path.append(smallest)
				print "%s:%s"%(smallest,previous[smallest])
				smallest = previous[smallest]
			path.append(smallest)
			return path
		if distances[smallest] == sys.maxsize:
			print ""
			print "Broke"
			break

		for neighbor in self.adj[smallest]:
			print " This is a neighbor: %s"%neighbor.sink
			print "  %s is this far from start: %s"%(neighbor.sink,distances[neighbor.sink])
			print "  %s is this far from %s: %s"%(neighbor.sink,smallest,neighbor.distance)
			alt = distances[smallest] + neighbor.distance
			print "  %s is this far from start, going through %s: %s"%(neighbor.sink,smallest,alt)
			if alt < distances[neighbor.sink]:
				distances[neighbor.sink] = alt
				previous[neighbor.sink] = smallest #mark this line. how does previous change?
				print "  %s has a new distance from start: %s"%(neighbor.sink,distances[neighbor.sink])
				print "  To get to %s you go through %s"%(neighbor.sink,smallest)
				print "  This is recorded in 'previous' array: %s"%previous
# what do these next 4 lines do? 
# traversal is based on a heap
# heaps always pull out the lowest (or smallest) value
# this heap is made up of a pair, distance and node name
# adding new distances helps choose the next one to visit, beacuse it always visits the smallest one
# it will visit the largest one iff no other connection and goal not yet found (does not exist)
# not this is only for deciding which to traverse next
# distances give the actual distance from start
				print "  We added a new distance, so its time to update our heap"

				for n in nodes:
					if n[1] == neighbor.sink:
						n[0] = alt
						break
				heapq.heapify(nodes)

				print "  The heap looks like this now"

				for n in nodes:
					print "  %s"%n
	return distances
 
g=Graph()
#map(g.add_vertex, ['A','B','C','D','E','F','G','H'])
g.add_vertex('A')
g.add_vertex('B')
g.add_vertex('C')
g.add_vertex('D')
g.add_vertex('E')
g.add_vertex('F')
g.add_vertex('G')
g.add_vertex('H')

g.add_edge('A','B',7)
g.add_edge('A','C',8)
g.add_edge('B','F',2)
g.add_edge('C','F',6)
g.add_edge('C','G',4)
g.add_edge('D','F',8)
g.add_edge('E','H',1)
g.add_edge('F','G',9)
g.add_edge('F','H',3)

print "These are the edge connections"
print g.get_edges('A')
print g.get_edges('B')
print g.get_edges('C')
print g.get_edges('D')
print g.get_edges('E')
print g.get_edges('F')
print g.get_edges('G')
print g.get_edges('H')

print g.shortest_path('A','H')

Anaylsis

An algorithm that sometimes can solve optimization problems is the Greedy Algorithm. In the greedy algorithm we make several small steps to our goal and at each step we choose the optimal step, greedy-choice.

See more information on Dynamic_Programming_vs_Greedy_Algorithms#Dijkstra.27s_algorithm


Other related algorithms.

Bellman–Ford algorithm can be used on graphs with negative edge weights, as long as the graph contains no negative cycle reachable from the source vertex s. Which makes sense because you could go round and round the cycle to get a smaller and smaller weight.

The A* search algorithm is a generalization of Dijkstra's algorithm that cuts down on the size of the subgraph that must be explored, if additional information is available that provides a lower bound on the "distance" to the target.

It first searches the routes that appear to be most likely to lead towards the goal. What sets A* apart from a greedy best-first search is that it also takes the distance already traveled into account. It uses a bit of intelligence called a heuristic to decide the next node to visit. The heuristic here is a guess of how far the current node is to the goal node. This can be node's Chebychev distance or Manhattan distance.

Just like with Dijkstra's a priority queue is used and with each visited node it adds its neighbors to the pq. With A* however, each node added also includes its heuristic which determines when its picked out from the pq. We pick the node with the smallest score, that is, the node we think is closets to the goal node.

In Dijkstra's the next node chosen is the one closest to any visited node. Dijkstra's can be viewed as a special case of A* where h(x) = 0 for all x (heuristic is 0 in all cases). .

The process that underlies Dijkstra's algorithm is similar to the greedy process used in Prim's algorithm. Prim's purpose is to find a minimum spanning tree that connects all nodes in the graph; Dijkstra is concerned with only two nodes. Prim's does not evaluate the total weight of the path from the starting node, only the individual path.

Bellman–Ford algorithm

There may be graphs with negative weight edges in which you want to perform SSSP on. In this case use Bellman–Ford algorithm. Like Dijkstra's it solves the single source shortest path problem. Unlike Dijkstra's it can handle negative weight edges at the cost of being a bit slower than Dijkstra's.

Like Dijkstra's Algorithm, Bellman–Ford is based on the principle of relaxation, in which an approximation to the correct distance is gradually replaced by more accurate values until eventually reaching the optimum solution. In both algorithms, the approximate distance to each vertex is always an overestimate of the true distance, and is replaced by the minimum of its old value with the length of a newly found path. However, Dijkstra's algorithm greedily selects the minimum-weight node that has not yet been processed, and performs this relaxation process on all of its outgoing edges; in contrast, the Bellman–Ford algorithm simply relaxes all the edges, and does this |V | − 1 times, where |V | is the number of vertices in the graph. In each of these repetitions, the number of vertices with correctly calculated distances grows, from which it follows that eventually all vertices will have their correct distances. This method allows the Bellman–Ford algorithm to be applied to a wider class of inputs than Dijkstra.

wiki

import heapq
import sys

class Edge(object):
    def __init__(self, u, v, w):
        self.source = u
        self.sink = v
        self.distance = w
    def __repr__(self):
        return "%s->%s:%s" % (self.source, self.sink, self.distance)
 
class Graph(object):
    def __init__(self):
        self.adj = {}
 
    def add_vertex(self, vertex):
        self.adj[vertex] = []
 
    def get_edges(self, v):
        return self.adj[v]
 
    def add_edge(self, u, v, w=0):
        if u == v:
            raise ValueError("u == v")
        edge = Edge(u,v,w)
        redge = Edge(v,u,w)
        edge.redge = redge  #redge is not defined in Edge class
        redge.redge = edge
        self.adj[u].append(edge)
        self.adj[v].append(redge)

    def shortest_path(self, start, finish):
    	distances = {} # Distance from start to node
        previous = {} # Previous node in optimal path from source
        nodes = [] # Priority queue of all nodes in Graph

        # init
        for vertex in self.adj:
            if vertex == start:
                distances[vertex] = 0
            else:
                distances[vertex] = sys.maxsize
                previous[vertex] = None
        
        # relax all the edges.
        for vertex in self.adj:
            print vertex
            for neighbor in self.adj[vertex]:
                print " This is a neighbor: %s"%neighbor.sink
                print "  %s is this far from start: %s"%(neighbor.sink,distances[neighbor.sink])
                print "  %s is this far from %s: %s"%(neighbor.sink,vertex,neighbor.distance)
                if distances[vertex] + neighbor.distance < distances[neighbor.sink]:
                    distances[neighbor.sink] = distances[vertex] + neighbor.distance
                    previous[neighbor.sink] = vertex

        # check for -ve weight
        for vertex in self.adj:
            for neighbor in self.adj[vertex]:
                if distances[vertex] + neighbor.distance < distances[neighbor.sink]:
                    print "Error: Graph contains negative weight cycle"
    
        # print it out
		print ""
		print "Finished. We found our goal."
		print "Going through 'previous' array now"
        smallest = finish
        while previous[smallest]:
            print "%s ->"%smallest,
            smallest = previous[smallest]
            if smallest == start:
                print "%s"%start
                break

        return distances

g=Graph()
#map(g.add_vertex, ['A','B','C','D','E','F','G','H'])
g.add_vertex('A')
g.add_vertex('B')
g.add_vertex('C')
g.add_vertex('D')
g.add_vertex('E')
g.add_vertex('F')
g.add_vertex('G')
g.add_vertex('H')

g.add_edge('A','B',7)
g.add_edge('A','C',8)
g.add_edge('B','F',2)
g.add_edge('C','F',6)
g.add_edge('C','G',4)
g.add_edge('D','F',8)
g.add_edge('E','H',1)
g.add_edge('F','G',9)
g.add_edge('F','H',3)

print "These are the edge connections"
print g.get_edges('A')
print g.get_edges('B')
print g.get_edges('C')
print g.get_edges('D')
print g.get_edges('E')
print g.get_edges('F')
print g.get_edges('G')
print g.get_edges('H')

print g.shortest_path('A','H')

A * Search Algorithm

A* Search used for path finding and graph traversal. extends Dijkstra's algorithm. uses heuristics (doesn't just take the shortest distance)

like in games. esp rts. A unit is at loc A and needs to be at B.

Dijkstra: from a node, examine all connected edges. mark that node so we don't visit it again. and then pick the closest node edge and repeat again.

It is a best first search. Finds the least cost path from one node to another.

Heuristic, gives a dynamic way to pick a path to target node. Its a best guess of the path to take.

The heuristic function is admissible. never overestimates the cost of reaching the goal. Two functions are Chebychev distance (diagonals and up/down/left/right, and Manhattan distance(only up/down/left/right).

Chebyshev Distance heuristic distance between two nodes is: chebyshev_distance = [(current.x - target.x).abs, (current.y - target.y).abs].max

Uses a priority queue sort of like Dijkstra's. stores alternate paths

knowledge-plus-heuristic cost function of node x the past path-cost function, which is the known distance from the starting node to the current node x (usually denoted g(x)) a future path-cost function, which is an admissible "heuristic estimate" of the distance from x to the goal (usually denoted h(x)).

the g(x) part of the heuristic is the cost from the starting point, not simply the local cost from the previously expanded node.


Uses a (nxn?) grid as a graph. Finds distance between two points (nodes) in this grid.

good visual: https://upload.wikimedia.org/wikipedia/commons/5/5d/Astar_progress_animation.gif

wiki

complete, will find a solution if exists (like bfs) Dijkstra's algorithm, as another example of a uniform-cost search algorithm, can be viewed as a special case of A* where h(x) = 0 for all x.

# original: http://maxburstein.com/blog/intro-to-graph-theory-finding-shortest-path-part2/
import heapq
import sys

class Node(object):
    def __init__(self, x, y):
        self.x = x # coord pos
        self.y = y # coord pos
        self.obstacle = False # is a node that can go through or not?
        self.g_score = sys.maxsize # best distance from the source 

    def __repr__(self):
        return "(%s,%s,%s)"%(self.x, self.y, self.obstacle)

class Graph(object):
    def __init__(self):
    	None

    def __init__(self, size):
    	self.size = size 
    	self.grid = []
    	for y in range(0,self.size): # 0..7
    		row = []
    		for x in range(0,self.size): # 0..7
    			row.append(Node(x,y))
    		self.grid.append(row)
	
	def set_obstacle(self,x,y):
    	self.grid[y][x].obstacle = True

    def shortest_path(self, start_x, start_y, end_x, end_y): # takes in the starting and ending x y pos. 
    	def heuristics(current,target):
    		return max(abs(current.x-target.x),abs(current.y-target.y))

    	start = self.grid[start_y][start_x]
    	finish = self.grid[end_y][end_x]

    	visited = [] # The set of nodes already evaluated
		# we do not want to visit the same node twice. 
    	previous = {} # Previous node in optimal path from source
		# keeps track of how we go to a node, just like in dijkstra's 
    	previous[start] = 0
    	f_score = {} # min-heap (priority queue)
		# stores node's distance from source + heuristics calculation 

		# All possible ways to go in a node
    	dx = [1, 1, 0, -1, -1, -1, 0, 1]
    	dy = [0, 1, 1, 1, 0, -1, -1, -1]
		# going through these we get all 8 possible adjacent nodes
		
		start.g_score = 0 # source distance is set to 0
    	f_score[start] = start.g_score + heuristics(start,finish)
		# first calc of heuristic function 
		# priority queue of nodes to be traversed
    	while len(f_score)>0:
    		#current = f_score.pop()
    		current = min(f_score, key = f_score.get) # node with smallest f_score
    		f_score.pop(current,None)
    		visited.append(current) # mark that node as visited

    		if current == finish: # last step is to print path 
    			path = [start]
    			while previous[current]:
    				path.append(current)
    				current = previous[current]
    			g.print_path(path)
    			print "Path found"

    		for direction in range(0,8): # all the nodes that touch the current node
    			new_x = current.x + dx[direction]
    			new_y = current.y + dy[direction]

    			if new_x < 0 or new_x >= self.size or new_y < 0 or new_y >= self.size: 
				# check out of range (off the edge)
    				continue # next if so

				# our node exists!
    			neighbor = self.grid[new_y][new_x]
				
				# Check if we've been to a node or if it is an obstacle
				# we do not want to visit nodes that we've already been to
				# a node is in visited after we pop it from f_score
				# a node is added to f_score...
    			if neighbor.obstacle or neighbor in visited or neighbor in f_score:
    				continue

				# getting path distances now
    			if direction % 2 == 1: # using direction? odd is diagonal because there is a 
				# 1 or -1 in both dx and dy at same time
    				tentative_g_score = current.g_score + 14  # traveled so far + distance to next node diagonal (technically sqrt(200)0
    			else:
    				tentative_g_score = current.g_score + 10 # traveled so far + distance to next node vertical or horizontal
				
				# If there is a new shortest path update our priority queue (relax)
				# is our new path going to be shorter than the older path found?
    			if tentative_g_score < neighbor.g_score:
					# if so then update how we go there, distance to there and add node to pq
					# to be eval later
    				previous[neighbor] = current
    				neighbor.g_score = tentative_g_score
    				f_score[neighbor] = neighbor.g_score + heuristics(neighbor,finish)


    def print_path(self, path):
    	for y in range(0,self.size):
    		for x in range(0,self.size):
    			if self.grid[y][x].obstacle:
    				print "X ",
    			elif self.grid[y][x] in path:
    				print "- ",
    			else:
    				print "0 ",
    		print  ""



g=Graph(8)
g.set_obstacle(1,1)
g.set_obstacle(2,2)
g.set_obstacle(3,3)
g.set_obstacle(4,4)
g.set_obstacle(5,5)
g.set_obstacle(6,6)
g.set_obstacle(2,1)
g.set_obstacle(3,2)
g.set_obstacle(4,3)
g.set_obstacle(5,4)
g.set_obstacle(6,5)
print g.shortest_path(0,2,6,3)

g = Graph(8)
g.set_obstacle(1,0)
g.set_obstacle(1,1)
g.set_obstacle(1,2)
g.set_obstacle(1,3)
g.set_obstacle(1,4)
g.set_obstacle(1,5)
g.set_obstacle(1,6)
print g.shortest_path(0,0,4,2)

All Pairs Shortest Path

Floyd–Warshall algorithm

Both Dijkstra's and Bellman–Ford solve SSSP. Get the shortest path from a single source. (Actually Dijkstra can solve for multiple sources, just set multiple nodes to have distance of 0, or connect those multiple sources to a single node of distance 0 using 0 weight edges).

Floyd-Warshall is slightly different. It computes all the possible shortest paths from each vertex to another. All Pairs Shortest Path. It works with negative weight edges.

Floyd–Warshall finds optimal path between each pair of vertices. It does so by incrementally improving an estimate on the shortest path between two vertices, until the estimate is optimal. Between each pair of vertices it tries each possible path between them to eventually get an optimal estimate of the shortest path between the two.

[1]

Example. Taking trips by bus costs money. But, turns out because gov wants to increase bus usage some routes can add to your bus credit rather than take away from. Vertices are bus stops / transfers. Weights of the edges are the monetary cost of the trip. Positive means you pay to take the trip. Negative means you can get paid to take that trip. Negative cycles are bad. In this case you could make money by travelling in the same circuit over and over again, which I'm sure the gov would want to avoid.

import heapq
import sys

class Edge(object):
    def __init__(self, u, v, w):
        self.source = u
        self.sink = v
	self.distance = w
    def __repr__(self):
        return "%s->%s:%s" % (self.source, self.sink, self.distance)
 
class Graph(object):
    def __init__(self):
        self.adj = {}
	self.muh_matrix = {}
	self.next = {} #trace back path
 
    def add_vertex(self, vertex):
        self.adj[vertex] = []
 
    def get_edges(self, v):
        return self.adj[v]
 
    def add_edge(self, u, v, w=0):
        if u == v:
            raise ValueError("u == v")
        edge = Edge(u,v,w)
        redge = Edge(v,u,w)
        edge.redge = redge  #redge is not defined in Edge class
        redge.redge = edge
        self.adj[u].append(edge)
        self.adj[v].append(redge)

    def to_matrix(self):
	for i in self.adj:
		self.muh_matrix[i] = {}
		self.next[i] = {}
		for j in self.adj:
			self.muh_matrix[i][j] = sys.maxsize
			self.muh_matrix[i][i] = 0
			self.next[i][j] = sys.maxsize
			for k in self.adj[i]: # gotta iterate to get proper weight which is a pain
				if k.sink == j:
					#print "%s %s %s"%(i, j, k.distance)
					self.muh_matrix[i][j] = k.distance
	for k in self.adj:
		for i in self.adj:
			for j in self.adj:
				if self.muh_matrix[i][k] + self.muh_matrix[k][j] < self.muh_matrix[i][j]:
					self.muh_matrix[i][j] = self.muh_matrix[i][k]+self.muh_matrix[k][j]
					self.next[i][j] = k
    def shortest_path(self, start, end):
		if self.muh_matrix[start][end] == sys.maxsize:
			return "no path"
		if start==end:
			return 0
		inter = self.next[start][end]
		if inter is sys.maxsize:
			return " "
		else:
			return self.shortest_path(start, inter) + inter + self.shortest_path(inter, end)
g=Graph()
map(g.add_vertex, ['A','B','C','D','E','F','G','H'])

g.add_edge('A','B',7)
g.add_edge('A','C',8)
g.add_edge('B','F',2)
g.add_edge('C','F',6)
g.add_edge('C','G',4)
g.add_edge('D','F',8)
g.add_edge('E','H',1)
g.add_edge('F','G',9)
g.add_edge('F','H',3)
g.to_matrix()
print g.shortest_path('A','H')
print g.muh_matrix['A']['H']


See more information on Dynamic_Programming_vs_Greedy_Algorithms#Floyd.E2.80.93Warshall_Algorithm