# Graphs

A tree is a subset of graphs. Lets start with trees.

Unlike arrays, which are linear data structures, trees are hierarchical (or non-linear) data structure.

**Nodes**

class Node { public int data; public Node nextLink; //Node constructor public Node(int d) { data = d; } //Print Node data public void printLink() { System.out.print("{" + data + "} "); } }

Nodes are used in Linked List as described previously here. Nodes store a data item and a link to one or more other nodes. This linking is implemented by the data structure that uses nodes. For Linked List there is only one link to another node. For graphs and trees one node can point to many other nodes.

In arrays each entry is indexed and can be written to or read directly. This is not the case for nodes. Getting to a specific node involves starting from one node and traversing the links until you find the target node.

Nodes are described by their relations between each other which means they can be considered recursive data types.

## Contents

## Linked List

See also previous page on LinkedList. A Linked list is defined by the nodes it contains. Linked List nodes holds the data and a pointer to one other node. Having only one link means there is only one direction to traverse a Linked List.

Actually there is type of linked list called a doubly linked list in which node A links to node B and then node B links back to node A in addition to B linking to a new node C. This means it is possible to go backwards in the list and go from B to A.

## Trees

Trees are made up of nodes and are defined by the relationships between these nodes. They are hierarchical. Parent nodes link "down" to children nodes. Nodes without children are called "leaf".

- There exactly one root node. This root node has no parent.

- All nodes except for the root has exactly one parent.

- There are no cycles. There is only one path from the root to any other node. (A path is a series of links).

## Binary Trees

A binary tree is a special type of tree where every node has up to two children.

Node making up a binary tree.

private static class Node<T> // a node can store any type of data item. so we use generics here <T> { public Node<T> left; public Node<T> right; public T data; public Node(T data) { this.data = data; } public Node<T> getLeft() { return this.left; } public void setLeft(Node<T> left) { this.left = left; } public Node<T> getRight() { return this.right; } public void setRight(Node<T> right) { this.right = right; } }

Then we can do things with these nodes like for example:

Drawing this: 60 / \ 40 70 / \ / \ 20 50 65 90 Node<Integer> x60 = new Node<Integer>(60); Node<Integer> x40 = new Node<Integer>(40); Node<Integer> x70 = new Node<Integer>(70); Node<Integer> x20 = new Node<Integer>(20); Node<Integer> x50 = new Node<Integer>(50); Node<Integer> x65 = new Node<Integer>(65); Node<Integer> x90 = new Node<Integer>(90); x60.setLeft(x40); x60.setRight(x70); x40.setLeft(x20); x40.setRight(x50); x70.setLeft(x65); x70.setRight(x90);

This basic Node class can be inherited into a BinaryTree class where the tree class can implement a methods like setting a root node and clearing root node.

(Probably should use a LinkedList of Nodes so that a Node can have a variable amount of children - would require methods like inserting, deletion and retrieval for the childs of a node).

A BinarySearchTree class would enforce having a most two children and require methods for insertion, deletion and retrieval because those processes are more strict for a Binary Search Tree in order to preserve its property. (namely that if you have a node in a BST then the values in the left subtree must be less than the node, and the values of the right subtree must be greater than the node). All nodes must be comparable to each other, and the values must all be unique. This property allows searches to run much faster than say, in a sorted or unsorted array. O(log n) vs O(n). At each step the number of nodes to check is cut by half. Of course if all the nodes in a BST were connected all in a line like in a LinkedList then the run time for a search would be no better than for a typical array. Keeping balance is important to ensure BST faster than array, can do this automatically.

## Tree Traversal

Given that trees are recursive data structures, they are more complicated to traverse compared to linear data structures like arrays. The complexity comes from the fact that each node has multiple points and the decision has to be made to pick one. Tree traversal aims to visit each node once and only once in a systematic order.

There are two main methods in tree traversal. Depth first search and Breadth first search.

Depth First Search. Starts with a node, and then you go deep. You go deep by following the child nodes. You go to the node's first child node, then you go to that child's node. This is recursion. It ends when you get to a leaf which is a node without children. Then you backtrack and try to go deep again and this repeats until all the nodes have been visited.

Breadth First Search. Starts with a node, then you queue up all the children of that node. Then you take node at the front of the queue which is the oldest node in the queue. Remove it, and add its children to the back of the queue. Consistently taking the oldest value, and adding the newer nodes to the end of the queue ensures that you will finish one level before moving to the next. This is breadth first search. You want to cast a wide net before going down to the next level.

DFS - Stacks - each recursive method call places the method onto a call stack. this is handled by the operating system. A method is removed from the stack when all the nodes below it have been visited. BFS - Queues - each node is iterated through by removing it from the start of the queu

There are three types of DFS traversal. They differ in the order of the operations executed.

Preorder traversal: visit root, visit left subtree, visit right subtree. Inorder traversal: visit left subtree, visit root, visit right subtree. Postorder traversal: visit left subtree, visit right subtree, visit root.

Level order is BFS. It works iteratively finishing one level before moving on to the next and uses a Queue to keep track of the nodes to be visited. The next node to be visited is removed from the front of the queue, and its child nodes are added to the end. This ensures you finish going through one level before starting another. Queue is an abstract data type (ADT) that is implemented with a LinkedList.

Here is an illustration of it: 60 / \ 40 70 / \ / \ 20 50 65 90 Pre-order traversal: Visit the root. Traverse the left subtree. Traverse the right subtree. i.e.: 60 40 20 50 70 65 90 In-order traversal: Traverse the left subtree. Visit the root. Traverse the right subtree. i.e.: 20 40 50 60 65 70 90 Post-order traversal: Traverse the left subtree. Traverse the right subtree. Visit the root. i.e.: 20 50 40 65 90 70 60 Level-order traversal: i.e.:60 40 70 20 50 65 90 Similarly to LinkedList there is no need to define a size initially as each node references another node. Note how inorder gives a sorted order of elements. Tree sort uses this. Treesort: Average case Worst case O(n*logn) O(n^2) Remember, all sorting algorithms have to run in at least O(n) time.

Another thing about inorder traversal is that, if you save a graph in this format (serializing) you can use that to restore a binary search tree that is minimum height. A minimum height binary tree has as many nodes as possible at each level except the last. Doing this using

AVL?

preorder traversal gets you the original form of the the graph.

Example code:

import java.util.Queue; // The data structure behind Level order traversal import java.util.LinkedList; //Queue implemented as a LinkedList public class TreeTraverse { private static class Node<T> // a node can store any type of data item. so we use generics here <T> { public Node<T> left; public Node<T> right; public T data; public Node(T data) { this.data = data; } public Node<T> getLeft() { return this.left; } public void setLeft(Node<T> left) { this.left = left; } public Node<T> getRight() { return this.right; } public void setRight(Node<T> right) { this.right = right; } } public static void preorder(Node<?> n) { if (n != null) { System.out.print(n.data + " "); preorder(n.getLeft()); preorder(n.getRight()); } } public static void inorder(Node<?> n) { if (n != null) { inorder(n.getLeft()); System.out.print(n.data + " "); inorder(n.getRight()); } } public static void postorder(Node<?> n) { if (n != null) { postorder(n.getLeft()); postorder(n.getRight()); System.out.print(n.data + " "); } } public static void levelorder(Node<?> n) { Queue<Node<?>> nodequeue = new LinkedList<Node<?>>(); if (n != null) nodequeue.add(n); while (!nodequeue.isEmpty()) { Node<?> next = nodequeue.remove(); System.out.print(next.data + " "); if (next.getLeft() != null) { nodequeue.add(next.getLeft()); } if (next.getRight() != null) { nodequeue.add(next.getRight()); } } } public static void main(final String[] args) { Node<Integer> x60 = new Node<Integer>(60); Node<Integer> x40 = new Node<Integer>(40); Node<Integer> x70 = new Node<Integer>(70); Node<Integer> x20 = new Node<Integer>(20); Node<Integer> x50 = new Node<Integer>(50); Node<Integer> x65 = new Node<Integer>(65); Node<Integer> x90 = new Node<Integer>(90); x60.setLeft(x40); x60.setRight(x70); x40.setLeft(x20); x40.setRight(x50); x70.setLeft(x65); x70.setRight(x90); preorder(x60); System.out.println(); inorder(x60); System.out.println(); postorder(x60); System.out.println(); levelorder(x60); System.out.println(); } }

Sometime trees can get really complicated and it is not even possible go through it entirely (pratically speaking), for example game trees, so other traversal methods must be used. Imagine a chess computer. It plays by brute forcing every possible move. The nodes represent a possible game state, and the edges represent a legal move. All chess games start out in the same way. This state forms the root node. White goes first with 20 possible moves (each of the 8 pawns can be moved into 2 possible positions, and either one of the two knights can be moved into 2 possible positions). This means the root node has 20 children. Then black goes with 20 possible moves. This means that each of the 20 nodes representing white's first move has 20 children which represent black's first move. After white plays his first turn there are 20 possible states the board can be in, and after black plays his first turn there are 20*20 = 400 possible states the board can be in. If you were to calculate the entire tree for all possible chess games you would have a tree with 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, or 10^120 nodes, give or take a few. For comparision, there have been about 10^26 nanoseconds since the Big Bang. There are thought to be only 10^75 atoms in the entire universe. No chess computer would or could calculate this number of moves. Instead what they do is calculate only 5, 10 or 20 moves into the future. This is Iterative deepening depth-first search. It performs a depth first search up to a certain depth. This is not a perfect solution, there is a flaw known as the horizon effect. Lets say the computer looks 6 moves a head, and then plays its best move. Then it turns out that if it had just looked 8 moves a head it could have played a better move.

## Binary Search Tree

A binary search tree is a tree that has two defining properties. Each node has at most two child nodes, hence "binary" and it is ordered in a way that makes search easier. Each node has value, and every node in its left subtree has a value less than it. And every node to the right subtree has a value greater than it. Duplicate values are not allowed.

Every binary search tree has a root node. And new nodes can be inserted systematically, there is no need to manually define and then link the nodes. And this process preserves the binary search property.

It is this property that makes binary search trees much faster for search applications than other data structures. Because of its 2D structure, values can be skipped to get closer to the target value quicker. Other, linear, data structures involve going through each value one by one and comparing it.

Imagine finding a number in a phone book given a name. You don't methodically go through As and Bs then Cs. You open the book to half way, figure out if you what you are looking is in the first half or second half, and then you take that half and recursively search again. until you find what you are looking for. If your contact starts with a letter in M - Z, there is no need to look through A - M. A BST allows you to do this. Comparisions lets to skip unnecsssariy nodes allowing for a faster search. This is why the operations of a tree are typically O(log n). log n is smaller than n (graph it).

60 / \ 40 70 / \ / \ 20 50 65 90 The root is 60. The child nodes of the root form the roots of other trees. In the above example, 40 and 70 form the roots of their own tree. And note that 20 is less than 40 and 50 is greater 40. 65 is less than 70 and 90 is greater than 70.

Example code:

/* Because of this comparision operation, a BST needs to be sensitive to the type of data it holds. This starts with putting a generic on the class public class BinarySearchTree<T extends Integer> The <T extends Integer> allows an Integer type or a subclass of it. There exists a constructor because of this need to enforce a data type. Method relating to retrieving, inserting and deleting are specific to that instance of a BST, so these methods are not static. Of course the traversal is independent of the data type so the remain same as before. Traversal methods are left static. a compareTo method is used to compare to different objects. Each object should have a key that can be used for comparisons. For example an addressbook BST may be made up of Contact objects. A Contact has a full name and phone number. Every person's name is unique (for this purposes anyway) and a typical operation of an address book is to return a phone number given a name. there for, the key should be the String representing the name. It may be necessary to invoke a getKey() method to return this. Inserting an item in the BST invovles traversing to the right place. It invovles a recursive call similar to a traversal. The base cases is when null is reached. At each node a decision is made whether to go left child or right child. This decision is based on the data in the current node and the data that is to be inserted. 60 / \ 40 70 / \ / \ 20 50 65 90 / \ / \ 10 30 45 55 EXAMPLE DELETE 40 Remember, in BST for each node all the nodes of the left child have to be less than it and all the nodes to the right child have to be greater than it. To delete a node with 0 children, simply mark that node as null. For a node with one child, replace the node with the child. For a node with two children, we need to pick a replacement value. A good candidate for the replacement value is the value most closests to it. This is either the greatest value found in the values that are lower than it. The the lowest value found in the values that are greater than it. Or, in tree terminology, we replace the value with the rightmost value in the left subtree, or the leftmost value in the right subtree. Take a look at 40. All the values forming the left subtree have values less than 40. all the values forming the right subtree have the values greater than 40. to properly replace 40, we replace this values with either the greatest (right-most) value found in the left subtree. or the smallest (left-most) value in the right subtree. Lets choose the replacement value to be the smallest value found in right subtree. Finding the smallest value in the right subtree simply involves taking the root of the subtree and recursively finding the left child until there isn't one. This node either has one right child or no child. To delete this node simply set it to be the right child. Either there is a right child which will take the node's place, or there isn't meaning the node will be marked null. In either case the node has been deleted. */ import java.util.Queue; import java.util.LinkedList; public class BinarySearchTree<T extends Integer> // need to be more specific about the object type we are dealing with here because we need to know for comparisons. // In this case we know that we are dealing with Integer objects or a subclass of them. { protected Node<T> root; public BinarySearchTree(){ root = null; } public BinarySearchTree(T rootItem){ root = new Node<T>(rootItem); } public void setRootItem(T newItem) { root = new Node<T>(newItem); } public T getRootItem() { return root.getItem(); } public Node<T> getRoot() { return root; } public void insert(T newItem){ root = insertItem(root, newItem); } public T retrieve(T searchKey){ return retrieveItem(root, searchKey); } public void delete(T searchKey) throws Exception{ root = deleteItem(root, searchKey); } protected Node<T> insertItem(Node<T> tNode, T newItem){ Node<T> newSubtree; if (tNode == null){ //position of insertion found //insert after leaf //create new node tNode = new Node<T>(newItem); return tNode; } T nodeItem = tNode.getItem(); //search for the insertion position if(newItem.compareTo(nodeItem) < 0){ //search left subtree newSubtree = insertItem(tNode.getLeft(), newItem); tNode.setLeft(newSubtree); return tNode; } else { newSubtree = insertItem(tNode.getRight(), newItem); tNode.setRight(newSubtree); return tNode; } } protected T retrieveItem(Node<T> tNode, T searchKey){ T treeItem; if (tNode == null){ treeItem = null; } else { T nodeItem = tNode.getItem(); if (searchKey.compareTo(nodeItem)==0){ //item is in root of some subtree treeItem = tNode.getItem(); } else if (searchKey.compareTo(nodeItem)<0){ //search left subtree treeItem = retrieveItem(tNode.getLeft(), searchKey); }else{ //search right subtree treeItem = retrieveItem(tNode.getRight(), searchKey); } } return treeItem; } protected Node<T> deleteItem(Node<T> tNode, T searchKey) throws Exception{ Node<T> newSubtree; if(tNode == null){ throw new Exception("TreeException: Item not found"); }else{ T nodeItem = tNode.getItem(); if (searchKey.compareTo(nodeItem) == 0){ //root tNode = deleteNode(tNode); }else if(searchKey.compareTo(nodeItem) < 0){ //left newSubtree = deleteItem(tNode.getLeft(), searchKey); tNode.setLeft(newSubtree); }else{ //right newSubtree = deleteItem(tNode.getRight(), searchKey); tNode.setRight(newSubtree); } } return tNode; } protected Node<T> deleteNode(Node<T> tNode){ // Algorithm note: There are four cases to consider: // 1. The tNode is a leaf. // 2. The tNode has no left child. // 3. The tNode has no right child. // 4. The tNode has two children. // Calls: findLeftmost and deleteLeftmost T replacementItem; if( (tNode.getLeft()==null) && (tNode.getRight()==null)){ //leaf return null; } else if(tNode.getLeft()==null){ return tNode.getRight(); } else if(tNode.getRight()==null){ return tNode.getLeft(); } else{ replacementItem = findLeftmost(tNode.getRight()); tNode.setItem(replacementItem); tNode.setRight(deleteLeftmost(tNode.getRight())); return tNode; } } protected T findLeftmost(Node<T> tNode){ if(tNode.getLeft()==null){ return tNode.getItem(); } else{ return findLeftmost(tNode.getLeft()); } } protected Node<T> deleteLeftmost(Node<T> tNode){ if(tNode.getLeft()==null){ return tNode.getRight(); } else{ tNode.setLeft(deleteLeftmost(tNode.getLeft())); return tNode; } } protected int getHeight(){ return getHeight(root, 0); } protected int getHeight(Node<?> root, int startHeight){ if (root==null) { return startHeight; } int leftHeight = getHeight(root.getLeft(), startHeight+1); int rightHeight = getHeight(root.getRight(), startHeight+1); if (leftHeight>rightHeight) { return leftHeight; }else { return rightHeight; } } private static class Node<T> { public Node<T> left; public Node<T> right; public T data; public Node(T data) { this.data = data; } public Node<T> getLeft() { return this.left; } public void setLeft(Node<T> left) { this.left = left; } public Node<T> getRight() { return this.right; } public void setRight(Node<T> right) { this.right = right; } public T getItem() { return data; } public void setItem(T data) { this.data = data; } public T getKey() { return data; } } public static void preorder(Node<?> n) { if (n != null) { System.out.print(n.data + " "); preorder(n.getLeft()); preorder(n.getRight()); } } public static void inorder(Node<?> n) { if (n != null) { inorder(n.getLeft()); System.out.print(n.data + " "); inorder(n.getRight()); } } public static void postorder(Node<?> n) { if (n != null) { postorder(n.getLeft()); postorder(n.getRight()); System.out.print(n.data + " "); } } public static void levelorder(Node<?> n) { Queue<Node<?>> nodequeue = new LinkedList<Node<?>>(); if (n != null) nodequeue.add(n); while (!nodequeue.isEmpty()) { Node<?> next = nodequeue.remove(); System.out.print(next.data + " "); if (next.getLeft() != null) { nodequeue.add(next.getLeft()); } if (next.getRight() != null) { nodequeue.add(next.getRight()); } } } public static void main(final String[] args) { BinarySearchTree<Integer> tree = new BinarySearchTree<Integer>(100); tree.insert(10); tree.insert(20); tree.insert(30); tree.insert(40); tree.insert(140); tree.insert(130); tree.insert(120); tree.insert(110); tree.insert(80); tree.insert(115); tree.insert(90); tree.insert(125); preorder(tree.getRoot()); System.out.println(); inorder(tree.getRoot()); System.out.println(); postorder(tree.getRoot()); System.out.println(); levelorder(tree.getRoot()); System.out.println(); System.out.println("The height of the binary search tree is "+tree.getHeight()+"."); } }

Sample iterative implementation (as opposed to recursive). The problem with iterating recursively is that it can't just stop at each process. Recursive functions are usually sub optimal.

Add(T data) { // create a new Node instance BinaryTreeNode<T> n = new BinaryTreeNode<T>(data); int result; // now, insert n into the tree // trace down the tree until we hit a NULL BinaryTreeNode<T> current = root, parent = null; while (current != null) { result = comparer.Compare(current.Value, data); if (result == 0) // they are equal - attempting to enter a duplicate - do nothing return; else if (result > 0) { // current.Value > data, must add n to current's left subtree parent = current; current = current.Left; } else if (result < 0) { // current.Value < data, must add n to current's right subtree parent = current; current = current.Right; } } // We're ready to add the node! count++; if (parent == null) // the tree was empty, make n the root root = n; else { result = comparer.Compare(parent.Value, data); if (result > 0) // parent.Value > data, therefore n must be added to the left subtree parent.Left = n; else // parent.Value < data, therefore n must be added to the right subtree parent.Right = n; } } public bool Remove(T data) { // first make sure there exist some items in this tree if (root == null) return false; // no items to remove // Now, try to find data in the tree BinaryTreeNode<T> current = root, parent = null; int result = comparer.Compare(current.Value, data); while (result != 0) { if (result > 0) { // current.Value > data, if data exists it's in the left subtree parent = current; current = current.Left; } else if (result < 0) { // current.Value < data, if data exists it's in the right subtree parent = current; current = current.Right; } // If current == null, then we didn't find the item to remove if (current == null) return false; else result = comparer.Compare(current.Value, data); } // At this point, we've found the node to remove count--; // We now need to "rethread" the tree // CASE 1: If current has no right child, then current's left child becomes // the node pointed to by the parent if (current.Right == null) { if (parent == null) root = current.Left; else { result = comparer.Compare(parent.Value, current.Value); if (result > 0) // parent.Value > current.Value, so make current's left child a left child of parent parent.Left = current.Left; else if (result < 0) // parent.Value < current.Value, so make current's left child a right child of parent parent.Right = current.Left; } } // CASE 2: If current's right child has no left child, then current's right child // replaces current in the tree else if (current.Right.Left == null) { current.Right.Left = current.Left; if (parent == null) root = current.Right; else { result = comparer.Compare(parent.Value, current.Value); if (result > 0) // parent.Value > current.Value, so make current's right child a left child of parent parent.Left = current.Right; else if (result < 0) // parent.Value < current.Value, so make current's right child a right child of parent parent.Right = current.Right; } } // CASE 3: If current's right child has a left child, replace current with current's // right child's left-most descendent else { // We first need to find the right node's left-most child BinaryTreeNode<T> leftmost = current.Right.Left, lmParent = current.Right; while (leftmost.Left != null) { lmParent = leftmost; leftmost = leftmost.Left; } // the parent's left subtree becomes the leftmost's right subtree lmParent.Left = leftmost.Right; // assign leftmost's left and right to current's left and right children leftmost.Left = current.Left; leftmost.Right = current.Right; if (parent == null) root = leftmost; else { result = comparer.Compare(parent.Value, current.Value); if (result > 0) // parent.Value > current.Value, so make leftmost a left child of parent parent.Left = leftmost; else if (result < 0) // parent.Value < current.Value, so make leftmost a right child of parent parent.Right = leftmost; } } return true; }

### Analysis

There are two ways to go forward, left child or right child. This compares to array in which only one way to go forward, just use next index. In the worst case scenario only one of the child is used. For example, all nodes are left child of each another. This makes height of the tree equal to the number of elements, like in the case of the LinkedList. This means that operations work in O(n) time. More typically however, we use both childs in storing our data. So for each node, a decision is to be made whether to go with the left child or right child. This cuts down on the number of elements to go through. Operations work in O(log n) time now.

Average case Worst case Retrieval O(log n) O(n) Insertion O(log n) O(n) Insertion O(log n) O(n) Traversal O(n) O(n) Traversal goes through each element so it has to be O(n). But the other operations can skip some elements.

Tree terminology full, all nodes have 0 or 2 children perfect, full, every parent has 2 children, leaves at same depth complete, full until h - 1, h filled from left to right. last level not completely full. balanced, every left and right subtree differs in height by 0 or 1 full => perfect => complete => balanced

Balance property. Restore from inorder traversal 60 / \ 40 70 / \ 20 50 20 40 50 60 70 50 40 60 20 70

## AVL Trees

Maintain balance automatically.

A binary tree limits each node to no more than two children. A binary search tree or BST is a binary tree with nodes arranged in a way that for every node n, each value in n's left subtree is less than the value of n, and each value in n's right subtree is greater than the value of n. This property allows for logn asymptotic time for inserts, deletes and searches (faster than arrays). This is because for each step down from the top of the tree the number of nodes to check is cut in half. The problem is that if each node is inserted into the BST in such a way that a linear line in formed (for example by inserting the values in an already sorted order) then the performance of the BST is no better than an array.

To fix this issue we need to balance the tree. Balancing a tree means ensuring that the difference in heights between any two subtrees is not greater than one. This means that the number of leaves are maximised leading to a shorter distance from the root to any leaf.

How it works is that every time an operation changes a tree, a balance is taken, and is rotation is taken if necessary. A balanced tree operates more efficiently than an out of balance one.

An AVL Tree is one type of self balancing binary search tree. diagram: http://en.wikipedia.org/wiki/AVL_tree

The balance of a tree is the difference in the heights of the two child sub trees. Every time an insert or delete operation takes place the balance of the tree is taken. If this is greater than one, rebalancing occurs.

Rotation rearranges nodes of a tree while still keeping the binary search tree property - every value in the left subtree of a node is less than the its value. every value in right subtree of node is greater that its value.

/* http://en.wikipedia.org/wiki/File:AVL_Tree_Rebalancing.svg Consider the left -right case. imagine that 4 has just been inserted. and 5 is complaining that its left side is too heavy! first 3 and 4 need to swap places. currently, 4 is attached as the right child of 3 because 4 is greater than 3. if these two are going to swap placed and preserve the bst order then 3 needs to be attached as the left child of 4 because 3 is less than 4. note that A can stay attached to where it is because A is less than 3 and it is less than anything greater than 3 and will always be. similarly with C, C is greater than 4 and is greater than anything less than 4, and will alway be greater. B is a whole another story. Currently it is greater than 3, but less than 4. It has to move because 4's left child will now be 3. B will now move from being 4's left child to being 3's right child. Previously B was a value greater than 3 and less than 4. Now it remains a value that is less than 4 and greater than 3. and both statements mean the same thing. now its time for the right rotate. the recently inserted value of 4 continues on rotating. now with 5. currently 4 and 5 are arranged in a way that suggests 4 is less than 5 because 4 is the left child of 5. these two need to be rearrange to say that 5 is greater than 4 because 5 is the right child of 4. so that rearrangement is made. and 3 can remain as it is as the left child of 4 because 3 is less than both 4 and 5. D can remain as it is because is it greater than both 4 and 5. C needs to change ever slightly from being the left child's right node to being the right child's left node. in other words it was previously where is was be cause it was less than5 and greater than4. now it is in somewhere where it is greater than4 and less than 5. after all of this. the tree is balance because for each node the height of one subtree is not sig greater than the other. and the root has changed. AVL tree Average Worst case Space O(n) O(n) Search O(log n) O(log n) Insert O(log n) O(log n) Delete O(log n) O(log n) Binary search tree Average Worst case Space O(n) O(n) Search O(log n) O(n) Insert O(log n) O(n) Delete O(log n) O(n) Worst case in BST is so because it could have all its values chained in a single line (like linkedlist) */ import java.util.Queue; import java.util.LinkedList; public class AVLTree<T extends Integer> { protected Node<T> root; public AVLTree(){ root = null; } public AVLTree(T rootItem){ root = new Node<T>(rootItem); } public void setRootItem(T newItem) { root = new Node<T>(newItem); } public Node<T> getRoot() { return root; } public void insert(T newItem){ root = insertItem(root, newItem); } public T retrieve(T searchKey){ return retrieveItem(root, searchKey); } public void delete(T searchKey) throws Exception{ root = deleteItem(root, searchKey); } public int getBalance(Node<T> tree) { if (tree!= null) { int x = 0, y = 0; if (tree.getLeft() != null) { x = tree.getLeft().getHeight(); } if (tree.getRight() != null) { y = tree.getRight().getHeight(); } return (x-y); } return 0; } protected Node<T> insertItem(Node<T> tNode, T newItem){ System.out.println("insert item"); Node<T> newSubtree; if (tNode == null){ //position of insertion found //insert after leaf //create new node tNode = new Node<T>(newItem); return tNode; } System.out.println("inserting:"+newItem+" at loc:"+tNode.getItem()); T nodeItem = tNode.getItem(); //search for the insertion position if(newItem.compareTo(nodeItem) < 0){ //search left subtree newSubtree = insertItem(tNode.getLeft(), newItem); tNode.setLeft(newSubtree); } else { newSubtree = insertItem(tNode.getRight(), newItem); tNode.setRight(newSubtree); } // find out the height of the node formed int a = 0, b = 0; if (tNode.getLeft() != null) { a = tNode.getLeft().getHeight(); } if (tNode.getRight() != null) { b = tNode.getRight().getHeight(); } tNode.setHeight((a>b?a:b)+1); int balance = getBalance(tNode); System.out.println (newItem + " " + balance); // Left Left Case if (balance > 1 && newItem < tNode.getLeft().getItem()) { //System.out.println ("left-left case : going for right : " + newItem + " " + tNode.getRight().getItem()); System.out.println("left-left"); return rightRotate(tNode); } // Right Right Case if (balance < -1 && newItem > tNode.getRight().getItem()) { //System.out.println ("right-right case : going for left : " + newItem + " " + tNode.getRight().getItem()); System.out.println("right-right"); return leftRotate(tNode); } // Left Right Case if (balance > 1 && newItem > tNode.getLeft().getItem()) { //System.out.println ("left-right case : going for left and then right : " + newItem + " " + tNode.getRight().getItem()); System.out.println("left-right"); tNode.setLeft( leftRotate(tNode.getLeft())); return rightRotate(tNode); } // Right Left Case if (balance < -1 && newItem < tNode.getRight().getItem()) { //System.out.println ("right-left case : going for right and then left : " + newItem + " " + tNode.getRight().getItem()); System.out.println("right-left"); tNode.setRight(rightRotate(tNode.getRight())); return leftRotate(tNode); } return tNode; } protected Node<T> rightRotate(Node<T> x) { System.out.println("right-rotate"); System.out.println("x:"+x.getItem()); Node<T> y = x.getLeft(); System.out.println("y:"+y.getItem()); Node<T> tmp = y.getRight(); //right rotation being done y.setRight(x); x.setLeft(tmp); //update the height of x node of AVL Tree int a = 0, b = 0; if(x.getLeft() != null) a = x.getLeft().getHeight(); if(x.getRight() != null) b = x.getRight().getHeight(); x.setHeight( (a > b ? a : b) + 1); //update the height of y node of AVL Tree a = 0; b = 0; if(y.getLeft() != null) a = y.getLeft().getHeight(); if(y.getRight() != null) b = y.getRight().getHeight(); y.setHeight( (a > b ? a : b) + 1); // y is the head of new AVL tree to be returned return y; } protected Node<T> leftRotate(Node<T> x) { System.out.println("left-rotate"); System.out.println("x:"+x.getItem()); Node<T> y = x.getRight(); System.out.println("y:"+y.getItem()); Node<T> tmp = y.getLeft(); //left rotation being done y.setLeft(x); x.setRight(tmp); //update the height of x node of AVL Tree int a = 0, b = 0; if(x.getLeft() != null) a = x.getLeft().getHeight(); if(x.getRight() != null) b = x.getRight().getHeight(); x.setHeight( (a > b ? a : b) + 1); //update the height of y node of AVL Tree a = 0; b = 0; if(y.getLeft() != null) a = y.getLeft().getHeight(); if(y.getRight() != null) b = y.getRight().getHeight(); y.setHeight( (a > b ? a : b) + 1); // y is the head of new AVL tree to be returned return y; } protected T retrieveItem(Node<T> tNode, T searchKey){ T treeItem; if (tNode == null){ treeItem = null; } else { T nodeItem = tNode.getItem(); if (searchKey.compareTo(nodeItem)==0){ //item is in root of some subtree treeItem = tNode.getItem(); } else if (searchKey.compareTo(nodeItem)<0){ //search left subtree treeItem = retrieveItem(tNode.getLeft(), searchKey); }else{ //search right subtree treeItem = retrieveItem(tNode.getRight(), searchKey); } } return treeItem; } protected Node<T> deleteItem(Node<T> tNode, T searchKey) throws Exception{ Node<T> newSubtree; if(tNode == null){ throw new Exception("TreeException: Item not found"); }else{ T nodeItem = tNode.getItem(); if (searchKey.compareTo(nodeItem) == 0){ //root tNode = deleteNode(tNode); }else if(searchKey.compareTo(nodeItem) < 0){ //left newSubtree = deleteItem(tNode.getLeft(), searchKey); tNode.setLeft(newSubtree); }else{ //right newSubtree = deleteItem(tNode.getRight(), searchKey); tNode.setRight(newSubtree); } } // find out the height of the node formed int a = 0, b = 0; if (tNode.getLeft() != null) { a = tNode.getLeft().getHeight(); } if (tNode.getRight() != null) { b = tNode.getRight().getHeight(); } tNode.setHeight((a>b?a:b)); int balance = getBalance(tNode); System.out.println (searchKey + " " + balance); // Left Left Case if (balance > 1 && searchKey < tNode.getLeft().getItem()) { //System.out.println ("left-left case : going for right : " + newItem + " " + tNode.getRight().getItem()); System.out.println("left-left"); return rightRotate(tNode); } // Right Right Case if (balance < -1 && searchKey > tNode.getRight().getItem()) { //System.out.println ("right-right case : going for left : " + newItem + " " + tNode.getRight().getItem()); System.out.println("right-right"); return leftRotate(tNode); } // Left Right Case if (balance > 1 && searchKey > tNode.getLeft().getItem()) { //System.out.println ("left-right case : going for left and then right : " + newItem + " " + tNode.getRight().getItem()); System.out.println("left-right"); tNode.setLeft( leftRotate(tNode.getLeft())); return rightRotate(tNode); } // Right Left Case if (balance < -1 && searchKey < tNode.getRight().getItem()) { //System.out.println ("right-left case : going for right and then left : " + newItem + " " + tNode.getRight().getItem()); System.out.println("right-left"); tNode.setRight(rightRotate(tNode.getRight())); return leftRotate(tNode); } return tNode; } protected Node<T> deleteNode(Node<T> tNode){ // Algorithm note: There are four cases to consider: // 1. The tNode is a leaf. // 2. The tNode has no left child. // 3. The tNode has no right child. // 4. The tNode has two children. // Calls: findLeftmost and deleteLeftmost T replacementItem; if( (tNode.getLeft()==null) && (tNode.getRight()==null)){ //leaf return null; } else if(tNode.getLeft()==null){ return tNode.getRight(); } else if(tNode.getRight()==null){ return tNode.getLeft(); } else{ replacementItem = findLeftmost(tNode.getRight()); tNode.setItem(replacementItem); tNode.setRight(deleteLeftmost(tNode.getRight())); return tNode; } } protected T findLeftmost(Node<T> tNode){ if(tNode.getLeft()==null){ return tNode.getItem(); } else{ return findLeftmost(tNode.getLeft()); } } protected Node<T> deleteLeftmost(Node<T> tNode){ if(tNode.getLeft()==null){ return tNode.getRight(); } else{ tNode.setLeft(deleteLeftmost(tNode.getLeft())); return tNode; } } protected int getHeight(){ return getHeight(root, 0); } protected int getHeight(Node<?> root, int startHeight){ if (root==null) { return startHeight; } int leftHeight = getHeight(root.getLeft(), startHeight+1); int rightHeight = getHeight(root.getRight(), startHeight+1); if (leftHeight>rightHeight) { return leftHeight; }else { return rightHeight; } } private static class Node<T> { public Node<T> left; public Node<T> right; public T data; public int height; private Node(){ } public Node(T data) { this.data = data; height = 1; } public Node<T> getLeft() { return this.left; } public void setLeft(Node<T> left) { this.left = left; } public Node<T> getRight() { return this.right; } public void setRight(Node<T> right) { this.right = right; } public T getItem() { return data; } public void setItem(T data) { this.data = data; } public T getKey() { return data; } public int getHeight() { return height; } public void setHeight(int height) { this.height = height; } } public static void preorder(Node<?> n) { if (n != null) { System.out.print(n.data + " "); preorder(n.getLeft()); preorder(n.getRight()); } } public static void inorder(Node<?> n) { if (n != null) { inorder(n.getLeft()); System.out.print(n.data + " "); inorder(n.getRight()); } } public static void postorder(Node<?> n) { if (n != null) { postorder(n.getLeft()); postorder(n.getRight()); System.out.print(n.data + " "); } } public static void levelorder(Node<?> n) { Queue<Node<?>> nodequeue = new LinkedList<Node<?>>(); if (n != null) nodequeue.add(n); while (!nodequeue.isEmpty()) { Node<?> next = nodequeue.remove(); System.out.print(next.data + " "); if (next.getLeft() != null) { nodequeue.add(next.getLeft()); } if (next.getRight() != null) { nodequeue.add(next.getRight()); } } } public static void main(final String[] args) { AVLTree<Integer> tree = new AVLTree<Integer>(100); System.out.println("root 100"); preorder(tree.getRoot()); System.out.println(); inorder(tree.getRoot()); System.out.println(); postorder(tree.getRoot()); System.out.println(); levelorder(tree.getRoot()); System.out.println(); System.out.println("inserting 10"); tree.insert(10); preorder(tree.getRoot()); System.out.println(); inorder(tree.getRoot()); System.out.println(); postorder(tree.getRoot()); System.out.println(); levelorder(tree.getRoot()); System.out.println(); System.out.println("inserting 20"); tree.insert(20); preorder(tree.getRoot()); System.out.println(); inorder(tree.getRoot()); System.out.println(); postorder(tree.getRoot()); System.out.println(); levelorder(tree.getRoot()); System.out.println(); System.out.println("inserting 15"); tree.insert(15); preorder(tree.getRoot()); System.out.println(); inorder(tree.getRoot()); System.out.println(); postorder(tree.getRoot()); System.out.println(); levelorder(tree.getRoot()); System.out.println(); System.out.println("deleting 20"); try{tree.delete(20);}catch(Exception e){e.toString();} preorder(tree.getRoot()); System.out.println(); inorder(tree.getRoot()); System.out.println(); postorder(tree.getRoot()); System.out.println(); levelorder(tree.getRoot()); System.out.println(); System.out.println("The height of the binary search tree is "+tree.getHeight()+"."); } }

## Red Black Trees

Are another form of self balancing binary search trees.

http://alaning.me:8080/root/RedBlackTree

Red-black trees are trees that have the following four properties: - Every node is colored either red or black. - Every NIL node is black. - If a node is red, then both of its children are black. - Every path from a node to a descendant leaf contains the same number of black nodes.

The first three properties are pretty self-explanatory. The fourth property, which is the most important of the four, simply states that starting from any node in the tree, the number of black nodes from that node to any leaf (NIL), must be the same. In Figure 7, take the root node as an example. Starting from 41 and going to any NIL, you will encounter the same number of black nodes—3. For example, taking a path from 41 to the left-most NIL node, we start on 41, a black node. We then travel down to node 9, then node 2, which is also black, then node 1, and finally the left-most NIL node. In this journey we encountered three black nodes—41, 2, and the final NIL node. In fact, if we travel from 41 to any NIL node, we'll always encounter precisely three black nodes.

Like the AVL tree, red-black trees are another form of self-balancing binary search tree. Whereas the balance property of an AVL tree was explicitly stated as a relationship between the heights of each node's left and right subtrees, red-black trees guarantee their balance in a more conspicuous manner. It can be shown that a tree that implements the four red-black tree properties has a height that is always less than 2 * lg(n+1), where n is the total number of nodes in the tree. For this reason, red-black trees ensure that all operations can be performed within an asymptotic running time of lg n.

Like AVL trees, any time a red-black tree has nodes inserted or deleted, it is important to verify that the red-black tree properties have not been violated. With AVL trees, the balance property was restored using rotations. With red-black trees, the red-black tree properties are restored through recoloring and rotations. Red-black trees are notoriously complex in their recoloring and rotation rules, requiring the nodes along the access path to make decisions based upon their color in contrast to the color of their parents and uncles. (An uncle of a node n is the node that is n's parent's sibling node.) A thorough discussion of recoloring and rotation rules is far beyond the scope of this article.

## Graphs

Now to move onto something that is more general. Graphs are a fundamental data structure.

A graph (G) is set of vertices (V(G)) and edges which connect these vertices (E(G)).

Graphs can be used to model many real-world problems. Graphs have many applications including the web (vertices=web pages, edges =hyperlinks), and social networks (vertices=users, edges='is a friend'). Edges can be both directed (one direction only, webpage A has a link to webpage B, but webpage B does not have a link to webpage A) or undirected (both directions, friendships are not one sided so person A being a friend of person B implies person B is a friend of person A). Here is an interesting presentation by the NSA on graphs and their applications: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf

Graphs problems can range from finding a path (or a shortest ptah) from one vertex to another, to finding the max flow of water that can be put through a set of connected pipes.

**Trees** are a special type of graph. Any two vertices in a tree are connected by one simple path. A **simple path** is a path with no repeated vertices. Trees are also a connected graph. All in all, a tree is a graph that has just enough edges to connect every two vertices to each other ("connected") and no more ("simple path").

**Order** of a graph is the number of vertices in a graph. |V(G)|

**Size** of a graph is the number of edges. |E(G)| **Sparse** graphs have fewer edges than **dense** graphs.

**Degree** of a vertex is the number of edges that is connected to it.

In a tree, |E(G)| = |V(G)| - 1. Because a tree has just enough edges to ensure that everything is connected.

**Terminology**

- tree - collection of vertices and edges
- nodes - stores data item
- parent nodes have children
- child nodes have parent
- leaf have no children and only have parent
- root have no parent and only have children
- sets of nodes can be disconnected from other nodes in the same graph

- edges - links between vertices
- depth (distance between current node and the root)
- height (greatest distance between current node and a leaf)
- both measured in edge count
- directed/undirected
- weighted/unweighted

- nodes - stores data item

Graphs can have at most (|V(G)| * |V(G)| - 1) / 2 edges.

- (|V(G)| * |V(G)| - 1), because an edge should connect a vertex to another vertex. -1 because it should be different though it is possible for an edge to connect a vertex to itself.
- /2, because it is assumed that and edge from u to v (u,v) is same as (v,u). this is the case for undirected graphs. not the case with directed graph where edges, like arrows, have direction.

**Neighbourhoods**. Neighbours are the vertices that are one edge away from you. In directed graphs, there are two different types of neighbours. Let A be a vertex and B be a neighbor. B is an **out-neighbor** of A if it is possible to walk from A to B. B is an **in-neighbor** of A if it is possible to walk from B to A. Essentially you go out to meet your out-neighbours, and your in-neighbours walk in to see you.

**Source**: the vertex where all the arcs come from. Everything point out. Nothing points in (in-degree = 0). The root of a tree would be a source node.**Sink**: the vertex where all the arcs point to. Nothing points out (out-degree = 0). Everything points in.**Walks**: Sequence of vertices that are connected by an edge.**Paths**: walks with no repeated vertices (sometimes also known as a simple path). In a tree every two vertices has a simple path between them.**Cycle**: walks that start and end at same vertex (must have >=3 vertices). A cycle cannot exist in a tree.

**Subgraphs** and **subdigraphs**: Subsets of a graph/digraph.

**Spanning sub(di)graph**: contains all the original vertices (in other words, unnecessary edges are removed)**Induced sub(di)graph**: contains the original edges when necessary (in other words, not all the original vertices needs to be present, but if they do the edges that connect them must be there).**Reverse digraph**: the all arcs are reversed**Underlying graph**of a digraph has all the arc converted into edges. (from directed to undirected).

**Digraphs** can have cycles.

- Digraph without cycle is called acyclic. Full term is
**directed acyclic graph**or 'DAG'. - Without a cycle, a walk involving all the vertices has limited length, starts with a source and ends with a sink.
- Imagine a digraph and all its directed edges. And then imagine physically moving the vertices around (the graph remains logically the same). Eventually you could get a physical arrangement of the vertices so that all the arcs point in one direction. This is called a
**topological order**, and only DAGs have them.

- Recall that in
**connected graphs**you can go from each vertex to another. **Connected components**describe the subgraphs that are connected. The original graph is not connected. But if you take parts of this graph you can get subgraphs which are connected. These subgraphs are connected components. They're sort of like cliques in that no member of a clique contacts a member of another clique. But these all these cliques can all be considered students of a school.- A digraph is
**weakly connected**if it can be described as connected if and only if you ignore the direction of the arcs. It is**strongly connected**if it is connected without ignoring the arcs.- The
**diameter**of a strongly connected digraph is the length of the longest path in it.

- The

**Storing a graph (serializing)**

In the general graph data structure we do not use nodes. Node classes were used in BinaryTree, BST, and SkipList classes. Nodes can handle an arbitrary number of neighbors and a value to each neighbor. However, because we can calculate costs between neighbors in a graph, using nodes can get really messy. Instead, for serialization and normal usage each graph node is stored in either an adjacency list or adjacency matrix.

**Adjacency matrix** is an |V| * |V| matrix (which takes up O(|V|^2) space). Each vertex is represented by a row and a column. Entries represents edges. (i,j)=1 represents edge from i to j. (i,j)=0 means no edge from i to j. Then there are weighted and negative weight edges where the value can be >1 or <0. Directed graphs can mean that (i,j)=1 and (j,i)=0. In undirected graphs (i,j)=1 must mean (j,i)=1.

**Adjacency list** is a list of vertex entries. Each entry of this list is another list of the vertex's out-bound neighbours.

For large numbers of edge connections ("dense") and for applications requiring large numbers of edge operations Adjacency matrix is fast with O(1) for add/remove/retrieval at the cost of higher storage cost and slower vertex operations O(|V|^2).

Adjacency list is well rounded with O(1) best case for adding edges and vertices, O(|V|+|E|) worst case for storage costs. Reading of edges requires traversal O(|V|).

http://bigocheatsheet.com/#graphs

## Basic Graph Example

Basic example of a graph done in python.

An Edge connects two vertices.

A Graph supports the insertion of vertices and edges which connect the vertices to each other.

Inserting a vertex involves creating a blank entry for it in the adjacency list. Adding an edge between two vertices involve creating an Edge object and then appending that to the appropriate vertex entry in the adjacency list.

To do: code the above basic graph operations.

class Edge(object): def __init__(self, u, v): self.source = u # Its not necess. to store source in Edge object because Edge is stored at an index in adj list/mat. this is redundant! self.sink = v def __repr__(self): return "%s->%s" % (self.source, self.sink) class Graph(object): def __init__(self): self.adj = {} def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v) redge = Edge(v,u) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) g=Graph() map(g.add_vertex, ['s','o','p','q','r','t']) g.add_edge('s','o') g.add_edge('s','p') g.add_edge('o','p') g.add_edge('o','q') g.add_edge('p','r') g.add_edge('r','t') g.add_edge('q','r') g.add_edge('q','t') #print map(g.get_edges, ['s','o','p','q','r','t']) print g.get_edges('s') print g.get_edges('o') print g.get_edges('p') print g.get_edges('q') print g.get_edges('r') print g.get_edges('t')

The above uses a python dictionary data type to serve as an adjacency list. A vertex is added in by creating an empty entry with the vertex as a reference. Edges are attached to vertices by finding the vertice entry in the adjacency list and appending an Edge object to it. An Edge object consists of two end points which are the vertices it connects.

It is also possible to use an adjacency matrix instead of a list. Here is the code that can construct an adjacency matrix from a list.

self.muh_matrix = {} def to_matrix(self): for i in self.adj: self.muh_matrix[i] = {} for j in self.adj: self.muh_matrix[i][j] = 1 self.muh_matrix[i][i] = 0

### Elementary Graph Operations

- Does the arc (u,v) exist?
- What is the outdegree of vertex u?
- What is the indegree of vertex u?
- Add an arc between two vertices u and v.
- Delete the arc between vertices u and v.
- Add a node to the (di)graph.
- Delete a node from the (di)graph

class Edge(object): def __init__(self, u, v): self.source = u self.sink = v def __repr__(self): return "%s->%s" % (self.source, self.sink) class Graph(object): def __init__(self): self.adj = {} def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v) redge = Edge(v,u) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) g=Graph() map(g.add_vertex, ['s','o','p','q','r','t']) g.add_edge('s','o') g.add_edge('s','p') g.add_edge('o','p') g.add_edge('o','q') g.add_edge('p','r') g.add_edge('r','t') g.add_edge('q','r') g.add_edge('q','t') #print map(g.get_edges, ['s','o','p','q','r','t']) print "%s: %s"% ('s',g.get_edges('s')) #[s->o, s->p] print "%s: %s"% ('o',g.get_edges('o')) #[o->s, o->p, o->q] print "%s: %s"% ('p',g.get_edges('p')) #[p->s, p->o, p->r] print "%s: %s"% ('q',g.get_edges('q')) #[q->o, q->r, q->t] print "%s: %s"% ('r',g.get_edges('r')) #[r->p, r->t, r->q] print "%s: %s"% ('t',g.get_edges('t')) #[t->r, t->q] print "Does the arc (p,o) exist?" retrieve = g.adj['p'] # O(1) result = "no" for i in retrieve: # O(d) if i.sink == 'o': result = "yes" print result print "Does the arc (o, t) exist?" retrieve = g.adj['o'] # O(1) result = "no" for i in retrieve: # O(d) if i.sink == 't': result = "yes" print result print "What is the out-degree of vertex 'o'?" retrieve = g.adj['o'] # O(1) result = 0 result = len(retrieve) print result print "What is the in-degree of vertex 'o'?" #O(n + |E|) retrieve = g.adj # O(1) result = 0 for i in retrieve: # O(n) if i != 'o': for j in retrieve[i]: # O(|E|) if j.sink == 'o': result = result + 1 print result print "Add an arc between two vertices s and t." g.adj['s'].append(Edge('s','t')) # O(1) print "Done" print "Delete the arc between two vertices s and t." retrieve = g.adj['s'] for i in retrieve: # O(d) if i.sink == 't': retrieve.remove(i) print "Done" print "Add vertex 'b'" retrieve = g.adj retrieve['b'] = [] # O(1) print "Done" print "Remove vertex 'b'" print "Done"

class Graph(object): def __init__(self): self.adj = {} def add_vertex(self, vertex): self.adj[vertex] = {} for i in self.adj: self.adj[i][vertex] = 0 self.adj[vertex][i] = 0 def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): self.adj[u][v] = 1 self.adj[v][u] = 1 g=Graph() map(g.add_vertex, ['s','o','p','q','r','t']) g.add_edge('s','o') g.add_edge('s','p') g.add_edge('o','p') g.add_edge('o','q') g.add_edge('p','r') g.add_edge('r','t') g.add_edge('q','r') g.add_edge('q','t') #print map(g.get_edges, ['s','o','p','q','r','t']) print "%s: %s"% ('s',g.get_edges('s')) #[s->o, s->p] print g.get_edges('o') # print g.get_edges('p') # print g.get_edges('q') # print g.get_edges('r') # print g.get_edges('t') # print "Does the arc (p,o) exist?" retrieve = g.adj['p']['o'] # O(1) result = "no" if retrieve == 1: result = "yes" print result print "Does the arc (o, t) exist?" retrieve = g.adj['o']['t'] # O(1) result = "no" if retrieve == 1: result = "yes" print result print "What is the out-degree of vertex 'o'?" retrieve = g.adj['o'] # O(1) result = 0 for i in retrieve: #O(n) if g.adj['o'][i] == 1: result = result + 1 print result print "What is the in-degree of vertex 'o'?" retrieve = g.adj # O(1) result = 0 for i in retrieve: #O(n) if g.adj[i]['o'] == 1: result = result + 1 # reassignment, not increment. in python integers are immutable. in java strings are immutable. print result print "Add an arc between two vertices s and t." g.adj['s']['t'] = 1 #O(1) print "Done" print "Delete the arc between two vertices s and t." g.adj['s']['t'] = 0 #O(1) print "Done" print "Add vertex 'b'" retrieve = g.adj retrieve['b'] = {} for i in retrieve: # O(n) retrieve[i]['b'] = 0 retrieve['b'][i] = 0 print "Done" print "Remove vertex 'b'" print "Done"

Delete a vertex from the (di)graph Adjacency matrix remove row and column (shift n-1 elements up and n-1 elements left) Î˜(n^2) Adjacency list look at all n entries check if node needs to be removed Î˜(m+n) In sufficiently sparse graphs, the list wins, but as mâ‰¤ 2n(n-1) for digraphs and mâ‰¤ n(n-1) for graphs, this isn't necessarily always so! Conclusion Use adjacency matrix for small, dense (di)graphs for which we wish to test for the existence of arcs, find the indegree of vertices, and/or delete arcs Use adjacency list for large, sparse (di)graphs for which we need to compute outdegree and add and/or delete vertices.

## Graph Traversal

We want to visit all vertices by following the existing arcs/edges. Need to avoid revisiting vertices. We avoid revisiting vertices by marking them as either not yet visited ("white"), currently visiting ("grey") and have visited ("black"). We have a data structure that stores a list of nodes to be visited. It starts off with one vertex (a root). The graph traversal procedure removes this vertex from the data structure, marks it as visited and add its neighbors to it. We repeat this until all vertices have been visited.

There are two graph traversal algorithms. Depth-first search and Breadth-first search. Fundamentally the are the same. We add neighbours to a list of vertexs to visit. The thing that makes them different is how the next vertices are chosen.

Search forest

- A is a parent of B if there is a directed edge from A to B
- A is a child of B if there is a directed edge from B to A
- A is a ancestor of B if if there is a directed path from A to B
- A is a descendant of B if there is a directed path from B to A

Imagine a family tree with the older members on top and arrows pointing down the generations. Everytime we pick a node at random to traverse (as opposed to following neighbours) we start a new disjoint tree. Put together, these disjoint trees are called forests.

- Arcs: There are four different kinds of arcs in the search forest F for a graph G. In the graph G, there exists arcs. These arcs can be classified in the following.
- The arcs in G that also in F are called tree arcs. Remember, like a family tree, with arcs point from the older generation to the younger. Tree arcs are the arcs in the graph that point to an immediate descendent. Parents to their children.
- Forward arcs. They don't exist in F. But they do connect one vertex to an decedent. Imagine an arc that connects from your grandparents to you.
- Back arcs. They don't exist in F. But they do connect one vertext to an ancestor. Imagine an arc that connects from you to your grandparents. They don't exist with DAGs.
- Cross arcs. Neither of the above. Say a sibling or an uncle/aunt arc.

### Depth first search

The data struct behind this is the stack. LIFO. It specifies that we select the next grey vertex to pick as the youngest remaining grey vertex. We start at the root and add all its child nodes into the stack. We take one vertex out of the stack (the last one we add). And add its childs back in. Eventually we hit a leaf node that has no children. So we go up to its parent node and pick another child. and so on an so sort. stacks can mean recursion. in the recursive implementation below the stack is the system memory's call stack. when a function is called it is added to the stack. when its finished its removed.

### Breadth first search

The data struct behind this is the queue. FIFO. It specifies that we select the next grey vertex to pick as the oldest remaining grey vertex. We start at the root and add their children into the queue. we take one vertex out of the queue (the first one we added) and add its children to the queue. then we go back to the queue, take one out and add its children in. queues = iteration Does not produce forward arcs (you're not going to be deep enough). but do have back and cross arcs. in addition to tree arcs. Depth first search produces all the arcs: forward arcs because it can go deep enough. There is no time complexity difference between DFS and BFS. Choosing next vertex is a constant time operation.

//Actually do DFS or BFS Graph Search. http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=graphsDataStrucs2

//See that CS 220 assignment

//One difficulty i'm having so far is in the bfs and dfs pick the neighbor nodes to traverse. it doesn't specify how to, it just says pick one. technically, you're supposed to pick the left most (lowest value) right? like how in a family tree the siblings are arranged left (youngest) to right (oldest). not too big a deal, but it may mess up the arc type for dfs.

but lets ignore that for now....

Note that if the graph is already in a sorted order. Then use Binary Search Tree instead! Use this method if not a binary (search) tree.

class Edge(object): def __init__(self, u, v): self.source = u self.sink = v def __repr__(self): return "%s->%s" % (self.source, self.sink) class Graph(object): def __init__(self): self.adj = {} self.visited = [] def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v) redge = Edge(v,u) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) def dfs(self, s): label = {} stack = [] explored = [] # stack to store explored (all neighbors discovered stack.append(s) # s is discovered (first visited) while len(stack) > 0: print stack top = stack.pop() print top # terminated if correct node found # continue if not yet found # or just traversing through all # the categorising is not working properly! https://en.wikipedia.org/wiki/Depth-first_search#Pseudocode # a worked example http://www.cs.auckland.ac.nz/compsci220s1t/archive/compsci220ft/tutorials/tut06/TUTORIALl-6.pdf for i in self.adj[top]: # get all adjacent edges (i.sink are adjacent nodes) if i.sink not in explored and i.sink not in stack: # not discovered and not explored stack.append(i.sink) print "%s-%s is tree-edge"%(top,i.sink) elif i.sink in stack: print "%s-%s is back-edge"%(top,i.sink) else: print "%s-%s is forward or cross edge"%(top,i.sink) explored.append(top) def dfs_recursive(self, s, visited=[]): print visited print s visited.append(s) for i in self.adj[s]: if i.sink not in visited: g.dfs_recursive(i.sink,visited) def bfs(self, s): # https://en.wikipedia.org/wiki/Breadth-first_search queue = [] queue.append(s) while len(queue) > 0: print queue top = queue[0] queue.remove(top) print top # we now can do something with this node if we want to # eg if statement to see if thise is node we want # if so, then return it. #traverse, no need do anythin. self.visited.append(top) for i in self.adj[top]: if i.sink not in self.visited and i.sink not in queue: queue.append(i.sink) g=Graph() map(g.add_vertex, ['s','o','p','q','r','t']) g.add_edge('s','o') g.add_edge('s','p') g.add_edge('o','p') g.add_edge('o','q') g.add_edge('p','r') g.add_edge('r','t') g.add_edge('q','r') g.add_edge('q','t') #print map(g.get_edges, ['s','o','p','q','r','t']) print g.get_edges('s') print g.get_edges('o') print g.get_edges('p') print g.get_edges('q') print g.get_edges('r') print g.get_edges('t') #g.bfs('s') g.dfs('s') #g.dfs_recursive('s')

## Advanced Graph Operations

### Cycles and Girths

cycles: walk with start=end

girth: length of shortest cycle in graph

How to find girth?

- Simple algorithm based on BFS:
- Perform BFS |V| times, starting at each vertex v ϵ V in turn.
- If during a BFS search, we encounter a grey (rather than white) neighbour while exploring the edges that start at grey vertex x, we continue to the end of the current level (i.e., explore the remaining vertices at this distance from the starting vertex) and then stop.
- For each edge (x,y) at this level for which both x and y are grey and for which v is the lowest common ancestor in the search tree, we compute the distances d(x) and d(y)to v.
- The length of the cycle involving x, y and v is then d(x) + d(y) + 1
- The minimum of these lengths at the level is the smallest cycle that involves v.
- The smallest cycle among all possible start vertices v is the girth

class Edge(object): def __init__(self, u, v, w): self.source = u self.sink = v self.distance = w def __repr__(self): return "%s->%s (%s)" % (self.source, self.sink, self.distance) class Graph(object): def __init__(self): self.adj = {} def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v, w) redge = Edge(v,u, w) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) def shortest_cycle(self, s): queue = [] queue.append(s) distances = {} visited = [] while len(queue) > 0: #print queue top = queue[0] queue.remove(top) #print top if top == s: distances[top] = 0 visited.append(top) for i in self.adj[top]: if i.sink in queue: #print "It's already grey!: %s %s <-- %s %s" % (i.sink, distances[i.sink], top, distances[top]) return distances[i.sink] + distances[top] + 1 if i.sink not in visited and i.sink not in queue: queue.append(i.sink) distances[i.sink] = distances[top] +1 g=Graph() map(g.add_vertex, ['1','2','3','4', '5', '6', '7', '8']) g.add_edge('1','2') g.add_edge('1','6') g.add_edge('1','8') g.add_edge('2','3') g.add_edge('3','4') g.add_edge('4','5') g.add_edge('5','6') g.add_edge('6','7') g.add_edge('7','8') print "Shortest cycle with vertex 1: %d" % (g.shortest_cycle('1') ) print "Shortest cycle with vertex 2: %d" % (g.shortest_cycle('2') ) print "..." print "The girth (shortest cycle) of this graph is" print min(map(g.shortest_cycle, ['1','2','3','4', '5', '6', '7', '8']))

### Cycle detection

Cycle detection is the algorithmic problem of finding a cycle in a sequence of iterated function values.

cycle: repeat values

sequence: series of values

iterated function: composed in itself (results fed as input to get new results) if that function maps a finite set S to itself. must eventually repeat a cycle of values.

Cycle detection: find where repeats given function and starting value.

can form functional graph from the iterated function. (that is, a directed graph in which each vertex has a single outgoing edge)

**Floyd's cycle-finding algorithm**

#https://en.wikipedia.org/wiki/Cycle_detection#Algorithms #http://alaning.me/index.php/Graphs#Cycles_and_Girths x0 = 0 set = [6,6,0,1,4,3,3,4,0] def f(a): return set[a] def floyd(f, x0): tortoise = f(x0) # f(x0) is the element/node next to x0. hare = f(f(x0)) # value of node in position after that while tortoise != hare: tortoise = f(tortoise) # keep moving in sequence hare = f(f(hare)) # moves through twice as quick as tortoise # until the values are the same (position not necess) # we found the starting value # hare moves two steps for each tortoise one step # tortoise at half distance of hare # eventually both in cycle # distance between them will be divisible by # cycle period lam # distance between hare and tortoise (v) = tortoise position # is divisible by cycle period (lam) # hare moves in the circle one step at a time # tortoise moves to the circle from initial location # will intersect at beginning of circle # tortoise, hare distance is constant 2(v) a multiple of (lam) # will agree when tortoise reaches (u) # Find the position μ of first repetition. mu = 0 tortoise = x0 while tortoise != hare: tortoise = f(tortoise) hare = f(hare) # Hare and tortoise move at same speed mu += 1 # Find the length of the shortest cycle starting from x_u # value at position of first repetition # The hare moves one step at a time while tortoise doesn't move. # lam is incremented until lam is found. lam = 1 hare = f(tortoise) while tortoise != hare: hare = f(hare) lam += 1 return lam, mu print floyd(f,x0)

**Analysis**
This code only accesses the sequence by storing and copying pointers, function evaluations, and equality tests; therefore, it qualifies as a pointer algorithm. The algorithm uses O(λ + μ) operations of these types, and O(1) storage space.

μ position of first repetition

λ length of the shortest cycle starting from x_μ

### k-colourings

Graph G has k-colourings if its vertices can be split into k-distinct sets so that every edge connects two vertices from different set. Edges are not allowed to connect two vertices belonging to the same set. Used in the Four colour theorem for map makers.

2-colouring graphs are **bipartite graphs**. Every edge crosses from one set to the other. No cycles of odd lengths.

### Matchings

set of edges that do not share a vertex.

can be used with bipartite graphs

Matchings are used whenever we need to assign members of a set to each other as exclusive pairs based on suitability criteria. Used in river crossing puzzle and variations of it.

### Minimum spanning trees

Prim's algo (based off Dijkstra's)

import heapq import sys class Edge(object): def __init__(self, u, v, w): self.source = u self.sink = v self.distance = w def __repr__(self): return "%s->%s:%s" % (self.source, self.sink, self.distance) class Graph(object): def __init__(self): self.adj = {} def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v,w) redge = Edge(v,u,w) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) def shortest_path(self, start): distances = {} # Distance from start to node previous = {} # Previous node in optimal path from source nodes = [] # Priority queue of all nodes in Graph # init for vertex in self.adj: if vertex == start: distances[vertex] = 0 heapq.heappush(nodes, [0, vertex]) else: distances[vertex] = sys.maxsize heapq.heappush(nodes, [sys.maxsize, vertex]) previous[vertex] = None while nodes: smallest = heapq.heappop(nodes)[1] if distances[smallest] == sys.maxsize: break for neighbor in self.adj[smallest]: # alt = distances[smallest] + neighbor.distance alt = neighbor.distance if alt < distances[neighbor.sink]: distances[neighbor.sink] = alt previous[neighbor.sink] = smallest for n in nodes: if n[1] == neighbor.sink: n[0] = alt break heapq.heapify(nodes) return previous g=Graph() #map(g.add_vertex, ['A','B','C','D']) #g.add_edge('A','B',2) #g.add_edge('A','D',1) #g.add_edge('B','D',2) #g.add_edge('C','D',3) map(g.add_vertex, ['A','B','C','D','E','F','G']) g.add_edge('A','B',7) g.add_edge('A','D',5) g.add_edge('B','C',8) g.add_edge('B','D',9) g.add_edge('B','E',7) g.add_edge('C','E',5) g.add_edge('D','E',15) g.add_edge('D','F',6) g.add_edge('E','F',8) g.add_edge('E','G',9) g.add_edge('F','G',11) print "This shows the shortest path" print g.shortest_path('A') # 'A' chosen arbitrarily

Kruskual's

from operator import itemgetter, attrgetter class Edge(object): def __init__(self, u, v, w): self.source = u self.sink = v self.weight = w def __repr__(self): return "%s->%s" % (self.source, self.sink) def __eq__(self, other): return self.source == other.source and self.sink == other.sink and self.weight == other.weight class Graph(object): def __init__(self): self.adj = {} def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v, w) redge = Edge(v,u, w) if edge not in self.adj[u]: self.adj[u].append(edge) if redge not in self.adj[v]: self.adj[v].append(redge) def bfs(self, s, t): queue = [] queue.append(s) visited = [] while len(queue) > 0: #print queue top = queue[0] queue.remove(top) #print top if top == t: return "found!" visited.append(top) for i in self.adj[top]: if i.sink not in visited and i.sink not in queue: queue.append(i.sink) return "not found" g=Graph() map(g.add_vertex, ['A', 'B', 'C', 'D', 'E', 'F', 'G']) g.add_edge('A','B', 7) g.add_edge('A','D', 5) g.add_edge('B','C', 8) g.add_edge('B','D', 9) g.add_edge('B','E', 7) g.add_edge('C','E', 5) g.add_edge('D','E', 15) g.add_edge('D','F', 6) g.add_edge('E','F', 8) g.add_edge('E','G', 9) g.add_edge('F','G', 11) g.bfs('A', 'H') all_edges = [] for i in g.adj: #print i for j in g.adj[i]: #print j.sink all_edges.append((i, j.sink, j.weight)) #print all_edges #print sorted(all_edges, key=itemgetter(2)) print "new start" t=Graph() # its called graph but really this is a tree where we build up our min span tree. for i in sorted(all_edges, key=itemgetter(2)): print i print i[0] try: print t.adj[i[0]] # if not found then create it except KeyError: print "nothing" t.add_vertex(i[0]) try: print t.adj[i[1]] except KeyError: print "nothing" t.add_vertex(i[1]) print "finding path between %s and %s: %s" %(i[0], i[1], t.bfs(i[0],i[1])) if t.bfs(i[0],i[1])=="not found": t.add_edge(i[0],i[1],i[2]) print t.adj[i[0]] print t.adj[i[1]] for i in t.adj: for j in t.adj[i]: print "%s %s %d"%(j.source, j.sink, j.weight) # t is min span tree # via Kruskal's algorithm. https://en.wikipedia.org/wiki/Kruskal%27s_algorithm

## Single source shortest path

### Dijkstra's algorithm

We find shortest route from node A to node B in a graph with non-negative weight edges.

Each node has a distance from the start. The start node has a distance of 0. All other nodes have distance of inf to represent a not yet reachable target.

At each step we pick the node with the shortest distance from the start that has not been visited yet. This is where a heap comes in handy. A heap implements an abstract data type called a priority queue. Just like other ADTs, you can add items in and then take them out later. In a queue the item taken out is oldest one, in a stack it is the newest one. In a priority queue each item has a priority. And the one taken out is the one with the lowest (or highest) priority. Heaps implement the priority queue by using a tree-based data structure that satisfies the heap property. The heap property is that the root nodes have a value greater than that of any value in its decendante nodes.

So after we've picked a node we cause a ripple effect to the neighbors, in a process called 'relaxation'. The distance to each neighbor is calculated by taking into account the current node's distance from the start and the weight of the edge to that neighbor. If this distance is less than the recorded distance of that neighbor node, the recorded distance gets updated.

There are two tables

- A distance table, which keeps an up-to-date "best distance" from the source node to every other node.
- A route table, which, for each node n, indicates what node was used to reach n to get the best distance.

Each node in the graph is visited only once, though its distance may be updated multiple times (or not at all) depending on the number of neighbors. Each relaxation gets a smaller and smaller value. This allows us to find the best path through a graph in O(m * log(n)) time where n is the number of vertices and m is the number of edges in the graph.

Example. Traffic flow. Weights of edges represent how fast traffic flows. Vertices are intersections. You want to minimize time spent in traffic so you prefer routes with small weight edges. Dijkstra's algorithm can be used here.

# based on http://maxburstein.com/blog/introduction-to-graph-theory-finding-shortest-path/ import heapq import sys class Edge(object): def __init__(self, u, v, w): self.source = u self.sink = v self.distance = w def __repr__(self): return "%s->%s:%s" % (self.source, self.sink, self.distance) class Graph(object): def __init__(self): self.adj = {} def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v,w) redge = Edge(v,u,w) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) def shortest_path(self, start, finish): distances = {} # Distance from start to node previous = {} # Previous node in optimal path from source nodes = [] # Priority queue of all nodes in Graph # init for vertex in self.adj: if vertex == start: distances[vertex] = 0 heapq.heappush(nodes, [0, vertex]) else: distances[vertex] = sys.maxsize heapq.heappush(nodes, [sys.maxsize, vertex]) previous[vertex] = None print "This is the heap which will be used in traversal" print "Remember a heap is popped to retrieve an entry. This entry is smallest one." print "In this case it is the node with the smallest distance from start." for n in nodes: print n while nodes: print "" print "Looking at this node now:", smallest = heapq.heappop(nodes)[1] print smallest print "%s is this far from start: %s"%(smallest,distances[smallest]) if smallest == finish: print "" print "Finished. We found our goal." print "Going through 'previous' array now" path = [] while previous[smallest]: path.append(smallest) print "%s:%s"%(smallest,previous[smallest]) smallest = previous[smallest] path.append(smallest) return path if distances[smallest] == sys.maxsize: print "" print "Broke" break for neighbor in self.adj[smallest]: print " This is a neighbor: %s"%neighbor.sink print " %s is this far from start: %s"%(neighbor.sink,distances[neighbor.sink]) print " %s is this far from %s: %s"%(neighbor.sink,smallest,neighbor.distance) alt = distances[smallest] + neighbor.distance print " %s is this far from start, going through %s: %s"%(neighbor.sink,smallest,alt) if alt < distances[neighbor.sink]: distances[neighbor.sink] = alt previous[neighbor.sink] = smallest #mark this line. how does previous change? print " %s has a new distance from start: %s"%(neighbor.sink,distances[neighbor.sink]) print " To get to %s you go through %s"%(neighbor.sink,smallest) print " This is recorded in 'previous' array: %s"%previous # what do these next 4 lines do? # traversal is based on a heap # heaps always pull out the lowest (or smallest) value # this heap is made up of a pair, distance and node name # adding new distances helps choose the next one to visit, beacuse it always visits the smallest one # it will visit the largest one iff no other connection and goal not yet found (does not exist) # not this is only for deciding which to traverse next # distances give the actual distance from start print " We added a new distance, so its time to update our heap" for n in nodes: if n[1] == neighbor.sink: n[0] = alt break heapq.heapify(nodes) print " The heap looks like this now" for n in nodes: print " %s"%n return distances g=Graph() #map(g.add_vertex, ['A','B','C','D','E','F','G','H']) g.add_vertex('A') g.add_vertex('B') g.add_vertex('C') g.add_vertex('D') g.add_vertex('E') g.add_vertex('F') g.add_vertex('G') g.add_vertex('H') g.add_edge('A','B',7) g.add_edge('A','C',8) g.add_edge('B','F',2) g.add_edge('C','F',6) g.add_edge('C','G',4) g.add_edge('D','F',8) g.add_edge('E','H',1) g.add_edge('F','G',9) g.add_edge('F','H',3) print "These are the edge connections" print g.get_edges('A') print g.get_edges('B') print g.get_edges('C') print g.get_edges('D') print g.get_edges('E') print g.get_edges('F') print g.get_edges('G') print g.get_edges('H') print g.shortest_path('A','H')

**Anaylsis**

An algorithm that sometimes can solve optimization problems is the Greedy Algorithm. In the greedy algorithm we make several small steps to our goal and at each step we choose the optimal step, greedy-choice.

See more information on Dynamic_Programming_vs_Greedy_Algorithms#Dijkstra.27s_algorithm

**Other related algorithms.**

**Bellman–Ford** algorithm can be used on graphs with negative edge weights, as long as the graph contains no negative cycle reachable from the source vertex s. Which makes sense because you could go round and round the cycle to get a smaller and smaller weight.

The **A* search** algorithm is a generalization of Dijkstra's algorithm that cuts down on the size of the subgraph that must be explored, if additional information is available that provides a lower bound on the "distance" to the target.

It first searches the routes that appear to be most likely to lead towards the goal. What sets A* apart from a greedy best-first search is that it also takes the distance already traveled into account. It uses a bit of intelligence called a heuristic to decide the next node to visit. The heuristic here is a guess of how far the current node is to the goal node. This can be node's Chebychev distance or Manhattan distance.

Just like with Dijkstra's a priority queue is used and with each visited node it adds its neighbors to the pq. With A* however, each node added also includes its heuristic which determines when its picked out from the pq. We pick the node with the smallest score, that is, the node we think is closets to the goal node.

In Dijkstra's the next node chosen is the one closest to any visited node. Dijkstra's can be viewed as a special case of A* where h(x) = 0 for all x (heuristic is 0 in all cases). .

The process that underlies Dijkstra's algorithm is similar to the greedy process used in **Prim's algorithm**. Prim's purpose is to find a minimum spanning tree that connects all nodes in the graph; Dijkstra is concerned with only two nodes. Prim's does not evaluate the total weight of the path from the starting node, only the individual path.

### Bellman–Ford algorithm

There may be graphs with negative weight edges in which you want to perform SSSP on. In this case use Bellman–Ford algorithm. Like Dijkstra's it solves the single source shortest path problem. Unlike Dijkstra's it can handle negative weight edges at the cost of being a bit slower than Dijkstra's.

Like Dijkstra's Algorithm, Bellman–Ford is based on the principle of relaxation, in which an approximation to the correct distance is gradually replaced by more accurate values until eventually reaching the optimum solution. In both algorithms, the approximate distance to each vertex is always an overestimate of the true distance, and is replaced by the minimum of its old value with the length of a newly found path. However, Dijkstra's algorithm greedily selects the minimum-weight node that has not yet been processed, and performs this relaxation process on all of its outgoing edges; in contrast, the Bellman–Ford algorithm simply relaxes all the edges, and does this |V | − 1 times, where |V | is the number of vertices in the graph. In each of these repetitions, the number of vertices with correctly calculated distances grows, from which it follows that eventually all vertices will have their correct distances. This method allows the Bellman–Ford algorithm to be applied to a wider class of inputs than Dijkstra.

import heapq import sys class Edge(object): def __init__(self, u, v, w): self.source = u self.sink = v self.distance = w def __repr__(self): return "%s->%s:%s" % (self.source, self.sink, self.distance) class Graph(object): def __init__(self): self.adj = {} def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v,w) redge = Edge(v,u,w) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) def shortest_path(self, start, finish): distances = {} # Distance from start to node previous = {} # Previous node in optimal path from source nodes = [] # Priority queue of all nodes in Graph # init for vertex in self.adj: if vertex == start: distances[vertex] = 0 else: distances[vertex] = sys.maxsize previous[vertex] = None # relax all the edges. for vertex in self.adj: print vertex for neighbor in self.adj[vertex]: print " This is a neighbor: %s"%neighbor.sink print " %s is this far from start: %s"%(neighbor.sink,distances[neighbor.sink]) print " %s is this far from %s: %s"%(neighbor.sink,vertex,neighbor.distance) if distances[vertex] + neighbor.distance < distances[neighbor.sink]: distances[neighbor.sink] = distances[vertex] + neighbor.distance previous[neighbor.sink] = vertex # check for -ve weight for vertex in self.adj: for neighbor in self.adj[vertex]: if distances[vertex] + neighbor.distance < distances[neighbor.sink]: print "Error: Graph contains negative weight cycle" # print it out print "" print "Finished. We found our goal." print "Going through 'previous' array now" smallest = finish while previous[smallest]: print "%s ->"%smallest, smallest = previous[smallest] if smallest == start: print "%s"%start break return distances g=Graph() #map(g.add_vertex, ['A','B','C','D','E','F','G','H']) g.add_vertex('A') g.add_vertex('B') g.add_vertex('C') g.add_vertex('D') g.add_vertex('E') g.add_vertex('F') g.add_vertex('G') g.add_vertex('H') g.add_edge('A','B',7) g.add_edge('A','C',8) g.add_edge('B','F',2) g.add_edge('C','F',6) g.add_edge('C','G',4) g.add_edge('D','F',8) g.add_edge('E','H',1) g.add_edge('F','G',9) g.add_edge('F','H',3) print "These are the edge connections" print g.get_edges('A') print g.get_edges('B') print g.get_edges('C') print g.get_edges('D') print g.get_edges('E') print g.get_edges('F') print g.get_edges('G') print g.get_edges('H') print g.shortest_path('A','H')

### A * Search Algorithm

A* Search used for path finding and graph traversal. extends Dijkstra's algorithm. uses heuristics (doesn't just take the shortest distance)

like in games. esp rts. A unit is at loc A and needs to be at B.

Dijkstra: from a node, examine all connected edges. mark that node so we don't visit it again. and then pick the closest node edge and repeat again.

It is a best first search. Finds the least cost path from one node to another.

Heuristic, gives a dynamic way to pick a path to target node. Its a best guess of the path to take.

The heuristic function is admissible. never overestimates the cost of reaching the goal. Two functions are Chebychev distance (diagonals and up/down/left/right, and Manhattan distance(only up/down/left/right).

Chebyshev Distance heuristic distance between two nodes is: chebyshev_distance = [(current.x - target.x).abs, (current.y - target.y).abs].max

Uses a priority queue sort of like Dijkstra's. stores alternate paths

knowledge-plus-heuristic cost function of node x the past path-cost function, which is the known distance from the starting node to the current node x (usually denoted g(x)) a future path-cost function, which is an admissible "heuristic estimate" of the distance from x to the goal (usually denoted h(x)).

the g(x) part of the heuristic is the cost from the starting point, not simply the local cost from the previously expanded node.

Uses a (nxn?) grid as a graph. Finds distance between two points (nodes) in this grid.

good visual: https://upload.wikimedia.org/wikipedia/commons/5/5d/Astar_progress_animation.gif

complete, will find a solution if exists (like bfs) Dijkstra's algorithm, as another example of a uniform-cost search algorithm, can be viewed as a special case of A* where h(x) = 0 for all x.

# original: http://maxburstein.com/blog/intro-to-graph-theory-finding-shortest-path-part2/ import heapq import sys class Node(object): def __init__(self, x, y): self.x = x # coord pos self.y = y # coord pos self.obstacle = False # is a node that can go through or not? self.g_score = sys.maxsize # best distance from the source def __repr__(self): return "(%s,%s,%s)"%(self.x, self.y, self.obstacle) class Graph(object): def __init__(self): None def __init__(self, size): self.size = size self.grid = [] for y in range(0,self.size): # 0..7 row = [] for x in range(0,self.size): # 0..7 row.append(Node(x,y)) self.grid.append(row) def set_obstacle(self,x,y): self.grid[y][x].obstacle = True def shortest_path(self, start_x, start_y, end_x, end_y): # takes in the starting and ending x y pos. def heuristics(current,target): return max(abs(current.x-target.x),abs(current.y-target.y)) start = self.grid[start_y][start_x] finish = self.grid[end_y][end_x] visited = [] # The set of nodes already evaluated # we do not want to visit the same node twice. previous = {} # Previous node in optimal path from source # keeps track of how we go to a node, just like in dijkstra's previous[start] = 0 f_score = {} # min-heap (priority queue) # stores node's distance from source + heuristics calculation # All possible ways to go in a node dx = [1, 1, 0, -1, -1, -1, 0, 1] dy = [0, 1, 1, 1, 0, -1, -1, -1] # going through these we get all 8 possible adjacent nodes start.g_score = 0 # source distance is set to 0 f_score[start] = start.g_score + heuristics(start,finish) # first calc of heuristic function # priority queue of nodes to be traversed while len(f_score)>0: #current = f_score.pop() current = min(f_score, key = f_score.get) # node with smallest f_score f_score.pop(current,None) visited.append(current) # mark that node as visited if current == finish: # last step is to print path path = [start] while previous[current]: path.append(current) current = previous[current] g.print_path(path) print "Path found" for direction in range(0,8): # all the nodes that touch the current node new_x = current.x + dx[direction] new_y = current.y + dy[direction] if new_x < 0 or new_x >= self.size or new_y < 0 or new_y >= self.size: # check out of range (off the edge) continue # next if so # our node exists! neighbor = self.grid[new_y][new_x] # Check if we've been to a node or if it is an obstacle # we do not want to visit nodes that we've already been to # a node is in visited after we pop it from f_score # a node is added to f_score... if neighbor.obstacle or neighbor in visited or neighbor in f_score: continue # getting path distances now if direction % 2 == 1: # using direction? odd is diagonal because there is a # 1 or -1 in both dx and dy at same time tentative_g_score = current.g_score + 14 # traveled so far + distance to next node diagonal (technically sqrt(200)0 else: tentative_g_score = current.g_score + 10 # traveled so far + distance to next node vertical or horizontal # If there is a new shortest path update our priority queue (relax) # is our new path going to be shorter than the older path found? if tentative_g_score < neighbor.g_score: # if so then update how we go there, distance to there and add node to pq # to be eval later previous[neighbor] = current neighbor.g_score = tentative_g_score f_score[neighbor] = neighbor.g_score + heuristics(neighbor,finish) def print_path(self, path): for y in range(0,self.size): for x in range(0,self.size): if self.grid[y][x].obstacle: print "X ", elif self.grid[y][x] in path: print "- ", else: print "0 ", print "" g=Graph(8) g.set_obstacle(1,1) g.set_obstacle(2,2) g.set_obstacle(3,3) g.set_obstacle(4,4) g.set_obstacle(5,5) g.set_obstacle(6,6) g.set_obstacle(2,1) g.set_obstacle(3,2) g.set_obstacle(4,3) g.set_obstacle(5,4) g.set_obstacle(6,5) print g.shortest_path(0,2,6,3) g = Graph(8) g.set_obstacle(1,0) g.set_obstacle(1,1) g.set_obstacle(1,2) g.set_obstacle(1,3) g.set_obstacle(1,4) g.set_obstacle(1,5) g.set_obstacle(1,6) print g.shortest_path(0,0,4,2)

## All Pairs Shortest Path

### Floyd–Warshall algorithm

Both Dijkstra's and Bellman–Ford solve SSSP. Get the shortest path from a single source. (Actually Dijkstra can solve for multiple sources, just set multiple nodes to have distance of 0, or connect those multiple sources to a single node of distance 0 using 0 weight edges).

Floyd-Warshall is slightly different. It computes all the possible shortest paths from each vertex to another. All Pairs Shortest Path. It works with negative weight edges.

Floyd–Warshall finds optimal path between each pair of vertices. It does so by incrementally improving an estimate on the shortest path between two vertices, until the estimate is optimal. Between each pair of vertices it tries each possible path between them to eventually get an optimal estimate of the shortest path between the two.

Example. Taking trips by bus costs money. But, turns out because gov wants to increase bus usage some routes can add to your bus credit rather than take away from. Vertices are bus stops / transfers. Weights of the edges are the monetary cost of the trip. Positive means you pay to take the trip. Negative means you can get paid to take that trip. Negative cycles are bad. In this case you could make money by travelling in the same circuit over and over again, which I'm sure the gov would want to avoid.

import heapq import sys class Edge(object): def __init__(self, u, v, w): self.source = u self.sink = v self.distance = w def __repr__(self): return "%s->%s:%s" % (self.source, self.sink, self.distance) class Graph(object): def __init__(self): self.adj = {} self.muh_matrix = {} self.next = {} #trace back path def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v,w) redge = Edge(v,u,w) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) def to_matrix(self): for i in self.adj: self.muh_matrix[i] = {} self.next[i] = {} for j in self.adj: self.muh_matrix[i][j] = sys.maxsize self.muh_matrix[i][i] = 0 self.next[i][j] = sys.maxsize for k in self.adj[i]: # gotta iterate to get proper weight which is a pain if k.sink == j: #print "%s %s %s"%(i, j, k.distance) self.muh_matrix[i][j] = k.distance for k in self.adj: for i in self.adj: for j in self.adj: if self.muh_matrix[i][k] + self.muh_matrix[k][j] < self.muh_matrix[i][j]: self.muh_matrix[i][j] = self.muh_matrix[i][k]+self.muh_matrix[k][j] self.next[i][j] = k def shortest_path(self, start, end): if self.muh_matrix[start][end] == sys.maxsize: return "no path" if start==end: return 0 inter = self.next[start][end] if inter is sys.maxsize: return " " else: return self.shortest_path(start, inter) + inter + self.shortest_path(inter, end) g=Graph() map(g.add_vertex, ['A','B','C','D','E','F','G','H']) g.add_edge('A','B',7) g.add_edge('A','C',8) g.add_edge('B','F',2) g.add_edge('C','F',6) g.add_edge('C','G',4) g.add_edge('D','F',8) g.add_edge('E','H',1) g.add_edge('F','G',9) g.add_edge('F','H',3) g.to_matrix() print g.shortest_path('A','H') print g.muh_matrix['A']['H']

See more information on Dynamic_Programming_vs_Greedy_Algorithms#Floyd.E2.80.93Warshall_Algorithm

## Flow Networks

Another example to give. Imagine water networks. Try as hard as you can you can not compress water. Which means that all pipes have their own max capacity.

You have a complicated set up of pipes which connect to each other. Actually you can think of the entire pipe network as a single point. What you want to know is how much flow of water can you get through this pipe network. Not enough flow and you're not being as efficient as you can be. Too much flow and there will be back log or pipe burst and that is not a good thing.

This is the max flow problem See also flow networks

### Ford–Fulkerson algorithm

Ford–Fulkerson algorithm can compute max flow in a flow network.

Also related: https://en.wikipedia.org/wiki/Kirchhoff%27s_current_law#Kirchhoff.27s_current_law_.28KCL.29 total in = total out

class Edge(object): def __init__(self, u, v, w): self.source = u self.sink = v self.capacity = w def __repr__(self): return "%s->%s:%s" % (self.source, self.sink, self.capacity) class FlowNetwork(object): def __init__(self): self.adj = {} self.flow = {} def add_vertex(self, vertex): self.adj[vertex] = [] def get_edges(self, v): return self.adj[v] def add_edge(self, u, v, w=0): if u == v: raise ValueError("u == v") edge = Edge(u,v,w) redge = Edge(v,u,0) edge.redge = redge #redge is not defined in Edge class redge.redge = edge self.adj[u].append(edge) self.adj[v].append(redge) self.flow[edge] = 0 self.flow[redge] = 0 def find_path(self, source, sink, path, path_set): if source == sink: return path for edge in self.get_edges(source): residual = edge.capacity - self.flow[edge] if residual > 0 and not (edge,residual) in path_set: path_set.add((edge, residual)) result = self.find_path( edge.sink, sink, path + [(edge,residual)], path_set) if result != None: return result def max_flow(self, source, sink): path = self.find_path(source, sink, [], set()) while path != None: flow = min(res for edge,res in path) for edge,res in path: self.flow[edge] += flow self.flow[edge.redge] -= flow path = self.find_path(source, sink, [], set()) return sum(self.flow[edge] for edge in self.get_edges(source)) g=FlowNetwork() map(g.add_vertex, ['s','o','p','q','r','t']) g.add_edge('s','o',3) g.add_edge('s','p',3) g.add_edge('o','p',2) g.add_edge('o','q',3) g.add_edge('p','r',2) g.add_edge('r','t',3) g.add_edge('q','r',4) g.add_edge('q','t',2) print g.max_flow('s','t')