Computer Science

From My Wiki
Jump to: navigation, search

Computers are used to store and search data.

Computer Science is the scientific approach to this.

public class HelloWorld {
	public static void main(String[] args) {
		System.out.println("Hello World!");
	}
}

Computer science is no more about computers (or programming) than astronomy is about telescopes.

Basic Object-oriented programming, Java Language

Object - a variable, or a function or a data structure (or a combination). It is a specific instance of a class.

Class - templates for Objects. Provides initial values for variables (states) and implements certain behaviours (methods).

Abstract Classes - Abstract classes permit declaration of classes that define only part of an implementation, leaving the subclasses to provide the details

If an abstract class contains only abstract method declarations, it should be implemented as an interface instead.

Concrete super class - Use extends to make subclass

Abstract class (At least one method is abstract) - Use extends to make subclass

Completely abstract class (interface) - A subclass implements the interface

interface can be implemented by many classes

class can implement many interfaces

interfaces can extend many interfaces

Generics

Used in classes. Can allow data structures to be reused in other applications because specifics are removed meaning it becomes more flexible.

But making it too general means that bugs may not be caught until runtime.

Generics = Type Safety - check for errors at compile time. Specific types can be chosen by the developer when using the data structure.

Performance - no need to type check at run time.

Reusability - breaks tight coupling between data structure and the application it is being used in. Allows it to be reused.

ArrayList<String> words = new ArrayList<String>(); instead of defaulting to Objects

public static void printCollection(List<?> c);

<?> - any type. unbounded wildcard type. List of unknowns.

public static void printCollection(List<? extends T> c); <? extends T> - any subclass of T, including T itself. also known as "bounded wildcard".

<?> = <? extends Object> (Object is top of hierarchy)

Methods:

public static <T> void fromArrayToList(T[] a, List<T> c) { - one type parameter: T, this is unbounded. Examples: List<Movie> moviesA = fromArrayToList(movies); OK List<String> wordsA = fromArrayToList(words); OK

public<T extends Item> List<T> startsWithA(List<T> list); - type parameter T here is restricted to Items or below. Let Movies be subclass of Items.

List<Movie> moviesA = startsWithA(movies); OK
List<String> wordsA = startsWithA(words); COMPILE TIME ERROR

public<T extends Item & Priced> void pricedItems(List<T> list); Let Movie, but not CD be subclass of Item, and CD be subclass of Priced

List<Movie> movies = pricedItems(movies); OK because Movie is subclass of Item
List<CD> cds = pricedItems(cds); OK because CD is subclass of Priced,

Control flow

how to decide what statements to execute.

if-statements

loops: while, for

Loops can be expressed as recursion or iteration.

Examples of the two in C

#include <stdio.h>
#include <iostream>
#include <string>
std::string reverse(std::string a);
std::string reverseIt(std::string a);
int main ()
{
	std::string s0 ("0123456789");
	std::cout << reverse(s0) << std::endl;
	std::cout << reverseIt(s0) << std::endl;
	return 0;
}
std::string reverse(std::string a)
{
	std::cout << a << std::endl;
	int length = a.length();
	if(length<=1)
	{
		return a;
	}
	std::string no (&a.at(length-1), 0, 1);  
	std::string yes (reverse(a.substr(1,length-2)));  
	std::string maybe (&a.at(0),0, 1);  
	std::string together = no + yes + maybe;
	return together;
}
std::string reverseIt(std::string a)
{
	int i = 0;
	for (i = 0;i<a.length()/2;i++)
	{
		char temp1 = a.at(a.length()-1-i);
		char temp2 = a.at(i);
		a.replace(i, 1, 1, temp1);
		a.replace(a.length()-1-i, 1, 1, temp2);
	}
	return a;
}

Examples of the two in Java

public class main{
	public static void main(String[] args) {
		String to = args[0];
		System.out.println(to);
		System.out.println(reverse(to));
		System.out.println(reverseIt(to));
	}
	public static String reverse(String a){
		System.out.println(a);
		int length = a.length();
		if(length<=1){
			return a;
		}
		return a.charAt(length-1)+
		reverse(a.substring(1,length-1))+
		a.charAt(0);
	}
	public static String reverseIt(String a){
		String newstring = ""; //String are immutable in Java. You can't change them. Not going to swap, instead create a new one. 
		for(int i =a.length()-1;i>=0;i--){
			newstring += a.charAt(i);
		}
		return newstring;
	}
}

Note: iteration uses a loop (in this case a for, though could use while , recursion uses if-statement to check base case. if not there is a call to its own method.

FizzBuzz

for i in range(1, 20):
    if (i%3==0) and (i%5==0):
        print "fizzbuzz"
    elif (i % 3) == 0:
        print "fizz"
    elif (i % 5) == 0:
        print "buzz"
    else:
        print "%d" % i

Data Types

All data in digital computers are represented as bits (0s and 1s) at the lowest level. Stick a few of these together and you can get specific data types. For example:

  • integers (whole numbers) which can be represented by any number of bits from 4 to 128 to n.
  • floating-point values (real numbers or in other words decimal digit numbers) can be made up from 16 to 128 bits, with some bits forming the significand and the other bits forming the exponent.
  • booleans (stores "true" or "false") can be represented by one bit.
  • characters (letters 'A' to 'Z' and 'a' to 'z' and digits '0' to '9' and punctuation marks '!' and control characters '\n'

Data Structures

One of the most important uses of computers is to store data. Stick a few data types together and you can get a data structure. Data structures are classes used to organize data and to provide operations upon them.

The most common data structure is the array.

An array is a basic data structure. They are the oldest and most important data structure. It is a collection of variables that can be identified based on its index. They are important because they mimic the way memory (RAM) works. RAM stores the most fundamental data type, the byte. "the smallest addressable unit of memory" - source in a linear order so that each byte is addressable by what is known as a 'memory address' visual. This is useful because this index can be calculated at runtime making it easy to do things like go through all the variables. Lets say you have a collection of test scores and you want to know the average mark. Instead of doing:

int score1 = 31
int score2 = 74
int score3 = 44
int score4 = 5
int score5 = 62
int score6 = 50
int score7 = 3
int score8 = 40
int score9 = 55
int score10 = 62

double average = (score1 + score2 + score3 + score4 + score5)/5;

you can do

int score[] = new int[10];
score[0] = 31;
score[1] = 74;
score[2] = 44;
score[3] = 5;
score[4] = 62;
score[5] = 50;
score[6] = 3;
score[7] = 40;
score[8] = 55;
score[9] = 62;

double average = 0;
for(int i =0; i<score.length; i++){
	average  += average + score[i];
} 
average /= score.length;
 // Iteration (going through everything and then printing them)
for(int i =0; i<score.length; i++){
	System.out.println(score[i]);
}

Arrays are a primitive data type (???) - it is one of the most basic data type there is (others include char, ints and floats). More complicated data types are built up from these basic ones. For example, a string is an array of chars. Actually, they mimic the way computers store data in memory. Memory locations and their addresses are sequential similarly to array contents and indexes.

The data structure used for a particular algorithm can greatly impact its performance.

Asymptotic Analysis

Data structures can provide operations. These operations can be measured based on its efficiency. Asymptotic Analysis examines how the efficiency of a data structure changes as the data structure's size approaches infinity. That is, for each new element stored by the data structure, how are the running times of the data structure's operations effected? The notation used for asymptotic analysis is big O notation. O(n) - n represents a linear proportion. This is the run time for searching an unsorted array. The number of steps to search an unsorted array grows linearly as the size of the array grows.

  • Asymptotic analysis is only concerned with sizes of data approaching infinity. In some cases it may be preferable to use an inefficient algorithm the algorithm will only work on small sets of data, or if the algorithm will be used infrequently. This is because there may be other factors in play. Compare merge sort vs bubble sort. Merge sort splits the array in half, recursively sorts each half and then merges them back together. Bubble sort has two nested for loops. Every adjacent pair of values is switched to be in the right place, hence eventually bubbling to the correct location. Asymptotically merge sort is much better (O(nlogn) compared to O(n^2)). For small sets however it may be preferable to use bubble sort. Splitting, recursive function calls and recombining arrays is more complicated than merely swapping array values.

Merge sort performs fewer steps, but each step is more computationally expensive that the steps involved in bubble sort. For sets of small sizes this may make a significant difference.

Arrays

Properties

  • Contents must be all of the same type (or derived type)
  • Contents are stored in contiguous memory (one after another with nothing in between).
  • Array elements can be accessed directly. Eg. arrayName[i] to access the ith element of the array arrayName.

Operations

  • Allocation
  • Accessing - O(1) constant time because of the contiguous property. It is easy to calculate the location of where a value should go.
  • Searching unsorted array - O(n) ok for small arrays. Should use other data structures of larger amounts of data.
  • Searching sorted array - can use binary search to give O(log n) time).

Abstract Data Types (ADTs)

You can abstract from this. specifiying a data structure in terms of what it can do (functionality) rather than how it works (specific representation). For example, a List data structure allows us to insert or delete elements in a way that is different to arrays, though the List itself can be implemented by an array.

Abstract data types are mathematical models for certain classes of data structures or types defined by the operations that may be performed on it.

Imagine ice cube makers. A user of an ice cube doesn't need to know about the evaporator, the condenser, the compressor and the throttle valve. These components form a black box which a user doesn't need to worry about. All a user needs to know is to attach this black box to a water and power supply. Then for day to day operation a user presses a button and expects ice to come out. An ice cube maker also has different states it is in, like if it is empty or full.

Similarly for abstract data types a user doesn't need to know the finer details of the data structure. All the user needs to know it what operations it can perform and how to perform these.

ADTs allow us to think at a higher level about algorithms. Algorithms work on ADTs.

All the following data structures are abstrations of arrays.

List

The problem with arrays is that they are fixed in size. To use an array you need to initialize it with a fixed sized to start with, and after its created it can't be resized. You can resize it manually however.

// Create an integer array with three elements
int [] fib = new int[3];
fib[0] = 1;
fib[1] = 1;
fib[2] = 2;
// Redimension message to a 10 element array
int [] temp = new int[10];
// Copy the fib array to temp
fib.CopyTo(temp, 0); 
// Assign temp to fib
fib = temp;

A better way would be to do this automatically. Create a new data structure that wraps around an array, provides read write access to the array and automatically resizes as needed.

The List class is like an array in that it is a homogeneous data structure. All elements of a List is of the same type (use generics to enforce this).

// Create a List of integers
List<int> myFavoriteIntegers = new List<int>();

// Create a list of strings
List<string> friendsNames = new List<string>();

The List class is an ADT takes a basic array and wraps around it a class that hides the implementation complexity. There is no need to set an explicit initial starting size. When adding items to the list there is no need to worry about resizing the internal array. List also implements basic array operations like Sort(), Find(), Contains(), iterate through all the elements.

The asymptotic running time of the List's operations are the same as those of the standard array's. While the List does indeed have more overhead, the relationship between the number of elements in the List and the cost per operation is the same as the standard array.

Each instance of a value is called an element of the list. While duplicate values can occur in a list, each element is considered a distinct thing identified by its index.

Operations on a list can include creating a new, blank list. After creating this list, you can start to add items to the start of the list, or end of the list. After adding some elements you may want to find what the first item is, or the last item, or if there are any items at all.

The List ADT can be implemented in two ways. ArrayList or LinkedList. The above C# List is an ArrayList. It is also possible to make a list by linking nodes together in a chain to form a list.

LinkedList

The abstract concept of a list can be implemented in (mainly) two ways. One way is a linked list. In a linked list, the elements are nodes. Each node is composed of a value that is stored, like an integer or a string, and reference to the next node. Having each node be responsible for the next node allows for efficient insertion and removal of elements from any position. The downside to this is that if you want to search through all the nodes, or to get to a specific node, or even if you just want to get to the end you must start at the beginning and then follow the references to get to where you want to be.

class Node {
    public int data;
    public Node nextLink;
    //Node constructor
    public Node(int d) {
	    data = d;
    }
    //Print Node data
    public void printLink() {
	    System.out.print("{" + data + "} ");
    }
}
class LinkList {
    private Node first;
    //LinkList constructor
    public LinkList() {
	    first = null;
    }
    //Returns true if list is empty
    public boolean isEmpty() {
	    return first == null;
    }
    //Inserts a new Node at the first of the list
    public void insert(int d) {
	    Node link = new Node(d);
	    link.nextLink = first;
	    first = link;
    }
    //Deletes the link at the first of the list
    public Node delete() {
	    Node temp = first;
	    first = first.nextLink;
	    return temp;
    }
    //Prints list data
    public void printList() {
	    Node currentLink = first;
	    System.out.print("List: ");
	    while(currentLink != null) {
		    currentLink.printLink();
		    currentLink = currentLink.nextLink;
	    }
	    System.out.println("");
    }
}
class LinkListTest {
    public static void main(String[] args) {
	    LinkList list = new LinkList();
	    list.insert(1);
	    list.insert(2);
	    list.insert(5);
	    list.insert(4);
	    list.insert(3);
	    list.printList();
	    while(!list.isEmpty()) {
			Node deletedLink = list.delete();
			System.out.print("deleted: ");
			deletedLink.printLink();
			System.out.println("");
			list.printList();
	    }
    }
}

ArrayList

Another way two implement a list is through an arraylist. An arraylist is similar to an array except that it is not fixed in size. A count is kept on the number of elements. An arraylist is implemented by an array and when this array is close to getting full a new one is created and the old data is copied over.

import java.util.ArrayList;
public class ArrayListTest {
	public static void main(String[] args) {
		ArrayList dictionary = new ArrayList();
		dictionary.add(new String("Hello"));
		dictionary.add(new String("World"));
		System.out.println(dictionary.get(0));
		System.out.println(dictionary.get(1));
	}
}

ArrayList vs LinkedList

Linked list has several advantages over Array list.

Insertion/Deletion: Insertion or deletion of an element in a linked list simply requires the reference of the current node to be rewritten. This is a constant time operation if you are already at the current node. Otherwise to start from the beginning it will take O(n) time to browse to a specific node.

Insertion or deletion of elements in an arraylist presents some interesting problems.

Insertion or deletion of an element in an array list will require shifting half of the elements on average, and all the elements in the worst case. This is the case when elements are inserted or deleted from the front or middle. This takes theta(n) time. Inserting at the end takes theta(1) time because a variable keeps track of the location of the last element (by keeping track of the size). However, when the underlying array is almost full, a new one will have to allocated and repopulated. This is an expensive operation that runs in theta(n) time. However, since this doesn't happen often (indeed the size is often doubled so chances of needing a new array gets less and less) this shouldn't afftect insertion at the end (which does not run in theta(n) time). So insertion at the end happens in theta(1) amortized time.

Indexing: Dynamic arrays (as well as a array data structure which forms their implementation) allow random access to any element in constant-time while in linked list getting to a random element requires going through sequential elements.

This makes linked lists unsuitable for applications where it's useful to look up an element by its index quickly, such as heapsort.

Linked list also requires a lot of memory to store both the value and the reference to the next value while array list requires just enough to store the values and a small fixed amount for overhead.

the lesson; dear reader:

  • Arrays have O(1) random access, but are really expensive to add stuff onto or remove stuff from.
  • Linked lists are really cheap to add or remove items anywhere and to iterate, but random access is O(n).

Hashtable

Lets step away from ADTs and go back to arrays. Arrays are useful mostly we can retrieve a value directly if we know its index. The ith element of an index can be read or write in constant time O(1).

But most of the time, the index of an element has very little to do with the contents of that element. That means to find a specific value in an array involves going through all the indexes to find the one you want O(n). If the elements are sorted then this time can be cut down to O(n).

If retrieving random values (as in retrieving values non sequentially) from an array is something that happens a lot in an application, then it is important to get this run time down to O(1). This can be using a specific value as the index to an array thus forming a relationship between the item and its position (index) in an array where previously there would be none.

Suppose that each student has a name and an id number. The id number can serve as key to retrieve the value which is the name. Since the id number is already a numerical value, it can be used as an index to an array of Strings.

public class Hashtable {
	public static void main(String[] args) {
		//students have id numbers and names
		//1111111 Adam
		//2222222 Bob
		//3333333 Chris
		//4444444 David
		String[] students = new String[10000000];
		students[1111111] = new String("Adam");
		students[2222222] = new String("Bob");
		students[3333333] = new String("Chris");
		students[4444444] = new String("David");
		System.out.println( students[1111111] );
		System.out.println( students[2222222] );
		System.out.println( students[3333333] );
		System.out.println( students[4444444] );	
	}
}

The above code gives the basic idea. Of course it is inefficient to use the id number in such a way (and if the key is not a number it is impossible to use it).

What is need is a function that can take in a key and compute a unique number based on it. The unique number is called a hash code and is used as an index to an array, and the function that gives this has code is called a hash function.

The benefits are huge. Hash tables are fast because the hash code can be used to access an array element directly. The complicated part of this is in the hash function. The most important thing is to not implement it yourself, rather rely on prewritten libraries to do hashtables.

Eg in C#

Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>(); employeeData.Add(455110189) = new Employee("Scott Mitchell"); if (employeeData.ContainsKey(123456789)) ...

in Java import java.util.HashMap HashMap<String, Boolean> blacklist = new HashMap<String, Boolean>; blacklist.put("word", Boolean.TRUE); if (blacklist.get("word") ... //retrieving boolean true value from a string


These libraries also handle hash collisions where the hash code given for two or more different keys are the same. Odds of collisions increase when the table is getting filled up (this is why hash tables are designed not be filled up over a certain percent - called the load factor) Typically only 70% of the array should be occupied (load factor typically set at 0.72). When a collision occurs then a resolution needs to happen. The goal of a resolution is to find a new way to store this item.

Closed addressing / open hashing ("chaining") ' Chaining another hashtable to the current hashtable. Worst case is that all values are mapped to same hashcode, meaning there is just one big hash table connected to the main one. O(n) to parse this hash table just like in unsorted array.

Open addressing / closed hashing ("probing") '

  • Linear chaining: has start value and a fixed interval value and these are cycled over and over to find the next free space. newLocation = (startingValue + stepSize) % arraySize
  • Quadratic chaining: has start value and a variable interval value and these are cycled over and over to find the next free space.
  • Double hashing: another hash function is used to give a new space.

May need to resize when using this. Cycling: this is why its important for size of hashtable and step size to be relatively prime, so that all slots can be stepped through and checked (typically just make hashtable size to make things easier). Lets say your hashtable size is 4 and step size is 2. Then you would only check check slots 0 and 2 for a free space which isn't that helpful compared with checking all slots.

Hash functions are designed to give a unique output based on an input. They are used to verify files are correctly transferred (say over the internet) and are also used in cryptography to securely store passwords. Passwords are not stored as they are when entered. Instead they are put through a hash function. Hash functions are designed to be irreversible so that there is no way to get the original file or password from just the hash. They are also designed to have an avalanche effect, so that even the smallest of changes to the input gives a dramatically different hash. This is good when a file downloaded over the internet is slightly corrupted.

Examples of hash functions include md5 and sha1. Both are used in cryptography and file integrity.

Techniques used in a hash function

  • division
    • h(k) = k % n as hash value with k as the key and n is a prime number size of the table.
  • folding
    • divide the integer key, k, into sections
    • add, subtract and/or multiply them, combining them into the hash code, e.g.
      • k = 013402122
      • h(k) = 013 + 402 + 122 = 537
      • can also multiply the section of numbers by prime numbers.
  • middle squaring.
    • k = 013402122
    • mid = 402
    • 402 ^ 2 = 161404
    • mid h(k) = 6140
  • truncation - delete part of the key
    • k = 013402122
    • h(k) = 122 (last 3 digits)

A good hash function would give a value array index for each key, it must be fast, and the hash codes it produces should be uniformly random. This will help avoid collisions (where two different inputs give the same output).

A perfect hash function would give a different hash code (index value) for every key. Of course we don't live in a perfect would, collisions will alway occur.

To resolve this, the array locations with collisions has a list of all possible values in a technique called chaining. In the worst case all the hash codes are the same meaning all the values are in the same array location and thus are all stored in a list. Because it is a list, it will take O(n) to go through it all.

Summary of Hash table

Average Worst case
Space O(n) O(n)
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)

ADTs: Queue, Stack, Priority Queues

Arrays are great because you can retrieve and set values at specific locations. List builds on to this by allowing for automatic resizing of the internal array when required. This makes it more flexible than having just an array.

These two data structures assume that existing values will read a lot more than new ones being inserted or old ones being removed. Geared for massive amounts of reads as opposed to writes or deletes. The problem with this is that inserting new ones always takes up more space because new space has to be allocated for the inserted values. And then that space is wasted empty space as soon as that value is removed.

How to reuse space? Turn the linear array into a circular one. No start or end. Just a variable to keep track of current position(s), and a method to grab the next one (using modulo).

 int increment(int variable) 
 {
    return (variable + 1) % theArray.Length; 
 }

Just like a list there is an internal array with an initial fixed size (that can later be resized by some growth factor - typically by doubling).

These ADTs are really basic. You can only add and remove items. Can't retrieve specific item by index like with array, but can still find out whether value exists or not. Most important difference between these is how they removed the items.

Queue, like queues in real life, operate in a first in first out (FIFO) basis. The most recently added item has to wait until all the other items are removed before it can be removed. Examples: web servers (first request gets processed first, then the next..) print queues, other programs that handle multiple incoming requests.

Stacks, like stacks of plates, operate in a last in first out (LIFO) basis - (last in first out). The most recently added item is the first one to be removed. Examples: plates at buffet restaurant call stack - keeps track of function invocations esp in recursions.

Priority Queues are similar to queues, except they remove the highest value first instead of the most recently added item.

Methods :

Queues - FIFO:

  • Enqueue() - check for capacity, add element to current "tail index". increments index using modulo, increase size if neccesarly.
  • Dequeue() - returns current element at "head index". nulls head index element. "increments" head index using modulo.
  • Peek() - same as dequeue, except no incrementing. Contains() - no random access, but can still check if item in queue.
  • ToArray() - can get array rep of current queue.

Stacks: - LIFO.

  • Push() - add item to stack
  • Pop() removes item at top of stack

Here is the basic idea of how to use them.

import java.util.LinkedList;
import java.util.Queue;
import java.util.Stack;
import java.util.PriorityQueue;
import java.util.Comparator;
public class HelloWorld {
	public static void main(String[] args) {
                Stack<Integer> lifo = new Stack<Integer>();
                for (int i = 1; i <= 10; i++)
                {
                        lifo.push ( new Integer(i) );
                }
                while ( !lifo.empty() )
                {
                        System.out.print ( lifo.pop() );
                        System.out.print ( ',' );
                }
                //10,9,8,7,6,5,4,3,2,1,
                System.out.println("");
                Queue<Integer> fifo = new LinkedList<Integer>(); //Queues are abstract classes, needs LinkedList to implement it
                for (int i = 1; i <= 10; i++)
                {
                	fifo.offer( new Integer(i) );
                }
                while ( !fifo.isEmpty() )
                {
                        System.out.print ( fifo.poll() );
                        System.out.print ( ',' );
                }
                //1,2,3,4,5,6,7,8,9,10,
                System.out.println("");
                PriorityQueue<Integer> pq1 = new PriorityQueue<Integer>();
                pq1.offer(new Integer(1));
                pq1.offer(new Integer(7));
                pq1.offer(new Integer(3));
                pq1.offer(new Integer(5));
                while ( !pq1.isEmpty() )
                {
                        System.out.print ( pq1.poll() );
                        System.out.print ( ',' );
                }
                //1,3,5,7,
                System.out.println("");
                PriorityQueue<Integer> pq2 = new PriorityQueue<Integer>(1, 
					new Comparator<Integer>(){
						public int compare(Integer a, Integer b){
							if (a>b) return -1;
							if (a<b) return 1;
							return 0;
						}
					});
				pq2.offer(new Integer(1));
                pq2.offer(new Integer(7));
                pq2.offer(new Integer(3));
                pq2.offer(new Integer(5));
                while ( !pq2.isEmpty() )
                {
                        System.out.print ( pq2.poll() );
                        System.out.print ( ',' );
                }
                //7,5,3,1,
                System.out.println("");
	}
}

Array based implementation of these ADTs in Java:

Stack: http://alaning.me:8080/root/Stack/blob/master/Stack.java

Queue: http://alaning.me:8080/root/Queue/blob/master/Queue.java

Implementation in C: http://alaning.me:8080/root/Stackinc

Trees and Graphs

See: Graphs

Skip list

Skip list is a mutation of linked list data structure and self-balanced binary tree running times.

What we have so far:

binary tree node

  • data
  • references left and right child

linked list node

  • data
  • references next node

binary tree data structure

  • contains reference to root node

linked list data structure

  • contains reference to head of list

linked list performance

  • searches / deletions - same as array (need iterate all elements one by one)
  • deletions then reassign nodes
  • inserting - unordered, can just add node, point to current head. linked list data structure will now point to this as new head
  • ordered, need to iterate to where it belongs (no direct access / indexing)

linked lists do not offer better search running times than arrays. but they do make adding and removing items easier. especially in sorted lists. simply change a few references. no need to resize an array, and shift items down.

use cases:

  • array list: know upper bound amount of items
  • linked list: no idea how many elements to be stored.
  • self-balancing bst - has to be sorted
  • skip list - has to be sorted

this sorted property allows look ups to be faster. because it makes traversals more predicatble. and you can skip some things to get to the intended destination.

this is used here in skip lists. like a linked list. but with an additional link to the node two steps ahead. (top link vs bottom link). nodes are still linked as usual, but you can get to every other one(? random) quicker.

In a skip list the head node does not contain any data. only pointers to other nodes.

Analysis

search, insertion, and deletion running times are asymptotically bounded by ln n (average case)

worst case is linear time (rare) - randomly chosen heights are all same height - would be just a normal linked list.

same asymptotic running time as a self-balanced BST. use skips lists over red-black trees because skip lists are more open to concurrent access/modifications. red-black trees when modified need to be rebalanced to a specific order, with skips its random (no specific heights). rebalance can affect large parts of the tree which requires mutual exclusion locks on those parts of the tree. inserting into skip list is far more localized, only nodes directly connected to affected node needs to be locked.

See more: https://en.wikipedia.org/wiki/Skip_list

External Links

Data Structure Visualizations - http://www.cs.usfca.edu/~galles/visualization/Algorithms.html

Summary of Data Structures

Structure: Linked list Array ArrayList Balanced tree Random access list
Indexing Θ(n) Θ(1) Θ(1) Θ(log n) Θ(log n)
Insert/delete at beginning Θ(1) N/A Θ(n) Θ(log n) Θ(1)
Insert/delete at end Θ(n) N/A Θ(1) amortized Θ(log n) Θ(log n) updating
Insert/delete in middle search time + Θ(1) N/A Θ(n) Θ(log n) Θ(log n) updating
Wasted space (average) Θ(n) 0 Θ(n) Θ(n) Θ(n)

Algorithms

An algorithm is a clearly stated, step-by-step method of solving a problem.

There are often many ways to solve the same problem. 

Example

Imagine you are trying to compress an audio file to make it smaller. 

There are many algorithms to achieve this depending on what you want. 

Lossy audio compression algorithms throw out data to reduce file size. This works because the human brain can't percieve all the sounds, especially when multiple sounds are being played at the same time. 

Lossless audio compression algorithms work by modelling future values of the digital signal in terms of previous values. So imagine your audio streams looked like this:                [ABCDEFABCDGHABCDEFABCDIJ] you could rewrite it like this: [^=ABCD][^EF^GH^EF^IJ] No information is permanently lost, but the file size is reduced and the original can be reconstructed. 

In both cases, file sizes are reduced because you eliminate redundant information. 

A program is a sequence of computer instructions implementing the algorithm.

In audio and video compression, these programs are call codecs. Lossless codecs include Free Lossless Audio Codec (FLAC) and Apple Lossless (ALAC), lossy codecs include MP3 and Windows Media Audio (wma).

Complexities (Big O)

Big-O notation is a relative representation of the complexity of an algorithm.

Sorting Algorithms

One of the most important uses of computers is to sort data. 

Sorting a list assumes that everything on that list can be compared with each other. Numbers can be compared with in size. Words can be compared alphabetically. These are examples of totally ordered sets. Most sorting algorithms are comparison based, they use an order relation to compare keys to say what goes where.

Selection Sort

Lets say you wanted to sort the list of scores above. One way to do that is to grab the smallest element in the list, and stick that in front. Then grab the next smallest element in the list and stick that right next to the first one, and so on and so forth. This forms the basis of Selection sort.

public class SelectionSort {
	public static void main(String[] args) {
		int score[] = new int[10];
		score[0] = 31;
		score[1] = 74;
		score[2] = 44;
		score[3] = 5;
		score[4] = 62;
		score[5] = 50;
		score[6] = 3;
		score[7] = 40;
		score[8] = 55;
		score[9] = 62;
		int min = 0; // assume that index of min element is 0
		// lets go through the array from left to right
		for (int i = 0; i < score.length; i++) {
			// lets go through the part of the array we haven't seen yet 
			// and pick out the smallest element we can find
			for (int j = i + 1; j < score.length; j++) {
				if (score[j] < score[min]) {
					min = j;
				}
			}
			// if the smallest element isn't the one we are on right now
			// swap them around
			if (min != i) {
				int temp = score[i];
				score[i] = score[min];
				score[min] = temp;
			}
			// the elements previous to this iteration
			// are all sorted correctly!
		}
		// show off the sorted list
		for(int i = 0; i<score.length; i++){
			System.out.println(score[i]);
		}
	}
}

Selection sort is "in place" - it does not take any additional space to perform the sorting. As you see in the code, elements are merely swapped and not copied to another cache or anything. Because swapping occurs, pairs of elements with the same key that appear in input in a certain order can come out of that order.

suppose: 3a = 3b 3b 1 3a 4 2 5 1 3b 3a 4 2 5 1 2 3a 4 3b 5 // not stable! 1 2 3a 3b 4 5

Selection sort makes the smallest possible number of swaps (writes) which makes sense in places where writes are expensive (eg flash memory which has a limited write lifespan). Selection sort is a very simple algorithm, it inefficient (especially on large lists) and generally performs worse than the similar insertion sort.

Insertion Sort

Insertion sort is another simple sorting algorithm that builds up a final sorted list one element at a time. Similarly to selection sort it is not as efficient on large lists, however it has some unique advantages.

For each element in the unsorted part of the input list, find where it should go, put it there and shifting the other elements up. This sorting algorithm is not unlike the method card players use to sort their hand. Take a card, and put it where it should be.

public class InsertionSort {
	public static void main(String[] args) {
		int score[] = new int[10];
		score[0] = 31;
		score[1] = 74;
		score[2] = 44;
		score[3] = 5;
		score[4] = 62;
		score[5] = 50;
		score[6] = 3;
		score[7] = 40;
		score[8] = 55;
		score[9] = 62;
		// lets go through the array from left to right
		// but skip the first one
		for (int i = 1; i < score.length; i++) {
			// lets remember where we are now
			// and what we are looking at
			int currentValue = score[i];
			int holePos = i;
			// we now look back and try to find where this current element should go
			// we shift the elements up one while doing this
			// thus leaving a hole
			while(holePos>0 && currentValue < score[holePos-1]){
				score[holePos] = score[holePos -1];
				holePos = holePos - 1;
			}
			// our current value goes into the hole we made 
			score[holePos] = currentValue;
			// the elements previous to this iteration
			// are all sorted, although there can elements 
			// that are missing
		}
		// show off the sorted list
		for(int i = 0; i<score.length; i++){
			System.out.println(score[i]);
		}
	}
}

Unlike selection sort it is stable as there are no swaps happening, only shifting happens. Insertion sort happens in-place with no cache needed.

Selection vs Insertion Sort

selection sort: array has sorted subarry and unsorted subarray insertion sort: array has sorted subarry and unsorted subarray selection sort: grab the max / min element from the unsorted sub (which takes some effort to find) insertion sort: grab the next element from the unsorted sub selection sort: and put it in the current place insertion sort: and put right place (which takes some effort to find) selection sort: swapping insertion sort: shifting

Analysis

Worst case performance: (assumes the list to be sorted in the exact opposite order)

Selection sort: O(n^2) - finding lowest element requires scanning all n elements, next lowest requires scanning (n-1) elements, etc..

Insertion sort: O(n^2) - every iteration of the inner loop will scan and shift the entire sorted subsection of the array before inserting the next element

Best case performance: (assumes list is already in correct order)

Selection sort: O(n^2) - finding lowest element still requires scanning all n elements, next lowest requires scanning (n-1) elements ... theta(n)

Insertion sort: O(n) - during each iteration the first element of unsorted subarray is only compared to the right most of the sorted array

It takes more effort to find the smallest / largest element than it takes to find the right place to put an element.

Additional notes

Because insertion sort performs its heavy lifting on the sorted part of the list (in finding the correct place to insert) it presents some unique properties. It can sort a list without seeing it entirely. This means it can sort a list as soon as it receives it. This makes it an online algorithm.

To Fix: definition of 'in-place' could be better. in-place: use only fixed addtional space, independent of n.

Merge Sort

Rather than iterate through the entire list in one way or another merge sort splits the list into two halves, a left half and a right half. It splits the list recursively until only pieces of size one remains. Then the pieces are merged back into the correct order.

public class MergeSort{
	public static void main(String[] args) {
		int score[] = new int[10];
		score[0] = 31;
		score[1] = 74;
		score[2] = 44;
		score[3] = 5;
		score[4] = 62;
		score[5] = 50;
		score[6] = 3;
		score[7] = 40;
		score[8] = 55;
		score[9] = 62;
		int[] output = mergeSort(score);
		for(int i : output){
			System.out.println(i);
		}
	}
	public static int[] mergeSort(int a[]){
		// if list size is 0 (empty) or 1, consider it sorted and return it
		// (using less than or equal prevents infinite recursion for a zero length m)		
		if(a.length<=1){
			return a;
		}
		// else list size is > 1, so split the list into two sublists
		int middle = a.length / 2;
		int[] left = new int[middle];
		int[] right = new int[a.length-middle];
		// populate left and right sublists
		for(int i = 0; i<middle;i++){
			left[i] = a[i];
		}
		int j = 0;
		for(int i = middle; i<a.length;i++){
			right[j] = a[i];
			j++;
		}
		left = mergeSort(left);
		right = mergeSort(right);
		return merge(left, right);
	}
	public static int[] merge(int[] left, int[] right){
		int[] result = new int[left.length+right.length];
		int righti = 0;
		int lefti = 0;
		int i=0;
		while(i<result.length){
			// if there are left elements and right elements remaining
			// then choose the smaller one
			// otherwise if there are only left or only right elements remaining
			// then choose the one that has elements
			if(lefti<left.length && righti<right.length && left[lefti]<=right[righti]){
				result[i] = left[lefti];
				lefti++;
			}else if(lefti<left.length && righti<right.length && left[lefti]>right[righti]){
				result[i] = right[righti];
				righti++;
			}else if(lefti>=left.length && righti<right.length){
				result[i] = right[righti];
				righti++;
			}else if(righti>=right.length && lefti<left.length){
				result[i] = left[lefti];
				lefti++;
			}
			i++;
		}
		return result;
	}
}

Quick Sort

In a similar way to merge sort, quick sort also divides the list into two smaller sublists. However, unlike merge sort which splits the list in half, quick sort splits the list based on a pivot element. A pivot element is an element in the list, either the first one or chosen at random, and the list is ordered so that elements less than the pivot come before the pivot, and elements greater than the pivot come after it. This is the parition operation and it works by swapping over elements from both sides. The partition operation is then applied to recursively to onto the smaller sublists.

public class QuickSort{
	public static void main(String a[]){
		int score[] = new int[10];
		score[0] = 31;
		score[1] = 74;
		score[2] = 44;
		score[3] = 5;
		score[4] = 62;
		score[5] = 50;
		score[6] = 3;
		score[7] = 40;
		score[8] = 55;
		score[9] = 63; //if 62 (dupe value) then infinte loop
		System.out.println("start");
		quickSort(score,0,score.length-1);
		for(int i : score){
			System.out.println(i);
		}
	}
	public static void quickSort(int array[],int left, int right){
		if(left<right)
		{
			// Pick an element, called a pivot, from the list.
			int pivot = partition(array, left, right);
			// Recursively apply the above steps to the sub-list of 
			// elements with smaller values and separately the sub-list of 
			// elements with greater values.
			quickSort(array, left, pivot);
			quickSort(array, pivot+1, right);
		}
	}
	/*
	 * This is the in-place partition algorithm. It partitions the portion of the array between indexes 
	 * left and right, inclusively, by moving all elements less than array[pivotIndex] before the pivot, 
	 * and the equal or greater elements after it. In the process it also finds the final position for 
	 * the pivot element, which it returns. It temporarily moves the pivot element to the end of the subarray, 
	 * so that it doesn't get in the way. Because it only uses exchanges, the final list has the same elements 
	 * as the original list. Notice that an element may be exchanged multiple times before reaching its final place. 
	 * Also, in case of pivot duplicates in the input array, they can be spread across the right subarray, in any order. 
	 * This doesn't represent a partitioning failure, as further sorting will reposition and finally "glue" them together.
	 */
	public static int partition(int[] list, int leftIndex, int rightIndex) {
		//random
		int pivotIndex = (int) ((Math.random()*(rightIndex-leftIndex +1)+leftIndex));
		int pivotValue = list[pivotIndex];
		int storeIndex = leftIndex;
		for(int p:list){
			if(p==pivotValue){
				System.out.print("["+p+"] ");
			}else{
				System.out.print(p+" ");
			}
		}
		System.out.println();
		swap(list, pivotIndex, rightIndex);
		for (int i = leftIndex; i < rightIndex; i++) {
			if (list[i] <= pivotValue) {
				swap(list, storeIndex, i);
				storeIndex++;
			}
		}
		swap(list, rightIndex, storeIndex);
		return storeIndex;
	}
	/**
	 * Swaps two elements in a list.
	 *
	 * @param list      The list.
	 * @param index1    The index of the first element to swap.
	 * @param index2    The index of the second element to swap.
	 */
	public static void swap(int[] list, int index1, int index2) {
		int temp = list[index1];
		list[index1] = list[index2];
		list[index2] = temp;
	}
}

Merge Sort vs Quick Sort

Both mergesort and quicksort are divide and conquer algorithms. They break the list into parts that are easier to solve. Mergesort splits the list equally in half. Quicksort splits the list based on a pivot element. Both recursively sorts their sublists.

Java Arrays . sort: different sort for data types. quicksort: primitives, merge sort for object types.

quick sort is not stable (sorted key can move around). in primitive types there is no identity, so no need to be stable. reference types matters so must use stable sort.

merge sort requires making clone of array, no matter for reference types where objects are much larger than array of references. but for primatives cloning the array doubles the memory usage.

Analysis

Mergesort

Worst case performance O(n log n)

Best case performance O(n log n)

Average case performance O(n log n)

Because these are recursive algorithms, they can be analysed using recurrence relations T(n) = 2T(n/2) + n // T(n) is running time of merge sort for a list of length n // 2 - 2 sub lists // T(n/2) - of half size // n - linear time to merge T(1)=1, n=2^k // recursion requires a base case, and let n = 2 ^ k T(n/2) = 2T(n/4) + n/2 // Start by defining T(n/2) T(n) = 2(2T(n/4) + n/2) + n // substitute back into original formula T(n) = 2^2T(n/(2^2)) + 2n T(n) = 2^3T(n/(2^3)) + 3n .. T(n) = 2^kT(n/(2^k)) + kn n = 2^k => k = log n T(n) = nT(1) + log n.n T (n) = n + n log n ∈ Θ(n log n)

Quicksort

Worst case performance O(n2) (extremely rare)

Best case performance O(n log n)

Average case performance O(n log n)

Worst case will split list into sized of 0 and n-1 T(n) = O(n) + T(0) + T(n-1) = O(n) + T(n-1) T(n) = O(n^2) same as in insertion and selection sort

Best case will split list into sizes n/2 T(n) = O(n) + 2T(n/2) Thus T(n) = O(n log n)


Master Theorem to recurrence relations: T (n) = aT (n/b) + n^c if (logb)a < c, T(n) belongs to Θ(n^c) if (logb)a = c, T(n) belongs to Θ(n^c log n) if (logb)a > c, T(n) belongs to Θ(n^(logb)a)

Summary of Sorting Algorithms

Method Worst Average Best Stable In-place
Insertion n^2 n^2 n y y
Selection n^2 n^2 n^2 n y
Shellsort  ??  ?? n n y
Mergesort nlogn nlogn nlogn y n
Quicksort n^2 nlogn nlogn n almost
Heapsort nlogn nlogn nlogn n y


The ideal sorting algorithm would have the following properties:

  • Stable: Equal keys aren't reordered.
  • Operates in place, requiring O(1) extra space.
  • Worst-case O(n·lg(n)) key comparisons.
  • Worst-case O(n) swaps.
  • Adaptive: Speeds up to O(n) when data is nearly sorted or when there are few unique keys.

There is no algorithm that has all of these properties, and so the choice of sorting algorithm depends on the application.

http://www.sorting-algorithms.com/


All sorting algorithms are at least O(n) because a sorting algorithm will always have to go through the entire set to ensure that it is all sorted.

Selection Algorithms

A selection algorithm is an algorithm for finding the kth smallest number in a list (such a number is called the kth order statistic).

QuickSelect