CS 216 Course Coverage Outline

This course at its core is an examination of abstraction and implementation. We will look especially at the environment where high level language programs operate. We will also look at different methods of implementing a Table (Dictionary) data structure. Numerous small related topics will be covered.

NOTE: I expect to complete the material between two lines in a single day.

Introduction to course

Complexity is controlled by abstraction

The Interface is the services provided by the abstraction
The Environment is the services available to the abstraction
The Representation is the way services are implemented for the abstraction

Abstractions are generally hierarchical and often layered

Example: quantum physics, classical physics, transistors, ..., user interface

Silicon: Atomic Weight=28, Density = 2.33 g/cm3, 12.1 cm3/mol
10 um feature size in 1971, 0.35 um feature size in 1995
How long until feature size breaks the quatum/classical physics abstraction?

Layers can be transparent or opaque (obscuring lower levels)

The selected Representation of an Abstraction determines performance
We will go down from the HLL abstraction to the Machine Language/OS abstraction

Code -> Machine language

IBCM - RISC-like, clean design, easy to the basics on
x86 - CISC, lots of complications

How "new" works
How function calls work
How OS calls work
How the I/O librarys work

We will go above the HLL abstraction into Data Structure and Algorithm abstraction

Multiple representations of a Table
Other commonly used structures (Heaps & Graphs)

Machine Level Basics

Data Formats

Bytes, Words, Double Words, Quad Words

Rational for finite sizes

Number formats

binary representation of unsigned integers

conversion of decimal or hexadecimal
addition, subtraction of unsigned integers

signed integer format

sign-magnitude
1's complement
bias

2's complement for assignment

IEEE Floating Pt. format basics

1 bit for sign
8 bits for exponent in 127 bias format (bits - 127 = exponent)
23 bits of mantissa (+1 hidden bit)
Special values for 0, + infinity, -infinity, and NaN
http://www.math.grin.edu/~stone/courses/fundamentals/IEEE-reals.html

Mention Fixed Point arithmetic in DSPs
ASSIGNMENT:

binary math
think on numbering scheme for 2's complement

Character formats

ASCII
EBCDIC
Unicode

Array Formats

Multi-dimensional Arrays

pointer vs. rectangular subscript calculations
Row-major layout A[row][col]

A[3][2] -> A[0][0], A[0][1], A[1][0], A[1][1], A[2][0], A[2][1]

K&R's "ANSI C" pgs. 110 to 114 (C uses ROW major)

Trade-offs of array bounds checking
Mention array alignments for speed (Byte Benchmark problem)

Struct Format
Class Format

2 structs are used for class members

one holds instance members
the other holds members common to all instances of the class

An offset is used to convert a pointer of a class to a pointer to the parent class
A pointer is followed to convert a pointer of a class to a pointer to a virtual parent class

ASSIGNMENT: Compare speed of traversing an array rows-cols vs. cols-rows,
ASSIGNMENT: Declare a class, find out locations of data items in an instance

Computer Layout (Mem, CPU, I/O, buses)

Memory is simple - address goes in one clock cycle, data comes out K cycles later

Caches provide faster access at cheap price

exploit temporal and spatial locality
require "snooping" to prevent errors

RAM = Random Access Memory

Volitile!
Main memory is Dynamic RAM (DRAM) 1 bit = 1 transistor + 1 capacitor
Cache memory is Static RAM (SRAM) 1 bit = 6 to 8 transistors

ROM = Read Only Memory

Non-Volitile
Good for initialization
PROM - Programmable ROM - can be changed

NOTE: Memory does not include typing - bits is bits!

CPU, to the outsider, merely appears to be doing read and write operations

Under "Memory Mapped I/O" address determines if its is I/O or to main memory

I/O has 3 basic ways of communicating

Being Polled - CPU reads registers on the I/O device
Interrupts - I/O device sends signal to CPU

requires individual wire in bus for each type of interrupt

Direct Memory Access (DMA) - I/O device becomes master of the bus

IBCM - Itty Bitty Computing Machine

IBCM basics
Fetch-Execute cycle
Operations and Op-codes
ASSIGNMENT: Write small program in IBCM Machine Code

High Level Language conversion to IBCM code
stack-based calculations of arithmetic expressions (is this possible with IBCM?)
ASSIGNMENT: Convert C expression to IBCM Machine Code

Memory Organization (code, consts, globals, stack, heap)

Code and consts often are write protected (permits sharing)
Globals - often placed at bottom of the stack
Stack

What is saved on the stack?

Parameters
Local variables
The Link - the address to return to
The size of the stack frame
Return values
Register values (in case of "callee saves")

Why a stack?

Recursion

Most processors have a "stack pointer", Sparc's directly manage the stack!
2 major classes of "Calling Conventions"

Callee saves

Subroutine call just branch-and-links to subroutine code
Subroutine code saves registers before use, replaces before returning

Caller saves

Caller saves important values before branch-and-linking to subroutine code
Subroutine code assumes anything in registers can be overwritten

Stack often "grows down" from the top of memory

Heap Management

Heap often "grows up" from the base of memory
Simple linked list implementation (First Fit)

First Fit with merging of contiguous free blocks

Best Fit
Worst Fit
Fragmentation: wasted memory

Internal fragmentation - the allocated memory is larger than the request
External fragmentation - memory between allocated blocks

especially bad when requests could have been fulfilled if memory was contiguous

ASSIGNMENT: Think up better implementation

Quick Fit

For small sizes, array holds pointers to blocks of that size
Larger size blocks are allocated separately

Allocate based on 2^N sized blocks

Array where A[N] is a pointer to a list of blocks of size 2^N
At most N steps to initially allocate a block of size 2^N, averages 2 steps
When a merge occurs, the block is moved into the neighboring list of larger blocks
Fast, but suffers from internal fragmentation

Testing Basics

NOTE: Testing only can tell if code is bad - it cannot prove the code is good!
Black Box Testing

Triangle Classification example
Use the Specifications and Your Experience to uncover problems
Can only test a finite number of cases - make them go far

Divide problem space into equivalence classes and make sure there is a test for each

Things to think on

Illegal inputs
Boundry conditions
Overlap of conditions
Order sensitivity
Look for missing or unspecified conditions in the Spec [HARD]
Think of multiple ways of implementation and each one's weaknesses

Test Plan and Testing Environment

Description of test data and rationale
Description of expected results for each test
Driver programs

Contains (or reads in) test data
Contains code to compare black box results to expected results

Purpose: Convince yourself and others that all cases are coverd and each test is valuable

Clear Box

Code coverage

Many problems show up the first time you run a piece of code.
Tools like "PureCoverage" are used
Some code is difficult/impossible to cover (OS error code)

Code inspections

Considered the #1 payoff of bugs found per time spent!

Tables (Dictionaries, Symbol Table, Associative Memory, Associative Arrays, Content-addressable Memory)

Description of the requirements

A table is a 1-to-1 or a 1-to-many mapping from one set to another

A phonebook can be seen as a table, mapping names to phone numbers

Can grow (not bounded in size)
Not dense (can be huge gaps between keys without wasting memory)

Declaration:

value& operator[](key index);
...

Example: word frequency counter
Issues

Other methods (delete, is_empty, member, ...)
Iterator

Simple Linear array implementation
Abstract performance measurements

Original idea - non-abstract performance measurement

Count exact numbers of each operation on a machine
Non-transferable - different architectures, compilers, primitive operations...
However, dominant terms existed....

Big O notation

Its an upper bound on the time or space required by a program
We say a function f = O(g) if there exist two constants C and N such that

f(n) <= C*g(n) for all n >= N

This only gives us a rough estimate for very large values of n!
Does not necessarily mean a tight upper bound!

Big Omega notation - same except for lower bounds.
Worst case measurements are the standard

Average case is sometimes stated, but how do you tell if your data is average?

Speed and Size measurement examples

Linked-list implementations

Each node in list stores (Key, Value) pair

Basic

Insert at front or back
Required to search entire list to show something is not present

Move to front

Whenever a Key is looked up, the containing node is moved to the front of the list
Front of the list acts as a cache - frequently used items remain near front

Provides fast access to commonly requested items

Ordered

Requires more time at insertion
Can determine node is not present with only searching a part of the list!

Twice as fast (on average) at determining a key is not present

Ordered linear array implementation (Binary Search)

Insertion is much longer O(n) vs. O(1)
Finding a key is much shorter O(log(n)) vs. O(n)
Optimum speed is determined by ratio of insertions to deletions
Recursive relations: f(n) =

constant if (n ==0 or n==1)
constant + f(n/2) otherwise

Binary Search Trees

Most of the beauty of binary search with a faster insertion time!
Gives up the worst case O(log(n)) time, but on achieves it on average.
Speed of data structure is sensitive to the order keys are inserted.
Recursive descent to find node

Run time analysis

Other considerations

Allocating dynamic memory takes time, usually independant of size allocated

Lots of small allocations = lots of time
Wulf states "this might be the biggest factor for small N"

Pointers + hidden sizes + internal fragmentation take up space.

Other things can be stored in binary trees

Arithmetic expressions
Any other tree

Traversals L = left subtree, R = right subtree, N = node itself

In order (LNR)
preorder (NLR)
postorder (LRN)

HashTables

Perfect Hashes

Requires all KEYs be known before hand
Hash function - converts a KEY to an integer
Modulus - a size for the array such that no collisions occur
bool is_member(KEY k) { return (array[ HashFunction(k) % Modulus ].key == k); }
Constant Time for searching! O(1)!

Dynamic Hash Tables

Can no longer guarantee that no collisions occur - Collision resolution strategies!

Hashing to buckets

Linked lists are used to store Key/Value pairs

Linear probing

If a spot is full in the array, put the Key/Value pair in the next free location
Leads to packed sections in the array
If deletion is supported, special Key is required to mark deletions

Double hashing

A variation on Linear Probing, but prevents packed sections of the array
Instead of inserting at Ideal + 1 spot, insert at Ideal + hash_function2(KEY) % Modulus

May need to resize array to guarantee average O(1) searches

Usually multiply size of array by factor (I like to go to next larger size of 2^N - 1)
Resize is usually done when used capacity is greater than 70%

Good Hash Functions

Fast
Spreads keys evenly (no matter what modulus is used)
Look hard at expected data!

length sensitivity
commonly occuring sequences

String s[0...lenght]

Sum s[0] to s[length]
Sum s[0]*k^0 + s[1]*k^1 + ... + s[length]*k^length
Any CRC (has nearly the same purpose as hash function!)
hashpjw() does multiply by 16, add current char, then xor top 4 bits with lowest 4 bits.

Balanced Binary Trees

2-3-4 Trees

Are NOT binary trees - each node can have upto 4 children
Are the smallest example of a B-trees
Each node has 0< N < 4 Keys and N+1 children
Tree is always perfectly balanced (in terms of nodes, not keys)
Insertion

If target node is a 2- or a 3-node, just insert
If target node is a 4-node, split into 2 2-nodes, and insert center key into parent node

then do insert into a 2-node

Can cause recursive nightmare if all ancestors are 4-nodes

Trick: split all 4 nodes on the way down!
Guarantees that parent is never a 4-node.

All transforms keep the tree perfectly balanced

Tree only grows in height if root is a 4-node that gets split.

Tree height is at most log(N), so all operations are O(log(N))!

B-Trees

can go from simple 2-3-4 Trees, to nodes that have thousands of keys
Used on disk drives where loading a node from the drive takes a while

Maps file numbers to data structures which hold the locations of pieces of a file.
File names are mapped to file numbers by directory files.

Red-Black Trees

Binary trees which function like 2-3-4 trees
Two different kinds of edges in tree:

"black" links connect 2-3-4 nodes to 2-3-4 nodes
"red" links bind binary nodes into 3-nodes and 4-nodes (none needed for a 2-node)

O(log(n)) search time

Since 2-3-4 tree is perfectly balanced, searches traverse log(n) black links
Paths contain at most as many red edges as black edges
So, log(n) + log(n) = O(log(n))

Insertion

Once again, we must break up 4-nodes as we go down
BUT - this may require rotations for the proper orientation of 4-nodes!
New node is inserted with a red link
AND - may also require a rotation if it makes a 4-node!

Rotation

Done when a 4-node is laid out linearly, rather than heirarchically
2 versions

Single rotation - middle key of 4-node is in the middle of the line
Double rotation - middle key of 4-node is at the bottom

Check out: http://www.cs.virginia.edu/~mdn4f/binary_tree/BinaryTree.html

Skip Lists

An improvement to linked lists to avoid O(n) search times
Actually composed of a collection of O(log(n)) linked lists organized as levels
Searching for k:

Start at top level, if k >= node's key and k is < next node's key, go down level.
when at bottom level, if k == node's key, return node, else return not found.

Inserting:

Imitate search, saving final node followed in each level
Insert node at the correct place in the bottom level
keep flipping coin: until it reads tails, keep inserting node at next higher level

Search time is based on randomness - but a randomness independent of keys! (unlike BST)

With a good random number generator, can almost make guarantees.

Simple to implement delete! (unlike Red-black trees)
Read Bill Wulf's notes: PS PDF
Source code for skil lists is available at http://www.cs.virginia.edu/~cs216/doc/skiplist.h

80x86 Assembly

Ferrari et al.'s Tiny Guide to 32 bit x86:

Organization of the 8086 processor

8 "general purpose"? registers
Machine Status Word - holds flag bits for jumps

x86 Assembly

How the assembler works

OP - DEST/SRC1 - SRC2 format of x86 assembly instructions

eg. ADD EAX, EBX - adds the contents of EAX and EBX and stores it in EAX.

Some Op-codes match to more than one Machine Language instruction e.g. MOV

Object Code - Machine Language which still requires offsets for external labels
How variable length instructions affect the fetch-execute cycle
Little Endianness

lowest address byte holds least significant bits
VAX is little endian
Nearly all other processors are big-endian (although many can switch)

Comparison with IBCM

Basic programs

.DATA section for globals

DUP can be used for declaring arrays
No Automatic Alignment!

.CODE section for code

Addressing modes - max 2 registers, multiply one by 1,2,3 or 4, plus a constant
PTR directive for ambiguous memory references
General instructions - ADD, SUB, XOR, etc.
CMP & JMP - how to implement loops and branches
CALL & RET - how subroutine calls work with the stack

PUBLIC directive to export a label
PROC directive tells which machine language form to convert CALL to.

Calling Convention

EAX, EBX, ECX, and EDX are caller saved
All other registers are callee saved
Caller pushes parameters onto the stack in Right to Left order
First thing, the callee saves the base ptr and copies the stack ptr to the base ptr.

base ptr is used to access parameters of the function

Interrupts

An interrupt halts what the processor is currently doing and has it start something else
Interrupt Vector

Each kind of interrupt is associated with a different number
The interrupt vector is an array of "interrupt function" pointers at a fixed location in memory
On an interrupt, the processor automatically jumps to the associated interrupt function

What are interrupts good for?

I/O - handling input/output events that happen on an irregular basis
Changing which process has the CPU
Operating System calls (software generated interrupts!)
Keeping track of time (Timers are used to cause an interrupt)
Trapping errors (divide by zero, overflow)
Trapping violations (illegal instructions, using memory you aren't allowed to use)
Failure warnings (UPS is now on...)
Now is the time to reboot (Vulcan neck pinch)
Debugging (stepping through a program, breakpoints)
Multiprocessor synchronization
Signaling the processor to start again (after entering a low power mode)
Makes julien fries

Masking Interrupts "turning off interrupts"

Done by clearing the I-bit
Prevents an interrupt of an interrupt
Also good for "Critical Sections" of multiprogrammed code

What happens on an interrupt

Push Machine Status Word (a.k.a. Flags register)
Clear I-big and T-bit
Push Program Counter (a.k.a. "IP" register)
load Program Counter with [Interrupt_Vector + 4*Interrupt_Number]
NOTE: STACK IS USED!!! ANYTHING BEYOND [SP] IS WRITTEN OVER!!!

INT - Software Interrupt

Takes a constant argument, the interrupt number
INT 3 fits in a single byte (good for breakpoints)

IRET - Interrupt Return

Does the reverse of an interrupt

How I/O uses interrupts

Interrupt service routine reads data from a keyboard

Definitions:

Interrupt - an event which causes the processor to stop what its doing and do something else
Interrupt Service Routine (ISR)- code to handle an interrupt which returns with an IRET instr.
Interrupt Number - a number indicating what kind of event triggered the interrupt
Interrupt Vector - an array of pointers to ISRs which is indexed by the IN
Exception, Trap = other names for interrupts
Clearing a flag - setting the associated bit to 0
Setting a flag - setting the associated bit to 1
Critical Section - a section of code which must be run by only 1 process at a time

Often in the OS
Actions must be seen to be atomic(indivisible) - all happening at once

How OS System Calls work

User code pushes arguments on the stack
User code puts value in register indicating which OS system call
User code triggers software interrupt
Processor looks up ISR in Interrupt Vector and jumps to it
OS code contains "switch()" statement to select OS system call
OS system call operation is performed
Optional delay (eg. disk read operation must wait until data is read from the disk)
OS system call places return code in register
OS executes IRET
User code checks return code of OS to confirm that everything went okay.

Data Structures

Heaps

For any subtree, the key at the root is greatest (least) of keys in the subtree
Root holds maximum for the entire tree
Insertion:

place new node in tree such that the tree is optimally balanced
N = new node;
while (N->parent->key > N->key && root != N) { swap(N, N->parent); }
upperbounded by height of tree = O(log(n))

Deletion:

remove root from the tree
move lowest leaf into the old place of root
N = root;
swap N with the node with the greatest key of N, N->left, N->right
repeat until no swaps are made
upperbounded by height of tree = O(log(n))

Fast heap construction:

slow version is inserting n keys = O(n*log(n))
fast version creates n/2 heaps of size 1
n/4 heaps of size 3
then n/8 heaps of size 7
etc.
n/2 * log(1) + n/4 * log(3) + n/8 * log(7) + n/16 * log(15) + ... = n - log(n) - 1 = O(n)

Building a heap in an array

Its possible to encode any binary tree in an array - it works well only if its balanced.
root is in location 1
left child of node in location x is in location 2*x
right child of node in location x is in location 2*x+1

Queues

FIFO = first in, first out
Can be implemented using the "circular buffer" technique.
Requires in_end = out_end = 0, buffer[size]
Insert "k":

if ( (in_end + 1) % size == out_end) then the buffer is full, report error (or expand the buffer)
buffer[in_end] = k;
in_end++;

Removal:

if (in_end == out_end) then buffer is empty, report error
temp = buffer[out_end];
out_end++;
return temp;

File buffers in libraries often use circular buffers

inserts or removals to the OS side are done in block transfers.
"flush" or "endl" force a block transfer before the buffer is full.

Graphs

Array based representation
Linked list of edges representation
Set representation

Huffman Encoding Trees (optional)

Sorting - Assume array A of size N

Bubble Sort

if A[i] > A[i+1] swap them
repeat until array is sorted
requires O(N²) compares and O(N²) swaps

Selection Sort

A[0] to A[i] is sorted and are the i+1 lowest elements in the array
A[i+1] to A[N-1] is not sorted.
To advance i

find the minimum value in the unsorted range
Stick it in A[i+1]
Now A[0] to A[i+1] is sorted.

Requires as few as O(N²) compares and O(N) swaps

Insertion Sort

A[0] to A[i] is sorted
A[i+1] to A[N-1] is not sorted
To advance i

Save the element in location A[i+1] into the variable "k"
Find the location "j" in A[0] to A[i] where "k" should go to keep the array sorted
Move elements A[j] to A[i] over one spot into the range A[j+1] to A[i+1]
Move "k" into location A[j]
Now A[0] to A[i+1] is sorted

Requires as few as O(N*log(N)) compares and O(N²) moves

Quick Sort

Quick Sort uses the "divide and conquer" strategy

Divide the problem into smaller pieces
Solve each of the pieces individually
Merge the solutions of the pieces into a solution for the whole
(Often uses recursion)

To sort the array:

Partition it

Pick a partition element "k"
find the first element on the left end greater than "k"
find the first element on the right end less than "k"
swap the locations of the two elements.
repeat until "k" is in the middle, every thing to the left is less than "k", and everything to the right is greater than "k"

Sort everything less than the partition element
Sort everything greater than the partition element
The array is now sorted!

f(N) = O(N) + 2*f(N/2)
f(N) = O(N*log(N))

Merge Sort

Also "divide and conquer"
To sort the array:

Divide the array in half
Sort the left half
Sort the right half
Merge the two halves together

have "p" point to the first element in the left half
have "q" point to the first element in the right half
if (p is past the end OR *p < *q ) select *p, p++
else select *q, q++
Repeat until both go past their ends.
Put elements into the array in the selected order
The array is now sorted.

f(N) = 2*f(N/2) + O(N)
f(N) = O(N*log(N))

Heap Sort

To sort the array:

Make a heap in place in the array (linear time if you use the bottom-up method)
One-by-one remove items from the heap and place them at the end of the array
The array is now sorted.

f(N) = O(N) + N*O(logN)
f(N) = O(N*log(N))

Labs

Linked Lists
IBCM Programming
Testing of SubSort Function
Linear Table
Binary Trees
Hash Table
Red-Black Trees
x86 Assembly
Priority Queues

Issues

An interesting item might be how a debugger works.
How buffering within "cin" and "cout" works.
Also vectors and double ended queues might be interesting.
We've looked at data abstraction, how about algorithm abstraction?
Project?