CS 216 Course Coverage Outline
This course at its core is an examination of abstraction and implementation.
We will look especially at the environment where high level language programs
operate. We will also look at different methods of implementing a
Table (Dictionary) data structure. Numerous small related topics
will be covered.
NOTE: I expect to complete the material between two lines in a single
day.

Introduction to course

Complexity is controlled by abstraction

The Interface is the services provided by the abstraction

The Environment is the services available to the abstraction

The Representation is the way services are implemented for the abstraction

Abstractions are generally hierarchical and often layered

Example: quantum physics, classical physics, transistors, ..., user interface

Silicon: Atomic Weight=28, Density = 2.33 g/cm3, 12.1 cm3/mol

10 um feature size in 1971, 0.35 um feature size in 1995

How long until feature size breaks the quatum/classical physics abstraction?

Layers can be transparent or opaque (obscuring lower levels)

The selected Representation of an Abstraction determines performance

We will go down from the HLL abstraction to the Machine Language/OS abstraction

Code > Machine language

IBCM  RISClike, clean design, easy to the basics on

x86  CISC, lots of complications

How "new" works

How function calls work

How OS calls work

How the I/O librarys work

We will go above the HLL abstraction into Data Structure and Algorithm
abstraction

Multiple representations of a Table

Other commonly used structures (Heaps & Graphs)

Machine Level Basics

Data Formats

Bytes, Words, Double Words, Quad Words

Rational for finite sizes

Number formats

binary representation of unsigned integers

conversion of decimal or hexadecimal

addition, subtraction of unsigned integers

signed integer format

signmagnitude

1's complement

bias

2's complement for assignment

IEEE Floating Pt. format basics

Mention Fixed Point arithmetic in DSPs

ASSIGNMENT:

binary math

think on numbering scheme for 2's complement

Character formats

Array Formats

Multidimensional Arrays

pointer vs. rectangular subscript calculations

Rowmajor layout A[row][col]

A[3][2] > A[0][0], A[0][1], A[1][0], A[1][1], A[2][0], A[2][1]

K&R's "ANSI C" pgs. 110 to 114 (C uses ROW major)

Tradeoffs of array bounds checking

Mention array alignments for speed (Byte Benchmark problem)

Struct Format

Class Format

2 structs are used for class members

one holds instance members

the other holds members common to all instances of the class

An offset is used to convert a pointer of a class to a pointer to the parent
class

A pointer is followed to convert a pointer of a class to a pointer to a
virtual parent class

ASSIGNMENT: Compare speed of traversing an array rowscols vs. colsrows,

ASSIGNMENT: Declare a class, find out locations of data items in an instance

Computer Layout (Mem, CPU, I/O, buses)

Memory is simple  address goes in one clock cycle, data comes out K cycles
later

Caches provide faster access at cheap price

exploit temporal and spatial locality

require "snooping" to prevent errors

RAM = Random Access Memory

Volitile!

Main memory is Dynamic RAM (DRAM) 1 bit = 1 transistor + 1 capacitor

Cache memory is Static RAM (SRAM) 1 bit = 6 to 8 transistors

ROM = Read Only Memory

NonVolitile

Good for initialization

PROM  Programmable ROM  can be changed

NOTE: Memory does not include typing  bits is bits!

CPU, to the outsider, merely appears to be doing read and write operations

Under "Memory Mapped I/O" address determines if its is I/O or to main memory

I/O has 3 basic ways of communicating

Being Polled  CPU reads registers on the I/O device

Interrupts  I/O device sends signal to CPU

requires individual wire in bus for each type of interrupt

Direct Memory Access (DMA)  I/O device becomes master of the bus

IBCM  Itty Bitty Computing Machine

IBCM basics

FetchExecute cycle

Operations and Opcodes

ASSIGNMENT: Write small program in IBCM Machine Code

High Level Language conversion to IBCM code

stackbased calculations of arithmetic expressions (is this possible with
IBCM?)

ASSIGNMENT: Convert C expression to IBCM Machine Code

Memory Organization (code, consts, globals, stack, heap)

Code and consts often are write protected (permits sharing)

Globals  often placed at bottom of the stack

Stack

What is saved on the stack?

Parameters

Local variables

The Link  the address to return to

The size of the stack frame

Return values

Register values (in case of "callee saves")

Why a stack?

Most processors have a "stack pointer", Sparc's directly manage the stack!

2 major classes of "Calling Conventions"

Callee saves

Subroutine call just branchandlinks to subroutine code

Subroutine code saves registers before use, replaces before returning

Caller saves

Caller saves important values before branchandlinking to subroutine code

Subroutine code assumes anything in registers can be overwritten

Stack often "grows down" from the top of memory

Heap Management

Heap often "grows up" from the base of memory

Simple linked list implementation (First Fit)

First Fit with merging of contiguous free blocks

Best Fit

Worst Fit

Fragmentation: wasted memory

Internal fragmentation  the allocated memory is larger than the request

External fragmentation  memory between allocated blocks

especially bad when requests could have been fulfilled if memory was contiguous

ASSIGNMENT: Think up better implementation

Quick Fit

For small sizes, array holds pointers to blocks of that size

Larger size blocks are allocated separately

Allocate based on 2^N sized blocks

Array where A[N] is a pointer to a list of blocks of size 2^N

At most N steps to initially allocate a block of size 2^N, averages 2 steps

When a merge occurs, the block is moved into the neighboring list of larger
blocks

Fast, but suffers from internal fragmentation

Testing Basics

NOTE: Testing only can tell if code is bad  it cannot prove the code is
good!

Black Box Testing

Triangle Classification example

Use the Specifications and Your Experience to uncover problems

Can only test a finite number of cases  make them go far

Divide problem space into equivalence classes and make sure there is a
test for each

Things to think on

Illegal inputs

Boundry conditions

Overlap of conditions

Order sensitivity

Look for missing or unspecified conditions in the Spec [HARD]

Think of multiple ways of implementation and each one's weaknesses

Test Plan and Testing Environment

Description of test data and rationale

Description of expected results for each test

Driver programs

Contains (or reads in) test data

Contains code to compare black box results to expected results

Purpose: Convince yourself and others that all cases are coverd and each
test is valuable

Clear Box

Code coverage

Many problems show up the first time you run a piece of code.

Tools like "PureCoverage" are used

Some code is difficult/impossible to cover (OS error code)

Code inspections

Considered the #1 payoff of bugs found per time spent!

Tables (Dictionaries, Symbol Table, Associative Memory, Associative Arrays,
Contentaddressable Memory)

Description of the requirements

A table is a 1to1 or a 1tomany mapping from one set to another

A phonebook can be seen as a table, mapping names to phone numbers

Can grow (not bounded in size)

Not dense (can be huge gaps between keys without wasting memory)

Declaration:

Example: word frequency counter

Issues

Other methods (delete, is_empty, member, ...)

Iterator

Simple Linear array implementation

Abstract performance measurements

Original idea  nonabstract performance measurement

Count exact numbers of each operation on a machine

Nontransferable  different architectures, compilers, primitive operations...

However, dominant terms existed....

Big O notation

Its an upper bound on the time or space required by a program

We say a function f = O(g) if there exist two constants C and N such that

f(n) <= C*g(n) for all n >= N

This only gives us a rough estimate for very large values of n!

Does not necessarily mean a tight upper bound!

Big Omega notation  same except for lower bounds.

Worst case measurements are the standard

Average case is sometimes stated, but how do you tell if your data is average?

Speed and Size measurement examples

Linkedlist implementations

Each node in list stores (Key, Value) pair

Basic

Insert at front or back

Required to search entire list to show something is not present

Move to front

Whenever a Key is looked up, the containing node is moved to the front
of the list

Front of the list acts as a cache  frequently used items remain near front

Provides fast access to commonly requested items

Ordered

Requires more time at insertion

Can determine node is not present with only searching a part of the list!

Twice as fast (on average) at determining a key is not present

Ordered linear array implementation (Binary Search)

Insertion is much longer O(n) vs. O(1)

Finding a key is much shorter O(log(n)) vs. O(n)

Optimum speed is determined by ratio of insertions to deletions

Recursive relations: f(n) =

constant
if (n ==0 or n==1)

constant + f(n/2) otherwise

Binary Search Trees

Most of the beauty of binary search with a faster insertion time!

Gives up the worst case O(log(n)) time, but on achieves it on average.

Speed of data structure is sensitive to the order keys are inserted.

Recursive descent to find node

Other considerations

Allocating dynamic memory takes time, usually independant of size allocated

Lots of small allocations = lots of time

Wulf states "this might be the biggest factor for small N"

Pointers + hidden sizes + internal fragmentation take up space.

Other things can be stored in binary trees

Arithmetic expressions

Any other tree

Traversals L = left subtree, R = right subtree, N = node itself

In order (LNR)

preorder (NLR)

postorder (LRN)

HashTables

Perfect Hashes

Requires all KEYs be known before hand

Hash function  converts a KEY to an integer

Modulus  a size for the array such that no collisions occur

bool is_member(KEY k) { return (array[ HashFunction(k) % Modulus
].key == k); }

Constant Time for searching! O(1)!

Dynamic Hash Tables

Can no longer guarantee that no collisions occur  Collision resolution
strategies!

Hashing to buckets

Linked lists are used to store Key/Value pairs

Linear probing

If a spot is full in the array, put the Key/Value pair in the next free
location

Leads to packed sections in the array

If deletion is supported, special Key is required to mark deletions

Double hashing

A variation on Linear Probing, but prevents packed sections of the array

Instead of inserting at Ideal + 1 spot, insert at Ideal + hash_function2(KEY)
% Modulus

May need to resize array to guarantee average O(1) searches

Usually multiply size of array by factor (I like to go to next larger size
of 2^N  1)

Resize is usually done when used capacity is greater than 70%

Good Hash Functions

Fast

Spreads keys evenly (no matter what modulus is used)

Look hard at expected data!

length sensitivity

commonly occuring sequences

String s[0...lenght]

Sum s[0] to s[length]

Sum s[0]*k^0 + s[1]*k^1 + ... + s[length]*k^length

Any CRC (has nearly the same purpose as hash function!)

hashpjw() does multiply by 16, add current char, then xor top 4 bits
with lowest 4 bits.

Balanced Binary Trees

234 Trees

Are NOT binary trees  each node can have upto 4 children

Are the smallest example of a Btrees

Each node has 0< N < 4 Keys and N+1 children

Tree is always perfectly balanced (in terms of nodes, not keys)

Insertion

If target node is a 2 or a 3node, just insert

If target node is a 4node, split into 2 2nodes, and insert center key
into parent node

then do insert into a 2node

Can cause recursive nightmare if all ancestors are 4nodes

Trick: split all 4 nodes on the way down!

Guarantees that parent is never a 4node.

All transforms keep the tree perfectly balanced

Tree only grows in height if root is a 4node that gets split.

Tree height is at most log(N), so all operations are O(log(N))!

BTrees

can go from simple 234 Trees, to nodes that have thousands of keys

Used on disk drives where loading a node from the drive takes a while

Maps file numbers to data structures which hold the locations of pieces
of a file.

File names are mapped to file numbers by directory files.

RedBlack Trees

Binary trees which function like 234 trees

Two different kinds of edges in tree:

"black" links connect 234 nodes to 234 nodes

"red" links bind binary nodes into 3nodes and 4nodes (none needed for
a 2node)

O(log(n)) search time

Since 234 tree is perfectly balanced, searches traverse log(n) black
links

Paths contain at most as many red edges as black edges

So, log(n) + log(n) = O(log(n))

Insertion

Once again, we must break up 4nodes as we go down

BUT  this may require rotations for the proper orientation of 4nodes!

New node is inserted with a red link

AND  may also require a rotation if it makes a 4node!

Rotation

Done when a 4node is laid out linearly, rather than heirarchically

2 versions

Single rotation  middle key of 4node is in the middle of the line

Double rotation  middle key of 4node is at the bottom

Check out: http://www.cs.virginia.edu/~mdn4f/binary_tree/BinaryTree.html

Skip Lists

An improvement to linked lists to avoid O(n) search times

Actually composed of a collection of O(log(n)) linked lists organized as
levels

Searching for k:

Start at top level, if k >= node's key and k is < next node's key, go
down level.

when at bottom level, if k == node's key, return node, else return not
found.

Inserting:

Imitate search, saving final node followed in each level

Insert node at the correct place in the bottom level

keep flipping coin: until it reads tails, keep inserting node at next higher
level

Search time is based on randomness  but a randomness independent of keys!
(unlike BST)

With a good random number generator, can almost make guarantees.

Simple to implement delete! (unlike Redblack trees)

Read Bill Wulf's notes: PS
PDF

Source code for skil lists is available at http://www.cs.virginia.edu/~cs216/doc/skiplist.h

80x86 Assembly

Ferrari et al.'s Tiny Guide to 32 bit x86:

Organization of the 8086 processor

8 "general purpose"? registers

Machine Status Word  holds flag bits for jumps

x86 Assembly

How the assembler works

OP  DEST/SRC1  SRC2 format of x86 assembly instructions

eg. ADD EAX, EBX  adds the contents of EAX and EBX and stores it
in EAX.

Some Opcodes match to more than one Machine Language instruction e.g.
MOV

Object Code  Machine Language which still requires offsets for external
labels

How variable length instructions affect the fetchexecute cycle

Little Endianness

lowest address byte holds least significant bits

VAX is little endian

Nearly all other processors are bigendian (although many can switch)

Comparison with IBCM

Basic programs

.DATA section for globals

DUP can be used for declaring arrays

No Automatic Alignment!

.CODE section for code

Addressing modes  max 2 registers, multiply one by 1,2,3 or 4, plus a
constant

PTR directive for ambiguous memory references

General instructions  ADD, SUB, XOR, etc.

CMP & JMP  how to implement loops and branches

CALL & RET  how subroutine calls work with the stack

PUBLIC directive to export a label

PROC directive tells which machine language form to convert CALL to.

Calling Convention

EAX, EBX, ECX, and EDX are caller saved

All other registers are callee saved

Caller pushes parameters onto the stack in Right to Left order

First thing, the callee saves the base ptr and copies the stack ptr to
the base ptr.

base ptr is used to access parameters of the function

Interrupts

An interrupt halts what the processor is currently doing and has it start
something else

Interrupt Vector

Each kind of interrupt is associated with a different number

The interrupt vector is an array of "interrupt function" pointers at a
fixed location in memory

On an interrupt, the processor automatically jumps to the associated interrupt
function

What are interrupts good for?

I/O  handling input/output events that happen on an irregular basis

Changing which process has the CPU

Operating System calls (software generated interrupts!)

Keeping track of time (Timers are used to cause an interrupt)

Trapping errors (divide by zero, overflow)

Trapping violations (illegal instructions, using memory you aren't allowed
to use)

Failure warnings (UPS is now on...)

Now is the time to reboot (Vulcan neck pinch)

Debugging (stepping through a program, breakpoints)

Multiprocessor synchronization

Signaling the processor to start again (after entering a low power mode)

Makes julien fries

Masking Interrupts "turning off interrupts"

Done by clearing the Ibit

Prevents an interrupt of an interrupt

Also good for "Critical Sections" of multiprogrammed code

What happens on an interrupt

Push Machine Status Word (a.k.a. Flags register)

Clear Ibig and Tbit

Push Program Counter (a.k.a. "IP" register)

load Program Counter with [Interrupt_Vector + 4*Interrupt_Number]

NOTE: STACK IS USED!!! ANYTHING BEYOND [SP] IS WRITTEN OVER!!!

INT  Software Interrupt

Takes a constant argument, the interrupt number

INT 3 fits in a single byte (good for breakpoints)

IRET  Interrupt Return

Does the reverse of an interrupt

How I/O uses interrupts

Interrupt service routine reads data from a keyboard

Definitions:

Interrupt  an event which causes the processor to stop what its doing
and do something else

Interrupt Service Routine (ISR) code to handle an interrupt which returns
with an IRET instr.

Interrupt Number  a number indicating what kind of event triggered the
interrupt

Interrupt Vector  an array of pointers to ISRs which is indexed by the
IN

Exception, Trap = other names for interrupts

Clearing a flag  setting the associated bit to 0

Setting a flag  setting the associated bit to 1

Critical Section  a section of code which must be run by only 1 process
at a time

Often in the OS

Actions must be seen to be atomic(indivisible)  all happening at once

How OS System Calls work

User code pushes arguments on the stack

User code puts value in register indicating which OS system call

User code triggers software interrupt

Processor looks up ISR in Interrupt Vector and jumps to it

OS code contains "switch()" statement to select OS system call

OS system call operation is performed

Optional delay (eg. disk read operation must wait until data is read from
the disk)

OS system call places return code in register

OS executes IRET

User code checks return code of OS to confirm that everything went okay.

Data Structures

Heaps

For any subtree, the key at the root is greatest (least) of keys in the
subtree

Root holds maximum for the entire tree

Insertion:

place new node in tree such that the tree is optimally balanced

N = new node;

while (N>parent>key > N>key && root != N) { swap(N, N>parent);
}

upperbounded by height of tree = O(log(n))

Deletion:

remove root from the tree

move lowest leaf into the old place of root

N = root;

swap N with the node with the greatest key of N, N>left, N>right

repeat until no swaps are made

upperbounded by height of tree = O(log(n))

Fast heap construction:

slow version is inserting n keys = O(n*log(n))

fast version creates n/2 heaps of size 1

n/4 heaps of size 3

then n/8 heaps of size 7

etc.

n/2 * log(1) + n/4 * log(3) + n/8 * log(7) + n/16 * log(15) + ... = n 
log(n)  1 = O(n)

Building a heap in an array

Its possible to encode any binary tree in an array  it works well only
if its balanced.

root is in location 1

left child of node in location x is in location 2*x

right child of node in location x is in location 2*x+1

Queues

FIFO = first in, first out

Can be implemented using the "circular buffer" technique.

Requires in_end = out_end = 0, buffer[size]

Insert "k":

if ( (in_end + 1) % size == out_end) then the buffer is full, report
error (or expand the buffer)

buffer[in_end] = k;

in_end++;

Removal:

if (in_end == out_end) then buffer is empty, report error

temp = buffer[out_end];

out_end++;

return temp;

File buffers in libraries often use circular buffers

inserts or removals to the OS side are done in block transfers.

"flush" or "endl" force a block transfer before the buffer is full.

Graphs

Array based representation

Linked list of edges representation

Set representation

Huffman Encoding Trees (optional)

Sorting  Assume array A of size N

Bubble Sort

if A[i] > A[i+1] swap them

repeat until array is sorted

requires O(N^{2}) compares and O(N^{2}) swaps

Selection Sort

A[0] to A[i] is sorted and are the i+1 lowest elements in the array

A[i+1] to A[N1] is not sorted.

To advance i

find the minimum value in the unsorted range

Stick it in A[i+1]

Now A[0] to A[i+1] is sorted.

Requires as few as O(N^{2}) compares and O(N) swaps

Insertion Sort

A[0] to A[i] is sorted

A[i+1] to A[N1] is not sorted

To advance i

Save the element in location A[i+1] into the variable "k"

Find the location "j" in A[0] to A[i] where "k" should go to keep the array
sorted

Move elements A[j] to A[i] over one spot into the range A[j+1] to A[i+1]

Move "k" into location A[j]

Now A[0] to A[i+1] is sorted

Requires as few as O(N*log(N)) compares and O(N^{2}) moves

Quick Sort

Quick Sort uses the "divide and conquer" strategy

Divide the problem into smaller pieces

Solve each of the pieces individually

Merge the solutions of the pieces into a solution for the whole

(Often uses recursion)

To sort the array:

Partition it

Pick a partition element "k"

find the first element on the left end greater than "k"

find the first element on the right end less than "k"

swap the locations of the two elements.

repeat until "k" is in the middle, every thing to the left is less than
"k", and everything to the right is greater than "k"

Sort everything less than the partition element

Sort everything greater than the partition element

The array is now sorted!

f(N) = O(N) + 2*f(N/2)

f(N) = O(N*log(N))

Merge Sort

Also "divide and conquer"

To sort the array:

Divide the array in half

Sort the left half

Sort the right half

Merge the two halves together

have "p" point to the first element in the left half

have "q" point to the first element in the right half

if (p is past the end OR *p < *q ) select *p, p++

else select *q, q++

Repeat until both go past their ends.

Put elements into the array in the selected order

The array is now sorted.

f(N) = 2*f(N/2) + O(N)

f(N) = O(N*log(N))

Heap Sort

To sort the array:

Make a heap in place in the array (linear time if you use the bottomup
method)

Onebyone remove items from the heap and place them at the end of the
array

The array is now sorted.

f(N) = O(N) + N*O(logN)

f(N) = O(N*log(N))
Labs

Linked Lists

IBCM Programming

Testing of SubSort Function

Linear Table

Binary Trees

Hash Table

RedBlack Trees

x86 Assembly

Priority Queues
Issues
An interesting item might be how a debugger works.
How buffering within "cin" and "cout" works.
Also vectors and double ended queues might be interesting.
We've looked at data abstraction, how about algorithm abstraction?
Project?