CS 216 Course Coverage Outline
This course at its core is an examination of abstraction and implementation.
We will look especially at the environment where high level language programs
operate. We will also look at different methods of implementing a
Table (Dictionary) data structure. Numerous small related topics
will be covered.
NOTE: I expect to complete the material between two lines in a single
day.
-
Introduction to course
-
Complexity is controlled by abstraction
-
The Interface is the services provided by the abstraction
-
The Environment is the services available to the abstraction
-
The Representation is the way services are implemented for the abstraction
-
Abstractions are generally hierarchical and often layered
-
Example: quantum physics, classical physics, transistors, ..., user interface
-
Silicon: Atomic Weight=28, Density = 2.33 g/cm3, 12.1 cm3/mol
-
10 um feature size in 1971, 0.35 um feature size in 1995
-
How long until feature size breaks the quatum/classical physics abstraction?
-
Layers can be transparent or opaque (obscuring lower levels)
-
The selected Representation of an Abstraction determines performance
-
We will go down from the HLL abstraction to the Machine Language/OS abstraction
-
Code -> Machine language
-
IBCM - RISC-like, clean design, easy to the basics on
-
x86 - CISC, lots of complications
-
How "new" works
-
How function calls work
-
How OS calls work
-
How the I/O librarys work
-
We will go above the HLL abstraction into Data Structure and Algorithm
abstraction
-
Multiple representations of a Table
-
Other commonly used structures (Heaps & Graphs)
-
Machine Level Basics
-
Data Formats
-
Bytes, Words, Double Words, Quad Words
-
Rational for finite sizes
-
Number formats
-
binary representation of unsigned integers
-
conversion of decimal or hexadecimal
-
addition, subtraction of unsigned integers
-
signed integer format
-
sign-magnitude
-
1's complement
-
bias
-
2's complement for assignment
-
IEEE Floating Pt. format basics
-
Mention Fixed Point arithmetic in DSPs
-
ASSIGNMENT:
-
binary math
-
think on numbering scheme for 2's complement
-
Character formats
-
Array Formats
-
Multi-dimensional Arrays
-
pointer vs. rectangular subscript calculations
-
Row-major layout A[row][col]
-
A[3][2] -> A[0][0], A[0][1], A[1][0], A[1][1], A[2][0], A[2][1]
-
K&R's "ANSI C" pgs. 110 to 114 (C uses ROW major)
-
Trade-offs of array bounds checking
-
Mention array alignments for speed (Byte Benchmark problem)
-
Struct Format
-
Class Format
-
2 structs are used for class members
-
one holds instance members
-
the other holds members common to all instances of the class
-
An offset is used to convert a pointer of a class to a pointer to the parent
class
-
A pointer is followed to convert a pointer of a class to a pointer to a
virtual parent class
-
ASSIGNMENT: Compare speed of traversing an array rows-cols vs. cols-rows,
-
ASSIGNMENT: Declare a class, find out locations of data items in an instance
-
Computer Layout (Mem, CPU, I/O, buses)
-
Memory is simple - address goes in one clock cycle, data comes out K cycles
later
-
Caches provide faster access at cheap price
-
exploit temporal and spatial locality
-
require "snooping" to prevent errors
-
RAM = Random Access Memory
-
Volitile!
-
Main memory is Dynamic RAM (DRAM) 1 bit = 1 transistor + 1 capacitor
-
Cache memory is Static RAM (SRAM) 1 bit = 6 to 8 transistors
-
ROM = Read Only Memory
-
Non-Volitile
-
Good for initialization
-
PROM - Programmable ROM - can be changed
-
NOTE: Memory does not include typing - bits is bits!
-
CPU, to the outsider, merely appears to be doing read and write operations
-
Under "Memory Mapped I/O" address determines if its is I/O or to main memory
-
I/O has 3 basic ways of communicating
-
Being Polled - CPU reads registers on the I/O device
-
Interrupts - I/O device sends signal to CPU
-
requires individual wire in bus for each type of interrupt
-
Direct Memory Access (DMA) - I/O device becomes master of the bus
-
IBCM - Itty Bitty Computing Machine
-
IBCM basics
-
Fetch-Execute cycle
-
Operations and Op-codes
-
ASSIGNMENT: Write small program in IBCM Machine Code
-
High Level Language conversion to IBCM code
-
stack-based calculations of arithmetic expressions (is this possible with
IBCM?)
-
ASSIGNMENT: Convert C expression to IBCM Machine Code
-
Memory Organization (code, consts, globals, stack, heap)
-
Code and consts often are write protected (permits sharing)
-
Globals - often placed at bottom of the stack
-
Stack
-
What is saved on the stack?
-
Parameters
-
Local variables
-
The Link - the address to return to
-
The size of the stack frame
-
Return values
-
Register values (in case of "callee saves")
-
Why a stack?
-
Most processors have a "stack pointer", Sparc's directly manage the stack!
-
2 major classes of "Calling Conventions"
-
Callee saves
-
Subroutine call just branch-and-links to subroutine code
-
Subroutine code saves registers before use, replaces before returning
-
Caller saves
-
Caller saves important values before branch-and-linking to subroutine code
-
Subroutine code assumes anything in registers can be overwritten
-
Stack often "grows down" from the top of memory
-
Heap Management
-
Heap often "grows up" from the base of memory
-
Simple linked list implementation (First Fit)
-
First Fit with merging of contiguous free blocks
-
Best Fit
-
Worst Fit
-
Fragmentation: wasted memory
-
Internal fragmentation - the allocated memory is larger than the request
-
External fragmentation - memory between allocated blocks
-
especially bad when requests could have been fulfilled if memory was contiguous
-
ASSIGNMENT: Think up better implementation
-
Quick Fit
-
For small sizes, array holds pointers to blocks of that size
-
Larger size blocks are allocated separately
-
Allocate based on 2^N sized blocks
-
Array where A[N] is a pointer to a list of blocks of size 2^N
-
At most N steps to initially allocate a block of size 2^N, averages 2 steps
-
When a merge occurs, the block is moved into the neighboring list of larger
blocks
-
Fast, but suffers from internal fragmentation
-
Testing Basics
-
NOTE: Testing only can tell if code is bad - it cannot prove the code is
good!
-
Black Box Testing
-
Triangle Classification example
-
Use the Specifications and Your Experience to uncover problems
-
Can only test a finite number of cases - make them go far
-
Divide problem space into equivalence classes and make sure there is a
test for each
-
Things to think on
-
Illegal inputs
-
Boundry conditions
-
Overlap of conditions
-
Order sensitivity
-
Look for missing or unspecified conditions in the Spec [HARD]
-
Think of multiple ways of implementation and each one's weaknesses
-
Test Plan and Testing Environment
-
Description of test data and rationale
-
Description of expected results for each test
-
Driver programs
-
Contains (or reads in) test data
-
Contains code to compare black box results to expected results
-
Purpose: Convince yourself and others that all cases are coverd and each
test is valuable
-
Clear Box
-
Code coverage
-
Many problems show up the first time you run a piece of code.
-
Tools like "PureCoverage" are used
-
Some code is difficult/impossible to cover (OS error code)
-
Code inspections
-
Considered the #1 payoff of bugs found per time spent!
-
Tables (Dictionaries, Symbol Table, Associative Memory, Associative Arrays,
Content-addressable Memory)
-
Description of the requirements
-
A table is a 1-to-1 or a 1-to-many mapping from one set to another
-
A phonebook can be seen as a table, mapping names to phone numbers
-
Can grow (not bounded in size)
-
Not dense (can be huge gaps between keys without wasting memory)
-
Declaration:
-
Example: word frequency counter
-
Issues
-
Other methods (delete, is_empty, member, ...)
-
Iterator
-
Simple Linear array implementation
-
Abstract performance measurements
-
Original idea - non-abstract performance measurement
-
Count exact numbers of each operation on a machine
-
Non-transferable - different architectures, compilers, primitive operations...
-
However, dominant terms existed....
-
Big O notation
-
Its an upper bound on the time or space required by a program
-
We say a function f = O(g) if there exist two constants C and N such that
-
f(n) <= C*g(n) for all n >= N
-
This only gives us a rough estimate for very large values of n!
-
Does not necessarily mean a tight upper bound!
-
Big Omega notation - same except for lower bounds.
-
Worst case measurements are the standard
-
Average case is sometimes stated, but how do you tell if your data is average?
-
Speed and Size measurement examples
-
Linked-list implementations
-
Each node in list stores (Key, Value) pair
-
Basic
-
Insert at front or back
-
Required to search entire list to show something is not present
-
Move to front
-
Whenever a Key is looked up, the containing node is moved to the front
of the list
-
Front of the list acts as a cache - frequently used items remain near front
-
Provides fast access to commonly requested items
-
Ordered
-
Requires more time at insertion
-
Can determine node is not present with only searching a part of the list!
-
Twice as fast (on average) at determining a key is not present
-
Ordered linear array implementation (Binary Search)
-
Insertion is much longer O(n) vs. O(1)
-
Finding a key is much shorter O(log(n)) vs. O(n)
-
Optimum speed is determined by ratio of insertions to deletions
-
Recursive relations: f(n) =
-
constant
if (n ==0 or n==1)
-
constant + f(n/2) otherwise
-
Binary Search Trees
-
Most of the beauty of binary search with a faster insertion time!
-
Gives up the worst case O(log(n)) time, but on achieves it on average.
-
Speed of data structure is sensitive to the order keys are inserted.
-
Recursive descent to find node
-
Other considerations
-
Allocating dynamic memory takes time, usually independant of size allocated
-
Lots of small allocations = lots of time
-
Wulf states "this might be the biggest factor for small N"
-
Pointers + hidden sizes + internal fragmentation take up space.
-
Other things can be stored in binary trees
-
Arithmetic expressions
-
Any other tree
-
Traversals L = left subtree, R = right subtree, N = node itself
-
In order (LNR)
-
preorder (NLR)
-
postorder (LRN)
-
HashTables
-
Perfect Hashes
-
Requires all KEYs be known before hand
-
Hash function - converts a KEY to an integer
-
Modulus - a size for the array such that no collisions occur
-
bool is_member(KEY k) { return (array[ HashFunction(k) % Modulus
].key == k); }
-
Constant Time for searching! O(1)!
-
Dynamic Hash Tables
-
Can no longer guarantee that no collisions occur - Collision resolution
strategies!
-
Hashing to buckets
-
Linked lists are used to store Key/Value pairs
-
Linear probing
-
If a spot is full in the array, put the Key/Value pair in the next free
location
-
Leads to packed sections in the array
-
If deletion is supported, special Key is required to mark deletions
-
Double hashing
-
A variation on Linear Probing, but prevents packed sections of the array
-
Instead of inserting at Ideal + 1 spot, insert at Ideal + hash_function2(KEY)
% Modulus
-
May need to resize array to guarantee average O(1) searches
-
Usually multiply size of array by factor (I like to go to next larger size
of 2^N - 1)
-
Resize is usually done when used capacity is greater than 70%
-
Good Hash Functions
-
Fast
-
Spreads keys evenly (no matter what modulus is used)
-
Look hard at expected data!
-
length sensitivity
-
commonly occuring sequences
-
String s[0...lenght]
-
Sum s[0] to s[length]
-
Sum s[0]*k^0 + s[1]*k^1 + ... + s[length]*k^length
-
Any CRC (has nearly the same purpose as hash function!)
-
hashpjw() does multiply by 16, add current char, then xor top 4 bits
with lowest 4 bits.
-
Balanced Binary Trees
-
2-3-4 Trees
-
Are NOT binary trees - each node can have upto 4 children
-
Are the smallest example of a B-trees
-
Each node has 0< N < 4 Keys and N+1 children
-
Tree is always perfectly balanced (in terms of nodes, not keys)
-
Insertion
-
If target node is a 2- or a 3-node, just insert
-
If target node is a 4-node, split into 2 2-nodes, and insert center key
into parent node
-
then do insert into a 2-node
-
Can cause recursive nightmare if all ancestors are 4-nodes
-
Trick: split all 4 nodes on the way down!
-
Guarantees that parent is never a 4-node.
-
All transforms keep the tree perfectly balanced
-
Tree only grows in height if root is a 4-node that gets split.
-
Tree height is at most log(N), so all operations are O(log(N))!
-
B-Trees
-
can go from simple 2-3-4 Trees, to nodes that have thousands of keys
-
Used on disk drives where loading a node from the drive takes a while
-
Maps file numbers to data structures which hold the locations of pieces
of a file.
-
File names are mapped to file numbers by directory files.
-
Red-Black Trees
-
Binary trees which function like 2-3-4 trees
-
Two different kinds of edges in tree:
-
"black" links connect 2-3-4 nodes to 2-3-4 nodes
-
"red" links bind binary nodes into 3-nodes and 4-nodes (none needed for
a 2-node)
-
O(log(n)) search time
-
Since 2-3-4 tree is perfectly balanced, searches traverse log(n) black
links
-
Paths contain at most as many red edges as black edges
-
So, log(n) + log(n) = O(log(n))
-
Insertion
-
Once again, we must break up 4-nodes as we go down
-
BUT - this may require rotations for the proper orientation of 4-nodes!
-
New node is inserted with a red link
-
AND - may also require a rotation if it makes a 4-node!
-
Rotation
-
Done when a 4-node is laid out linearly, rather than heirarchically
-
2 versions
-
Single rotation - middle key of 4-node is in the middle of the line
-
Double rotation - middle key of 4-node is at the bottom
-
Check out: http://www.cs.virginia.edu/~mdn4f/binary_tree/BinaryTree.html
-
Skip Lists
-
An improvement to linked lists to avoid O(n) search times
-
Actually composed of a collection of O(log(n)) linked lists organized as
levels
-
Searching for k:
-
Start at top level, if k >= node's key and k is < next node's key, go
down level.
-
when at bottom level, if k == node's key, return node, else return not
found.
-
Inserting:
-
Imitate search, saving final node followed in each level
-
Insert node at the correct place in the bottom level
-
keep flipping coin: until it reads tails, keep inserting node at next higher
level
-
Search time is based on randomness - but a randomness independent of keys!
(unlike BST)
-
With a good random number generator, can almost make guarantees.
-
Simple to implement delete! (unlike Red-black trees)
-
Read Bill Wulf's notes: PS
PDF
-
Source code for skil lists is available at http://www.cs.virginia.edu/~cs216/doc/skiplist.h
-
80x86 Assembly
-
Ferrari et al.'s Tiny Guide to 32 bit x86:
-
Organization of the 8086 processor
-
8 "general purpose"? registers
-
Machine Status Word - holds flag bits for jumps
-
x86 Assembly
-
How the assembler works
-
OP - DEST/SRC1 - SRC2 format of x86 assembly instructions
-
eg. ADD EAX, EBX - adds the contents of EAX and EBX and stores it
in EAX.
-
Some Op-codes match to more than one Machine Language instruction e.g.
MOV
-
Object Code - Machine Language which still requires offsets for external
labels
-
How variable length instructions affect the fetch-execute cycle
-
Little Endianness
-
lowest address byte holds least significant bits
-
VAX is little endian
-
Nearly all other processors are big-endian (although many can switch)
-
Comparison with IBCM
-
Basic programs
-
.DATA section for globals
-
DUP can be used for declaring arrays
-
No Automatic Alignment!
-
.CODE section for code
-
Addressing modes - max 2 registers, multiply one by 1,2,3 or 4, plus a
constant
-
PTR directive for ambiguous memory references
-
General instructions - ADD, SUB, XOR, etc.
-
CMP & JMP - how to implement loops and branches
-
CALL & RET - how subroutine calls work with the stack
-
PUBLIC directive to export a label
-
PROC directive tells which machine language form to convert CALL to.
-
Calling Convention
-
EAX, EBX, ECX, and EDX are caller saved
-
All other registers are callee saved
-
Caller pushes parameters onto the stack in Right to Left order
-
First thing, the callee saves the base ptr and copies the stack ptr to
the base ptr.
-
base ptr is used to access parameters of the function
-
Interrupts
-
An interrupt halts what the processor is currently doing and has it start
something else
-
Interrupt Vector
-
Each kind of interrupt is associated with a different number
-
The interrupt vector is an array of "interrupt function" pointers at a
fixed location in memory
-
On an interrupt, the processor automatically jumps to the associated interrupt
function
-
What are interrupts good for?
-
I/O - handling input/output events that happen on an irregular basis
-
Changing which process has the CPU
-
Operating System calls (software generated interrupts!)
-
Keeping track of time (Timers are used to cause an interrupt)
-
Trapping errors (divide by zero, overflow)
-
Trapping violations (illegal instructions, using memory you aren't allowed
to use)
-
Failure warnings (UPS is now on...)
-
Now is the time to reboot (Vulcan neck pinch)
-
Debugging (stepping through a program, breakpoints)
-
Multiprocessor synchronization
-
Signaling the processor to start again (after entering a low power mode)
-
Makes julien fries
-
Masking Interrupts "turning off interrupts"
-
Done by clearing the I-bit
-
Prevents an interrupt of an interrupt
-
Also good for "Critical Sections" of multiprogrammed code
-
What happens on an interrupt
-
Push Machine Status Word (a.k.a. Flags register)
-
Clear I-big and T-bit
-
Push Program Counter (a.k.a. "IP" register)
-
load Program Counter with [Interrupt_Vector + 4*Interrupt_Number]
-
NOTE: STACK IS USED!!! ANYTHING BEYOND [SP] IS WRITTEN OVER!!!
-
INT - Software Interrupt
-
Takes a constant argument, the interrupt number
-
INT 3 fits in a single byte (good for breakpoints)
-
IRET - Interrupt Return
-
Does the reverse of an interrupt
-
How I/O uses interrupts
-
Interrupt service routine reads data from a keyboard
-
Definitions:
-
Interrupt - an event which causes the processor to stop what its doing
and do something else
-
Interrupt Service Routine (ISR)- code to handle an interrupt which returns
with an IRET instr.
-
Interrupt Number - a number indicating what kind of event triggered the
interrupt
-
Interrupt Vector - an array of pointers to ISRs which is indexed by the
IN
-
Exception, Trap = other names for interrupts
-
Clearing a flag - setting the associated bit to 0
-
Setting a flag - setting the associated bit to 1
-
Critical Section - a section of code which must be run by only 1 process
at a time
-
Often in the OS
-
Actions must be seen to be atomic(indivisible) - all happening at once
-
How OS System Calls work
-
User code pushes arguments on the stack
-
User code puts value in register indicating which OS system call
-
User code triggers software interrupt
-
Processor looks up ISR in Interrupt Vector and jumps to it
-
OS code contains "switch()" statement to select OS system call
-
OS system call operation is performed
-
Optional delay (eg. disk read operation must wait until data is read from
the disk)
-
OS system call places return code in register
-
OS executes IRET
-
User code checks return code of OS to confirm that everything went okay.
-
Data Structures
-
Heaps
-
For any subtree, the key at the root is greatest (least) of keys in the
subtree
-
Root holds maximum for the entire tree
-
Insertion:
-
place new node in tree such that the tree is optimally balanced
-
N = new node;
-
while (N->parent->key > N->key && root != N) { swap(N, N->parent);
}
-
upperbounded by height of tree = O(log(n))
-
Deletion:
-
remove root from the tree
-
move lowest leaf into the old place of root
-
N = root;
-
swap N with the node with the greatest key of N, N->left, N->right
-
repeat until no swaps are made
-
upperbounded by height of tree = O(log(n))
-
Fast heap construction:
-
slow version is inserting n keys = O(n*log(n))
-
fast version creates n/2 heaps of size 1
-
n/4 heaps of size 3
-
then n/8 heaps of size 7
-
etc.
-
n/2 * log(1) + n/4 * log(3) + n/8 * log(7) + n/16 * log(15) + ... = n -
log(n) - 1 = O(n)
-
Building a heap in an array
-
Its possible to encode any binary tree in an array - it works well only
if its balanced.
-
root is in location 1
-
left child of node in location x is in location 2*x
-
right child of node in location x is in location 2*x+1
-
Queues
-
FIFO = first in, first out
-
Can be implemented using the "circular buffer" technique.
-
Requires in_end = out_end = 0, buffer[size]
-
Insert "k":
-
if ( (in_end + 1) % size == out_end) then the buffer is full, report
error (or expand the buffer)
-
buffer[in_end] = k;
-
in_end++;
-
Removal:
-
if (in_end == out_end) then buffer is empty, report error
-
temp = buffer[out_end];
-
out_end++;
-
return temp;
-
File buffers in libraries often use circular buffers
-
inserts or removals to the OS side are done in block transfers.
-
"flush" or "endl" force a block transfer before the buffer is full.
-
Graphs
-
Array based representation
-
Linked list of edges representation
-
Set representation
-
Huffman Encoding Trees (optional)
-
Sorting - Assume array A of size N
-
Bubble Sort
-
if A[i] > A[i+1] swap them
-
repeat until array is sorted
-
requires O(N2) compares and O(N2) swaps
-
Selection Sort
-
A[0] to A[i] is sorted and are the i+1 lowest elements in the array
-
A[i+1] to A[N-1] is not sorted.
-
To advance i
-
find the minimum value in the unsorted range
-
Stick it in A[i+1]
-
Now A[0] to A[i+1] is sorted.
-
Requires as few as O(N2) compares and O(N) swaps
-
Insertion Sort
-
A[0] to A[i] is sorted
-
A[i+1] to A[N-1] is not sorted
-
To advance i
-
Save the element in location A[i+1] into the variable "k"
-
Find the location "j" in A[0] to A[i] where "k" should go to keep the array
sorted
-
Move elements A[j] to A[i] over one spot into the range A[j+1] to A[i+1]
-
Move "k" into location A[j]
-
Now A[0] to A[i+1] is sorted
-
Requires as few as O(N*log(N)) compares and O(N2) moves
-
Quick Sort
-
Quick Sort uses the "divide and conquer" strategy
-
Divide the problem into smaller pieces
-
Solve each of the pieces individually
-
Merge the solutions of the pieces into a solution for the whole
-
(Often uses recursion)
-
To sort the array:
-
Partition it
-
Pick a partition element "k"
-
find the first element on the left end greater than "k"
-
find the first element on the right end less than "k"
-
swap the locations of the two elements.
-
repeat until "k" is in the middle, every thing to the left is less than
"k", and everything to the right is greater than "k"
-
Sort everything less than the partition element
-
Sort everything greater than the partition element
-
The array is now sorted!
-
f(N) = O(N) + 2*f(N/2)
-
f(N) = O(N*log(N))
-
Merge Sort
-
Also "divide and conquer"
-
To sort the array:
-
Divide the array in half
-
Sort the left half
-
Sort the right half
-
Merge the two halves together
-
have "p" point to the first element in the left half
-
have "q" point to the first element in the right half
-
if (p is past the end OR *p < *q ) select *p, p++
-
else select *q, q++
-
Repeat until both go past their ends.
-
Put elements into the array in the selected order
-
The array is now sorted.
-
f(N) = 2*f(N/2) + O(N)
-
f(N) = O(N*log(N))
-
Heap Sort
-
To sort the array:
-
Make a heap in place in the array (linear time if you use the bottom-up
method)
-
One-by-one remove items from the heap and place them at the end of the
array
-
The array is now sorted.
-
f(N) = O(N) + N*O(logN)
-
f(N) = O(N*log(N))
Labs
-
Linked Lists
-
IBCM Programming
-
Testing of SubSort Function
-
Linear Table
-
Binary Trees
-
Hash Table
-
Red-Black Trees
-
x86 Assembly
-
Priority Queues
Issues
An interesting item might be how a debugger works.
How buffering within "cin" and "cout" works.
Also vectors and double ended queues might be interesting.
We've looked at data abstraction, how about algorithm abstraction?
Project?