Tuesday, August 12, 2008

More on algorithms

© Moreniche

6.11.3 Binary search

However, if we place our items in an array and sort them in either ascending or

descending order on the key first, then we can obtain much better performance with an algorithm

called binary search.

In binary search, we first compare the key with the item in the middle position of the

array. If there's a match, we can return immediately. If the key is less than the middle key, then

the item sought must lie in the lower half of the array; if it's greater then the item sought must lie

in the upper half of the array. So we repeat the procedure on the lower (or upper) half of the array.

function can now be implemented:

static void *bin_search( collection c, int low, int high, void *key ) {

int mid;

/* Termination check */

if (low > high) return NULL;

mid = (high+low)/2;

switch (memcmp(ItemKey(c->items[mid]),key,c->size)) {

/* Match, return item found */

case 0: return c->items[mid];

/* key is less than mid, search lower half */

case -1: return bin_search( c, low, mid-1, key);

/* key is greater than mid, search upper half */

case 1: return bin_search( c, mid+1, high, key );

default : return NULL;

}

}

void *FindInCollection( collection c, void *key ) {

/* Find an item in a collection

Pre-condition:

c is a collection created by ConsCollection

c is sorted in ascending order of the key

key != NULL

Post-condition: returns an item identified by key if

57

one exists, otherwise returns NULL

*/

int low, high;

low = 0; high = c->item_cnt-1;

return bin_search( c, low, high, key );

}

Points to note:

a. bin_search is recursive: it determines whether the search key lies in the lower or upper

half of the array, then calls itself on the appropriate half.

b. There is a termination condition (two of them in fact!)

i. If low > high then the partition to be searched has no elements in it and

ii. If there is a match with the element in the middle of the current partition, then we

can return immediately.

c. AddToCollection will need to be modified to ensure that each item added is placed in its

correct place in the array. The procedure is simple:

i. Search the array until the correct spot to insert the new item is found,

ii. Move all the following items up one position and

iii. Insert the new item into the empty position thus created.

d. bin_search is declared static. It is a local function and is not used outside this class: if it

were not declared static, it would be exported and be available to all parts of the program.

The static declaration also allows other classes to use the same name internally.

Analysis

58

Each step of the algorithm divides

the block of items being searched

in half. We can divide a set of n

items in half at most log2 n times.

Thus the running time of a binary

search is proportional to log n and

we say this is a O(log n) algorithm.

Binary search requires a more complex program than

our original search and thus for small n it may run

slower than the simple linear search. However, for

large n,

Thus at large n, log n is much smaller than n,

consequently an O(log n) algorithm is much

faster than an O(n) one.

Plot of n and log n vs n .

In the worst case, insertion may require n operations to insert into a sorted list.

1. We can find the place in the list where the new item belongs using binary search in O(log

n) operations.

59

2. However, we have to shuffle all the following items up one place to make way for the new

one. In the worst case, the new item is the first in the list, requiring n move operations for

the shuffle!

A similar analysis will show that deletion is also an O(n) operation.

If our collection is static, ie it doesn't change very often - if at all - then we may not be

concerned with the time required to change its contents: we may be prepared for the initial

build of the collection and the occasional insertion and deletion to take some time. In

return, we will be able to use a simple data structure (an array) which has little memory

overhead.

However, if our collection is large and dynamic, ie items are being added and deleted

continually, then we can obtain considerably better performance using a data structure

called a tree.

The Binary search requires an ordered list.

Iterative Algorithm

int find (const apvector &list, double target)

// pre: list is sorted in ascending order

//post: ITERATIVE binary search will return the index of the target element, else -1

{

int mid;

int first = 0;

int last = list.length( ) -1;

while ( first <= last )

{

mid = (first + last) / 2;

if ( list[mid] == target )

return mid;

if ( list[mid] > target )

last = mid - 1;

else first = mid + 1;

}

return -1;

}

Recursive Algorithm

int find (const apvector &list, double target, int first, int last)

// pre: list is sorted in ascending order

//post: RECURSIVE binary search will return the index of the target element, else -1

{

if (first > last)

60

return -1;

int mid = (first + last) / 2;

if (list[mid] == target)

return mid;

if (list[mid] <>

return find(list, target, mid+1, last);

return find(list, target, first, mid-1);

}

UNIT 7

Algorithms for External Storage

7.1 A Model of External Computation

7.2 External sorting

7.3 Characteristics of External Sorting

7.4 Criteria for Developing an External Sorting Algorithm

7.5 Important Uses of External Sorting

7.6 Merge Sort--A Digression

7.7 Top-Down Strategy

7.8 Bottom-Up Strategy

7.9 Storing Information in Files

7.10 Hashed Files

7.11 Indexed Files

7.1 A Model of External Computation

61

In the algorithms discussed so far, we have assumed that the amount of input data is

sufficiently small to fit in main memory at the same time. But what if we want to sort all the

employees of the government by length of service or store all the information in the nation’s tax

returns? In such problems the amount of data to be processed exceeds the capacity of the main

memory. Most large computer systems have on-line external storage devices, such as disks or

mass storage devices, on which vast quantities of data can be stored. These on-line storage

devices, however, have access characteristics that are quite different from those of main memory.

A number of data structures and algorithms have been developed to utilize these devices more

effectively. This chapter discusses data structures and algorithms for sorting and retrieving

information stored in secondary memory.

Pascal, and some other languages, provides the file data type, which is intended to

represent data stored in secondary memory. Even if the language being used does not have a file

data type, the operating system undoubtedly supports the notion of files in secondary memory.

Whether we are talking about Pascal files or files maintained by the operating system directly, we

are faced with limitations on how files may be accessed. The operating system divides secondary

memory into equal-sized blocks. The size of blocks varies among operating systems, but 512 to

4096 bytes is typical.

We many regard a file as stored in a linked list of blocks, although more typically the

operating system uses a tree-like arrangement, where the blocks holding the file are leaves of the

tree, and interior nodes each hold pointers to many blocks of the file. If, for example, 4 bytes

suffice to hold the address of a block, and blocks are 4096 bytes long, then a root can hold

pointers to up to 1024 blocks. Thus, files of up to 1024 blocks, i.e., about four million bytes,

could be represented by a root block and blocks holding the file. Files of up to 220 blocks, or 232

bytes could be represented by a root block pointing to 1024 blocks at an intermediate level, each

of which points to 1024 leaf blocks holding a part of the file, and so on.

The basic operation on files is to bring a single block to a buffer in main memory; a buffer

is simply a reserved area of main memory whose size is the same as the size of a block. A typical

operating system facilitates reading the blocks in the order in which they appear in the list of

blocks that holds the file. That is, we initially read the first block into the buffer for that file, and

then replace it by the second block, which is written into the same buffer, and so on.

We can now see the rationale behind the rules for reading Pascal files. Each file is stored

in a sequence of blocks, with a whole number of records in each block. (Space may be wasted, as

we avoid having one record split across block boundaries.) The read-cursor always points to one

of the records in the block that is currently in the buffer. When that cursor must move to a record

not in the buffer, it is time to read the next block of the file.

Similarly, we can view the Pascal file-writing process as one of creating a file in a buffer. As

records are “written” into the file, they are placed in the buffer for that file, in the position

immediately following any previously placed records. When the buffer cannot hold another

complete record, the buffer is copied into an available block of secondary storage and that block is

appended to the end of the list of blocks for that file. We can now regard the buffer as empty and

write more records into it.

7.2 External sorting

This term is used to refer to sorting methods that are employed when the data to be sorted is too

large to fit in primary memory.

Why we Need New Algorithms?

Most of the internal sorting algorithms take advantage of the fact that memory is directly

addressable. Shell sort compares elements A [t] and A[i – hk] in one time unit. Heapsort

62

computers elements A[i] and [i + 2 + 1] in one time unit. Quick sort, with median of three

partitioning requires comparing A [left], A [Center], and A [Right], in a constant number of time

units. If the input is on a tape can only be accessed sequentially. Even if the data is on a disk,

there is still a practical loss of efficiency, because of the delay required to spin the disk head.

To see how slow external access really are, create a student file that is large, but not too big to fit

in main memory. Read the file in and sort it using an efficient algorithm. The time it takes to sort

the input is certain to be insignificant compared to the time to read the input, even though sorting

is an O (n log n) operation and reading the input is only O(n).

Model for External Sorting

The wide variety of mass storage devices makes external sorting much more device-dependent

than internal sorting. The algorithms that we will consider work on tapes, which are probably the

most restrictive storage medium. Since access to an element on tape is done by winding the tape

to the correct location tapes can be efficiently accessed only in sequential order (in either

direction).

We will assume that we have at least three tapes are drives to perform, the sorting. We need too

drives to do an efficient sort, the third drive simplifies matters. If only one tape drive is present,

then we are in trouble any algorithm will require W(N2) tape accesses.

The Simple Algorithm

The basic external sorting algorithm uses the Merge routine from mergesort. Suppose we have

four tapes Ta1, Ta2, Tb1, Tb2, which are two input and two output tapes. Depending on the point in

the algorithm, the a and b tapes are either input tapes or output tapes. Suppose the data is

initially on Ta1. Suppose further that the internal memory can hold (and sort) M records at a time.

A natural first step is to read M records at a time from the input tape, sort the records internally,

and then write the sorted records alternately to Tb1 and Tb2. We will call each set of sorted records

a run. When this is done, we rewind all the tapes. Suppose we have the same input as our

example for Shellsort.

Ta1 81 94 11 96 12 35 17 99 28 58 41 75 15

Ta2

Tb1

Tb2

If M = 3, then after the runs are constructed, the tapes will contain the data indicated in the

following figure.

Ta1

Ta2

Tb1 11 81 94 17 28 99 15

Tb2 12 35 96 41 58 75

Now, Tb1 and Tb2 contain a group of runs. We take the first run from each tape and merge them,

writing the result, which is a run twice as long, onto Ta1. Then we take the next run from each

tape, merge these, and the write the result to Ta2. We continue this process, alternative between

Ta1 and Ta2 until either Tb1 or Tb2 is empty. At this point either both are empty or there is one run

left. In the latter case, we copy this run to the approximate tape. We rewind all four tapes, and

repeat the same step, this time using the tapes as input and the b tapes, and repeat the same

steps, this time using the a tapes as input and the b tapes as output. This will give runs of 4m.

We continue the process until we get one run of length n.

63

This algorithm will require [log (n/m)] passes, plus the initial run-constructing pass. For instance,

if we have 10 million records of 128 bytes each, and four megabytes of internal memory, then the

first pass will create 320 runs. We would then need nine more passes to complete the sort. Our

example requires [log 13/3] = 3 more passes, which are shown in the following figure.

Ta1 11 12 35 81 94 96 15

Ta2 17 28 41 58 75 99

Tb1

Tb 2

Ta1

Ta2

Tb1 11 12 17 28 35 51 58 75 81 94 96 99

Tb2 15

Ta1 11 12 15 17 28 35 41 58 75 81 94 96 99

Ta2

Tb1

Tb2

7.3 Characteristics of External Sorting

1. During the sort, some of the data must be stored externally. Typically the data will be

stored on tape or disk.

2. The cost of accessing data is significantly greater than either bookkeeping or comparison

costs.

3. There may be severe restrictions on access. For example, if tape is used, items must be

accessed sequentially.

7.4 Criteria for Developing an External Sorting Algorithm

1. Minimize number of times an item is accessed.

2. Access items in sequential order

7.5 Important Uses of External Sorting

1. Business applications where a "transaction" file updates a master file.

2. Example: Updating an inventory database based on sales.

3. Updating a personnel database based on new hires,

4. Promotions, dismissals, etc.

5. Database applications

64

o Projection: The user requests a subset of the fields in a file. When a subset of the

fields is taken, there might be duplicate records, so an external sort is used to

remove duplicates.

o Join: Two files are "joined" on a common field(s) to create a new file whose fields are

the union of the fields of the two files. The two files must be sorted so that the

"join" fields can be matched.

6. Example: Suppose one database contains information about courses and rooms and

another database contains information about students and courses. To find out which

classrooms are being used by CS students, one could write the query:

7. Select from student database records with student.major = CS

8. Join the records selected in the previous query with the courses database using the

common courses field.

9. Project the result on rooms to produce a listing of the rooms being used by CS students

7.6 Merge Sort--A Digression

Merge sort is an ideal candidate for external sorting because it satisfies the two criteria for

developing an external sorting algorithm. Merge sort can be implemented either top-down or

bottom-up. The top-down strategy is typically used for internal sorting, whereas the bottom-up

strategy is typically used for external sorting.

7.7 Top-Down Strategy

The top-down strategy works by:

1. Dividing the data in half

2. Sorting each half

3. Merging the two halves

Example

Try using this strategy on the following set of characters (ignore the blanks):

a sorting example

Code

This section presents code for implementing top-down mergesort using arrays.

Merge code for arrays

i = 1;

j = 1;

a[M+1] = INT_MAX;

b[N+1] = INT_MAX;

for (k = 1; k <= M+N; k++)

c[k] = (a[i] <>

Array Mergesort

mergesort(int a[], int left, int right)

{

int i, j, k, mid;

if (right > left) {

mid = (right + left) / 2;

mergesort(a, left, mid);

mergesort(a, mid+1, right);

/* copy the first run into array b */

for (i = left, j = 0; i <= mid; i++, j++)

b[j] = a[i];

b[j] = MAX_INT;

/* copy the second run into array c */

for (i = mid+1, j = 0; i <=right; i++, j++)

65

c[j] = a[i];

c[j] = MAX_INT;

/* merge the two runs */

i = 0;

j = 0;

for (k = left; k <= right; k++)

a[k] = (b[i] <>

}

}

· This code is a more straightforward but less elegant version of mergesort than the

mergesort routine presented on page 166 of Sedgewick.

· My code incorporates the merging code shown in the first bullet.

· Sedgewick's code uses a characteristic of the two halves of the data to cleverly avoid using

sentinels. Right before the merging code begins, he also manages to avoid the assignment i

= left by loading the first half of the array from back to front (i.e., by decrementing i rather

than incrementing i).

· Which is faster? I coded up both algorithms and ran them on 100,000 elements. The result

was a virtual deadheat (mine ran in 2.93 seconds versus 2.95 seconds for Sedgewick).

· What is the moral? Sedgewick's code is undeniably more elegant than mine. However, my

code is more straightforward and could be more easily understood by a programmer trying

to maintain the code. Since the two programs run almost identically fast, I would prefer

my code since it's easier to understand. The moral here is that unless you can get a

significant performance improvement from clever but subtle algorithmic tricks, you

should use the more straightforward approach. If you can get significant performance

improvements, than you should clearly document the tricks you used so that a

programmer unfamiliar with your code can figure out what you were doing.

7.8 Bottom-Up Strategy

The bottom-up strategy is the strategy that we will use for external sorting. The bottom-up

strategy for mergesort works by:

1. Scanning through data performing 1-by-1 merges to get sorted lists of size 2.

2. Scaning through the size 2 sublists and perform 2-by-2 merges to get sorted lists of size 4.

3. Continuing the process of scanning through size n sublists and performing n-by-n merges

to get sorted lists of size 2n until the file is sorted (i.e., 2n >= N, where N is the size of the

file).

Example

Try implementing this strategy on the following set of characters (ignore the blanks):

a sorting example

You get the following runs:

as, or, it, gn, ex, am, lp, e

aors, gint, aemx, elp

aginorst, aeelmpx

aaeegilmnoprstx

Code for Bottom-Up Strategy

The code for doing an in-memory sort using the bottom-up strategy is much more complicated

than for the top-down strategy. In general, one is better off using the top-down strategy if one

wants to use mergesort to perform an internal memory sort.

Mergesort Performance

1. Mergesort requires about N lg N comparisons to sort any file of N elements.

66

o Bottom Up: Each M-by-N merge requires M+N comparisons. Each merge doubles

the size of the runs, so lg N passes are required and each pass requires about N

comparisons.

o Top Down: The problem is broken into two halves, the two halves are sorted, and

then N comparisons are used to merge them. This gives the recurrence:

o M[N] = 2M[N/2] + N with M[1] = 0

2. Mergesort requires extra space proportional to N.

3. Mergesort is stable.

4. Mergesort is insensitive to the initial order of its input. In other words, it will require

roughly N lg N comparisons on any input.

Programs

The programs take one argument, the name of a file to sort, and output the sorted file (they do not

write to the file so the file remains unsorted). For example, one could type:

array_mergesort my_file

Test data can be generated using the program generate_mergesort_data, which is found in the

above mentioned bin directory. generate_mergesort_data takes the number of integers to be

generated as an argument and outputs the desired number of integers. The integers are generated

randomly. For example, one could type:

generate_mergesort_data 10 > my_file

array_mergesort my_file

7.9 Storing Information in Files

In this section we consider data structures and algorithm for storing and retrieving

information in externally stored files. We shall view a file as a sequence of records, where each

record consists of the same sequence of fields. Fields can be either fixed length, having a

predetermined number of bytes, or variable length, having an arbitrary size. Files with fixed

length records are commonly used in database management systems to store highly structured

data. Files with variable length records are typically used to store textual information; they are

not available in Pascal. In this section we shall assume fixed – length fields, the fixed – length

techniques can be modified simply to work for variable – length records.

The operations on files we shall consider are the following

a. INSERT a particular record into a particular file.

b. DELETE from a particular file all records having a designated value in each of a designated

set of fields.

c. MODIFY all records in a particular file by setting to designated values in certain fields in

those records that have a designated value in each of another set of fields.

d. RETRIEVE all records having designated values in each of a designated set of fields.

A Simple Organization

The simplest, and also least efficient, way to implement the above file operations is to use

the file reading and writing primitives such as found in Pascal. In this “organization” (which is

really a “ lack of organization”), records can be stored in any order, Retrieving a record with

specified values in certain fields is achieved by scanning the file and looking at each record to see

if it has the specified values. An insertion into a file can be performed by appending the record to

the end of the file.

For modification of records, scan the file and check each record to see if it matches the

designated fields, and if so, make the required changes to the record. A deletion operation works

almost the same way, but when we find a record whose fields match the values required for the

deletion to take place, we must find a way to delete the record. One possibility is to shift all

subsequent records one position forward in their blocks, and move the first record of each

subsequent block into the last position of the previous block of the file. However, this approach

will not work if records are pinned, because a pointer to the ith record in the file would then point

to the I+1st record.

If records are pinned, we must use a somewhat different approach. We mark deleted

records in some way, but we do not move records to fill their space, nor do we ever insert a new

67

record into their space. Thus, the record becomes deleted logically from the file, but its space is

still used for the file.

This is necessary so that if we ever follow a pointer to a deleted record, we shall discover

that the record pointed to was deleted and take some appropriate action, such as making the

pointer NIL so it will not be followed again. Two ways to mark records as deleted are:

1. Replace the record by some value that could never be the value of a” real” record, and

when following a pointer, assume the record is deleted if it has that value.

2. Let each record have a deletion bit, a single bit that is records that have been deleted

and 0 otherwise.

7.10 Hashed Files

Hashing is a common techniques used to provide fast access to information stored on

secondary files. We divide the records of a file among buckets, each consisting of a linked list of

one or more blocks of external storage. The organization is similar to that portrayed in Fig. 4.10.

There is a bucket table containing B pointers, one for each bucket. Each pointer in the bucket

table is the physical address of the first block of the linked- list of blocks for that bucket.

The buckets are numbered 0,1, ……, B-1. A hash function h maps each key value into one

of the integers 0 through B-1. If x is a key, h (x) is the number of the bucket that contains the

record with key x, if such a record is present at all. The blocks making up each bucket are

chained together in a linked list. Thus, the header of the ith block of a bucket contains a pointer to

the physical address of the I +1st block. The last block of a bucket contains a NIL pointer in its

header. The major difference between, elements stored in one block of a bucket do not need to be

chained by pointers; only the blocks need to be chained.

X

Hashing with buckets consisting of chained blocks.

If the size of the bucket table is small, it can be kept in main memory. Otherwise, it can

be stored sequentially on as many blocks as necessary. To look for the record with key x, we

compute h (x), and find the block of the bucket table containing the pointer to the first lock of

bucket h (x). We then read the blocks of bucket h (x) successively, until we find a block that

contains the record with key x. If we exhaust all blocks in the linked list for bucket h (x), we

conclude that x is not the key of any record.

This structure is quite efficient if the operation is one that specifies values for the fields

in the key, such as retrieving the record with a specified key value or inserting a record (which,

naturally, specifies the key value for that record). The average number of block accesses required

for an operation that specifies the key of a record is roughly the average number of blocks in a

bucket, which is n/bk if n is the number of records, a block holds b records, and k is the number

of buckets. Thus, on the average, operations based on keys are k times faster with this

organization than with the unorganized file. Unfortunately, operations not based on keys are not

speeded up, as we must examine essentially all the buckets during these other operations. The

only general way to speed up operations not based on keys seems to be the use of secondary

indices, which are discussed at the end of this section.

68

R1 R2 R3 R4 R5 R6 R1 R2 R3

Bucket

directory

To insert a record with key value x, we first check to see if there is already a record with

key x. If there is, we report error, since we assume that the key uniquely identifies each record. If

there is no record with key x, we insert the new record in the first block in the chain for bucket h

(x) into which the record can fit. If the record cannot fit into any existing block in the chain for

bucket h (x), we call upon the file system to find a new block into which the record is placed. This

new block is then added to the end of the chain for bucket h (x).

To delete a record with key x, we first locate the record, and then set its deletion bit.

Another possible deletion strategy (which cannot be used if the records are pinned) is to replace

the deleted record with the last record in the chain for h (x). If the removal of the last record

makes the last block in the chain for h (x) empty, we can then return the empty block to the file

system for later re-use.

A well-designed hashed-access file organization requires only a few block accesses for

each file operation. If our hash function is good, and the number of buckets is roughly equal to

the number of records in the file divided by the number of records that can fit on one block, then

the average bucket consists of one block. Excluding the number of block accesses to search the

bucket table, a typical retrieval based on keys will then take on block access, and a typical

insertion, deletion, or modification will take two block accesses. If the average number of records

per bucket greatly exceeds the number that will fit on one block, we can periodically reorganize

the hash table by doubling the number of buckets and splitting each bucket into two.

7.11 Indexed Files

Another common way to organize a file of records is to maintain the file sorted by key

values. We can then search the file as we would a dictionary or telephone directory, scanning only

the first name or word on each page. To facilitate the search we can create a second file, called a

sparse index, which consists of pairs (x, b), where x is a key value and b is the physical address of

the block in which the first record has key value x. This sparse index is maintained sorted by key

values.

69

UNIT 8

Memory Management

8.1 The Issues in Memory

8.2 Garbage Collection Algorithms For Equal-Sized Block (Collection in Place)

8.3 Buddy system (Distribution of blocks, Allocation blocks, Returning blocks to available

storage)

8.4 Storage compaction and Compaction problem

8.1 The Issues in Memory

In actual practice, most systems use various combinations of different storage

allocation methods. This is because each method performs better under a different class of

demands. Most Pascal, C, and C++ implementations, for example, rely on a stack for local

variable allocation and a heap, typically implemented with a boundary tag method, for

dynamic or pointer variable allocation. On machines, which use virtual memory, both the

stack and heap areas share a single virtual address space, which is constructed by a

combination of hardware and software from a number of pages. Thus, fixed sized blocks

(pages) are allocated to create the memory region from which the variable size blocks on

the heap are later allocated!

Internal data structures in many operating systems are frequently allocated in terms of

a standard block size from a pool of such blocks, which is quite independent of the storage

allocation mechanism, used for allocation of page frames for user virtual address spaces.

For example, virtual address spaces might be composed of 4096 byte blocks, a size which

is also appropriate for disk buffers, but inappropriate for the records in service request

queues; the latter might be allocated from a separate pool of 16 or 32 byte blocks.

When dynamic data structures are maintained in virtual memory, poor locality of

reference can result. It is highly likely that a linked list, for example, will end up with each

list element stored in a separate page. Some heap managers have been constructed which

attempt to improve the locality of heap data structures, for example, by allocating

successive elements of a list in the same page when possible. This requires that the free

space data structures be organized, when possible, by page. It also requires that the

allocate routine be informed of the desired location of the allocated storage. The Pascal

new procedure has access to the address of the pointer to the newly allocated storage

because it returns its result in a parameter passed by reference, while the C malloc

function is not generally aware of this. As a result, an implementation of new could

attempt to allocate the new data structure in the same page as the pointer to the new data

structure, while this would be more difficult with malloc. Of course, even if new is coded to

allow this, it will only be useful if users are careful to call new(element^.next) to extend a

list, instead of using a temporary variable and then assigning that temporary to the

appropriate field of the data structure.

8.2 Garbage Collection Algorithms for Equal-Sized Block (Collection in Place)

The simplest general-purpose dynamic allocation mechanism of all requires that all of

the available storage be divided into equal sized blocks. When this is done, the size field on the

allocate and reallocate requests is not very meaningful, since only one size is available. If the

70

user requests more than this amount, the request will fail; if the user requests less and there

are any free blocks, the request will be successful and an entire block will be allocated no

matter how little memory the user actually requests.

The heap manager must maintain enough information to determine which blocks in

the heap are free at any time. Typically, this information will be stored in the form of a

linked list, where the first word of each free block holds the address of the next free block.

As with chained resolution of forward references, there is a problem with representing the

end of the free list. Throughout this section, this problem will be ignored, and the symbol

NULL, standing for an illegal or reserved pointer value, will be used; it should be

remembered that there are occasional contexts where there are no reserved addresses,

requiring a more complex solution to the problem of marking the end of a list.

When writing a heap manager in assembly language or other low-level languages,

where there is no difference between pointers and integers, it is natural to think of

memory as one big array in which it is perfectly appropriate to do arithmetic on array

indices. In higher level languages, where the type system discourages or even prevents

this, we could take two approaches to explaining such algorithms. One approach is to

explain the algorithm in terms of a large array of integers, M, with integer memory

addresses. Another is to use pointers with carefully documented and very unsafe

conversions between integer and pointer types.

In C, for example, *(int*)a is a pointer to the integer at the memory address a, while

*(char*)a refers to the character stored at the same memory address. The variable a used

in these examples could have been an integer or a pointer; in either casem, it holds the

same memory address. In most C implementations, the integer occupies 4 successive

bytes of memory starting at this address, while the character occupies only one byte at

this address.

Using these conventions, C code to allocate from a free list on the heap .

typedef void * pointer; /* used for untyped pointers */

pointer freelist; /* pointer to the first free block in memory */

pointer allocate( int size )

{

if (size > blocksize) {

error( "size too big" );

return NULL;

} else if (freelist == NULL) {

error( "no space available" );

return NULL;

} else {

pointer block;

block = freelist;

freelist = *(pointer *)freelist;

return block;

}

}

void deallocate( pointer block, int size );

{

71

if (block != NULL) {

*(pointer *)block = freelist;

freelist = block;

}

}

Although fixed size block allocation is presented here primarily to set the stage for

more interesting allocation algorithms, there are many places where it is the most

appropriate approach. When memory is divided into fixed length pages (or sectors on disk),

and a non-contiguous allocation scheme is being used, fixed size blocks are clearly

appropriate. Fixed size blocks are also appropriate for those applications where all

requests will be for the same size block. This latter situation shows up on many systems

where the only dynamically allocated storage is for input-output buffering, and all sectors

are of the same size. More commonly, there will be two or three common request sizes;

perhaps a small request for device control blocks, and a large one for input-output buffers.

When this is the case, fixed block size allocation can still be used, but only by maintaining

a separate heap for each size.

One disadvantage of a fixed block size is that, if a program needs to create a data

structure larger than the block size, the program must use multiple blocks for this

structure. Programmers who remember the early days of Microsoft's DOS and Apple's

MacOS are likely to remember a time when the largest memory incrment the operating

system would allocate to an application was 64K bytes. In those settings, many

programmers were forced to write awkwardly structured programs to manipulate such

things as databases, images or other large objects.

Another disadvantage of a fixed block size is that, if a program needs blocks

smaller than the fixed size, the standard size will be allocated anyway. As a result, each

user object will be stored in a container larger than necessary, wasting space. In each

block of memory allocated to a user, we refer to the unused space as a fragment of what

ought to be free space. Because this fragment is inside an allocated block, we refer to this

problem as internal fragmentation of the available free space. In general, any storage

allocation mechanism which will, on occasion, allocate blocks larger than the requested

size is said to cause some degree of internal fragmentation.

An effective measure of the extent to which a storage allocation algorithm leads to

internal fragmentation is the percent of allocated space which goes unused because it is

contained in internal fragments. If the block sizes requested by users are uniformly

distributed between one and the allocated block size, the fixed size block mechanism

described in this section would result in about 50% waste due to internal fragmentation.

In fact, a uniform distribution of block sizes is quite uncommon! Most real problems only

require a few well-defined block sizes.

8.3 Buddy system (Distribution of blocks, Allocation blocks, Returning blocks to

available storage)

One way of dealing with internal fragmentation is to allow a variety of block sizes.

Blocks of each size can be allocated and reallocated by the use of a fixed size block

allocate and reallocated mechanism, and if a block of one size is not available, a larger

block can be allocated and a block of the desired split off of it. When this is done, all

blocks resulting from splitting a particular block are called buddies, and the block from

which they were split is called their parent. The resulting storage allocation mechanism is

said to use a buddy system. All buddy systems maintain an array of lists of free blocks,

72

where all blocks in each list are the same size, and the array is indexed by a value

computed from the size.

The oldest buddy system, the binary buddy system has block sizes that are powers

of two. Therefore, when a block is split, it is split exactly in half, and when blocks are

combined, two equal size blocks are combined to make a block twice as big. With the

binary buddy system, we arrange things so that blocks of size 2n always begin at memory

addresses where the n least significant bits are zero. Thus, blocks of size 1 (20) may begin

at any address, but blocks of size 2 (21) may only begin at even addresses, and blocks of

size 4 (22) only begin at addresses with the least significant 2 bits equal to zero.

The constraints on the block addresses in the binary buddy system have an

interesting consequence. When a block of size 2n+1 is split into two blocks of size 2n, the

addresses of these two blocks will differ in exactly one bit, bit n, using the counting

scheme that numbers bits starting with 0 at the least significant end. Thus, given a block

of size 2n at address a, we can compute the address of its buddy, the other half of the

block from which it was split, by exclusive-oring a with 2n. This leads to the allocate

routine.

typedef void * pointer; /* used for untyped pointers */

/* pointers to the free space lists */

pointer freelists[sizes];

/* blocks in freelists[i] are of size 2**i. */

#define BLOCKSIZE(i) (1 << (i))

/* the address of the buddy of a block from freelists[i]. */

#define BUDDYOF(b,i) ((pointer)( ((int)b) ^ (1 << (i)) ))

pointer allocate( int size )

{

int i;

/* compute i as the least integer such that i >= log2(size) */

for (i = 0; BLOCKSIZE( i ) <>

if (i >= sizes) {

error( "no space available" );

return NULL;

} else if (freelists[i] != NULL) {

/* we already have the right size block on hand */

pointer block;

block = freelists[i];

freelists[i] = *(pointer *)freelists[i];

return block;

} else {

/* we need to split a bigger block */

pointer block, buddy;

block = allocate( BLOCKSIZE( i + 1 ) );

if (block != NULL) {

/* split and put extra on a free list */

buddy = BUDDYOF( block, i );

73

*(pointer *)buddy = freelists[i];

freeliests[i] = buddy;

}

return block;

}

}

There are two terminating conditions. As shown, recursion terminates when a

sufficiently large block of free space is available; in this case, either that block is returned

directly to the user, or it is split, perhaps repeatedly, and some piece of it is returned. If

there is no sufficiently large block available, the algorithm terminates with an error. The

same error message results when there is not enough free space as when the user

requests a block larger than the largest allowed, because the recursion keeps asking for

larger and larger blocks!

Uses purely integer arithmetic to take the log of the requested block size. This is

almost always faster than converting the block size to floating point, taking the log, and

then rounding up to the nearest integer. Note that, in C, 1<

2i. Also, note that the exclusive-or of variables a and b in C is a^b.

The deallocate routine for the binary buddy system is also recursive.

void deallocate( pointer block, int size )

{

int i;

pointer * p;

pointer buddy;

/* compute i as the least integer such that i >= log2(size) */

for (i = 0; BLOCKSIZE( i ) <>

/* see if this block's buddy is free */

buddy = BUDDYOF( block, i );

p = &freelists[i];

while ((*p != NULL) && (*p != buddy)) p = (pointer *)*p;

if (*p != buddy) {

/* buddy not free, put block on its free list */

*(pointer *)block = freelists[i];

freeliest[i] = block;

} else {

/* buddy found, remove it from its free list */

*p = *(pointer *)buddy;

/* deallocate the block and its buddy as one block */

if (block > buddy) {

deallocate( buddy, BLOCKSIZE( i + 1 ));

} else {

deallocate( block, BLOCKSIZE( i + 1 ));

}

}

}

74

Either find the block's buddy directly above or below it in the heap, so there are

two possible recursive calls to deallocate, one dealing with the former case and one with

the latter. The recursion will continue so long as we keep finding buddies to combine with

the successively larger blocks. Eventually, we must reach the point where the block's

buddy is not found in a free list, and at that point, the block goes in that free list with no

further recursion.

In the binary buddy system, each block has exactly one buddy, the identity of

which can be determined directly from the address and size of that block. At the start, all

of memory, or at least, all of the memory used for the heap, may be a single block, which

is then subdivided as required. Only when the user deallocates the very last block would

the buddy system be able to reconstruct this block. Illustrates this for a heap of 128 bytes

starting at address 0.

_______________________________________________________________

|_______________________________________________________________|

0 128

p1 = allocate(24);

_______________________________________________________________

|\\\\\\\\\\\|///|_______________|

_______________________________|

0 32 64 128

p1

p2 = allocate(12);

_______________________________________________________________

|\\\\\\\\\\\|///|\\\\\|/|_______|

_______________________________|

0 32 48 64 128

p1 p2

p3 = allocate(24); deallocate(p1,24)

_______________________________________________________________

|_______________|\\\\\|/|_______|\\\\\\\\\\\|///|

_______________|

0 32 48 64 96 128

p2 p3

______________________________________

Key |______|\\\\\\\\\\\\\|/////////////////|

free allocated internal fragment

An example of the binary buddy system in use.

This example illustrates three things: First, the successive subdivision of blocks as

requests are made, second, the internal fragmentation resulting from allocating more than

was requested, and third, the external fragmentation resulting from the breaking up of the

free space into multiple fragments. By the end of the example, there are two free blocks of

size 32, but there is no way to combine these blocks to get a block of size 64 should one be

needed.

The cost of external fragmentation is not as easy to measure as that of internal

fragmentation, but it can be partially characterized in terms of such statistics as the

average number of free blocks and the distribution of free block sizes.

75

The two blocks of size 16 starting at addresses 32 and 48. Should the block

at address 32 be reallocated, these two must be combined into a new block of size 32

starting at address 32. This new block is the buddy of the block at address 0, and since

that block is also free, they must be combined to make a block of size 64. The two blocks

of size 32 starting at addresses 64 and 96 are also buddies.

What if the initial heap does not begin at zero? What if the size of the initial heap is

not a power of 2? The answer used with the binary buddy system is fairly straightforward.

The rules already given define what addresses are legal for each block, so if we start at a

random address, we must follow those rules, breaking the initial heap into whatever size

blocks work. Consider a heap starting at address 100 and running to address 274.

_________________________________ ___________________

|___|_______|_______________|___ __|_______________|_|

100 104 112 128 256 272

4 8 16 128 16 2

The initial division of an odd heap into free blocks.

The heap shown in Figure begins with one free block of size 2 (at address 272), one

of size 4 (at address 100), two of size 16, and one of size 128. These blocks have no

buddies in the heap, so they can never be combined with anything else, but they may be

allocated to users.

Some aspects of the performance of the binary buddy system are easy to

determine. For example, if over a long period of time, requests for all possible block sizes

are evenly distributed, the average allocated block will be about one third larger than

required (one quarter of each block will typically be wasted, three quarters may be in use).

This is because blocks of size s will only be allocated to satisfy requests for between s/2

and s words of memory. This waste, which represents internal fragmentation, could clearly

be reduced if there were a larger assortment of available block sizes. The nature of the

external fragmentation is harder to determine without an experimental measurement, and

in any case, the assumption of uniformly distributed requests for different block sizes is

very unrealistic. In fact, programmers frequently prefer powers of two and powers of 10

when deciding on the sizes of different data structures!

The Fibonacci series provides the basis for a buddy system variant that

reduces the degree of internal fragmentation by providing a wider variety of free block

sizes. Each member of the Fibonacci series is the sum of the two previous elements, as

shown in Figure 14.7.

i : 0 1 2 3 4 5 6 7 8 9 10 11 ...

fib(i): 0 1 1 2 3 5 8 13 21 34 55 89 ...

i <>fib(i) = i

i > 2: fib(i) = fib(i-1) + fib(i-2)

Figure 14.7. The Fibonacci Series

Of course, we are not interested in blocks too small to hold a pointer, so if

memory is byte addressed, and if pointers are 32 bits each, the smallest block size that we

will use with this series is 5 bytes. In the limit, the ratio of successive numbers in the

Fibonacci series approaches the golden ratio (approximately 1.61803).

76

When a block is split using the Fibonacci buddy system, the fragments which result

are not the same size; thus, for example, when a block of size 55 is split, the results are of size

21 and 34. Figure illustrates a sequence of operations using the Fibonacci buddy system.

______________________________________________________

|______________________________________________________|

0 55

p1 := allocate(15);

______________________________________________________

|\\\\\\\\\\\\\\|/////|_________________________________|

0 21 55

p1 p2 := allocate(10);

______________________________________________________

|\\\\\\\\\\\\\\|/////|\\\\\\\\\|//|____________________|

0 21 34 55

p1 p2

deallocate(p1,15); p3 := allocate(6);

______________________________________________________

|\\\\\|/|____________|\\\\\\\\\|//|____________________|

0 8 21 34 55

p3 p2

______________________________________

Key |______|\\\\\\\\\\\\\|/////////////////|

free allocated internal fragment

An example of the Fibonacci buddy system in use.

With the Fibonacci buddy system, there is no simple formula for deriving the buddy of

a block from information about its size and location. In Figure 14.8, there are two blocks of

size 13; one of these has a buddy of size 8 below it, while the other has a buddy of size 21

above it. Thus, the code for the Fibonacci buddy system will necessarily be more complex than

that for the binary buddy system. The additional cost of the Fibonacci buddy system is offset

by the reduction in internal fragmentation resulting from having a more diverse assortment of

block sizes than the binary buddy system.

8.4 Storage Compaction and Compaction problem

Compaction works by actually moving blocks of data etc. from one location in memory to another

so as to collect all the free blocks into one large block. The allocation problem then becomes

completely implied. Allocation now consists of merely moving a pointer which point to the top of

this successively shortening block of storage. Once this single block gets too small again, the

compaction mechanism is again invoked to reclaim what unused storage may now exist among

allocated blocks. There is generally no storage release mechanism. Instead, a marking algorithm

is used to mark blocks that are still in use. Then, instead of freeing each unmarked block by

calling a release mechanism to put it on the free list, the compactor simply collects all unmarked

blocks into one large block at one end of the memory segment. The only real problem in this

method is the redefining of pointers. This is solved by making extra passes through memory.

After blocks are marked, the entire memory is stepped through and the new address for each

marked block is determined. This is solved by making extra passes through memory. After blocks

are marked, the entire memory is stepped through and the new address for each marked block is

determined. This new address is stored in the block itself. Then another pass over memory is

made. On this pass, pointers that point to marked blocks are reset to point to where the marked

blocks will be after compaction. This is why the new address is stored right in the block – it is

77

easily obtainable. After all pointers have been reset, then the marked blocks are moved to their

new locations. A general algorithm for the compaction routine is as follows.

1. Invoke garbage collection marking routine.

2. Repeat step 3 until the end of memory is reached.

3. If the current block of storage being examined has been marked then set the address of

the block to the starting address of unused memory update the starting address of unused

memory

4. Redefine variable references address of unused memory.

5. Define new values for pointers in marked block.

6. Repeat step 7 until the end of memory is reached

7. Move marked blocks into new locations and reset markets.

One approach to reducing external fragmentation is to move blocks in memory so that fragments

can be combined; this is called compacting the heap. This is much easier to say than it is to do! To

do it, we need a way to find and update all pointers to a block when we move the block. In

segmented virtual memories, there is usually a master segment table containing descriptions of

all of the segments in the system. If all pointers to a segment are indirect through an entry in this

table, segments may easily be moved to compact storage by updating this entry, in general,

though, finding all of the pointers to a block is computationally difficult.

Consider, for example, the problem of compacting the storage used by a Pascal, C or C++

program, which has built extensive data structures in the heap. Each item allocated by that

program in the heap may contain any number of pointers to other items in the heap, and no item

may be moved without updating all of the pointers that refer to it. This is possible only when there

is a description of each block in the heap, which tells the system which words in that block

contain pointers. Some machines have what are called tagged architectures, where each word has

a small field describing the type of the contents of that word, but the majority of machines provide

no help in solving this problem.

A Pascal or Java compiler supporting heap compaction must explicitly include this

information in each item stored on the heap, typically by including a special pointer to a type or

class description record as the first word of each object. Doing this in Pascal and Java is feasible

because these are strongly typed languages. Doing it in C or C++ is harder because the compiler

has less licence to include auxiliary information in a C or C++ structure. In those languages,

programmers usually assume that the pointer to a structure is also a pointer to the first explicitly

declared element of the structure, so insertions by the compiler pose problems.

The ability of the system to find all pointers to a block allows for more than just heap compaction,

it allows automatic reclamation of those memory blocks in the heap which are no longer

referenced by any of the user's pointers. This is valuable enough that it has a name, garbage

collection.

In a language implementation that supports garbage collection, so that storage which is no

longer referenced by the user is automatically reclaimed, the user need not make any calls to a

service such as reallocate; in fact, such a service need not even be provided to the user.

One approach to automatic storage reclamation uses a reference count associated with each block.

Many programmers consider reference counting to be a simple form of garbage collection, but it is

more properly considered as an inferior alternative to garbage collection. The reference count

attached to a block is initially zero when the block is allocated, but it is immediately incremented

when the pointer to that block is assigned to any pointer variable. The reference count of a block

is incremented each time the pointer to that block is assigned to a pointer variable, and it is

decremented each time the pointer to that block, stored in some pointer variable, is overwritten

with a pointer to something else or when a pointer variable holding a pointer to that block is

reallocated. The block is reallocated whenever the block's reference counts falls to zero.

The problem with reference counts is that they cannot detect the fact that a circularly linked

structure has been cut loose from the remainder of the structure, and thus, they tend to slowly

loose storage in environments where such structures occur. Reference counting is widely used in

modern file systems, but it is not generally used in heap managers.

Real garbage collection, in contrast, does not suffer from this problem. With garbage

collection, a special routine, the garbage collector, periodically traverses all data structures

accessible to the user, marking each item on the heap, which is encountered; this is called the

78

marking phase of the garbage collector. After doing so, the garbage collector claims all unmarked

blocks as garbage and sweeps them into the free space list; this is called the sweep phase of the

garbage collector. The names for these two phases lead to the name for the entire algorithm, the

mark sweep garbage collection algorithm.

Generally, the mark-sweep algorithm requires heap data structure that includes a mark

bit in each item on the heap. We might store this in the status field of the boundary tag on each

item. Given this, the sweep phase becomes a simple scan of the boundary tag list of the entire

heap, gathering the unmarked blocks, marking them as free, merging them if they are adjacent,

and stringing the results onto the free list. In the process, the mark bits on all blocks are cleared

to prepare for the next marking phase.

The marking phase is more interesting. In its classical form, this assumes that the entire heap

is reachable from some root item. The marking phase operates by recursively traversing the items

reachable from the root. If, during this recursion, an unmarked item is found, the item is

immediately marked and then the recursion continues. If an already marked item is found, it is

ignored since the presence of a mark indicates that it either has already been dealt with or that it

is on the stack of items being dealt with. The recursion itself, of course, only follows the pointers

stored in each item, completely ignoring integers, characters or other non-pointer content.

79

UNIT 9

NP Complete Problem

9.1 Introduction

9.2 Polynomial-time

9.3 Abstract Problems

9.4 Encoding

9.5 NP-Completeness and Reducibility

9.6 NP-Completeness

9.7 Circuit Satisfiability

9.8 NP-Complete Problems

9.9 The Vertex-cover Problem

9.10 The Hamiltonian-cycle Problem

9.11 The Traveling-salesman Problem

9.1 Introduction

All of the algorithms we have studied thus far have been polynomial-time algorithms: on inputs

of size n, their worst-case running time is O(nK) where K is a constant. What do you think

whether all problems can be solved in polynomial-time? The answer is no. Some problems such as

Tuning's famous "Halting Problem," cannot be solved by any computer, no matter how much time

is provided. One problems can b e solved, but not in time O(nK) for any constant k. The problems

that can be solvable by polynomial - time algorithms are called tractable, and problems that need

super polynomial time as being intractable.

We discuss an interesting class of problems called the "NP-Complete" problems in this chapter,

whose status is unknown. There is no polynomial-time algorithm for an NP-Complete problem,

nor we are yet able to prove a super polynomial time lower bound for any of them. In theoretical

computer science question P ¹ NP question has been one of the deepest, most perplexing open

research problems in theoretical computer science. It came into existence in 1971.

Many scientists believe that the NP-complete problems can be solved in polynomial time i.e.

intractable. Because if any single NP-complete problem can be solved in polynomial time, then we

can solve every NP-complete problem in a polynomial time.

If you want to design good algorithms you should understand the rudiments of the theory of NPcompleteness.

If its intractability can be proved, as an engineer you would then do better

spending your time developing an approximation algorithm rather than searching for fast

algorithm that solved the problem exactly. Some problems no harder than sorting, graph

searching or overhear flow seem easy but are in face NP-complete. Hence we should become

familiar with its important class of problems.

9.2 Polynominal Time

We begin our study of NP-completeness by defining polynominal time solvable problems.

Generally these problems are tractable. The reason is a mathematical issue. We give three

supporting arguments here.

The first argument says that it is reasonable to regard a problem that requires time Q(n100) as

intractable there are very few practical problems that require time on the order of such a high

degree polynomials. The practical polynomial-time problems require much less time.

80

Second, many problems that can be solved in polynomial-time in one model, there always exists

another polynomial-time model.

And the third is that the class of polynomial-time solvable problems has closure properties. For

example if we fed the output of one polynomial-time algorithm into the input of another, the

resultant algorithm is polynomial. If an polynomial-time algorithm makes a constant number of

calls to polynomial-time sub routines, the running time of the composite algorithm is polynomial.

9.3 Abstract Problems

To make clear the class of polynomial-time solvable problems, we first define what a problem is.

We define an abstract problem Q to be a binary relation on a set I of problem instances and a set

S of problem solutions. For example, remember the problem of finding SHORTEST PATH, a

shortest path between two given vertices in an unweighted undirected graph G = (V,E). We define

an instance for SHORTEST PATH as a triple consisting of a graph and two veritices. And a

solution is defined as a sequence of vertices in the graph (with empty sequence denoting that no

path exists) The problem SHORTEST PATH itself is the relation that associated each instance of a

graph of two vertices with a shortest path in the graph that joins the two vertices. We can have

more than one solution becauseshortest paths are not necessarily unique.

This formulation of an abstract problem is sufficient for our purposes. To make simple the theory

of NP completeness restricts attention to decision problems: those having yes/no solution. In this

case we can view an abstract decision problem as a function that maps the instance set I to the

solution set {0,1}. We can observe it by an example: a decisions problem path related to the

shortest path problem is "Given a graph G = (V,E), two vertices p, q Î V and a positive integer k

does a path exits in G between p and q whose length is at most k" if i = (G,p,q,k) is an instance of

this shortest path problem, then PATH (i) = 1 (yes) if a shortest path from u to v has length at

most k otherwise PATH (i) = 0 (no).

Certain other abstract problems are there called optimization problem in which some value

must be minimized or maximized and these are not decision problems. But if we want to apply the

theory of NP-completeness to optimization problems, we must reproduce them as decision

problems. Typically, an optimization problem can be recast by imposing a bound on the value to

be optimized. For example in recasting the shortest path problem as the decision problem PATH

we added a bound k to the problem instance.

The requirement to recast optimization problem as decision problem does not diminish the impact

of the theory. Generally if we are able to solve an optimization problem quickly, we will be able to

solve its related decision problem in short time. We simply compare the value obtained from the

solution of the optimization problem with the bound provided as input to the decision problem if

an optimization problem is easy, therefore, its related decision problem is easy as well. Stated in a

way that has more relevance to NP-completeness if we can provide evidence that a decision is

hard, we also provide evidence that its related optimization problem is hard. Thus even though it

restricts attention it decision problem the theory of NP-completeness applies much more widely.

9.4 Encodings

If we want to make a computer program that can solve an abstract problem, we have to represent

instances in a way that the program understands. An encoding of a set S of abstract objects is a

mapping e from S to the set of binary strings

, For example, the encoding of the natural numbers

N = {0,1,2,3,4........} is as the strings {0,10,11,100,......} Hence by this encoding, e(17) = 10001.

Anyone who has looked at computer representations of keyboard characters is familiar with either

the ASCII or EBCDIC codes. In the ASCII codes, e (A) = 1000001. Even a compound object can be

encoded as a binary string by combining the representations of its constituent parts. Polygons,

graphs, functions ordered pairs programs all can be encoded as binary strings.

Hence a computer algorithm to solves some substance decision problem will an encoding of

problem instants as input. A concrete problem is a problem whose instance set is the set of binary

81

thing. An algorithm solves a concrete problem in time O(T (n)) if it is provided a problem instance i

of length n = [i], the algorithm can produce the solution in at most O(T(n)) time. We can say that a

concrete problem is polynomial-time solvable. Therefore if there exist an algorithm to solve it in

time O(nk) for some constant k.

We will now define the complexity class P as the set of concrete decision problems that can be

solved in polynomial-time.

Encoding can be used to map abstract problem to concrete problems. Given an abstract decision

problem Q mapping an instance set I to {0,1} an encoding e : I —{0,1} can be use to induce a

related concrete decision problem which we denote by e(Q). If the solution to an abstract problem

instance i Î I is Q(i) Î {0,1}, then the solution to the concrete-problem e(i) Î{0,1}* is also Q(i). There

may be some binary strings that represent no meaningful abstract-problem instance. For

convenience, we shall assume that any such string is mapped arbitrarily to 0. Thus the concrete

problem produces the same solution as the abstract problem on binary digit instances that

represent the encoding of abstract-problem instances.

Now we generalize the definition of polynomial-time solvability from concrete problems to abstract

problems using encoding as the bridge, but we keep the definition independent of any particular

encoding. We want to say that the efficiency of solving a problem will not depend on how the

problem is encoded. Unfortunately, it depends quite heavily. For example suppose that an integer

k is to be provided as the sole input to an algorithm and suppose that the running time of the

algorithm is O(k). If the integer K the provided in unary—a string of k 1's then the running time of

the algorithm is O(n) on length-n-inputs, which is polynomial time. If we use the more natural

binary representation of the integer k, but then the input length is n = [1gk]. In this case, the

running time of the algorithm is O(k) = q(2n) which is exponential in the size of the input. Thus,

depending on the encoding, the algorithm run in the either polynomial of super polynomial-time.

To understand the polynomial – time it the encoding of an abstract program is important. We

cannot really talk about solving an abstract problem without first specifying an encoding.

Practically, if we rule out expensive encoding such as an unary ones, the actual encoding of a

problem makes little difference to whether the problem can be solved in polynomial-time. For

example representing integers in base 6 instead of binary has no effect on whether a problem is

solvable in polynomial-time, since an integer represented in base 2 in polynomial-time.

A function f : {0,1}*—{0,1}* is Polynomial-time computable if we can find a polynomial-time

algorithm A that, for any input xÎ{0,1}*, produces as output f (x). For time set I of problem

instances, two encoding e1 and e2 are polynomial related if there exist two polynomial-time

computable functions f12 and f21 such that for any i Î I, we have f12 (e1(i)) e21 (i) and f21 (e2(i))

e1(i). That is, the encoding e2 (i) can be computed from the encoding e1(i) by a polynomial-time

algorithm and vice versa.

9.5 NP-completeness and Reducibility

The reason that theoretical computer scientists believe that P ¹ NP is the existence of the class of

"NP-complete" problems. This class has an interesting property that if any one NP-complete

problem can be solved in polynomial-time, then every problem in NP has a polynomial-time

solution, that is, P = NP. No polynomial-time algorithm has ever been discovered for any NPcomplete

problem for the decades.

The language HAM-CYCLE is one NP-complete problem. If we could decide HAM-CYCLE in polynomialtime

then we solve could every problem in NP in polynomial-time. In fact, if NP - P should turn out

to be nonempty, we could say with certainty that HAM-CYCLE ÎNP – P.

82

Figure: An illustration of a polynomial-time reduction from a language L1 to a language

L2 L1via a reduction function f. For any input x Î{0,1}*, the question of whether

x ÎL, has the same answer as the qestion of whether f (x)ÎL2.

The NP-complete languages are the "hardest" languages in NP. We shall show how to compare the

relative "hardness" of languages using "polynomial-time reducibility." First, we formally define the

NP-complete languages, and then we sketch a proof that one such language, called CIRCUIT-SAT,

is NP-complete. We use the notion of reducibility to show that many other problems are NPcomplete.

Reducibility

A problem P can be reduced to another problem P’ if any instance of P can be "easily rephrased"

as an instance of P’ the solution to which provides a solution to the instance of P. For example,

the problem of solving linear equations in an indeterminate x reduces to the problem of solving

quadratic equations. Given an instance ax + b = 0, we can transform it to Ox2 + ax + b = 0. Its

solution provides a solution to ax + b = 0. Thus, if a problem P reduces to another problem P,

then we say that P is, "no harder to solve" than P.

Now we returns our formal-language framework for decision problems, a language L1

is said to

polynomial-time reducible to a language L2

written L1 £ pL2, if there exists a polynomial-time

computable to function f : {a,b}* ® {a,b}* such that for all x Î{a,b}*,

x ÎL1 if and only if f (x) ÎL2.........................................(1)

The function f is called the reducing-function, and a polynomial-time algorithm F that computes

f is known as a reduction algorithm.

The idea of a polynomial-time reduction from a language L1to another language L2 is given in

Figure. Each language is a subset of {0,1}*. The reduction function f gives a polynomial-time

mapping such that if x ÎL1 then f (x) ÎL2. also, if x ÎL1, then f (x) ÏL2 . Thus, the reduction

function maps any instance x of the decision problem represented by the language L2 to an

instance f (x) of the problem represented the language L1 to instance f(x) of the problem

represented by L2. Providing an answer to whether f (x) ÎL2 directly provides the answer to

whether x ÎL1.

Figure: The proof of lemma 2The algorithms F is a reduction algorithm that computes the

reduction function f from L1to L2in ploynomial time, and A2 is a polynomial-time algorithm that

decides L2 . Illustrated is an algorithm A1that decides whether x ÎL1 . by using F

to transform any input x into f(x) and then using A2 to decide whether f (x) ÎL2 .

83

{0,1}* {0,1}*

F A2

f(x) f(x)ÎL2?

xÎL1?

x

A1

Polynomial-time reductions give us a powerful tool for proving that various languages belong to P.

9.6 NP-completeness

Polynomial-time reductions gives a formal means for showing that one problem is at least as hard

as another, if we consider a polynomial-time factor. Hence if L1P L2, then L1 is not more than

a polynomial-time factor Harder than L2 because the "less than or equal to" notation for reduction

is mnemonic. Now we define the set of NP-complete languages, which are the hardest problems in

NP.

A language L {0,1}* is NP-complete if

1. L ÎNP, and

2. L1P L for every L1 ÎNP.

NP

NPC

P

Figure: How most theoretical computer scientists view the relationship among P, NP, and NPC.

Both

P and NPC are wholly contained within NP,and P NPC = Æ.

If a language L satisfies property 2, but not necessarily property 1, we say that L is NP = hard. We

also define NPC to be the class of NP-complete language.

9.7 Circuit Satisfiability

Up to this point we have not actually proved that any Problems is NP-complete though we have

defined NP-complete problem. If we prove that at least one problem is NP-complete, polynomialtime

reducibility can be used as a tool to prove the NP-completeness of other problems. So we will

focus on showing the existence of an NP-complete problem: the circuit-satisfiability problem.

84

Figure: Two instances of the circuit satisfiability problem. (a) The assignment x1= 1, x2= 1, x3=

0

to the inputs of this circuit causes the output of the circuit to be 1. The circuit is therefore

satisfiable.

(b) No assignment to the input of this circuit can cause the output of the circuit to

be 1. the circuit is therefore unsatisfable.

We shall informally describe a proof that relies of a basic understanding of boolean combinational

circuits.

Two boolean combinational circuits are shown in Figure. Each circuit has three inputs and a one

output. A truth assignment means a set of boolean input values for that circuit. We say that a one

output Boolean combinational circuits is satisfiable if it has a satisfying assignment: a truth

assignment that causes the output of the circuit in Figure 4.4(a) has the satisfying assignment

x1= 1, x2= 1, x3= 0, and so it is satisfiable. No assignment of values to x1,x2, and x3 causes the

circuit in Figure 4.4(b) to produce a 1 output; it always produces 0, and so it is unsatisfiable.

Now we state the circuit-satisfiability problem as, "Given a boolean combinational circuit

composed of AND, OR, AND NOT gates is it satisfiable?" In order to pose this question formally

however we must agree on a standard encoding. We can make a graph like encoding that maps

any given circuit C into a binary string Cwhose length is not much larger than the size of the

circuit itself. As a formal language. We can therefore define.

CIRCUIT-SET = {CCis a satisfiable boolean combinational circuit}

There is a great importance of the circuit-satisfiability problem in the area of computer aided

hardware optimization. If a circuit always produces 0, it can be replaced by an easier circuit that

omits all logic gates and provides the constant 0 value as its output. If we can design a

polynomial-time algorithm for the problem then it would have considerable practical application.

Suppose we are given a circuit C, we might attempt to determine whether it is satisfiable by

simply checking all possible assignment to the input. But if there are k inputs there are 2k

possible assignments. When the size of C is polynomial in k, checking each one leads to a super

polynomial-time algorithm. In fact as has been claimed there is strong evidence that no

polynomial-time algorithm exists that solves the circuit-satisfiability problem because circuit

satisfiability is NP-complete. We break the proof of this fact into two parts based on the two parts

of the definition of NP-completeness.

85

9.8 NP-complete Problems

NP-complete problem can be in the domains: Boolean logic, arithmetic, automata and language

theory, network design sets and partitions, storage and retrieval sequencing and scheduling,

graphs, mathematical programming, algebra and number theory, games and puzzles, program

optimization etc. Here we use the reduction methodology to provide NP-completeness proofs for

the problems related to graph theory and set partitioning.

CIRCUIT-SAT

SAT

3-CNF-SAT

CLIQUE HAM-CYCLE

VERTEX-COVER TSP

SUBSET-SUM

Figure: The Structure of NP Completents Proofs

9.9 The Vertex-cover Problem

We define a vertex cover of an undirected graph G = (V,E) as a subset V' V such that if (u,v) E,

then u V, then u V' (or both). That is each vertex "covers" its incident edges, and a vertex cover

for G is a set of vertices that covers all the edges in E. The size of a vertex cover is the number of

vertices in it. For example the graph in figure has a vertex cover {w, z} of size 2.

This problem is to find a vertex cover of minimum size in a given graph. Restating it as a decision

problem we wish to determine whether a graph has a vertex cover of a given size k. In language we

define.

9.10 The Hamiltonian-cycle Problem

a a’ a a’

z1 z2 z3 z4

b b’ b b’

(a) (b)

a a’ a a’

z1 z2 z3 z4 A

86

b b’ b b’

(c) (d)

Figure (a) Widget A, used in the reduction from 3-CNF to HAM-CYCLE. (b)-(c). If A is a subgraph

of some graph G That contains a hamiltoniam cycle and the only Connections from A to the rest of

G are through the vertices a, a’, b and b’ then the shaded edges represent the only two possible

ways in which the hamiltonian cycle may traverse the edges of subgraph A.

(d) A compact representation of the A widget.

Theorem

The Hamiltonian cycle problem is NP-complete.

Proof Initially we show that HAM-CYCLE belongs to NP. Given a graph G = (V,E) our certificate is

the sequence of [V] vertices that make up the Hamiltonian cycle. The verification algorithm checks

that this sequence contains each vertex in V exactly once and that with the first vertex repeated at

the end it forms a cycle in G. This verification can be performed in polynomial-time.

We now prove that HAM-CYCLE is NP-complete by showing that 3 CNF-SAT P HAM-CYCLE.

Given a 3-CNF Boolean formula ø over variables x1, x2,....,xn with clauses c1,c2,....,ck , each

containing exactly 3 distinct literals we construct a graph G = (V,E) in polynomial-time such that

G has a Hamiltonian cycle if and only if f is satisfiable. Our construction is based on widgets,

which are pieces of graphs that enforce certain properties.

Our first widget is the subgraph A shown in Figure. Suppose that A is a subgraph of some graph

G and that the only connections between A and the remainder G are through the vertices z1, z2,

z3 and z4 in one of the ways shown in figures (b) and (c) we may treat subgraph A as if it were

simply a pair of edges (a,a') and (b,b') with the restriction that any hamiltonian cycle of G must

include exactly one of these edges. We shall represent widget A as shown in Figure.

The subgraph B in Figure is our second widget. Suppose that B is a subgraph of some graph G

and that the only connections from B to the remainder of G are through vertices b1,b2.b3, and b4.

A Hamiltonian cycle of graph G cannot traverse all of the edges (b1,b2), (b2,b3), and (b3,b4),

since then all vertices in the widget other than b1,b2,b3 and b4 would be missed. A hamiltonian

cycle of G may however traverse any proper subset of these edges. Figure (a)—(e) show five such

subsets; the remaining two subsets can be obtained by performing a top-to-bottom flip of part

(b) and (e). We represent this widget as in Figure (f), the idea being that at least one of the paths

pointed by that arrows must be taken by a G hamiltonain cycle.

The graph G that we shall construct consists mostly of copies of these two widgets. The

construction is illustrated in Figure 4.13 of the k clauses f, we include a copy of widget B, and we

join these widgets together in series as follows. Letting bij be the copy of vertex bj in the jth copy

of widget B, we connect bi,4 to bi+1.1 for i = 1,2,...,k - 1.

Then, for each variable xm in f we include two vertices x'm and x"m . We connect these two

vertices by means of two copies of the edge (x'm , x"m), which we denote by em and em to

distinguish them. The idea is that if the hamiltonian cycle takes edge em , it corresponds to

assigning variable xm the value 1. If the hamiltonian cycle takes edge em, the variable is assigned

the value 0. Each pair of these edges forms a two-edge loop; we connect these small loops in

series by adding edges (x'm , x"m+1) for m = 1,2,....,n - 1. We connect the left (clause) side of the

graph to the right (variable) side by means of two edges (b1,1 , x'1 ) and (bk,4 , x"n ), which are the

topmost and bottom most edges in Figure.

87

We are not yet finished with the construction of graph G, since we have yet to relate the variables

to the clauses. If the jth literal of clause Ci is xm, then we use an A widget to connect edge

(bij,bi,j+1) with edge em. If the jth literal of clause ci is xm, then we instead put an A widget

between edge (bij,bi,j+1) and em In Figure for example, because clause c2 is (xi  x2,x3), we

place three A widgets as follows:

between (b2,1;b2,2) and e1,

between (b2,2;b2,3) and 2 e , and

between (b2,3;b2,4) and e3,

Note that connecting two edges by means of A widgets actually entails replacing each edge by the

five edges in the top to bottom of Figure (a) and, of course, adding connections that pass through

the Z vertices as well. A given literal lm may appear in several clauses and thus an edge em or

m e may be influenced by several A widgets (edge 3 e ,for example). In this case, we connect the A

widgets in series, as shown in Figure effectively replacing edge cm or m e by a series of edges.

Figure: Widget B, used in the reduction from 3-CHF-SAT to HAM-CYCLE. No path from vertex

b1 to vertex b4 containing all the vertices in the widget may use all three edges (b1,b2), (b2,b3),

88

and (b3,b4).Any proper subset of these edges may be used, however. (a)-(e) five such subsets.

(f) A representation of this widget in which at least one of the paths pointed to

by the arrow must be taken a hamiltonian cycle.

Figure: The Graph Constructed from the formula f = (x1x2x3)(x1x2x3)(x1x2x3).

A satisfying assignment s to the variables of f is s (x1) = 0, s (x2) = 1, and s (x3) = 1, which

corresponds to the hamiltonian cycle shown. Note that if s(xm) = 1, then edge m e is in

the hamiltonian cycle, and if s(Xm) = 0, then edge m e is in the hamiltonian cycle

b1,3 x2

O O O

O O O

O O O

O O O

b1,4

b1,3

b1,4 x’3 b3,3

O O O

O O O

O O O

b3,3 x’’3 O O O

b3,4 b3,4 x’’3

89

e3

A

A

We claim that formula f is satisfiable if and only if graph G contains a hamiltonian cycle. We

first suppose that G has a hamiltonian cycle h and show that f is satisfiable. Cycle h must take a

particular form:

First, it traverses edge. ( ' )

1.1 1 b , x to go from the top left of the top right.

It then follows all of the x'm and x"m vertices from top to bottom, choosing either edge em or

edge m e , but not both.

It next traverses edge ( " )

k n b 4, x to get back to the left side.

Finally, it traverses the B widgets from bottom to top on the left.

(It actually traverses edges within the A widgets as well, but we use these subgraphs to enforce

the either /or nature of the edges it connects.)

Given the hamiltonian cycle h, we define a truth assignment for ø as follows. If edge em belong to

h then we set xm = 1. Otherwise, edge m e belong to h, and we set xm = 0.

We claim that this assignment satisfies f. Consider a clause Ci and the corresponding B widget in

G. Each edge ( ) i, j i, j 1 b b + is connected by an A widget to either edge em of edge m e , depending on

whether xm or Øxm is the jth literal in the clause. The edge ( ) i, j i, j 1 b b + is traversed by h if and only

if the corresponding literal is 0. Since each of the three edges (bi,j1,bi,2),(bi,j2,bi,3),(bi,j3,bi,4) in

clause ci is also in a B widget, all three cannot be traversed by the hamiltonian cycle h. One of the

three edges, therefore, must have a corresponding literal whose assigned value is 1, and Clause

Ci is satisfied. This property holds for each clause C , i 1, 2.....k, i = and thus formula f is satisfied.

4

u v

1

3 2

1

x w

5

Figure: An instance of the traveling - salesman problem. Shaded edges

represent a minimum-cost tour, with cost 7.

Conversely, let us suppose that formula f is satisfied by some truth assignment. By following the

rules from above, we can construct a hamiltonian cycle. For graph G: traverse edge em edge n e if

xm = 0, and traverse edge ( ) i, j i, j 1 b ,b + if and only if the jth literal of clause Ci is 0 under the

assignment. These rules can indeed be followed, since we assure that s is a satisfying assignment

for formula f.

Finally, we note that graph G can be constructed on polynomial-time. It contains one B widget for

each of the k clauses in f, and so there are 3k A widgets. Since the A and B widgets are of fixed

size, the graph G has O(k) vertices and is easily constructed in polynomial-time. Thus we have

provided a polynomial-time reduction from 3-CHF-SET to HAM-CYCLE.

9.11 The Traveling-salesman Problem

90

In the travelling-salesman problem, which is closely related to the hamiltonian-cycle problem, a

salesman must visit n cities. Modeling the problem as a complete graph with n vertices, we can

say that the salesman wishes to make a tour, or hamiltonian cycle, visiting each city exactly once

and to finishing at the city he starts from. There is an integer cost c(i, j) to travel from city i to city

j, and the salesman wishes to make the tour whose total cost is minimum, where the total cost is

the sum of the individual costs along the edges of the tour. For example, in Figure 4.15 a

minimum -cost tour is u, w, v, x, uwith cost 7. The formal language for the  traveling salesman

problem is :

TPS = {G, c, k} : G = (V, E) is a complete graph,

c is a function from V ´ V ® Z,

K Z, and

G has a travelling -salesman tour with cost at most k}.

The following theorem shows that a fast algorithm for the travelling-salesman problem is unlikely

to exist.

Theorem

The travelling-salesman problem is NP-complete.

Proof: We first show that TPS belongs to NP. Given an instance of the problem, we use as a

certificate the sequence of n vertices in the tour. The verification algorithm checks that this

sequence contains each vertex exactly once, sums up the edge costs and checks whether the sum

is at most k. This process can certainly be done in polynomial-time.

To prove that TSP is NP-hard, show that HAM-CYCLE P TSP. Let G = (V,E) be an instance of HAMCYCLE.

We form the complete graph G' = (V,E') where E' = {(i,j) : i,j,V}, and we define the cost

function c by

c(i,j) = {0 if (i,j)E,

1if (i,j)E.

The instance of TSP then (G',c,o), which is easily formed in polynomial-time.

We now show that graph G has a hamiltonian cycle if and only if graph G' has a tour of cost at

most 0. Suppose that graph G has a hamiltonian cycle h. Each edge in h belong to E and thus has

cost 0 in G' has. Thus, h' is a tour in G' with cost 0. Conversely, suppose that graph G' has a tour

h' of cost at most 0. Since the cost s of the edges in E' are 0 and 1, the cost of tour h' is exactly 0.

Therefore, h' contains only edge in E. We conclude that h is a hamiltonian cycle in graph G.

91

No comments: