UNB/ CS/ David Bremner/ teaching/ cs3383/ assignments/ CS3383 Assignment 4

Introduction

For general information about assignments in this course, see assignments

Huffman Coding

Consider a discrete probability distribution on symbols [math]1[/math] to [math]n[/math] with probabilities [math]p_1 \ldots p_n[/math] where each [math]p_i[/math] is an integer power of [math]1/2[/math]. Suppose a long sequence [math]S[/math] of samples is drawn from the distribution such that symbol [math]i[/math] occurs [math]p_i|S|[/math] times. Prove that if a Huffman code based on the probabilities [math]p_i[/math] is used, the total number of bits to encode S is

\begin{equation} |S|\sum_{i=1}^n p_i \lg (1/p_i) \end{equation}

You may use without proof the fact that the two smallest probabilities have to be the same.

Hint: It suffices to prove that symbol $j$ is at depth $\lg (1/p_j)$ in the resulting tree.

Greed is Good

Let [math]e[/math] be the unique lightest edge in a graph [math]G[/math]. Let [math]T[/math] be a spanning tree of [math]G[/math] such that [math]e \notin T[/math] . Prove using elementary properties of spanning trees (i.e. not the cut property) that [math]T[/math] is not a minimum spanning tree of [math]G[/math].

Union by weight

Consider the following alternative implementation of the disjoint-sets data structure.

class Partition:
    def __init__(P,n):
        # sometimes called makeset(j)
        P.parent = [j for j in range(n)]
        P.weight = [1] * n

    def find(P, key):
        if P.parent[key] != key:
            key = P.find(P.parent[key])
        return key

    def union(P,x,y):
        rx = P.find(x)
        ry = P.find(y)

        if rx == ry:
            return

        if P.weight[rx] > P.weight[ry]:
            P.parent[ry] = rx
            P.weight[rx] += P.weight[ry]
        else:
            P.parent[rx] = ry
            P.weight[ry] += P.weight[rx]

Prove that after any sequence of union operations, P.find(x) takes [math]O(\lg n)[/math] time.