Hash Functions

In the previous example, we determined a hash value by examining the remainder after division. In this section we'll examine several algorithms that compute a hash value.

Division Method (TableSize = Prime)

A hash value, from 0 to (HashTableSize - 1), is computed by dividing the key value by the size of the hash table and taking the remainder:

Public Function Hash(ByVal Key As Long) As Long
    Hash = Key Mod HashTableSize
End Function

Selecting an appropriate HashTableSize is important to the success of this method. For example, a HashTableSize divisible by two would yield even hash values for even keys, and odd hash values for odd keys. This is an undesirable property, as all keys would hash to even values if they happened to be even. If HashTableSize is a power of two, then the hash function simply selects a subset of the key bits as the table index. To obtain a more random scattering, HashTableSize should be a prime number not too close to a power of two.

Multiplication Method (TableSize = 2n)

The multiplication method may be used for a HashTableSize that is a power of 2. The Key is multiplied by a constant, and then the necessary bits are extracted to index into the table. One method uses the fractional part of the product of the key and the golden ratio, or (sqrt(5) - 1)/2. For example, assuming a word size of 8 bits, the golden ratio is multiplied by 28 to obtain 158. The product of the 8-bit key and 158 results in a 16-bit integer. For a table size of 25, the 5 most significant bits of the least significant word are extracted for the hash value. The following definitions may be used for the multiplication method:

' 8-bit index
Private Const K As Long = 158

' 16-bit index
Private Const K As Long = 40503

' 32-bit index
Private Const K As Long = 2654435769

' bitwidth(index)=w, size of table=2^m
Private Const S As Long = 2^(w - m)
Private Const N As Long = 2^m - 1
Hash = ((K * Key) And N) \ S

For example, if HashTableSize is 1024 (210), then a 16-bit index is sufficient and S would be assigned a value of 2(16 - 10) = 64. Constant N would be 210 - 1, or 1023. Thus, we have:

Private Const K As Long = 40503
Private Const S As Long = 64
Private Const N As Long = 1023

Public Function Hash(ByVal Key As Long) As Long
    Hash = ((K * Key) And N) \ S
End Function

Variable String Addition Method (TableSize = 256)

To hash a variable-length string, each character is added, modulo 256, to a total. A hash value, range 0–255, is computed.

Public Function Hash(ByVal S As String) As Long
    Dim h As Byte
    Dim i As Long
    
    h = 0
    For i = 1 to Len(S)
        h = h + Asc(Mid(S, i, 1))
    Next i
    Hash = h
End Function

Variable String Exclusive-Or Method (TableSize = 256)

This method is similar to the addition method, but successfully distinguishes similar words and anagrams. To obtain a hash value in the range 0–255, all bytes in the string are exclusive-or'd together. However, in the process of doing each exclusive-or, a random component is introduced.

Private Rand8(0 To 255) As Byte

Public Function Hash(ByVal S As String) As Long
    Dim h As Byte
    Dim i As Long

    h = 0
    For i = 1 To Len(S)
        h = Rand8(h Xor Asc(Mid(S, i, 1)))
    Next i
    Hash = h
End Function

Rand8 is a table of 256 8-bit unique random numbers. The exact ordering is not critical. The exclusive-or method has its basis in cryptography, and is quite effective.

Variable String Exclusive-Or Method (TableSize <= 65536)

If we hash the string twice, we may derive a hash value for an arbitrary table size up to 65536. The second time the string is hashed, one is added to the first character. Then the two 8-bit hash values are concatenated together to form a 16-bit hash value:

Private Rand8(0 To 255) As Byte

Public Function Hash(ByVal S As String) As Long
    Dim h1 As Byte
    Dim h2 As Byte
    Dim c As Byte
    Dim i As Long

    if Len(S) = 0 Then
        Hash = 0
        Exit Function
    End If

    h1 = Asc(Mid(S, 1, 1))
    h2 = h1 + 1
    For i = 2 To Len(S)
        c = Asc(Mid(S, i, 1))
        h1 = Rand8(h1 Xor c)
        h2 = Rand8(h2 Xor c)
    Next i

    ' Hash is in range 0 .. 65535
    Hash = (h1 * 256) + h2
    ' scale Hash to table size
    Hash = Hash Mod HashTableSize
End Function

Hashing strings is computationally expensive, as we manipulate each byte in the string. A more efficient technique utilizes a DLL written in C to perform the hash function. Included in the download is a test program that hashes strings using C and Visual Basic. The C version is typically 20 times faster.