【轉載文章】Google Topics in Programming Languages: A Lock-Free Hash Table - 蜂蜜弟弟的網誌

字體：小中大

【轉載文章】Google Topics in Programming Languages: A Lock-Free Hash Table

2009/04/04 22:30:10瀏覽1139｜回應0｜推薦0

A hash table, put simply, is an abstraction of an array that allows any value to be used as an index. While an array requires that indices be integers, a hash table can use a floating-point value, a string, another array, or even a structure as the index. This index is called the key, and the contents of the array element at that index is called the value. So a hash table is a data structure that stores key/value pairs and can be quickly searched by the key. Because insertion and removal are operations dependent on the speed of the search, they tend to be fast as well.

To achieve this magic, a hash table uses a helper function that converts any object into an integral index suitable for subscripting the array. For example, to index an array with strings representing the numbers 0 through 9, a hash function might look like this:

1unsigned hash ( const char key )


2...{


3  return key - '0';


4}

The first character is expected to always be '0', '1', '2', '3', '4', '5', '6', '7', '8', or '9', and the trick of subtracting '0' from one of those characters evaluates to the integer value that the character represents. This works because you can subtract the lowest value in a contiguous range from another value in that range and find the zero-based distance between them. So ('2'-'0') results in 2, ('0'-'0') results in 0, and ('9'-'0') results in 9.

For those of you who aren't familiar with this trick, it might help to work with the actual values rather than character constants. Let's take an arbitrary range of [12, 16). You want 12 to represent 0, 13 to represent 1, 14 to represent 2, and so on up to 16 which should represent 4. To do this you can take any value in the range and subtract the lowest value, 12, from it. Try it with a calculator.

The hash function shown above can then be used to index a suitable array. For illustrative purposes, I'll include a complete test program for you to play with. Notice how problems occur when two entries are hashed to the same index. This is called a collision, and properly handling collisions is where most of the effort in implementing hash tables comes in. Here is the example program:

 1#include <stdio.h>  2


 3/**//* Find the number of elements in an array */  4#define length(a) ( sizeof a / sizeof *a )  5


 6static size_t hash ( const char key )


 7...{


 8  return key - '0';


 9} 10


11static void jsw_flush ( void );


12static int get_digit ( char *ch );


13


14static void fill_table ( char table[] );


15static void show_table ( const char table[], size_t size );


16


17int main ( void )


18...{


19  /**//* This is our hash table */ 20  char table[10] = ...{0};


21  


22  fill_table ( table );


23  show_table ( table, length ( table ) );


24


25  return 0;


26} 27


28/**//* 29  Discard remaining characters in stdin


30  up to the next newline or end-of-file


31*/ 32void jsw_flush ( void )


33...{


34  int ch;


35


36  do 37    ch = getchar();


38  while ( ch != '\n' && ch != EOF );


39


40  clearerr ( stdin );


41} 42


43/**//* 44  Get a single character digit, '0'-'9'.


45  Return 1 on success, 0 on input error,


46  end-of-file, or invalid input


47*/ 48int get_digit ( char *ch )


49...{


50  int rc;


51


52  printf ( "Enter a single digit: " );


53  fflush ( stdout );


54  


55  if ( rc = scanf ( "%1[0123456789]", ch ) == 1 ) ...{


56    /**//* At least a newline is left over; clear it all */ 57    jsw_flush();


58  } 59


60  return rc;


61} 62


63/**//* Populate the hash table */ 64void fill_table ( char table[] )


65...{


66  char ch;


67


68  while ( get_digit ( &ch ) ) ...{


69    /**//* Create an index from the character */ 70    unsigned i = hash ( ch );


71    


72    if ( table[i] != 0 ) ...{


73      /**//* Duplicate indexes are called collisions */ 74      printf ( "That index has been filled\n" );


75    } 76    else ...{


77      /**//* The element is empty; fill it in */ 78      table[i] = ch;


79    } 80  } 81} 82


83/**//* Display the contents of the hash table */ 84void show_table ( const char table[], size_t size )


85...{


86  size_t i;


87


88  for ( i = 0; i < size; i++ ) ...{


89    /**//* Mark empty elements as 'null' with the ~ character */ 90    printf ( "%c\n", table[i] != 0 ? table[i] : '~' );


91  } 92}

Unfortunately, not all cases are as clean cut as this contrived example. In reality, a hash function will be expected to convert a much broader range of items that don't fall right into indices. In general, it's not possible to create such a perfect hashing scheme that every item can be converted into a unique index. For details about why this is so, see The Art of Hashing on this website.

In the example above, all of the valid keys mapped perfectly to an index. However, when you have a larger range of values, two keys that are not equal may be mapped to the same index. For example, if we continue to use an array of 10, but want to hash all values in the range of [0,20), we might use a hash function like the following:

1size_t hash ( const unsigned int key )


2

( 知識學習｜檔案分享 )