HASH index method not correctly handling NULL text values? - Mailing list pgsql-general

From David Madore
Subject HASH index method not correctly handling NULL text values?
Date
Msg-id 20020203004915.A4956@clipper.ens.fr
Whole thread Raw
Responses Re: HASH index method not correctly handling NULL text  (Neil Conway <nconway@klamath.dyndns.org>)
Re: HASH index method not correctly handling NULL text values?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hello.

I don't know if this counts as a bug or not.  I'm using PostgreSQL
version 7.1.3, and I tried creating an index using HASH (rather than
BTREE as I usually do) on a text column of one of my tables.  The
table has 345442 rows, of which 344339 have a non NULL value in the
column in question.  The index creation proceeded without trouble, but
a subsequent "VACUUM VERBOSE ANALYZE" command gave the following
warning: "NUMBER OF INDEX' TUPLES (344339) IS NOT THE SAME AS HEAP'
(345442)" with the advice to recreate the index.  Essentially, my
question is: is the warning incorrect or is the index so?  In the end
I recreated my index as a BTREE and this gave no problem.

Incidentally, I might ask, which is the best choice of indexing method
considering that the only comparison I will ever make on this column
is equality (this is what led me to prefer HASH initially) and
considering that the column takes of the order of magnitude of a few
hundred distinct values (as opposed to the third-of-a-million rows of
the table)?  The BTREE index creation was considerably faster than the
HASH index creation.

In case this is relevant or of any help, here is the creation command
for the table:

CREATE TABLE articles (
  id serial PRIMARY KEY ,
  msgid text UNIQUE NOT NULL ,
  from_header text NOT NULL ,
  subject_header text NOT NULL ,
  date timestamp NOT NULL ,
  posting_date timestamp ,
  last_reference text ,
  lines int NOT NULL ,
  bytes int NOT NULL ,
  sender_name text ,
  sender_host text
) ;

here is the command I tried to create the HASH index:

CREATE INDEX articles_sender_name_key ON articles USING HASH ( sender_name ) ;

and here is the corresponding output from "VACUUM VERBOSE ANALYZE"
(the databased had already been vacuumed just prior to index
creation):

NOTICE:  --Relation articles--
NOTICE:  Pages 10326: Changed 0, reaped 3, Empty 0, New 0; Tup 345442: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 9, MinLen
151,MaxLen 833; Re-using: Free/Avail. Space 3356/0; EndEmpty/Avail. Pages 0/0. CPU 0.58s/0.36u sec. 
NOTICE:  Index articles_pkey: Pages 1809; Tuples 345442: Deleted 0. CPU 0.25s/1.54u sec.
NOTICE:  Index articles_msgid_key: Pages 2524; Tuples 345442: Deleted 0. CPU 0.22s/1.63u sec.
NOTICE:  Index articles_posting_date_key: Pages 957; Tuples 345442: Deleted 0. CPU 0.12s/1.59u sec.
NOTICE:  Index articles_last_reference_key: Pages 2324; Tuples 345442: Deleted 0. CPU 0.18s/1.68u sec.
NOTICE:  Index articles_sender_name_key: Pages 3089; Tuples 344339: Deleted 0. CPU 0.17s/10.79u sec.
NOTICE:  Index articles_sender_name_key: NUMBER OF INDEX' TUPLES (344339) IS NOT THE SAME AS HEAP' (345442).
        Recreate the index.

PS: Please send copy of replies to me personally as I do not receive
mail from the list.  Thanks again.

--
     David A. Madore
    (david.madore@ens.fr,
     http://www.eleves.ens.fr:8080/home/madore/ )

pgsql-general by date:

Previous
From: Mike Mascari
Date:
Subject: Re: PostgreSQL transaction locking problem
Next
From: Neil Conway
Date:
Subject: Re: HASH index method not correctly handling NULL text