Thread: Known problem with HASH index?

Known problem with HASH index?

From
Allan Engelhardt
Date:
Is there a known problem with HASH type index in PostgreSQL 7.1.2 4PGDG on Red Hat Linux 7.1 (2.4.2 kernel)?  I can't
finda lot of documentation, but this is what I observe: 


[playpen]$ dropdb test; createdb test; psql -f create_table.sql test; psql -c "COPY clients FROM '/tmp/input.txt';"
test;psql -c 'CREATE INDEX clients_idx ON clients USING HASH (tel);' test; vacuumdb test; vacuumdb --analyze test 
DROP DATABASE
CREATE DATABASE
CREATE
COPY
CREATE
NOTICE:  Index clients_idx: NUMBER OF INDEX' TUPLES (916864) IS NOT THE SAME AS HEAP' (1000000).
 Recreate the index.
VACUUM
NOTICE:  Index clients_idx: NUMBER OF INDEX' TUPLES (916864) IS NOT THE SAME AS HEAP' (1000000).
 Recreate the index.
VACUUM

[playpen]$ cat create_table.sql
CREATE TABLE clients (
       ClientID integer,
       firstname varchar(5),
       surname varchar(22),
       area varchar(3),
       neigh varchar(27),
       dimos varchar(50),
       tel varchar(7)

The input file is a bit big to include, but was created using this brain-damaged perl script (somebody please teach me
howto do random letter strings :-)) 

#!/usr/local/bin/perl -w
my $i;
my $j;
srand(time||$$);

for ($i = 0; $i<1_000_000; $i++) {
    print int(rand(1<<31)), "\t";
    foreach $j (1..5)  { printf "%c", int(rand(57)+65); } print "\t";
    foreach $j (1..22) { printf "%c", int(rand(57)+65); } print "\t";
    foreach $j (1..3)  { printf "%c", int(rand(57)+65); } print "\t";
    foreach $j (1..27) { printf "%c", int(rand(57)+65); } print "\t";
    foreach $j (1..50) { printf "%c", int(rand(57)+65); } print "\t";
    foreach $j (1..7)  { printf "%c", int(rand(10)+48); } print "\n";
}
print "\\.\n";



Re: Known problem with HASH index?

From
"Neil Conway"
Date:
> Is there a known problem with HASH type index in PostgreSQL 7.1.2
> 4PGDG on Red Hat Linux 7.1 (2.4.2 kernel)?  I can't find a lot of
> documentation, but this is what I observe:

Tom Lane has mentioned several times that index types other than BTREE
have suffered some bit-rot over the past few years, and probably have
problems with concurrent access -- so using a BTREE is recommended
whenever possible. It would be cool if someone felt like taking the
time to clean up HASH indexes though...

> [playpen]$ dropdb test; createdb test; psql -f create_table.sql
> test; psql -c "COPY clients FROM '/tmp/input.txt';" test; psql -c
> 'CREATE INDEX clients_idx ON clients USING HASH (tel);' test;
> vacuumdb test; vacuumdb --analyze test DROP DATABASE

Doing vacuumdb followed by vacuumdb --analyze is redundant; doing a
VACUUM ANALYZE also does a regular VACUUM.

> NOTICE:  Index clients_idx: NUMBER OF INDEX' TUPLES (916864) IS NOT
> THE SAME AS HEAP' (1000000).
>  Recreate the index.

That's strange... Personally, I'd just switch to a BTREE, where
presumambly this won't occur. However, I'd suggest waiting for one of
the Postgres hackers to give you a proper answer ;-)

> [playpen]$ cat create_table.sql
> CREATE TABLE clients (
>        ClientID integer,
>        firstname varchar(5),
>        surname varchar(22),
>        area varchar(3),
>        neigh varchar(27),
>        dimos varchar(50),
>        tel varchar(7)

The missing ');' at the end is a typo, right?

> The input file is a bit big to include, but was created using this
> brain-damaged perl script (somebody please teach me how to do
> random letter strings :-))

Grab stuff from /usr/share/dict?

>srand(time||$$);

On modern Perls, the default srand() seed is more secure than this, I
believe. (Although it doesn't matter for your script, of course)

Cheers,

Neil


Re: Known problem with HASH index?

From
Tom Lane
Date:
Allan Engelhardt <allane@cybaea.com> writes:
> Is there a known problem with HASH type index in PostgreSQL 7.1.2
> 4PGDG on Red Hat Linux 7.1 (2.4.2 kernel)?

It's got a number of known shortcomings, but not anything like the one
you describe.  I couldn't reproduce it.  I ran your script, then did
VACUUM VERBOSE and VACUUM VERBOSE ANALYZE:

NOTICE:  --Relation clients--
NOTICE:  Pages 23256: Changed 0, reaped 0, Empty 0, New 0; Tup 1000000: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen
167,MaxLen 183; Re-using: Free/Avail. Space 2345632/1600; EndEmpty/Avail. Pages 0/1. CPU 4.37s/0.95u sec elapsed 55.65
sec.
NOTICE:  Index clients_idx: Pages 6614; Tuples 1000000. CPU 1.18s/35.89u sec elapsed 59.93 sec.
NOTICE:  Rel clients: Pages: 23256 --> 23256; Tuple(s) moved: 0. CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
NOTICE:  --Relation clients--
NOTICE:  Pages 23256: Changed 0, reaped 0, Empty 0, New 0; Tup 1000000: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen
167,MaxLen 183; Re-using: Free/Avail. Space 2345632/1600; EndEmpty/Avail. Pages 0/1. CPU 4.41s/0.96u sec elapsed 25.78
sec.
NOTICE:  Index clients_idx: Pages 6614; Tuples 1000000. CPU 1.16s/35.85u sec elapsed 62.99 sec.
NOTICE:  Rel clients: Pages: 23256 --> 23256; Tuple(s) moved: 0. CPU 0.00s/0.00u sec elapsed 0.00 sec.
NOTICE:  Analyzing clients
VACUUM

Looks fine here.

            regards, tom lane