Thread: Known problem with HASH index?
Is there a known problem with HASH type index in PostgreSQL 7.1.2 4PGDG on Red Hat Linux 7.1 (2.4.2 kernel)? I can't finda lot of documentation, but this is what I observe: [playpen]$ dropdb test; createdb test; psql -f create_table.sql test; psql -c "COPY clients FROM '/tmp/input.txt';" test;psql -c 'CREATE INDEX clients_idx ON clients USING HASH (tel);' test; vacuumdb test; vacuumdb --analyze test DROP DATABASE CREATE DATABASE CREATE COPY CREATE NOTICE: Index clients_idx: NUMBER OF INDEX' TUPLES (916864) IS NOT THE SAME AS HEAP' (1000000). Recreate the index. VACUUM NOTICE: Index clients_idx: NUMBER OF INDEX' TUPLES (916864) IS NOT THE SAME AS HEAP' (1000000). Recreate the index. VACUUM [playpen]$ cat create_table.sql CREATE TABLE clients ( ClientID integer, firstname varchar(5), surname varchar(22), area varchar(3), neigh varchar(27), dimos varchar(50), tel varchar(7) The input file is a bit big to include, but was created using this brain-damaged perl script (somebody please teach me howto do random letter strings :-)) #!/usr/local/bin/perl -w my $i; my $j; srand(time||$$); for ($i = 0; $i<1_000_000; $i++) { print int(rand(1<<31)), "\t"; foreach $j (1..5) { printf "%c", int(rand(57)+65); } print "\t"; foreach $j (1..22) { printf "%c", int(rand(57)+65); } print "\t"; foreach $j (1..3) { printf "%c", int(rand(57)+65); } print "\t"; foreach $j (1..27) { printf "%c", int(rand(57)+65); } print "\t"; foreach $j (1..50) { printf "%c", int(rand(57)+65); } print "\t"; foreach $j (1..7) { printf "%c", int(rand(10)+48); } print "\n"; } print "\\.\n";
> Is there a known problem with HASH type index in PostgreSQL 7.1.2 > 4PGDG on Red Hat Linux 7.1 (2.4.2 kernel)? I can't find a lot of > documentation, but this is what I observe: Tom Lane has mentioned several times that index types other than BTREE have suffered some bit-rot over the past few years, and probably have problems with concurrent access -- so using a BTREE is recommended whenever possible. It would be cool if someone felt like taking the time to clean up HASH indexes though... > [playpen]$ dropdb test; createdb test; psql -f create_table.sql > test; psql -c "COPY clients FROM '/tmp/input.txt';" test; psql -c > 'CREATE INDEX clients_idx ON clients USING HASH (tel);' test; > vacuumdb test; vacuumdb --analyze test DROP DATABASE Doing vacuumdb followed by vacuumdb --analyze is redundant; doing a VACUUM ANALYZE also does a regular VACUUM. > NOTICE: Index clients_idx: NUMBER OF INDEX' TUPLES (916864) IS NOT > THE SAME AS HEAP' (1000000). > Recreate the index. That's strange... Personally, I'd just switch to a BTREE, where presumambly this won't occur. However, I'd suggest waiting for one of the Postgres hackers to give you a proper answer ;-) > [playpen]$ cat create_table.sql > CREATE TABLE clients ( > ClientID integer, > firstname varchar(5), > surname varchar(22), > area varchar(3), > neigh varchar(27), > dimos varchar(50), > tel varchar(7) The missing ');' at the end is a typo, right? > The input file is a bit big to include, but was created using this > brain-damaged perl script (somebody please teach me how to do > random letter strings :-)) Grab stuff from /usr/share/dict? >srand(time||$$); On modern Perls, the default srand() seed is more secure than this, I believe. (Although it doesn't matter for your script, of course) Cheers, Neil
Allan Engelhardt <allane@cybaea.com> writes: > Is there a known problem with HASH type index in PostgreSQL 7.1.2 > 4PGDG on Red Hat Linux 7.1 (2.4.2 kernel)? It's got a number of known shortcomings, but not anything like the one you describe. I couldn't reproduce it. I ran your script, then did VACUUM VERBOSE and VACUUM VERBOSE ANALYZE: NOTICE: --Relation clients-- NOTICE: Pages 23256: Changed 0, reaped 0, Empty 0, New 0; Tup 1000000: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 167,MaxLen 183; Re-using: Free/Avail. Space 2345632/1600; EndEmpty/Avail. Pages 0/1. CPU 4.37s/0.95u sec elapsed 55.65 sec. NOTICE: Index clients_idx: Pages 6614; Tuples 1000000. CPU 1.18s/35.89u sec elapsed 59.93 sec. NOTICE: Rel clients: Pages: 23256 --> 23256; Tuple(s) moved: 0. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM NOTICE: --Relation clients-- NOTICE: Pages 23256: Changed 0, reaped 0, Empty 0, New 0; Tup 1000000: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 167,MaxLen 183; Re-using: Free/Avail. Space 2345632/1600; EndEmpty/Avail. Pages 0/1. CPU 4.41s/0.96u sec elapsed 25.78 sec. NOTICE: Index clients_idx: Pages 6614; Tuples 1000000. CPU 1.16s/35.85u sec elapsed 62.99 sec. NOTICE: Rel clients: Pages: 23256 --> 23256; Tuple(s) moved: 0. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Analyzing clients VACUUM Looks fine here. regards, tom lane