Michael Loftis <mloftis@wgops.com> writes:
> [ on hash vs btree indexing ]
> Most of the time until the btree gets deep they are nearly equivalent.
> When the tree ends up becoming many levels deep it can take longer to
> walk than the hash.
Maybe. I've just completed a simple benchmark of btree vs hash indexes
as implemented in Postgres, and I can't see any advantage.
Using current sources on Red Hat Linux 7.2, I built a simple test table
containing one integer column, and filled it with 16 million random
integers generated by int4(1000000000 * random()). With a btree index,
"explain analyze select * from foo where f1 = 314888455" (matching a
single row of the table) took about 22 msec on first try (nothing in
cache), and subsequent repetitions about 0.11 msec. With a hash index,
the first try took about 28 msec and repetitions about 0.15 msec.
Moreover, the hash index was a whole lot bigger: main table size 674
meg, btree 301 meg, hash 574 meg, which possibly offers part of the
explanation for the greater access time.
I would have tried a larger test case, but this one already taxed
my patience: it took 36 hours to build the hash index (vs 19 minutes
for the btree index). It looks like hash index build has an O(N^2)
performance curve --- the thing had 100 meg of hash index built within
an hour of starting, but got slower and slower after that.
In short, lack of support for concurrent operations is hardly the
worst problem with Postgres' hash indexes. If you wanna fix 'em,
be my guest ... but I think I shall spend my time elsewhere.
regards, tom lane