Thread: Behavior of hash index on a text field
Hello -
My environment: Postgres 8.2, FreeBSD.
I have a db of about 250G. In it is a table of Internet domain information containing, among other things, the domain name. In that table I have a serial int index as the primary key to use as a foreign key in a number of other tables. The domain name is a text field. There are about 60M rows in the table, all with unique names.
Many queries begin with the specification of the domain name and then fan out through joins to other tables. Obviously I need to be able to locate a domain name quickly so I can get the index of the row to use in the joins. I believe (but don't know) that a hash index would be better for this than a btree. I don't know enough to have an opinion wrt GiST or GIN indexes. Any advice on this problem would be greatly appreciated.
TIA,
david
My environment: Postgres 8.2, FreeBSD.
I have a db of about 250G. In it is a table of Internet domain information containing, among other things, the domain name. In that table I have a serial int index as the primary key to use as a foreign key in a number of other tables. The domain name is a text field. There are about 60M rows in the table, all with unique names.
Many queries begin with the specification of the domain name and then fan out through joins to other tables. Obviously I need to be able to locate a domain name quickly so I can get the index of the row to use in the joins. I believe (but don't know) that a hash index would be better for this than a btree. I don't know enough to have an opinion wrt GiST or GIN indexes. Any advice on this problem would be greatly appreciated.
TIA,
david
"David Monarchi" <david.e.monarchi@gmail.com> writes: > I believe (but don't know) that a hash index would be better for this than a > btree. A fairly reliable rule of thumb is that there isn't *any* situation where a Postgres hash index outperforms a btree. Why this is so is not entirely clear, and various people keep poking at the code in hopes of making it better. Sooner or later we'll either succeed in getting hash indexes to be a win for specific use cases, or give up and drop them entirely. But at present it is undeniable that Postgres hash indexes are not production quality. (Even if they had a performance win, their current lack of WAL backup makes them unfit for production use...) regards, tom lane
Thanks for the insight, Tom. I had read that Postgres hash indexes didn't work too well, but thought that had been remedied in 8.2. Do you have any thoughts about btree versus GiST or GIN for text data with very few duplicate values?
Best,
david
Best,
david
On 3/31/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David Monarchi" <david.e.monarchi@gmail.com> writes:
> I believe (but don't know) that a hash index would be better for this than a
> btree.
A fairly reliable rule of thumb is that there isn't *any* situation
where a Postgres hash index outperforms a btree.
Why this is so is not entirely clear, and various people keep poking
at the code in hopes of making it better. Sooner or later we'll either
succeed in getting hash indexes to be a win for specific use cases,
or give up and drop them entirely.
But at present it is undeniable that Postgres hash indexes are not
production quality. (Even if they had a performance win, their
current lack of WAL backup makes them unfit for production use...)
regards, tom lane