Thread: Ignore hash indices on replicas

Ignore hash indices on replicas

From

Steven Schlansker

Date:

10 July 2012, 04:10:11

I'm using Postgres hash indices on a streaming replica master.
As is documented, hash indices are not logged, so the replica does not have access to them.

I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that
show that our particular application is faster by quite a bit when a hash index is available.

I assume that fixing the hash index logging issue hasn't been a priority due to low interest / technical limitations,
butI'm curious for a stopgap measure -- can we somehow configure Postgres to ignore hash indices on a replica, using
otherb-tree indices or even a sequential scan?  I know I can do this on a per-connection basis by disabling various
indexlookup methods, but it'd be nice if it just ignored invalid indices on its own. 

I've not seen much reference to this problem around, but I do apologize if I've missed it in the manual or it is
extremelyobvious how you do this :) 

Thanks,
Steven

Re: Ignore hash indices on replicas

From

Jeff Davis

Date:

19 August 2012, 21:37:41

On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote:
> I understand that the current wisdom is "don't use hash indices", but
> (unfortunately?) I have benchmarks that
> show that our particular application is faster by quite a bit when a
> hash index is available.

Can you publish the results somewhere? It might provoke some interest.

> I assume that fixing the hash index logging issue hasn't been a
> priority due to low interest / technical limitations, but I'm curious
> for a stopgap measure -- can we somehow configure Postgres to ignore
> hash indices on a replica, using other b-tree indices or even a
> sequential scan?  I know I can do this on a per-connection basis by
> disabling various index lookup methods, but it'd be nice if it just
> ignored invalid indices on its own.

This might work for you:

http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner

Regards,
    Jeff Davis

Re: Ignore hash indices on replicas

From

Scott Marlowe

Date:

20 August 2012, 03:01:36

On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker <steven@likeness.com> wrote:
> I'm using Postgres hash indices on a streaming replica master.
> As is documented, hash indices are not logged, so the replica does not have access to them.
>
> I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that
> show that our particular application is faster by quite a bit when a hash index is available.

You could use a slony slave and have different indexes etc between
master and slave but it's more complex to setup, maintain and monitor
for most people.

Re: Ignore hash indices on replicas

From

Steven Schlansker

Date:

20 August 2012, 17:26:20

On Aug 19, 2012, at 8:01 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker <steven@likeness.com> wrote:
>> I'm using Postgres hash indices on a streaming replica master.
>> As is documented, hash indices are not logged, so the replica does not have access to them.
>>
>> I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that
>> show that our particular application is faster by quite a bit when a hash index is available.
>
> You could use a slony slave and have different indexes etc between
> master and slave but it's more complex to setup, maintain and monitor
> for most people.

Thanks for the suggestion, but we finally have replication working in a way we understand / like and I don't really
considerthis a viable option.  The built-in replication has been treating us very well.

Re: Ignore hash indices on replicas

From

Steven Schlansker

Date:

20 August 2012, 17:29:43

On Aug 19, 2012, at 2:37 PM, Jeff Davis <pgsql@j-davis.com> wrote:

> On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote:
>> I understand that the current wisdom is "don't use hash indices", but
>> (unfortunately?) I have benchmarks that
>> show that our particular application is faster by quite a bit when a
>> hash index is available.
>
> Can you publish the results somewhere? It might provoke some interest.

I might be able to spend some time looking at making this public, but the general parameters are:

122M rows, lookup key is a UUID type.  Lookups are ~1000 random keys at a time (as in, a giant SELECT * FROM table
WHEREkey IN (?,?,?,?,…) 

>
>> I assume that fixing the hash index logging issue hasn't been a
>> priority due to low interest / technical limitations, but I'm curious
>> for a stopgap measure -- can we somehow configure Postgres to ignore
>> hash indices on a replica, using other b-tree indices or even a
>> sequential scan?  I know I can do this on a per-connection basis by
>> disabling various index lookup methods, but it'd be nice if it just
>> ignored invalid indices on its own.
>
> This might work for you:
>
> http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner

Thanks for the link; that looks interesting.  It is a bit unfortunate that I would have to find and exclude indices
manually,but very doable...

Re: Ignore hash indices on replicas

From

Jeff Janes

Date:

20 August 2012, 17:55:13

On Mon, Aug 20, 2012 at 10:29 AM, Steven Schlansker <steven@likeness.com> wrote:
>
> On Aug 19, 2012, at 2:37 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>
>> On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote:
>>> show that our particular application is faster by quite a bit when a
>>> hash index is available.
>>
>> Can you publish the results somewhere? It might provoke some interest.
>
> I might be able to spend some time looking at making this public, but the general parameters are:
>
> 122M rows, lookup key is a UUID type.  Lookups are ~1000 random keys at a time (as in, a giant SELECT * FROM table
WHEREkey IN (?,?,?,?,…) 

How well cached is the data?

If it has to be read from disk, then a bitmap index scan on a btree
index with effective_io_concurrency set high might do quite well,
assuming you can convince the planner to use one.

Cheers,

Jeff