Thread: Ignore hash indices on replicas
I'm using Postgres hash indices on a streaming replica master. As is documented, hash indices are not logged, so the replica does not have access to them. I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that show that our particular application is faster by quite a bit when a hash index is available. I assume that fixing the hash index logging issue hasn't been a priority due to low interest / technical limitations, butI'm curious for a stopgap measure -- can we somehow configure Postgres to ignore hash indices on a replica, using otherb-tree indices or even a sequential scan? I know I can do this on a per-connection basis by disabling various indexlookup methods, but it'd be nice if it just ignored invalid indices on its own. I've not seen much reference to this problem around, but I do apologize if I've missed it in the manual or it is extremelyobvious how you do this :) Thanks, Steven
On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote: > I understand that the current wisdom is "don't use hash indices", but > (unfortunately?) I have benchmarks that > show that our particular application is faster by quite a bit when a > hash index is available. Can you publish the results somewhere? It might provoke some interest. > I assume that fixing the hash index logging issue hasn't been a > priority due to low interest / technical limitations, but I'm curious > for a stopgap measure -- can we somehow configure Postgres to ignore > hash indices on a replica, using other b-tree indices or even a > sequential scan? I know I can do this on a per-connection basis by > disabling various index lookup methods, but it'd be nice if it just > ignored invalid indices on its own. This might work for you: http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner Regards, Jeff Davis
On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker <steven@likeness.com> wrote: > I'm using Postgres hash indices on a streaming replica master. > As is documented, hash indices are not logged, so the replica does not have access to them. > > I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that > show that our particular application is faster by quite a bit when a hash index is available. You could use a slony slave and have different indexes etc between master and slave but it's more complex to setup, maintain and monitor for most people.
On Aug 19, 2012, at 8:01 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker <steven@likeness.com> wrote: >> I'm using Postgres hash indices on a streaming replica master. >> As is documented, hash indices are not logged, so the replica does not have access to them. >> >> I understand that the current wisdom is "don't use hash indices", but (unfortunately?) I have benchmarks that >> show that our particular application is faster by quite a bit when a hash index is available. > > You could use a slony slave and have different indexes etc between > master and slave but it's more complex to setup, maintain and monitor > for most people. Thanks for the suggestion, but we finally have replication working in a way we understand / like and I don't really considerthis a viable option. The built-in replication has been treating us very well.
On Aug 19, 2012, at 2:37 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote: >> I understand that the current wisdom is "don't use hash indices", but >> (unfortunately?) I have benchmarks that >> show that our particular application is faster by quite a bit when a >> hash index is available. > > Can you publish the results somewhere? It might provoke some interest. I might be able to spend some time looking at making this public, but the general parameters are: 122M rows, lookup key is a UUID type. Lookups are ~1000 random keys at a time (as in, a giant SELECT * FROM table WHEREkey IN (?,?,?,?,…) > >> I assume that fixing the hash index logging issue hasn't been a >> priority due to low interest / technical limitations, but I'm curious >> for a stopgap measure -- can we somehow configure Postgres to ignore >> hash indices on a replica, using other b-tree indices or even a >> sequential scan? I know I can do this on a per-connection basis by >> disabling various index lookup methods, but it'd be nice if it just >> ignored invalid indices on its own. > > This might work for you: > > http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner Thanks for the link; that looks interesting. It is a bit unfortunate that I would have to find and exclude indices manually,but very doable...
On Mon, Aug 20, 2012 at 10:29 AM, Steven Schlansker <steven@likeness.com> wrote: > > On Aug 19, 2012, at 2:37 PM, Jeff Davis <pgsql@j-davis.com> wrote: > >> On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote: >>> show that our particular application is faster by quite a bit when a >>> hash index is available. >> >> Can you publish the results somewhere? It might provoke some interest. > > I might be able to spend some time looking at making this public, but the general parameters are: > > 122M rows, lookup key is a UUID type. Lookups are ~1000 random keys at a time (as in, a giant SELECT * FROM table WHEREkey IN (?,?,?,?,…) How well cached is the data? If it has to be read from disk, then a bitmap index scan on a btree index with effective_io_concurrency set high might do quite well, assuming you can convince the planner to use one. Cheers, Jeff