Thread: Indexes slower when used in decending vs. ascending order?
I was under the impression that Postgres did not need explicit descending order indexes created and that standard indexes would work fine. I am noticing a significant speed difference between the following two queries (the one using "ORDER BY clientkey desc, premiseskey desc, logtime desc, logkey desc" takes 19 seconds to execute, versus almost immediate execution of the "ORDER BY clientkey, premiseskey, logtime, logkey") and was wondering if there was anything I could do to make execution any faster. Any thoughts? Queries follow: The slow one: vigprem=> explain SELECT * FROM log WHERE clientkey in ('000000004000000000010000000001') AND premiseskey in ('000000004000000000030000000001') and logicaldel = 'N' /*VIG_GEN*/ ORDER BY clientkey desc, premiseskey desc, logtime desc, logkey desc LIMIT 20 OFFSET 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..74.89 rows=20 width=548) -> Index Scan Backward using logtime_index on log (cost=0.00..6195163.59 rows=1654578 width=548) Index Cond: ((clientkey = '000000004000000000010000000001'::bpchar) AND (premiseskey = '000000004000000000030000000001'::bpchar)) Filter: (logicaldel = 'N'::bpchar) (4 rows) The fast one: vigprem=> explain SELECT * FROM log WHERE clientkey in ('000000004000000000010000000001') AND premiseskey in ('000000004000000000030000000001') and logicaldel = 'N' /*VIG_GEN*/ ORDER BY clientkey, premiseskey, logtime, logkey LIMIT 20 OFFSET 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..74.89 rows=20 width=548) -> Index Scan using logtime_index on log (cost=0.00..6195163.59 rows=1654578 width=548) Index Cond: ((clientkey = '000000004000000000010000000001'::bpchar) AND (premiseskey = '000000004000000000030000000001'::bpchar)) Filter: (logicaldel = 'N'::bpchar) (4 rows)
Alasdair Young <ayoung@vigilos.com> writes: > I am noticing a significant speed difference between the following two > queries (the one using "ORDER BY clientkey desc, premiseskey desc, > logtime desc, logkey desc" takes 19 seconds to execute, versus almost > immediate execution of the "ORDER BY clientkey, premiseskey, logtime, > logkey") and was wondering if there was anything I could do to make > execution any faster. I'd bet that the problem is the "filter" on logicaldel --- is the value of that heavily correlated with the index ordering? You could be scanning through many many rows of the index to reach the first (last) one with logicaldel = 'N'. regards, tom lane
On Tue, 2006-04-11 at 14:18 -0400, Tom Lane wrote: > Alasdair Young <ayoung@vigilos.com> writes: > > I am noticing a significant speed difference between the following two > > queries (the one using "ORDER BY clientkey desc, premiseskey desc, > > logtime desc, logkey desc" takes 19 seconds to execute, versus almost > > immediate execution of the "ORDER BY clientkey, premiseskey, logtime, > > logkey") and was wondering if there was anything I could do to make > > execution any faster. > > I'd bet that the problem is the "filter" on logicaldel --- is the value > of that heavily correlated with the index ordering? You could be > scanning through many many rows of the index to reach the first (last) > one with logicaldel = 'N'. > > regards, tom lane Removing the logicaldel seems to give the same results. (The archives seem to indicate the two queries should take roughly the same amount of time) Any other ideas? - alasdair The slow version: vigprem=> explain analyze SELECT * FROM log WHERE clientkey in ('000000004000000000010000000001') AND premiseskey in ('000000004000000000030000000001') /*VIG_GEN*/ ORDER BY clientkey desc, premiseskey desc, logtime desc, logkey desc LIMIT 20 OFFSET 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..74.84 rows=20 width=548) (actual time=19799.54..19799.95 rows=20 loops=1) -> Index Scan Backward using logtime_index on log (cost=0.00..6191056.91 rows=1654586 width=548) (actual time=19799.54..19799.92 rows=21 loops=1) Index Cond: ((clientkey = '000000004000000000010000000001'::bpchar) AND (premiseskey = '000000004000000000030000000001'::bpchar)) Total runtime: 19800.03 msec (4 rows) The fast version: vigprem=> explain analyze SELECT * FROM log WHERE clientkey in ('000000004000000000010000000001') AND premiseskey in ('000000004000000000030000000001') /*VIG_GEN*/ ORDER BY clientkey, premiseskey, logtime, logkey LIMIT 20 OFFSET 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..74.84 rows=20 width=548) (actual time=0.35..0.75 rows=20 loops=1) -> Index Scan using logtime_index on log (cost=0.00..6191056.91 rows=1654586 width=548) (actual time=0.35..0.72 rows=21 loops=1) Index Cond: ((clientkey = '000000004000000000010000000001'::bpchar) AND (premiseskey = '000000004000000000030000000001'::bpchar)) Total runtime: 0.80 msec (4 rows)
Alasdair Young <ayoung@vigilos.com> writes: > On Tue, 2006-04-11 at 14:18 -0400, Tom Lane wrote: >> I'd bet that the problem is the "filter" on logicaldel --- is the value >> of that heavily correlated with the index ordering? > Removing the logicaldel seems to give the same results. Hmm. Maybe a whole lot of recently-dead row versions near the upper end of the index range? > (The archives seem to indicate the two queries should take roughly the > same amount of time) Yeah, the scan speed should be essentially the same in either direction, I'd think. I have to suppose that the backwards scan is fetching a whole lot of rows that it ends up not returning. Offhand the only reasons I can think of for that are that the rows are not visible according to the current MVCC snapshot, or because of a post-index filter condition. > Limit (cost=0.00..74.84 rows=20 width=548) (actual > time=19799.54..19799.95 rows=20 loops=1) > -> Index Scan Backward using logtime_index on log > (cost=0.00..6191056.91 rows=1654586 width=548) (actual > time=19799.54..19799.92 rows=21 loops=1) > Index Cond: ((clientkey = > '000000004000000000010000000001'::bpchar) AND (premiseskey = > '000000004000000000030000000001'::bpchar)) > Total runtime: 19800.03 msec > (4 rows) That's pretty spectacular. There is no way that Postgres is only fetching one row per second; it's got to be discarding a whole lot of rows under the hood. It'd be useful to run VACUUM VERBOSE on this table and see what it's got to say. regards, tom lane
>That's pretty spectacular. There is no way that Postgres is only > fetching one row per second; it's got to be discarding a whole lot > of rows under the hood. It'd be useful to run VACUUM VERBOSE on > this table and see what it's got to say. vigprem=# vacuum verbose log; INFO: --Relation public.log-- INFO: Pages 82731: Changed 0, Empty 0; Tup 1654586: Vac 0, Keep 0, UnUsed 0. Total CPU 0.70s/0.26u sec elapsed 10.63 sec. VACUUM Doesn't mean a lot to me. (I've only just starting using postgres, coming from an Oracle background) I'm going to go try to work out what possible post-index filter conditions could be in place... Thanks for your help. If you have any further ideas, please post them. I'm convinced I can get this to go faster :) - alasdair On Tue, 2006-04-11 at 16:24 -0400, Tom Lane wrote: > Alasdair Young <ayoung@vigilos.com> writes: > > On Tue, 2006-04-11 at 14:18 -0400, Tom Lane wrote: > >> I'd bet that the problem is the "filter" on logicaldel --- is the value > >> of that heavily correlated with the index ordering? > > > Removing the logicaldel seems to give the same results. > > Hmm. Maybe a whole lot of recently-dead row versions near the upper end > of the index range? > > > (The archives seem to indicate the two queries should take roughly the > > same amount of time) > > Yeah, the scan speed should be essentially the same in either direction, > I'd think. I have to suppose that the backwards scan is fetching a > whole lot of rows that it ends up not returning. Offhand the only > reasons I can think of for that are that the rows are not visible > according to the current MVCC snapshot, or because of a post-index > filter condition. > > > Limit (cost=0.00..74.84 rows=20 width=548) (actual > > time=19799.54..19799.95 rows=20 loops=1) > > -> Index Scan Backward using logtime_index on log > > (cost=0.00..6191056.91 rows=1654586 width=548) (actual > > time=19799.54..19799.92 rows=21 loops=1) > > Index Cond: ((clientkey = > > '000000004000000000010000000001'::bpchar) AND (premiseskey = > > '000000004000000000030000000001'::bpchar)) > > Total runtime: 19800.03 msec > > (4 rows) > > That's pretty spectacular. There is no way that Postgres is only > fetching one row per second; it's got to be discarding a whole lot > of rows under the hood. It'd be useful to run VACUUM VERBOSE on > this table and see what it's got to say. > > regards, tom lane
Alasdair Young <ayoung@vigilos.com> writes: >> That's pretty spectacular. There is no way that Postgres is only >> fetching one row per second; it's got to be discarding a whole lot >> of rows under the hood. It'd be useful to run VACUUM VERBOSE on >> this table and see what it's got to say. > vigprem=# vacuum verbose log; > INFO: --Relation public.log-- > INFO: Pages 82731: Changed 0, Empty 0; Tup 1654586: Vac 0, Keep 0, > UnUsed 0. > Total CPU 0.70s/0.26u sec elapsed 10.63 sec. > VACUUM [ looks again... ] Hmm, I can give you one quick performance tip: use something newer than PG 7.3.x. VACUUM's output hasn't looked like that in a long time. Realizing that you're using such an old version, I now have another theory: are there by any chance a whole lot of rows matching the WHERE condition? If so I think this is explained by a problem we fixed in PG 8.0: the original coding for btree index search didn't handle lots of equal keys very well. IIRC, what it's doing is descending the search tree to find the first entry matching the WHERE condition --- and then, because you asked for a backwards scan, it sequentially steps forward over all the matching keys to get in position to scan them backwards. In 8.0 we made the tree descent logic smart enough to arrive at the end of the run of matching keys to start with. 2003-12-20 20:23 tgl * src/: backend/access/nbtree/nbtinsert.c, backend/access/nbtree/nbtpage.c, backend/access/nbtree/nbtsearch.c, include/access/nbtree.h: Improve btree's initial-positioning-strategy code so that we never need to step more than one entry after descending the search tree to arrive at the correct place to start the scan. This can improve the behavior substantially when there are many entries equal to the chosen boundary value. Per suggestion from Dmitry Tkach, 14-Jul-03. regards, tom lane
This is exactly what I needed to know. Thanks again :) (Now I just need to get Postgres 8.0 running on a RedHat 7.2/7.3 box :) (Compiling 8.0 from source seems to require a whole stack of things that are missing from RH7.2.) Ah well :) - alasdair -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, April 11, 2006 1:54 PM To: Alasdair Young Cc: pgsql-novice@postgresql.org Subject: Re: [NOVICE] Indexes slower when used in decending vs. ascending order? Alasdair Young <ayoung@vigilos.com> writes: >> That's pretty spectacular. There is no way that Postgres is only >> fetching one row per second; it's got to be discarding a whole lot of >> rows under the hood. It'd be useful to run VACUUM VERBOSE on this >> table and see what it's got to say. > vigprem=# vacuum verbose log; > INFO: --Relation public.log-- > INFO: Pages 82731: Changed 0, Empty 0; Tup 1654586: Vac 0, Keep 0, > UnUsed 0. > Total CPU 0.70s/0.26u sec elapsed 10.63 sec. > VACUUM [ looks again... ] Hmm, I can give you one quick performance tip: use something newer than PG 7.3.x. VACUUM's output hasn't looked like that in a long time. Realizing that you're using such an old version, I now have another theory: are there by any chance a whole lot of rows matching the WHERE condition? If so I think this is explained by a problem we fixed in PG 8.0: the original coding for btree index search didn't handle lots of equal keys very well. IIRC, what it's doing is descending the search tree to find the first entry matching the WHERE condition --- and then, because you asked for a backwards scan, it sequentially steps forward over all the matching keys to get in position to scan them backwards. In 8.0 we made the tree descent logic smart enough to arrive at the end of the run of matching keys to start with. 2003-12-20 20:23 tgl * src/: backend/access/nbtree/nbtinsert.c, backend/access/nbtree/nbtpage.c, backend/access/nbtree/nbtsearch.c, include/access/nbtree.h: Improve btree's initial-positioning-strategy code so that we never need to step more than one entry after descending the search tree to arrive at the correct place to start the scan. This can improve the behavior substantially when there are many entries equal to the chosen boundary value. Per suggestion from Dmitry Tkach, 14-Jul-03. regards, tom lane
Oh: and as a followup, I ran the same query against the same data in 8.1 and it returned in 0.5 seconds. Looks like this bug/issue/feature in pre-8.0 versions was to blame. Thanks once again for your help. - alasdair -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, April 11, 2006 1:54 PM To: Alasdair Young Cc: pgsql-novice@postgresql.org Subject: Re: [NOVICE] Indexes slower when used in decending vs. ascending order? Alasdair Young <ayoung@vigilos.com> writes: >> That's pretty spectacular. There is no way that Postgres is only >> fetching one row per second; it's got to be discarding a whole lot of >> rows under the hood. It'd be useful to run VACUUM VERBOSE on this >> table and see what it's got to say. > vigprem=# vacuum verbose log; > INFO: --Relation public.log-- > INFO: Pages 82731: Changed 0, Empty 0; Tup 1654586: Vac 0, Keep 0, > UnUsed 0. > Total CPU 0.70s/0.26u sec elapsed 10.63 sec. > VACUUM [ looks again... ] Hmm, I can give you one quick performance tip: use something newer than PG 7.3.x. VACUUM's output hasn't looked like that in a long time. Realizing that you're using such an old version, I now have another theory: are there by any chance a whole lot of rows matching the WHERE condition? If so I think this is explained by a problem we fixed in PG 8.0: the original coding for btree index search didn't handle lots of equal keys very well. IIRC, what it's doing is descending the search tree to find the first entry matching the WHERE condition --- and then, because you asked for a backwards scan, it sequentially steps forward over all the matching keys to get in position to scan them backwards. In 8.0 we made the tree descent logic smart enough to arrive at the end of the run of matching keys to start with. 2003-12-20 20:23 tgl * src/: backend/access/nbtree/nbtinsert.c, backend/access/nbtree/nbtpage.c, backend/access/nbtree/nbtsearch.c, include/access/nbtree.h: Improve btree's initial-positioning-strategy code so that we never need to step more than one entry after descending the search tree to arrive at the correct place to start the scan. This can improve the behavior substantially when there are many entries equal to the chosen boundary value. Per suggestion from Dmitry Tkach, 14-Jul-03. regards, tom lane
"Alasdair Young" <ayoung@vigilos.com> writes: > (Now I just need to get Postgres 8.0 running on a RedHat 7.2/7.3 box :) Um ... I work for Red Hat, so I think I have to slip you the bad news twice today: no sane person would still be running that OS version either. 7.2 was obsolete about the time I went to work at Red Hat (in 2001). If you want something that was current last year and is probably pretty stable now, PG 8.0.x on Fedora Core 4 would be a good bet. Personally I'd recommend PG 8.1.x on FC5 if you are just looking at deploying now. It'll be much more maintainable a year or two out. regards, tom lane
> Um ... I work for Red Hat, so I think I have to slip you the bad news > twice today: no sane person would still be running that OS version > either. 7.2 was obsolete about the time I went to work at Red Hat > (in 2001). Sadly, some users are stuck. For instance if you have Dialogic D240SC/T1 Rev 1 cards then the latest drivers you can get are the 5.x and they are very picky about kernel versions, streams libs, etc. If you want to build a supported system, you must start with Red Hat 7.2/7.3 and must, among other detailed steps, apply specific kernel patches. Or you can go through a similar set of steps on Windows. I'll take Linux - even an old version. Fortunately I have it on authority of an Intel employee that they will be open-sourcing the Dialogic drivers sometime in the next year so over time this problem will probably disappear. Cheers, Steve
I'm basically stuck with that as well: we run a custom kernel and a stack of drivers that have no support past 7.2 (at least presently) so it looks like I'll just need to work out a way of getting 8.0 to compile on rh7.2 (or patching postgres 7.3 with the bugfix and recompiling :) None of those look like fun, so this issue will probably be added to our stack of "reasons to upgrade to FC5" :) - alasdair -----Original Message----- From: Steve Crawford [mailto:scrawford@pinpointresearch.com] Sent: Wednesday, April 12, 2006 9:43 AM To: Tom Lane Cc: Alasdair Young; pgsql-novice@postgresql.org Subject: Re: [NOVICE] Indexes slower when used in decending vs. ascending order? > Um ... I work for Red Hat, so I think I have to slip you the bad news > twice today: no sane person would still be running that OS version > either. 7.2 was obsolete about the time I went to work at Red Hat (in > 2001). Sadly, some users are stuck. For instance if you have Dialogic D240SC/T1 Rev 1 cards then the latest drivers you can get are the 5.x and they are very picky about kernel versions, streams libs, etc. If you want to build a supported system, you must start with Red Hat 7.2/7.3 and must, among other detailed steps, apply specific kernel patches. Or you can go through a similar set of steps on Windows. I'll take Linux - even an old version. Fortunately I have it on authority of an Intel employee that they will be open-sourcing the Dialogic drivers sometime in the next year so over time this problem will probably disappear. Cheers, Steve