Thread: [PERFORM] Huge difference between ASC and DESC ordering

[PERFORM] Huge difference between ASC and DESC ordering

From
twoflower
Date:
I have the following query
select *
from "JOB_MEMORY_STORAGE" st inner join "JOB_MEMORY" s on s.fk_id_storage = st.id
where st.fk_id_client = 20045
order by s.id asc limit 50

which takes 90 seconds to finish. JOB_MEMORY has 45 million rows, JOB_MEMORY_STORAGE has 50 000 rows.

Query plan:
Limit  (cost=0.98..1971.04 rows=50 width=394) (actual time=93357.197..93357.654 rows=50 loops=1) ->  Nested Loop  (cost=0.98..344637384.09 rows=8746875 width=394) (actual time=93357.194..93357.584 rows=50 loops=1)       ->  Index Scan Backward using "JOB_MEMORY_id_desc" on "JOB_MEMORY" s  (cost=0.56..113858938.25 rows=45452112 width=164) (actual time=0.059..18454.332 rows=18883917 loops=1)       ->  Index Scan using "JOB_MEMORY_STORAGE_pkey" on "JOB_MEMORY_STORAGE" st  (cost=0.41..5.07 rows=1 width=222) (actual time=0.002..0.002 rows=0 loops=18883917)             Index Cond: (id = s.fk_id_storage)             Filter: (fk_id_client = 20045)             Rows Removed by Filter: 1
Planning time: 1.932 ms
Execution time: 93357.745 ms

As you can see, it is indeed using an index JOB_MEMORY_id_desc in a backward direction, but it is very slow.

When I change ordering to desc in the query, the query finishes immediately and the query plan is
Limit  (cost=0.98..1981.69 rows=50 width=394) (actual time=37.577..37.986 rows=50 loops=1) ->  Nested Loop  (cost=0.98..344613154.25 rows=8699235 width=394) (actual time=37.575..37.920 rows=50 loops=1)       ->  Index Scan using "JOB_MEMORY_id_desc" on "JOB_MEMORY" s  (cost=0.56..113850978.19 rows=45448908 width=165) (actual time=0.013..5.117 rows=6610 loops=1)       ->  Index Scan using "JOB_MEMORY_STORAGE_pkey" on "JOB_MEMORY_STORAGE" st  (cost=0.41..5.07 rows=1 width=221) (actual time=0.003..0.003 rows=0 loops=6610)             Index Cond: (id = s.fk_id_storage)             Filter: (fk_id_client = 20045)             Rows Removed by Filter: 1
Planning time: 0.396 ms
Execution time: 38.058 ms

There is also an index on JOB_MEMORY.id. I also tried a composite index on (fk_id_storage, id), but it did not help (and was not actually used).
I ran ANALYZE on both tables.

Postgres 9.6.2, Ubuntu 14.04, 192 GB RAM, SSD, shared_buffers = 8196 MB.
How can I help Postgres execute the query with asc ordering as fast as the one with desc?

Thank you.

View this message in context: Huge difference between ASC and DESC ordering
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Huge difference between ASC and DESC ordering

From
Jeff Janes
Date:
On Mon, Mar 6, 2017 at 6:22 AM, twoflower <standa.kurik@gmail.com> wrote:
I have the following query
select *
from "JOB_MEMORY_STORAGE" st inner join "JOB_MEMORY" s on s.fk_id_storage = st.id
where st.fk_id_client = 20045
order by s.id asc limit 50

The query stops as soon as it finds 50 rows which meet fk_id_client = 20045.  When you order one way, it needs to cover 18883917 to find those 50.  When you order the other way, it takes 6610 to find those 50.   So the problem is that the tuples which satisfy st.fk_id_client = 20045 all lie towards one end of the s.id range, but PostgreSQL doesn't know that. This is a hard type of problem to solve at a fundamental level.  The best you can do is work around it.  Do you really need the order to be on s.id?  If so, you can get PostgreSQL to stop trying to use the index for ordering purposes by writing that as "order by s.id+0 asc limit 50", or by using a CTE which does the join and have the ORDER BY and LIMIT outside the CTE.

Do you have an index on fk_id_client?  Or perhaps better, (fk_id_client, id)?  How many rows satisfy fk_id_client = 20045?


How can I help Postgres execute the query with asc ordering as fast as the one with desc?

You probably can't.  Your data us well suited to one, and ill suited for the other.  You can probably make it faster than it currently is, but not as fast as the DESC version.

Cheers,

Jeff

Re: [PERFORM] Huge difference between ASC and DESC ordering

From
twoflower
Date:
Thank you Jeff.

There are 7 million rows satisfying fk_id_client = 20045. There is an index on fk_id_client, now I added a composite (fk_id_client, id) index but that did not help.

I see the point of what you are saying, but still don't understand how these two situations (asc vs. desc) are not symmetrical. I mean, there is an ascending index on JOB_MEMORY.id, so why does it matter which end I am picking the data from?

The thing is, even when I force Postgres to use the ascending index on id, it's still orders of magnitude slower than the desc version (even when that one goes through the index backwards).

View this message in context: Re: Huge difference between ASC and DESC ordering
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Huge difference between ASC and DESC ordering

From
Jeff Janes
Date:
On Mon, Mar 6, 2017 at 8:46 AM, twoflower <standa.kurik@gmail.com> wrote:
Thank you Jeff.

There are 7 million rows satisfying fk_id_client = 20045. There is an index on fk_id_client, now I added a composite (fk_id_client, id) index but that did not help.

With 7 million rows, you shouldn't expect any magic here.  But still 7 million is less than 18 million, and you may be able to get that 7 million with more sequential-like IO.

Did you force PostgreSQL to stop using the index on s.id?  If not, do that.  If so, please post the EXPLAIN (analyze) of the plan it does switch to.



I see the point of what you are saying, but still don't understand how these two situations (asc vs. desc) are not symmetrical.

They return different data.  How could they be symmetrical?  You are getting a different 50 rows depending on which way you order the data in the query.  You are **not** getting the same 50 rows, just in a different order from among the 50.

 
I mean, there is an ascending index on JOB_MEMORY.id, so why does it matter which end I am picking the data from?


The query stops as soon as it finds 50 rows which meet fk_id_client = 20045.  When you order one way, it needs to cover 18883917 to find those 50.  When you order the other way, it takes 6610 to find those 50. This fact does not depend on whether the index is ASC or DESC.  If you traverse a DESC index backwards, it has exactly the same issue as if you traverse a ASC index forward.  Either way, once it decides to use that index to obtain the ordering of the query, it has to inspect 18883917 tuples before it satisfies the LIMIT.
 

The thing is, even when I force Postgres to use the ascending index on id, it's still orders of magnitude slower than the desc version (even when that one goes through the index backwards).

Right.  PostgreSQL has to return the rows commanded by your query.  It can't just decide to return a different set of rows because doing so would be faster.  If that is what you want, wrap the whole query into a subselect and move the ORDER BY into the outer query, like "select * from (SELECT ... LIMIT 50) foo order by foo.id"

Changing the ordering direction of the index doesn't change which rows get returned, while changing the ordering direction of the query does.

Cheers,

Jeff

Re: [PERFORM] Huge difference between ASC and DESC ordering

From
twoflower
Date:
Thank you Jeff.


Jeff Janes wrote
> Did you force PostgreSQL to stop using the index on s.id?  If not, do
> that.  If so, please post the EXPLAIN (analyze) of the plan it does switch
> to.

Yes, this



finishes in 20 seconds, which is two times faster than *order by id asc*.
Query plan:




Jeff Janes wrote
> The query stops as soon as it finds 50 rows which meet fk_id_client =
> 20045.  When you order one way, it needs to cover 18883917 to find those
> 50.  When you order the other way, it takes 6610 to find those 50. This
> fact does not depend on whether the index is ASC or DESC.  If you traverse
> a DESC index backwards, it has exactly the same issue as if you traverse a
> ASC index forward.  Either way, once it decides to use that index to
> obtain
> the ordering of the query, it has to inspect 18883917 tuples before it
> satisfies the LIMIT.

I think I finally get it. I investigated the query result set more closely
and realized that indeed the relevant rows start only after > 18 million
rows in the asc *id* order and that's the problem. On the other hand, with
*desc* Postgres very quickly finds 50 rows matching *fk_id_client = 20045*.
So it is just the process of scanning the index and checking the condition
which takes all of the time.

Understanding the problem more, it brought me to a solution I might end up
going with (and which you also suggested by asking whether I really need
ordering the data by *id*), a different order clause which still makes sense
in my scenario:



Finishes in 7 seconds.



Best regards,
Stanislav



--
View this message in context:
http://www.postgresql-archive.org/Huge-difference-between-ASC-and-DESC-ordering-tp5947712p5947887.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.