Thread: How do I bump a row to the front of sort efficiently

How do I bump a row to the front of sort efficiently

From
Sam Saffron
Date:
I have this query:

select * from topics
order by case when id=1 then 0 else 1 end, bumped_at desc
limit 30

It works fine, bumps id 1 to the front of the sort fine but is
terribly inefficient and scans

OTH

"select * from topics where id = 1" is super fast

"select * from topics order by bumped_at desc limit 30" is super fast

Even this is fast, and logically equiv as id is primary key unique

select * from topic
where id = 1000
union all
select * from (
  select * from topics
  where id <> 1000
  order by bumped_at desc
  limit 30
) as x
limit 30


However, the contortions on the above query make it very un-ORM
friendly as I would need to define a view for it but would have no
clean way to pass limits and offsets in.

Is there any clean technique to bump up particular rows to the front
of a sort if a certain condition is met without paying a huge
performance hit?


Re: How do I bump a row to the front of sort efficiently

From
hubert depesz lubaczewski
Date:
On Mon, Feb 02, 2015 at 05:16:40PM +1100, Sam Saffron wrote:
> Even this is fast, and logically equiv as id is primary key unique
> select * from topic
> where id = 1000
> union all
> select * from (
>   select * from topics
>   where id <> 1000
>   order by bumped_at desc
>   limit 30
> ) as x
> limit 30
> Is there any clean technique to bump up particular rows to the front
> of a sort if a certain condition is met without paying a huge
> performance hit?

Why don't you use the union-all approach? If it's fast, and does what
you need ?

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/


Re: How do I bump a row to the front of sort efficiently

From
Tim Clarke
Date:
On 02/02/15 08:40, hubert depesz lubaczewski wrote:
> On Mon, Feb 02, 2015 at 05:16:40PM +1100, Sam Saffron wrote:
>> Even this is fast, and logically equiv as id is primary key unique
>> select * from topic
>> where id = 1000
>> union all
>> select * from (
>>   select * from topics
>>   where id <> 1000
>>   order by bumped_at desc
>>   limit 30
>> ) as x
>> limit 30
>> Is there any clean technique to bump up particular rows to the front
>> of a sort if a certain condition is met without paying a huge
>> performance hit?
> Why don't you use the union-all approach? If it's fast, and does what
> you need ?
>
> depesz
>

Or create another index or column that's build at insertion time with
the sort that you want in it. At least then retrieval time will be fast
at the cost of extra space.

--
Tim Clarke

A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?


Re: How do I bump a row to the front of sort efficiently

From
BladeOfLight16
Date:
On Mon, Feb 2, 2015 at 1:16 AM, Sam Saffron <sam.saffron@gmail.com> wrote:
However, the contortions on the above query make it very un-ORM
friendly as I would need to define a view for it but would have no
clean way to pass limits and offsets in.

This is why ORMs are bad. They make hard problems much harder, and the only benefit is that they maybe make easy problems a little quicker. The cost/savings is heavily skewed toward the cost, since there's no upper bound on the cost and there is a pretty small lower bound on the savings. Micro-ORMs tend to do a better job of not shielding you from (or rather, getting in the way of) the SQL while still providing some good result-to-object translation. Whether even that is necessary depends on your language, though. (For example, in Python, psycopg2 has a built in way of spitting out namedtuples, which means you get result-to-object translation out of the box. That makes even a micro-ORM pretty unnecessary. On the other hand, a micro-ORM that does this well without blocking you from the SQL, such as PetaPOCO, is a boon in .NET.)

If you can, your best bet would probably be to find a way to get your ORM to execute raw SQL (with good parametrization to prevent injection attacks!!!!) and be done with it. It took me way too much experience fighting with an ORM on complicated queries to realize that.

Re: How do I bump a row to the front of sort efficiently

From
BladeOfLight16
Date:
On Tue, Feb 3, 2015 at 9:33 PM, BladeOfLight16 <bladeoflight16@gmail.com> wrote:
This is why ORMs are bad. They make hard problems much harder, and the only benefit is that they maybe make easy problems a little quicker. The cost/savings is heavily skewed toward the cost, since there's no upper bound on the cost and there is a pretty small lower bound on the savings. Micro-ORMs tend to do a better job of not shielding you from (or rather, getting in the way of) the SQL while still providing some good result-to-object translation. Whether even that is necessary depends on your language, though. (For example, in Python, psycopg2 has a built in way of spitting out namedtuples, which means you get result-to-object translation out of the box. That makes even a micro-ORM pretty unnecessary. On the other hand, a micro-ORM that does this well without blocking you from the SQL, such as PetaPOCO, is a boon in .NET.)

If you can, your best bet would probably be to find a way to get your ORM to execute raw SQL (with good parametrization to prevent injection attacks!!!!) and be done with it. It took me way too much experience fighting with an ORM on complicated queries to realize that.

Er, *pretty small upper bound on the savings.

Re: How do I bump a row to the front of sort efficiently

From
Sam Saffron
Date:
Note: I still consider this a bug/missing feature of sorts since the
planner could do better here, and there is no real clean way of
structuring a query to perform efficiently here, which is why I
erroneously cross posted this to hacker initially:


# create table testing(id serial primary key, data varchar);
# insert into testing(data) select 'test' from pg_tables a,pg_tables
b,pg_tables c,pg_tables d limit 100000


# explain select * from testing order by id limit 30;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Limit  (cost=0.29..1.24 rows=30 width=9)
   ->  Index Scan using testing_pkey on testing  (cost=0.29..3148.29
rows=100000 width=9)
(2 rows)

# explain select * from testing where id = 1000;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Index Scan using testing_pkey on testing  (cost=0.29..8.31 rows=1 width=9)
   Index Cond: (id = 1000)
(2 rows)

# explain select * from testing order by case when id = 1000 then 0
else 1 end, id limit 30;
                                QUERY PLAN
---------------------------------------------------------------------------
 Limit  (cost=4744.45..4744.52 rows=30 width=9)
   ->  Sort  (cost=4744.45..4994.45 rows=100000 width=9)
         Sort Key: (CASE WHEN (id = 1000) THEN 0 ELSE 1 END), id
         ->  Seq Scan on testing  (cost=0.00..1791.00 rows=100000 width=9)
(4 rows)

Cost goes through the roof for a query that pg could have have done
better with if it were able to better "understand" the case statement.


Re: How do I bump a row to the front of sort efficiently

From
BladeOfLight16
Date:
On Tue, Feb 3, 2015 at 11:28 PM, Sam Saffron <sam.saffron@gmail.com> wrote:
Note: I still consider this a bug/missing feature of sorts since the
planner could do better here, and there is no real clean way of
structuring a query to perform efficiently here, which is why I
erroneously cross posted this to hacker initially:

No, it should not be considered a bug or a deficiency. You're telling the system to use a computed value that uses an arbitrary logic construct for sorting, and that's before you actually give it a filter to work with. You're also trying to abuse ORDER BY to make LIMIT do filtering. How do you expect that to go? I would expect the planner to do exactly what you told it you wanted: sort the entire table by that computed value, and it would have to do a sequential scan to compute the value for every row before it knew which ones came first for the LIMIT.

CASE is well known to cause optimization problems; arbitrary conditional logic isn't especially conducive to a planner optimizing things. In general, the planner has no idea what the logic really means, so the planner would have to have some kind of special logic trying to pick up on this case. Your particular use case is uncommon; why should the planner code be junked up with a thousand little optimizations for uncommon situations like this (which would make it unmaintainable) when you already have a reasonable alternative? PG is great for a reason: the devs have made a lot of fantastic choices in designing it. Trying to keep the planner relatively simple is one of them, if I recall correctly.

What you want is well represented by a UNION query: you want it to fetch one particular row by ID, and then you want to tack on 29 other particular rows based on a particular sort order and possibly an offset. These are two completely disparate ways of fetching data; of course the most optimal way is going to be to essentially write two queries and put the results together. That's also going to be the clearest way of writing the query, so the next person to work on it knows what you were doing.

And that aside, you don't even need to use CASE. You could've just used (id = 1), which would give you a boolean result. You can most certainly sort by a boolean. (I believe PG sorts TRUE first.) You should see if that gets optimized at all. (If it doesn't, I won't be surprised and everything else I said still holds.)

By the way, you can do better on your UNION query:

select * from topic
where id = 1000
union all
(select * from topic
where id <> 1000
order by bumped_at desc
limit 29)

I imagine your original would be at risk of LIMITing out the very row you seek to get at the "top", since you don't have an ORDER BY to tell it which ones to keep during the outer LIMIT.

Re: How do I bump a row to the front of sort efficiently

From
Sam Saffron
Date:
The union hack may be able to work, but what I want is slightly more complex

I want this to work efficiently, even without the case it chokes:

explain select * from testing order by id in (100,2,-1) desc, id limit 30;
                                QUERY PLAN
---------------------------------------------------------------------------
 Limit  (cost=4869.45..4869.52 rows=30 width=9)
   ->  Sort  (cost=4869.45..5119.45 rows=100000 width=9)
         Sort Key: ((id = ANY ('{100,2,-1}'::integer[]))), id
         ->  Seq Scan on testing  (cost=0.00..1916.00 rows=100000 width=9)
(4 rows)

I need to be able to offset and limit the union hack in a view, which
is proving very tricky.

On Wed, Feb 4, 2015 at 9:15 PM, BladeOfLight16 <bladeoflight16@gmail.com> wrote:
> On Tue, Feb 3, 2015 at 11:28 PM, Sam Saffron <sam.saffron@gmail.com> wrote:
>>
>> Note: I still consider this a bug/missing feature of sorts since the
>> planner could do better here, and there is no real clean way of
>> structuring a query to perform efficiently here, which is why I
>> erroneously cross posted this to hacker initially:
>
>
> No, it should not be considered a bug or a deficiency. You're telling the
> system to use a computed value that uses an arbitrary logic construct for
> sorting, and that's before you actually give it a filter to work with.
> You're also trying to abuse ORDER BY to make LIMIT do filtering. How do you
> expect that to go? I would expect the planner to do exactly what you told it
> you wanted: sort the entire table by that computed value, and it would have
> to do a sequential scan to compute the value for every row before it knew
> which ones came first for the LIMIT.
>
> CASE is well known to cause optimization problems; arbitrary conditional
> logic isn't especially conducive to a planner optimizing things. In general,
> the planner has no idea what the logic really means, so the planner would
> have to have some kind of special logic trying to pick up on this case. Your
> particular use case is uncommon; why should the planner code be junked up
> with a thousand little optimizations for uncommon situations like this
> (which would make it unmaintainable) when you already have a reasonable
> alternative? PG is great for a reason: the devs have made a lot of fantastic
> choices in designing it. Trying to keep the planner relatively simple is one
> of them, if I recall correctly.
>
> What you want is well represented by a UNION query: you want it to fetch one
> particular row by ID, and then you want to tack on 29 other particular rows
> based on a particular sort order and possibly an offset. These are two
> completely disparate ways of fetching data; of course the most optimal way
> is going to be to essentially write two queries and put the results
> together. That's also going to be the clearest way of writing the query, so
> the next person to work on it knows what you were doing.
>
> And that aside, you don't even need to use CASE. You could've just used (id
> = 1), which would give you a boolean result. You can most certainly sort by
> a boolean. (I believe PG sorts TRUE first.) You should see if that gets
> optimized at all. (If it doesn't, I won't be surprised and everything else I
> said still holds.)
>
> By the way, you can do better on your UNION query:
>
> select * from topic
> where id = 1000
> union all
> (select * from topic
> where id <> 1000
> order by bumped_at desc
> limit 29)
>
> I imagine your original would be at risk of LIMITing out the very row you
> seek to get at the "top", since you don't have an ORDER BY to tell it which
> ones to keep during the outer LIMIT.


Re: How do I bump a row to the front of sort efficiently

From
Paul Jungwirth
Date:
>> I imagine your original would be at risk of LIMITing out the very row you
>> seek to get at the "top", since you don't have an ORDER BY to tell it which
>> ones to keep during the outer LIMIT.

Here is an old thread about combining ORDER BY with UNION:

http://www.postgresql.org/message-id/16814.1280268424@sss.pgh.pa.us

So I think this query would work:

select * from topic
where id = 1000
union all
(select * from topic
where id <> 1000
order by bumped_at desc
limit 29)
order by case when id = 1000 then 0 else 1 end, bumped_at desc
;

> I need to be able to offset and limit the union hack in a view, which
> is proving very tricky.

Since this is sort of a "parameterized view" (which Postgres does not
have) you are probably better off figuring out how to make the UNION
query work with your ORM. What ORM is it? Maybe someone here can help
you with that. Or maybe instead of a view you could write a
set-returning function, e.g. as described here:

http://stackoverflow.com/questions/11401749/pass-in-where-parameters-to-postgresql-view

Paul

--
_________________________________
Pulchritudo splendor veritatis.


Re: How do I bump a row to the front of sort efficiently

From
Rémi Cura
Date:
Hey,
I'm not a guru, here is what I understood.
You are mixing several problems in the same question :
 - 1. why the planner isn't more efficient
 - 2. why the workaround is difficult to use with an ORM.

for 1. you can't do much (as said by others, you don't really need a case here anyway). I think using a CASE is equivalent for the planner to using your own custom blackbox function. So no way to improve that.
for 2. : if you can't pass limit and offset in your ORM,
a small workaround is to number your row following the order you defined with the function row_number() over(your order here),
then you can use your ORM to design where conditions equivalent to limit and offset :

WHERE row_number BETWEEN your_offset AND your_limit

Cheers,
Rémi-C

2015-02-04 21:40 GMT+01:00 Paul Jungwirth <pj@illuminatedcomputing.com>:
>> I imagine your original would be at risk of LIMITing out the very row you
>> seek to get at the "top", since you don't have an ORDER BY to tell it which
>> ones to keep during the outer LIMIT.

Here is an old thread about combining ORDER BY with UNION:

http://www.postgresql.org/message-id/16814.1280268424@sss.pgh.pa.us

So I think this query would work:

select * from topic
where id = 1000
union all
(select * from topic
where id <> 1000
order by bumped_at desc
limit 29)
order by case when id = 1000 then 0 else 1 end, bumped_at desc
;

> I need to be able to offset and limit the union hack in a view, which
> is proving very tricky.

Since this is sort of a "parameterized view" (which Postgres does not
have) you are probably better off figuring out how to make the UNION
query work with your ORM. What ORM is it? Maybe someone here can help
you with that. Or maybe instead of a view you could write a
set-returning function, e.g. as described here:

http://stackoverflow.com/questions/11401749/pass-in-where-parameters-to-postgresql-view

Paul

--
_________________________________
Pulchritudo splendor veritatis.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: How do I bump a row to the front of sort efficiently

From
Paul Jungwirth
Date:
> Or maybe instead of a view you could write a
> set-returning function, e.g. as described here:

I thought I'd see if I could make this work just for fun. Here is a
simple proof of concept (on 9.3):

-- DROP TABLE IF EXISTS topics;
CREATE TABLE topics (
  id INTEGER PRIMARY KEY,
  bumped_at INTEGER NOT NULL
);
INSERT INTO topics
SELECT a, a * 2
FROM   generate_series(1, 1000) s(a)
;

CREATE OR REPLACE FUNCTION topics_sorted_after_id(INT, INT)
RETURNS TABLE(id int, after_top int, bumped_at int)
AS $$
SELECT  id, 0 AS after_top, bumped_at
FROM    topics
WHERE   id = $1
UNION ALL
(SELECT id, 1 AS after_top, bumped_at
 FROM   topics
 WHERE  id IS DISTINCT FROM $1
 ORDER BY bumped_at DESC
 LIMIT $2 - 1)
ORDER BY after_top, bumped_at DESC
$$
LANGUAGE sql;

SELECT * FROM topics_sorted_after_id(45, 30);

That looks to me like it gives the right results. I'm curious if
RETURNS TABLE is the right approach to use here or if there is
something nicer.

What if the ORM insists on `FROM topics`? Is there any way to rewrite
the query or function to work around that?

Paul

--
_________________________________
Pulchritudo splendor veritatis.