Home > mailing lists

Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From	James Coleman
Subject	Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date	September 6, 2018 21:04:44
Msg-id	CAAaqYe_tFxFN-+ESfvw5iMEc9GhX1FuMHhr0xZ2tUNFxkvJ46w@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Incremental sort (was: PoC: Partial sort) (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List	pgsql-hackers

Tree view

I disagree because it's not ideal to basically have to append pk to every index in the system just to get the ability to tie-break when there's actually very low likelihood of ties anyway.

A similar use case is trying to batch through a result set, in which case you need a stable sort as well.

The benefit is retaining the generality of indexes (and saving space in them etc.) while still allowing using them for more specific sorts. Any time you paginate or batch this way you benefit from this, which in many applications applies to a very high percentage of indexes.

On Thu, Sep 6, 2018 at 10:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

James Coleman <jtc331@gmail.com> writes:
> A fairly common planning problem for us is what we call "most recent first" queries; i.e., "the 50 most recent <table> rows for a <foreign key>".

> Here's a basic setup:

> -- created_at has very high cardinality
> create table foo(pk serial primary key, owner_fk integer, created_at timestamp);
> create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at);

> -- technically this data guarantees unique created_at values,
> -- but there's no reason it couldn't be modified to have a few
> -- random non-unique values to prove the point
> insert into foo(owner_fk, created_at)
> select i % 100, now() - (i::text || ' minutes')::interval
> from generate_series(1, 1000000) t(i);

> And here's the naive query to get the results we want:

> select *
> from foo
> where owner_fk = 23
> -- pk is only here to disambiguate/guarantee a stable sort
> -- in the rare case that there are collisions in the other
> -- sort field(s)
> order by created_at desc, pk desc
> limit 50;

If you're concerned about the performance of this case, why don't you make
an index that actually matches the query?

regression=# create index on foo (owner_fk, created_at, pk);
CREATE INDEX
regression=# explain analyze select * from foo where owner_fk = 23 order by created_at desc, pk desc limit 50;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..110.92 rows=50 width=16) (actual time=0.151..0.280 rows=50 loops=1)
-> Index Only Scan Backward using foo_owner_fk_created_at_pk_idx on foo (cost=0.42..20110.94 rows=9100 width=16) (actual time=0.146..0.255 rows=50 loops=1)
Index Cond: (owner_fk = 23)
Heap Fetches: 50
Planning Time: 0.290 ms
Execution Time: 0.361 ms
(6 rows)

There may be use-cases for Alexander's patch, but I don't find this
one to be terribly convincing.

regards, tom lane

pgsql-hackers by date:

From: "R, Siva"
Date: 06 September 2018, 21:02:20
Subject: Re: Bug in ginRedoRecompress that causes opaque data on page to beoverrun

From: Michael Paquier
Date: 06 September 2018, 21:56:27
Subject: Re: Prevent concurrent DROP SCHEMA when certain objects are beinginitially created in the namespace

Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

Previous

Next