Thread: Limit and inherited tables

Limit and inherited tables

From

Konstantin Knizhnik

Date:

15 January 2016, 14:49:44

Hi,

I am sorry if this question was already discussed but I failed to find 
any information about in archive.
I noticed that LIMIT clause is not pushed down to inherited tables.
Consider the following tables:

create table foo(x integer primary key);
create table foo1 () inherits(foo);
create table foo2 () inherits(foo);
insert into foo1 values (generate_series(0,100000));
insert into foo2 values (generate_series(0,100000));


explain select * from foo order by x limit 1;                               QUERY PLAN
------------------------------------------------------------------------ Limit  (cost=5.10..5.10 rows=1 width=4)   ->
Sort (cost=5.10..5.61 rows=200000 width=4)         Sort Key: foo.x         ->  Append  (cost=0.00..4.06 rows=200000
width=4)              ->  Seq Scan on foo  (cost=0.00..0.00 rows=1 width=4)               ->  Seq Scan on foo1
(cost=0.00..2.03rows=100000 width=4)               ->  Seq Scan on foo2  (cost=0.00..2.03 rows=100000 width=4)
 
(7 rows)

So Postgres has to merge two large data sets and sort the result, while 
the optimal plan is to take just one record from each inherited table, 
sort 2 records and then limit the result.

Such optimization will be especially useful in case of using 
postgres_fdw - when inherited tables are located at remote nodes.
Are there any plans to support this optimization or may be somebody is 
already working on it?
Otherwise I can try to investigate it and propose optimizer patch for it.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Limit and inherited tables

From

Tom Lane

Date:

15 January 2016, 14:58:43

Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> I noticed that LIMIT clause is not pushed down to inherited tables.

It is when appropriate.

> Consider the following tables:

> create table foo(x integer primary key);
> create table foo1 () inherits(foo);
> create table foo2 () inherits(foo);
> insert into foo1 values (generate_series(0,100000));
> insert into foo2 values (generate_series(0,100000));

This example is lacking indexes on the child tables, which is
why the plan shown is about as good as you're going to get.
The contents of foo1 and foo2 have to be read in entirety in any
case, and sorting them separately is not a win compared to doing
a single sort.

With indexes, you get something like
Limit  (cost=0.73..0.78 rows=1 width=4)  ->  Merge Append  (cost=0.73..9778.76 rows=200003 width=4)        Sort Key:
foo.x       ->  Index Only Scan using foo_pkey on foo  (cost=0.12..8.14 rows=1 width=4)        ->  Index Only Scan
usingfoo1_x_idx on foo1  (cost=0.29..3050.31 rows=100001 width=4)        ->  Index Only Scan using foo2_x_idx on foo2
(cost=0.29..3050.31rows=100001 width=4)
 
        regards, tom lane

Re: Limit and inherited tables

From

Konstantin Knizhnik

Date:

15 January 2016, 16:17:08

> This example is lacking indexes on the child tables, which is
> why the plan shown is about as good as you're going to get.
> The contents of foo1 and foo2 have to be read in entirety in any
> case, and sorting them separately is not a win compared to doing
> a single sort.
It is true, but not in case of FDW connected to remote host.
In this case sending large volumes of data through network will be very 
inefficient.

There will be no problem if FDW can provide index scan - in this case 
MergeAppend will fetch only required number of records:

postgres=# explain analyze select * from t order by u limit 1;
QUERYPLAN
 
-----------------------------------------------------------------------------------------------------------------------
Limit (cost=300.17..300.23 rows=1 width=8) (actual time=4.588..4.588 
 
rows=1 loops=1)   ->  Merge Append  (cost=300.17..762.76 rows=7681 width=8) (actual 
time=4.586..4.586 rows=1 loops=1)         Sort Key: t.u         ->  Index Scan using t_pkey on t  (cost=0.12..8.14
rows=1
 
width=8) (actual time=0.003..0.003 rows=0 loops=1)         ->  Foreign Scan on t_fdw1  (cost=100.00..193.92 rows=2560 
width=8) (actual time=1.532..1.532 rows=1 loops=1)         ->  Foreign Scan on t_fdw2  (cost=100.00..193.92 rows=2560 
width=8) (actual time=1.510..1.510 rows=1 loops=1)         ->  Foreign Scan on t_fdw3  (cost=100.00..193.92 rows=2560 
width=8) (actual time=1.535..1.535 rows=1 loops=1)

But if sort is performed by non-indexed fields, then current behaviour 
will be inefficient and can be significantly improved by pushing limits 
to remote hosts.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Limit and inherited tables

From

Tom Lane

Date:

15 January 2016, 16:30:38

Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>> This example is lacking indexes on the child tables, which is
>> why the plan shown is about as good as you're going to get.
>> The contents of foo1 and foo2 have to be read in entirety in any
>> case, and sorting them separately is not a win compared to doing
>> a single sort.

> It is true, but not in case of FDW connected to remote host.
> In this case sending large volumes of data through network will be very 
> inefficient.

If the FDW isn't providing a sorted path, there is no way to improve the
situation much.  You can't just "push the LIMIT", you'd have to push
"ORDER BY ... LIMIT", which will mean that a sort has to happen anyway.
If the remote end can do a fast-start sort efficiently, it should report
that, and then a merge-append plan is just fine.
        regards, tom lane

Re: Limit and inherited tables

From

Ashutosh Bapat

Date:

18 January 2016, 06:16:01

On Fri, Jan 15, 2016 at 9:47 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

This example is lacking indexes on the child tables, which is
why the plan shown is about as good as you're going to get.
The contents of foo1 and foo2 have to be read in entirety in any
case, and sorting them separately is not a win compared to doing
a single sort.
It is true, but not in case of FDW connected to remote host.
In this case sending large volumes of data through network will be very inefficient.

There will be no problem if FDW can provide index scan - in this case MergeAppend will fetch only required number of records:

postgres=# explain analyze select * from t order by u limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Limit (cost=300.17..300.23 rows=1 width=8) (actual time=4.588..4.588 rows=1 loops=1)
-> Merge Append (cost=300.17..762.76 rows=7681 width=8) (actual time=4.586..4.586 rows=1 loops=1)
Sort Key: t.u
-> Index Scan using t_pkey on t (cost=0.12..8.14 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=1)
-> Foreign Scan on t_fdw1 (cost=100.00..193.92 rows=2560 width=8) (actual time=1.532..1.532 rows=1 loops=1)
-> Foreign Scan on t_fdw2 (cost=100.00..193.92 rows=2560 width=8) (actual time=1.510..1.510 rows=1 loops=1)
-> Foreign Scan on t_fdw3 (cost=100.00..193.92 rows=2560 width=8) (actual time=1.535..1.535 rows=1 loops=1)

But if sort is performed by non-indexed fields, then current behaviour will be inefficient and can be significantly improved by pushing limits to remote hosts.

Pushing ORDER BY clause to the foreign server is supported with commit f18c944b6137329ac4a6b2dce5745c5dc21a8578. Check if that helps to get sorted data from the foreign server. LIMIT pushdown is not supported yet, though.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company