Thread: How does the planner determine plan_rows ?

How does the planner determine plan_rows ?

From

Donald Dong

Date:

11 January 2019, 05:41:46

Hi,

I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
the results have different row numbers, but the tables are all empty.

=# CREATE TABLE t1(id INT, data INT);
=# EXPLAIN ANALYZE SELECT * FROM t1;
  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
  time=0.003..0.003 rows=0 loops=1)

=# CREATE TABLE t2(data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t2;
  Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
  time=0.002..0.002 rows=0 loops=1)

=# CREATE TABLE t3(id INT, data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t3;
  Seq Scan on t3  (cost=0.00..22.70 rows=1270 width=36) (actual
  time=0.001..0.001 rows=0 loops=1)

I found this behavior unexpected. I'm still trying to find out how/where the planner
determines the plan_rows. Any help will be appreciated!

Thank you,
Donald Dong

Re: How does the planner determine plan_rows ?

From

Andrew Gierth

Date:

11 January 2019, 06:48:32

>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes:

 Donald> Hi,
 Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
 Donald> `SELECT * `. I found the results have different row numbers,
 Donald> but the tables are all empty.

Empty tables are something of a special case, because the planner
doesn't assume that they will _stay_ empty, and using an estimate of 0
or 1 rows would tend to create a distorted plan that would likely blow
up in runtime as soon as you insert a second row.

The place to look for info would be estimate_rel_size in
optimizer/util/plancat.c, from which you can see that empty tables get
a default size estimate of 10 pages. Thus:

 Donald> =# CREATE TABLE t1(id INT, data INT);
 Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
 Donald>   Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
 Donald>   time=0.003..0.003 rows=0 loops=1)

An (int,int) tuple takes about 36 bytes, so you can get about 226 of
them on a page, so 10 pages is 2260 rows.

 Donald> =# CREATE TABLE t2(data VARCHAR);
 Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
 Donald>   Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
 Donald>   time=0.002..0.002 rows=0 loops=1)

Size of a varchar with no specified length isn't known, so the planner
determines an average length of 32 by the time-honoured method of rectal
extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
page.

-- 
Andrew (irc:RhodiumToad)

Re: How does the planner determine plan_rows ?

From

Donald Dong

Date:

11 January 2019, 06:56:15

Thank you for the great explanation!

> On Jan 10, 2019, at 7:48 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>
>>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes:
>
> Donald> Hi,
> Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
> Donald> `SELECT * `. I found the results have different row numbers,
> Donald> but the tables are all empty.
>
> Empty tables are something of a special case, because the planner
> doesn't assume that they will _stay_ empty, and using an estimate of 0
> or 1 rows would tend to create a distorted plan that would likely blow
> up in runtime as soon as you insert a second row.
>
> The place to look for info would be estimate_rel_size in
> optimizer/util/plancat.c, from which you can see that empty tables get
> a default size estimate of 10 pages. Thus:
>
> Donald> =# CREATE TABLE t1(id INT, data INT);
> Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
> Donald>   Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
> Donald>   time=0.003..0.003 rows=0 loops=1)
>
> An (int,int) tuple takes about 36 bytes, so you can get about 226 of
> them on a page, so 10 pages is 2260 rows.
>
> Donald> =# CREATE TABLE t2(data VARCHAR);
> Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
> Donald>   Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
> Donald>   time=0.002..0.002 rows=0 loops=1)
>
> Size of a varchar with no specified length isn't known, so the planner
> determines an average length of 32 by the time-honoured method of rectal
> extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
> page.
>
> --
> Andrew (irc:RhodiumToad)

Re: How does the planner determine plan_rows ?

From

Tom Lane

Date:

11 January 2019, 07:01:51

Donald Dong <xdong@csumb.edu> writes:
> I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
> the results have different row numbers, but the tables are all empty.

This isn't a terribly interesting case, since you've neither loaded
any data nor vacuumed/analyzed the table, but ...

> I found this behavior unexpected. I'm still trying to find out how/where the planner
> determines the plan_rows.

... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.

            regards, tom lane

Re: How does the planner determine plan_rows ?

From

Donald Dong

Date:

11 January 2019, 10:41:51

> On Jan 10, 2019, at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> ... estimate_rel_size() in plancat.c is where to look to find out
> about the planner's default estimates when it's lacking hard data.

Thank you! Now I see how the planner uses the rows to estimate the cost and
generates the best_plan.

To me, tracing the function calls is not a simple task. I'm using cscope, and I
use printf when I'm not entirely sure. I was considering to use gbd, but I'm
having issues referencing the source code in gdb.

I'm very interested to learn how the professionals explore the codebase!

Re: How does the planner determine plan_rows ?

From

Bruce Momjian

Date:

26 January 2019, 03:01:53

On Thu, Jan 10, 2019 at 11:41:51PM -0800, Donald Dong wrote:
> 
> > On Jan 10, 2019, at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > ... estimate_rel_size() in plancat.c is where to look to find out
> > about the planner's default estimates when it's lacking hard data.
> 
> Thank you! Now I see how the planner uses the rows to estimate the cost and
> generates the best_plan.
> 
> To me, tracing the function calls is not a simple task. I'm using cscope, and I
> use printf when I'm not entirely sure. I was considering to use gbd, but I'm
> having issues referencing the source code in gdb.
> 
> I'm very interested to learn how the professionals explore the codebase!

Uh, the developer FAQ has some info on this:

    https://wiki.postgresql.org/wiki/Developer_FAQ

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +