Thread: How does the planner determine plan_rows ?
Hi, I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found the results have different row numbers, but the tables are all empty. =# CREATE TABLE t1(id INT, data INT); =# EXPLAIN ANALYZE SELECT * FROM t1; Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.003..0.003 rows=0 loops=1) =# CREATE TABLE t2(data VARCHAR); =# EXPLAIN ANALYZE SELECT * FROM t2; Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.002..0.002 rows=0 loops=1) =# CREATE TABLE t3(id INT, data VARCHAR); =# EXPLAIN ANALYZE SELECT * FROM t3; Seq Scan on t3 (cost=0.00..22.70 rows=1270 width=36) (actual time=0.001..0.001 rows=0 loops=1) I found this behavior unexpected. I'm still trying to find out how/where the planner determines the plan_rows. Any help will be appreciated! Thank you, Donald Dong
>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes: Donald> Hi, Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on Donald> `SELECT * `. I found the results have different row numbers, Donald> but the tables are all empty. Empty tables are something of a special case, because the planner doesn't assume that they will _stay_ empty, and using an estimate of 0 or 1 rows would tend to create a distorted plan that would likely blow up in runtime as soon as you insert a second row. The place to look for info would be estimate_rel_size in optimizer/util/plancat.c, from which you can see that empty tables get a default size estimate of 10 pages. Thus: Donald> =# CREATE TABLE t1(id INT, data INT); Donald> =# EXPLAIN ANALYZE SELECT * FROM t1; Donald> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual Donald> time=0.003..0.003 rows=0 loops=1) An (int,int) tuple takes about 36 bytes, so you can get about 226 of them on a page, so 10 pages is 2260 rows. Donald> =# CREATE TABLE t2(data VARCHAR); Donald> =# EXPLAIN ANALYZE SELECT * FROM t2; Donald> Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual Donald> time=0.002..0.002 rows=0 loops=1) Size of a varchar with no specified length isn't known, so the planner determines an average length of 32 by the time-honoured method of rectal extraction (see get_typavgwidth in lsyscache.c), making 136 rows per page. -- Andrew (irc:RhodiumToad)
Thank you for the great explanation! > On Jan 10, 2019, at 7:48 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > >>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes: > > Donald> Hi, > Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on > Donald> `SELECT * `. I found the results have different row numbers, > Donald> but the tables are all empty. > > Empty tables are something of a special case, because the planner > doesn't assume that they will _stay_ empty, and using an estimate of 0 > or 1 rows would tend to create a distorted plan that would likely blow > up in runtime as soon as you insert a second row. > > The place to look for info would be estimate_rel_size in > optimizer/util/plancat.c, from which you can see that empty tables get > a default size estimate of 10 pages. Thus: > > Donald> =# CREATE TABLE t1(id INT, data INT); > Donald> =# EXPLAIN ANALYZE SELECT * FROM t1; > Donald> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual > Donald> time=0.003..0.003 rows=0 loops=1) > > An (int,int) tuple takes about 36 bytes, so you can get about 226 of > them on a page, so 10 pages is 2260 rows. > > Donald> =# CREATE TABLE t2(data VARCHAR); > Donald> =# EXPLAIN ANALYZE SELECT * FROM t2; > Donald> Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual > Donald> time=0.002..0.002 rows=0 loops=1) > > Size of a varchar with no specified length isn't known, so the planner > determines an average length of 32 by the time-honoured method of rectal > extraction (see get_typavgwidth in lsyscache.c), making 136 rows per > page. > > -- > Andrew (irc:RhodiumToad)
Donald Dong <xdong@csumb.edu> writes: > I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found > the results have different row numbers, but the tables are all empty. This isn't a terribly interesting case, since you've neither loaded any data nor vacuumed/analyzed the table, but ... > I found this behavior unexpected. I'm still trying to find out how/where the planner > determines the plan_rows. ... estimate_rel_size() in plancat.c is where to look to find out about the planner's default estimates when it's lacking hard data. regards, tom lane
> On Jan 10, 2019, at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > ... estimate_rel_size() in plancat.c is where to look to find out > about the planner's default estimates when it's lacking hard data. Thank you! Now I see how the planner uses the rows to estimate the cost and generates the best_plan. To me, tracing the function calls is not a simple task. I'm using cscope, and I use printf when I'm not entirely sure. I was considering to use gbd, but I'm having issues referencing the source code in gdb. I'm very interested to learn how the professionals explore the codebase!
On Thu, Jan 10, 2019 at 11:41:51PM -0800, Donald Dong wrote: > > > On Jan 10, 2019, at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > ... estimate_rel_size() in plancat.c is where to look to find out > > about the planner's default estimates when it's lacking hard data. > > Thank you! Now I see how the planner uses the rows to estimate the cost and > generates the best_plan. > > To me, tracing the function calls is not a simple task. I'm using cscope, and I > use printf when I'm not entirely sure. I was considering to use gbd, but I'm > having issues referencing the source code in gdb. > > I'm very interested to learn how the professionals explore the codebase! Uh, the developer FAQ has some info on this: https://wiki.postgresql.org/wiki/Developer_FAQ -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +