Thread: Query Optimizer Failure / Possible Bug
The query and the corresponding EXPLAIN is at http://hannes.imos.net/query.txt I'd like to use the column q.replaced_serials for multiple calculations in the SELECT clause, but every time it is referenced there in some way the whole query in the FROM clause returning q is executed again. This doesn't make sense to me at all and eats performance. If this wasn't clear enough, for every q.replaced_serials <insert_random_calculation> AS some_column in the SELECT clause there is new block of --------------------------------------------------------------- -> Aggregate (cost=884.23..884.23 rows=1 width=0) -> Nested Loop (cost=0.00..884.23 rows=1 width=0) -> Index Scan using ix_rma_ticket_serials_replace on rma_ticket_serials rts (cost=0.00..122.35 rows=190 width=4) Index Cond: ("replace" = false) -> Index Scan using pk_serials on serials s (cost=0.00..3.51 rows=1 width=4) Index Cond: (s.serial_id = "outer".serial_id) Filter: ((article_no = $0) AND (delivery_id = $1)) --------------------------------------------------------------- in the EXPLAIN result. For those who wonder why I do this FROM (SELECT...). I was searching for a way to use the result of an subselect for multiple calculations in the SELECT clause and return that calculation results as individual columns. I tested a bit further and found out that PG behaves the same in case q is a view. This makes me wonder how efficient the optimizer can work with views - or even worse - nested views. Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. Thanks in advance, Hannes Dorbath
Hannes, > The query and the corresponding EXPLAIN is at > > http://hannes.imos.net/query.txt The problem is that you're using a complex corellated sub-select in the SELECT clause: SELECT d.delivery_id, da.article_no, da.amount, ( SELECT COUNT(*) FROM serials s INNER JOIN rma_ticket_serials rts ON ( s.serial_id = rts.serial_id ) WHERE s.article_no = da.article_no AND s.delivery_id = d.delivery_id AND rts.replace = FALSE ) AS replaced_serials This means that the planner pretty much has to iterate over the subquery, running it once for each row in the result set. If you want the optimizer to use a JOIN structure instead, put the subselect in the FROM clause. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Thank you very much for your reply. I'll try to modify it. Josh Berkus wrote: > Hannes, > > >>The query and the corresponding EXPLAIN is at >> >>http://hannes.imos.net/query.txt > > > The problem is that you're using a complex corellated sub-select in the SELECT > clause: > > SELECT > d.delivery_id, > da.article_no, > da.amount, > ( > SELECT > COUNT(*) > FROM > serials s > INNER JOIN rma_ticket_serials rts ON ( > s.serial_id = rts.serial_id > ) > WHERE > s.article_no = da.article_no AND > s.delivery_id = d.delivery_id AND > rts.replace = FALSE > ) AS replaced_serials > > This means that the planner pretty much has to iterate over the subquery, > running it once for each row in the result set. If you want the optimizer > to use a JOIN structure instead, put the subselect in the FROM clause. > -- imos Gesellschaft fuer Internet-Marketing und Online-Services mbH Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net
hm, a few days and not a single reply :| any more information needed? test data? simplified test case? anything? thanks Hannes Dorbath wrote: > The query and the corresponding EXPLAIN is at > > http://hannes.imos.net/query.txt > > I'd like to use the column q.replaced_serials for multiple calculations > in the SELECT clause, but every time it is referenced there in some way > the whole query in the FROM clause returning q is executed again. > > This doesn't make sense to me at all and eats performance. > > If this wasn't clear enough, for every > > q.replaced_serials <insert_random_calculation> AS some_column > > in the SELECT clause there is new block of > > --------------------------------------------------------------- > -> Aggregate (cost=884.23..884.23 rows=1 width=0) > -> Nested Loop (cost=0.00..884.23 rows=1 width=0) > -> Index Scan using ix_rma_ticket_serials_replace on > rma_ticket_serials rts (cost=0.00..122.35 > rows=190 width=4) > Index Cond: ("replace" = false) > -> Index Scan using pk_serials on serials s > (cost=0.00..3.51 rows=1 width=4) > Index Cond: (s.serial_id = "outer".serial_id) > Filter: ((article_no = $0) AND (delivery_id = $1)) > --------------------------------------------------------------- > > in the EXPLAIN result. > > For those who wonder why I do this FROM (SELECT...). I was searching for > a way to use the result of an subselect for multiple calculations in the > SELECT clause and return that calculation results as individual columns. > > I tested a bit further and found out that PG behaves the same in case q > is a view. This makes me wonder how efficient the optimizer can work > with views - or even worse - nested views. > > Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. > > > Thanks in advance, > Hannes Dorbath -- imos Gesellschaft fuer Internet-Marketing und Online-Services mbH Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net
Noticed this problem,too. You can always make the calculation you want done once inside a set returning function so it'll behave like a table, but that's ugly. On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath <light@theendofthetunnel.de> wrote: > hm, a few days and not a single reply :| > > any more information needed? test data? simplified test case? anything? > > > thanks > > > Hannes Dorbath wrote: >> The query and the corresponding EXPLAIN is at >> http://hannes.imos.net/query.txt >> I'd like to use the column q.replaced_serials for multiple calculations >> in the SELECT clause, but every time it is referenced there in some way >> the whole query in the FROM clause returning q is executed again. >> This doesn't make sense to me at all and eats performance. >> If this wasn't clear enough, for every >> q.replaced_serials <insert_random_calculation> AS some_column >> in the SELECT clause there is new block of >> --------------------------------------------------------------- >> -> Aggregate (cost=884.23..884.23 rows=1 width=0) >> -> Nested Loop (cost=0.00..884.23 rows=1 width=0) >> -> Index Scan using ix_rma_ticket_serials_replace on >> rma_ticket_serials rts (cost=0.00..122.35 >> rows=190 width=4) >> Index Cond: ("replace" = false) >> -> Index Scan using pk_serials on serials s >> (cost=0.00..3.51 rows=1 width=4) >> Index Cond: (s.serial_id = "outer".serial_id) >> Filter: ((article_no = $0) AND (delivery_id = $1)) >> --------------------------------------------------------------- >> in the EXPLAIN result. >> For those who wonder why I do this FROM (SELECT...). I was searching >> for >> a way to use the result of an subselect for multiple calculations in the >> SELECT clause and return that calculation results as individual columns. >> I tested a bit further and found out that PG behaves the same in case q >> is a view. This makes me wonder how efficient the optimizer can work >> with views - or even worse - nested views. >> Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. >> Thanks in advance, >> Hannes Dorbath >
Mhh. I have no clue about the internals of PostgreSQL and query planing, but to me as user this should really be a thing the optimizer has to work out.. On 03.04.2005 10:01, PFC wrote: > > Noticed this problem,too. > You can always make the calculation you want done once inside a set > returning function so it'll behave like a table, but that's ugly. > > On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath > <light@theendofthetunnel.de> wrote: > >> hm, a few days and not a single reply :| >> >> any more information needed? test data? simplified test case? anything? >> >> >> thanks >> >> >> Hannes Dorbath wrote: >> >>> The query and the corresponding EXPLAIN is at >>> http://hannes.imos.net/query.txt >>> I'd like to use the column q.replaced_serials for multiple calculations >>> in the SELECT clause, but every time it is referenced there in some way >>> the whole query in the FROM clause returning q is executed again. >>> This doesn't make sense to me at all and eats performance. >>> If this wasn't clear enough, for every >>> q.replaced_serials <insert_random_calculation> AS some_column >>> in the SELECT clause there is new block of >>> --------------------------------------------------------------- >>> -> Aggregate (cost=884.23..884.23 rows=1 width=0) >>> -> Nested Loop (cost=0.00..884.23 rows=1 width=0) >>> -> Index Scan using ix_rma_ticket_serials_replace on >>> rma_ticket_serials rts (cost=0.00..122.35 >>> rows=190 width=4) >>> Index Cond: ("replace" = false) >>> -> Index Scan using pk_serials on serials s >>> (cost=0.00..3.51 rows=1 width=4) >>> Index Cond: (s.serial_id = "outer".serial_id) >>> Filter: ((article_no = $0) AND (delivery_id = $1)) >>> --------------------------------------------------------------- >>> in the EXPLAIN result. >>> For those who wonder why I do this FROM (SELECT...). I was >>> searching for >>> a way to use the result of an subselect for multiple calculations in the >>> SELECT clause and return that calculation results as individual columns. >>> I tested a bit further and found out that PG behaves the same in case q >>> is a view. This makes me wonder how efficient the optimizer can work >>> with views - or even worse - nested views. >>> Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. >>> Thanks in advance, >>> Hannes Dorbath >> >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Some people on the #postgresql irc channel pointed out that it's a known issue. http://www.qaix.com/postgresql-database-development/246-557-select-based-on-function-result-read.shtml A more simple testcase is below. Adding OFFSET 0 to the inner query does indeed fix it in my case. SELECT tmp.user_id AS foo, tmp.user_id AS bar, tmp.user_id AS baz FROM ( SELECT u.user_id FROM users u ) AS tmp; Seq Scan on users (cost=0.00..1.53 rows=53 width=4) (actual time=0.230..0.233 rows=1 loops=1) Total runtime: 0.272 ms --------------------------- SELECT tmp.user_id AS foo, tmp.user_id AS bar, tmp.user_id AS baz FROM ( SELECT (SELECT 1) AS user_id FROM users u ) AS tmp; Seq Scan on users u (cost=0.03..1.56 rows=53 width=0) (actual time=0.216..0.219 rows=1 loops=1) InitPlan -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.004..0.006 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.004 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1) Total runtime: 0.270 ms --------------------------- SELECT tmp.user_id AS foo, tmp.user_id AS bar, tmp.user_id AS baz FROM ( SELECT (SELECT 1) AS user_id FROM users u OFFSET 0 ) AS tmp; Subquery Scan tmp (cost=0.01..1.03 rows=1 width=4) (actual time=0.032..0.042 rows=1 loops=1) -> Limit (cost=0.01..1.02 rows=1 width=0) (actual time=0.026..0.033 rows=1 loops=1) InitPlan -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1) -> Seq Scan on users u (cost=0.00..1.01 rows=1 width=0) (actual time=0.022..0.027 rows=1 loops=1) Total runtime: 0.090 ms On 04.04.2005 17:18, Hannes Dorbath wrote: > Mhh. I have no clue about the internals of PostgreSQL and query planing, > but to me as user this should really be a thing the optimizer has to > work out.. > > > On 03.04.2005 10:01, PFC wrote: > >> >> Noticed this problem,too. >> You can always make the calculation you want done once inside a >> set returning function so it'll behave like a table, but that's ugly. >> >> On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath >> <light@theendofthetunnel.de> wrote: >> >>> hm, a few days and not a single reply :| >>> >>> any more information needed? test data? simplified test case? anything? >>> >>> >>> thanks >>> >>> >>> Hannes Dorbath wrote: >>> >>>> The query and the corresponding EXPLAIN is at >>>> http://hannes.imos.net/query.txt >>>> I'd like to use the column q.replaced_serials for multiple >>>> calculations >>>> in the SELECT clause, but every time it is referenced there in some way >>>> the whole query in the FROM clause returning q is executed again. >>>> This doesn't make sense to me at all and eats performance. >>>> If this wasn't clear enough, for every >>>> q.replaced_serials <insert_random_calculation> AS some_column >>>> in the SELECT clause there is new block of >>>> --------------------------------------------------------------- >>>> -> Aggregate (cost=884.23..884.23 rows=1 width=0) >>>> -> Nested Loop (cost=0.00..884.23 rows=1 width=0) >>>> -> Index Scan using ix_rma_ticket_serials_replace on >>>> rma_ticket_serials rts (cost=0.00..122.35 >>>> rows=190 width=4) >>>> Index Cond: ("replace" = false) >>>> -> Index Scan using pk_serials on serials s >>>> (cost=0.00..3.51 rows=1 width=4) >>>> Index Cond: (s.serial_id = "outer".serial_id) >>>> Filter: ((article_no = $0) AND (delivery_id = $1)) >>>> --------------------------------------------------------------- >>>> in the EXPLAIN result. >>>> For those who wonder why I do this FROM (SELECT...). I was >>>> searching for >>>> a way to use the result of an subselect for multiple calculations in >>>> the >>>> SELECT clause and return that calculation results as individual >>>> columns. >>>> I tested a bit further and found out that PG behaves the same in >>>> case q >>>> is a view. This makes me wonder how efficient the optimizer can work >>>> with views - or even worse - nested views. >>>> Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. >>>> Thanks in advance, >>>> Hannes Dorbath >>> >>> >>> >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 5: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq >>