Re: [HACKERS] Optimizer badness in 7.0 beta - Mailing list pgsql-hackers

From Brian Hirt
Subject Re: [HACKERS] Optimizer badness in 7.0 beta
Date
Msg-id 20000305150132.B29257@loopy.berkhirt.com
Whole thread Raw
In response to Re: [HACKERS] Optimizer badness in 7.0 beta  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: [HACKERS] Optimizer badness in 7.0 beta
List pgsql-hackers
Peter,

Actually, the query you supply will not work, I'll get duplicate 
rows because the relationship between game and game_developer is 
a one to many.  Of course you had no way of knowing that from the 
information I supplied.  I could throw a distinct in there to get 
the results, but that really feels like bad form because of the large
amount of duplicate rows.  In any case, the original query I supplied 
is generated SQL, created by a Database to Object persistance layer 
and cannot by design have multiple tables in the from clause, so 
restrictions to the table seleted from must be in the form of a 
qualifier.  

I realize that the query in question could be written better.  My 
concern was the huge difference in performance between 6.5 and 7.0 on
this type of query.  Other people may be bitten by this one, so I wanted
to bring it up.  I've been able to easily to work around this problem, 
it just seems wrong that the difference in execution time is so far 
off from the previous release.  

I dont know too much about the PG internals, but when I used sybase, 
it would usually execute the sub-select independently and stuff the 
results into a temp table and then do another query, joining to the 
results of the sub-select.  In a situation like this one, worst case 
without indexes you would get a table scan for the sub-select 
and then a merge join with a sequential scan on the temp table and a 
sequential scan on the other table (example below).  Using that
approach, with no indexes, the query still executes in a fraction of a 
second.  It just seems that a query on tables as small as I'm describing
should never take as long as it did.   It seems like a problem with
the optimizer, but if people are happy with currenty functionality that's 
fine with me also.

-brian

For Example:

SELECT DISTINCT game_id INTO temp tmp_res
FROM game_developer
WHERE developer_id = 3

SELECT *
FROM game, tmp_res
WHERE game.game_id = tmp_res.game_id AND game.approved = 1
ORDER BY copyright_year desc, game_title

On Sun, Mar 05, 2000 at 03:15:45PM +0100, Peter Eisentraut wrote:
> This query can be rewritten as
> 
> SELECT creation_timestamp, etc. 
> FROM game, game_developer
> WHERE game.game_id = game_developer.game_id
>   AND approved = 1 AND developer_id = 3
> ORDER BY copyright_year desc, game_title
> 
> The way you're writing it you're almost asking it to be slow. :)
> 
> Of course that still doesn't explain why it's now 94sec versus formerly 1
> but I'm sure Tom Lane will enlighten us all very soon. :)
> 
> 
> Brian Hirt writes:
> 
> > select 
> >     creation_timestamp,
> [snip]
> > from 
> >     game 
> > where 
> >     approved = 1 
> > and 
> >     game_id in (
> >         select 
> >             distinct game_id
> >         from 
> >             game_developer
> >         where 
> >             developer_id = 3) 
> > order by 
> >     copyright_year desc,
> >     game_title;
> 
> 
> -- 
> Peter Eisentraut                  Sernanders väg 10:115
> peter_e@gmx.net                   75262 Uppsala
> http://yi.org/peter-e/            Sweden
> 
> 
> 
> ************

-- 
The world's most ambitious and comprehensive PC game database project.
                     http://www.mobygames.com


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Optimizer badness in 7.0 beta
Next
From: Mike Mascari
Date:
Subject: Re: [HACKERS] DROP TABLE inside a transaction block