Re: How to optimize SQL query ? - Mailing list pgsql-sql

From Cédric Dufour (Cogito Ergo Soft)
Subject Re: How to optimize SQL query ?
Date
Msg-id NDBBIFNBODNADCAOFDOAMEJGCDAA.cedric.dufour@cogito-ergo-soft.com
Whole thread Raw
In response to How to optimize SQL query ?  (Milosz Krajewski <Milosz.Krajewski@FinSkog.com.pl>)
List pgsql-sql
You're right ! Its always a question of keeping the cartesian products of
joints as low as possible, depending on what you know of your data structure
and on the WHERE clause(s) that you know might be most used on your query...

Note that if you do not explicitely give the order of joints to PostgreSQL
(using, the 'table1, table2, ... tableN WHERE ...' syntax), PostgreSQL
should try to achieve the joints the best way as possible, based on the
tables' statistics. I had no opportunity so far to check how well it
behaves.

On ther other hand, is the optimizing algorithm clever enough to know that
even though a table might be fairly large, the quantity of rows important to
your query might be very low (depending on a WHERE clause) and that the
given table shall thus appear in the first joints rather than in the last
ones ? Anyone has experience with this ?

Happy optimizing !

> -----Original Message-----
> From: Milosz Krajewski [mailto:Milosz.Krajewski@FinSkog.com.pl]
> Sent: Friday, August 02, 2002 15:42
> To: Cédric Dufour (Cogito Ergo Soft)
> Subject: Re: [SQL] How to optimize SQL query ?
>
>
> Cédric Dufour (Cogito Ergo Soft) wrote:
> >
> >
> > Use the explicit JOIN syntax and join each table one after another in
> > the order you feel is the more adequate for your query. PostgreSQL
> > will respect this order.
> > From one I understand, you should write it this way:
> >
> > SELECT * FROM
> >   continents
> >   INNER JOIN countries ON ( continents.id = country.id_continent )
> >   INNER JOIN cities ON ( countries.id = cities.id_country )
> >   INNER JOIN temp_cities ON ( cities.id = temp_cities.id )
> > WHERE
> >  ( continents.active = 1 )
> >  AND ( countries.active = 1 )
> >  AND ( cities.active = 1 )
> >
> > The reason to do so are:
> > 1. Joining first on the tables that contain the less rows
> contributes > to keep the cartesian product between the joins as
> low as possible
> > 2. Thus if a continent - respectively country - is not active, it
> > will be casted out from the join immediately and thus reduce the
> > cartesian product for the next join(s)
> > 3. Joining on 'temp-cities' allows the usage of the hopefully defined
> > index
> >
> > I achieved ratio from 10000 to 1 respecting this strategy on a
> > scenario fairly closed to yours ;-)
> >
> > Ce.D
>
> Thaks for your answer.
>
> I this schema temp_cities is the smallest table (ie 5 records) so
> shouldnt't it look like this ?
>
> select *
> from
>   temp_cities
>   inner join cities using (id)
>   inner join countries on (...)
>   inner join continents on (...)
> where
>   ...
> ?
>
>
> --
> [ Milosz "Krashan" Krajewski ][ mail: vilge@mud.org.pl, UIN: 1319535 ]
> [ inet: Vilge, Vilgefortz    ][ www: http://www.ds2.pg.gda.pl/~krash ]
> [      2 nie jest rowne 3, nawet dla bardzo duzych wartosci 2        ]
>




pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: How to optimize SQL query ?
Next
From: Stephan Szabo
Date:
Subject: Re: Random resultset retrieving -> performance bottleneck