Thread: How to optimize SQL query ?

How to optimize SQL query ?

From
Milosz Krajewski
Date:
How to optimize query or just force postgre to do it my way ?

Example: table continets (   id numeric, ..., active numeric );
 table countries (   id numeric, id_continent numeric, ..., active numeric );
 table cities (   id numeric, id_country numeric, ..., active numeric );

relations: cities.id_county are in countries.id countries.id_continent are on continents.id

Query: table temp_cities (   id_city numeric; );


temp_cities is temp table which holds few (~20) id of cities, to
show them.

so: select * from cities    where cities.id in (select id_city from temp_cities);
or: select * from cities, temp_cities tmp   where cities.id = tmp.id_city;
works fine.

But the problem starts here:
 select * from cities, coutries, continets   where      (cities.id in (select id_city from temp_cities)) and
(cities.id_county= countries.id) and     (countries.id_continent = continents.id) and     (cities.active = 1) and
(coutries.active= 1) and     (continents.active = 1)
 

(active means is row active or archive, many of them are active,
but I have to check it)

Posgre is planning it like this: joins cities with coutries joins countries with continents selects active filtering
withcities.id (with temp_cities)
 

If I could force it to filter cities.id first 
(I can do this with Oracle by changing 
"select id_city from temp_cities" to 
"select id_city from temp_cities group by id_city")
it will work much (1000x) faster.

Can I force postgre do it my way ?
-- 
[ Milosz "Krashan" Krajewski ][ mail: vilge@mud.org.pl, UIN: 1319535 ]
[ inet: Vilge, Vilgefortz    ][ www: http://www.ds2.pg.gda.pl/~krash ]


Re: How to optimize SQL query ?

From
Cédric Dufour (Cogito Ergo Soft)
Date:
> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Milosz Krajewski
> Sent: Monday, July 29, 2002 16:50
> To: pgsql-sql@postgresql.org
> Subject: [SQL] How to optimize SQL query ?
>
>
> How to optimize query or just force postgre to do it my way ?
>
> Example:
>   table continets (
>     id numeric, ..., active numeric
>   );
>
>   table countries (
>     id numeric, id_continent numeric, ..., active numeric
>   );
>
>   table cities (
>     id numeric, id_country numeric, ..., active numeric
>   );
>
> relations:
>   cities.id_county are in countries.id
>   countries.id_continent are on continents.id
>
> Query:
>   table temp_cities (
>     id_city numeric;
>   );
>
>
> temp_cities is temp table which holds few (~20) id of cities, to
> show them.
>
> so:
>   select * from cities
>     where cities.id in (select id_city from temp_cities);

AVOID: indexes (which you should have defined on primary keys [implicitely
defined by PostgreSQL] and foreign keys [must be defined explicitely]) are
not used

> or:
>   select * from cities, temp_cities tmp
>     where cities.id = tmp.id_city;

BETTER ;-)

> works fine.
>
> But the problem starts here:
>
>   select * from cities, coutries, continets
>     where
>       (cities.id in (select id_city from temp_cities)) and
>       (cities.id_county = countries.id) and
>       (countries.id_continent = continents.id) and
>       (cities.active = 1) and (coutries.active = 1) and
>       (continents.active = 1)
>
> (active means is row active or archive, many of them are active,
> but I have to check it)
>
> Posgre is planning it like this:
>   joins cities with coutries
>   joins countries with continents
>   selects active
>   filtering with cities.id (with temp_cities)
>
> If I could force it to filter cities.id first
> (I can do this with Oracle by changing
> "select id_city from temp_cities" to
> "select id_city from temp_cities group by id_city")
> it will work much (1000x) faster.
>
> Can I force postgre do it my way ?

Use the explicit JOIN syntax and join each table one after another in the
order you feel is the more adequate for your query. PostgreSQL will respect
this order.

From one I understand, you should write it this way:
SELECT*
FROMcontinents
INNER JOINcountriesON ( continents.id = country.id_continent )
INNER JOINcitiesON ( countries.id = cities.id_country )
INNER JOINtemp_citiesON ( cities.id = temp_cities.id )
WHERE( continents.active = 1 )AND ( countries.active = 1 )AND ( cities.active = 1 )

The reason to do so are:
1. Joining first on the tables that contain the less rows contributes to
keep the cartesian product between the joins as low as possible
2. Thus if a continent - respectively country - is not active, it will be
casted out from the join immediately and thus reduce the cartesian product
for the next join(s)
3. Joining on 'temp-cities' allows the usage of the hopefully defined index

I achieved ratio from 10000 to 1 respecting this strategy on a scenario
fairly closed to yours ;-)

Ce.D

> --
> [ Milosz "Krashan" Krajewski ][ mail: vilge@mud.org.pl, UIN: 1319535 ]
> [ inet: Vilge, Vilgefortz    ][ www: http://www.ds2.pg.gda.pl/~krash ]
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>




Re: How to optimize SQL query ?

From
Tom Lane
Date:
Milosz Krajewski <Milosz.Krajewski@FinSkog.com.pl> writes:
> Can I force postgre do it my way ?

Possibly.  See
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/explicit-joins.html
        regards, tom lane


Re: How to optimize SQL query ?

From
Cédric Dufour (Cogito Ergo Soft)
Date:
You're right ! Its always a question of keeping the cartesian products of
joints as low as possible, depending on what you know of your data structure
and on the WHERE clause(s) that you know might be most used on your query...

Note that if you do not explicitely give the order of joints to PostgreSQL
(using, the 'table1, table2, ... tableN WHERE ...' syntax), PostgreSQL
should try to achieve the joints the best way as possible, based on the
tables' statistics. I had no opportunity so far to check how well it
behaves.

On ther other hand, is the optimizing algorithm clever enough to know that
even though a table might be fairly large, the quantity of rows important to
your query might be very low (depending on a WHERE clause) and that the
given table shall thus appear in the first joints rather than in the last
ones ? Anyone has experience with this ?

Happy optimizing !

> -----Original Message-----
> From: Milosz Krajewski [mailto:Milosz.Krajewski@FinSkog.com.pl]
> Sent: Friday, August 02, 2002 15:42
> To: Cédric Dufour (Cogito Ergo Soft)
> Subject: Re: [SQL] How to optimize SQL query ?
>
>
> Cédric Dufour (Cogito Ergo Soft) wrote:
> >
> >
> > Use the explicit JOIN syntax and join each table one after another in
> > the order you feel is the more adequate for your query. PostgreSQL
> > will respect this order.
> > From one I understand, you should write it this way:
> >
> > SELECT * FROM
> >   continents
> >   INNER JOIN countries ON ( continents.id = country.id_continent )
> >   INNER JOIN cities ON ( countries.id = cities.id_country )
> >   INNER JOIN temp_cities ON ( cities.id = temp_cities.id )
> > WHERE
> >  ( continents.active = 1 )
> >  AND ( countries.active = 1 )
> >  AND ( cities.active = 1 )
> >
> > The reason to do so are:
> > 1. Joining first on the tables that contain the less rows
> contributes > to keep the cartesian product between the joins as
> low as possible
> > 2. Thus if a continent - respectively country - is not active, it
> > will be casted out from the join immediately and thus reduce the
> > cartesian product for the next join(s)
> > 3. Joining on 'temp-cities' allows the usage of the hopefully defined
> > index
> >
> > I achieved ratio from 10000 to 1 respecting this strategy on a
> > scenario fairly closed to yours ;-)
> >
> > Ce.D
>
> Thaks for your answer.
>
> I this schema temp_cities is the smallest table (ie 5 records) so
> shouldnt't it look like this ?
>
> select *
> from
>   temp_cities
>   inner join cities using (id)
>   inner join countries on (...)
>   inner join continents on (...)
> where
>   ...
> ?
>
>
> --
> [ Milosz "Krashan" Krajewski ][ mail: vilge@mud.org.pl, UIN: 1319535 ]
> [ inet: Vilge, Vilgefortz    ][ www: http://www.ds2.pg.gda.pl/~krash ]
> [      2 nie jest rowne 3, nawet dla bardzo duzych wartosci 2        ]
>