Re: How to optimize SQL query ? - Mailing list pgsql-sql
From | Cédric Dufour (Cogito Ergo Soft) |
---|---|
Subject | Re: How to optimize SQL query ? |
Date | |
Msg-id | NDBBIFNBODNADCAOFDOAIEJFCDAA.cedric.dufour@cogito-ergo-soft.com Whole thread Raw |
In response to | How to optimize SQL query ? (Milosz Krajewski <Milosz.Krajewski@FinSkog.com.pl>) |
List | pgsql-sql |
> -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Milosz Krajewski > Sent: Monday, July 29, 2002 16:50 > To: pgsql-sql@postgresql.org > Subject: [SQL] How to optimize SQL query ? > > > How to optimize query or just force postgre to do it my way ? > > Example: > table continets ( > id numeric, ..., active numeric > ); > > table countries ( > id numeric, id_continent numeric, ..., active numeric > ); > > table cities ( > id numeric, id_country numeric, ..., active numeric > ); > > relations: > cities.id_county are in countries.id > countries.id_continent are on continents.id > > Query: > table temp_cities ( > id_city numeric; > ); > > > temp_cities is temp table which holds few (~20) id of cities, to > show them. > > so: > select * from cities > where cities.id in (select id_city from temp_cities); AVOID: indexes (which you should have defined on primary keys [implicitely defined by PostgreSQL] and foreign keys [must be defined explicitely]) are not used > or: > select * from cities, temp_cities tmp > where cities.id = tmp.id_city; BETTER ;-) > works fine. > > But the problem starts here: > > select * from cities, coutries, continets > where > (cities.id in (select id_city from temp_cities)) and > (cities.id_county = countries.id) and > (countries.id_continent = continents.id) and > (cities.active = 1) and (coutries.active = 1) and > (continents.active = 1) > > (active means is row active or archive, many of them are active, > but I have to check it) > > Posgre is planning it like this: > joins cities with coutries > joins countries with continents > selects active > filtering with cities.id (with temp_cities) > > If I could force it to filter cities.id first > (I can do this with Oracle by changing > "select id_city from temp_cities" to > "select id_city from temp_cities group by id_city") > it will work much (1000x) faster. > > Can I force postgre do it my way ? Use the explicit JOIN syntax and join each table one after another in the order you feel is the more adequate for your query. PostgreSQL will respect this order. From one I understand, you should write it this way: SELECT* FROMcontinents INNER JOINcountriesON ( continents.id = country.id_continent ) INNER JOINcitiesON ( countries.id = cities.id_country ) INNER JOINtemp_citiesON ( cities.id = temp_cities.id ) WHERE( continents.active = 1 )AND ( countries.active = 1 )AND ( cities.active = 1 ) The reason to do so are: 1. Joining first on the tables that contain the less rows contributes to keep the cartesian product between the joins as low as possible 2. Thus if a continent - respectively country - is not active, it will be casted out from the join immediately and thus reduce the cartesian product for the next join(s) 3. Joining on 'temp-cities' allows the usage of the hopefully defined index I achieved ratio from 10000 to 1 respecting this strategy on a scenario fairly closed to yours ;-) Ce.D > -- > [ Milosz "Krashan" Krajewski ][ mail: vilge@mud.org.pl, UIN: 1319535 ] > [ inet: Vilge, Vilgefortz ][ www: http://www.ds2.pg.gda.pl/~krash ] > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >