performance problems of a complex query with 2 subselects - Mailing list pgsql-sql
From | Juris |
---|---|
Subject | performance problems of a complex query with 2 subselects |
Date | |
Msg-id | 36BB1581.555A2364@rfb.lv Whole thread Raw |
Responses |
Re: [SQL] performance problems of a complex query with 2 subselects
|
List | pgsql-sql |
Hello! I am quite new to PostgreSQL and was hoping to use it as an alternative to MS Access for the product catalog software I am developing (the data is represented in a meta-model). I fully recreated the environment I had on MS Access (see attachment) and tried to run the query. The execution time for this query on MS Access for 40000 records was approx. 10 seconds. First the pgsql query on 400 records just seemed to hang. I installed a newer version of pgsql (6.4.2), created indices on the columns used until 'explain' showed only 1 'seq search' (couldn't get rid of that one). I got the query time on 400 records to be 800ms and decided to try it on 40000 records, but to no avail. It trashed disk for 5 minutes and then declared it was out of virtual memory (which was probably true). So the question here is why did Access do just fine (I didn't even create any indices) while Pgsql didn't? What could I be doing wrong? What can I improve? Any more comments? I've heard Access keeps the whole .mdb file in memory that's why for small databases it is faster... Could that be the answer? Best regards, Juris. P.S. Here come my SQL statements and 'explain' output: Create Table C (C_id int, C_name Char(20)); Create Table CP (C_id int, P_id int); Create Table OPV (O_id int, P_id int, OPV_value char(20)); Create Table O (O_id int, C_id int); Create Table P (P_id int, P_name char(20)); Insert Into P (P_id, P_name) Values (1, 'Name'); Insert Into P (P_id, P_name) Values (2, 'Manufacturer'); Insert Into P (P_id, P_name) Values (3, 'Price'); Insert Into P (P_id, P_name) Values (4, 'URL'); Insert Into C (C_Id, C_Name) Values (1, 'Trivial'); Insert Into CP (C_id, P_id) Values (1, 1); Insert Into CP (C_id, P_id) Values (1, 2); Insert Into CP (C_id, P_id) Values (1, 3); Insert Into CP (C_id, P_id) Values (1, 4); Inserted 40000 records (4 properties for each of 10000 objects) into OPV (I have a python script which does that). create index opv_value_idx on opv (opv_value); create index opv_oid_idx on opv (o_id); create index opv_pid_idx on opv (p_id); create unique index o_oid_idx on o (o_id); create unique index p_pid_idx on p (p_id); select o.o_id, p.p_id, opv.opv_value from o, p, opv where (opv.o_id = o.o_id) And (opv.p_id = p.p_id) And (p.p_id in (1, 2, 3, 4)) And (o.o_id in (select opv.o_id from opv where opv.p_id = 2 and opv.opv_value = 'GigaCorp Inc.' ) ) And (o.o_id in (select opv.o_id from opv where opv.p_id = 3 and int4( trim(opv.opv_value) ) >= 100 ) ) And (o.o_id in (select opv.o_id from opv where opv.p_id = 3 and int4( trim(opv.opv_value) ) <= 800 ) ) order by o.o_id; Sort (cost=3460.13 size=0 width=0) -> Hash Join (cost=3460.13 size=30000 width=28) -> Nested Loop (cost=751.13 size=30000 width=24) -> Seq Scan on p (cost=1.13 size=3 width=4) -> Index Scan using opv_pid_idx on opv (cost=250.00 size=40000 width=20) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on o (cost=394.00 size=10000 width=4) SubPlan -> Index Scan using opv_pid_idx on opv (cost=250.00 size=401 width=4) -> Index Scan using opv_pid_idx on opv (cost=250.00 size=2001 width=4) -> Index Scan using opv_pid_idx on opv (cost=250.00 size=2001 width=4)