Thread: performance problems of a complex query with 2 subselects
Hello! I am quite new to PostgreSQL and was hoping to use it as an alternative to MS Access for the product catalog software I am developing (the data is represented in a meta-model). I fully recreated the environment I had on MS Access (see attachment) and tried to run the query. The execution time for this query on MS Access for 40000 records was approx. 10 seconds. First the pgsql query on 400 records just seemed to hang. I installed a newer version of pgsql (6.4.2), created indices on the columns used until 'explain' showed only 1 'seq search' (couldn't get rid of that one). I got the query time on 400 records to be 800ms and decided to try it on 40000 records, but to no avail. It trashed disk for 5 minutes and then declared it was out of virtual memory (which was probably true). So the question here is why did Access do just fine (I didn't even create any indices) while Pgsql didn't? What could I be doing wrong? What can I improve? Any more comments? I've heard Access keeps the whole .mdb file in memory that's why for small databases it is faster... Could that be the answer? Best regards, Juris. P.S. Here come my SQL statements and 'explain' output: Create Table C (C_id int, C_name Char(20)); Create Table CP (C_id int, P_id int); Create Table OPV (O_id int, P_id int, OPV_value char(20)); Create Table O (O_id int, C_id int); Create Table P (P_id int, P_name char(20)); Insert Into P (P_id, P_name) Values (1, 'Name'); Insert Into P (P_id, P_name) Values (2, 'Manufacturer'); Insert Into P (P_id, P_name) Values (3, 'Price'); Insert Into P (P_id, P_name) Values (4, 'URL'); Insert Into C (C_Id, C_Name) Values (1, 'Trivial'); Insert Into CP (C_id, P_id) Values (1, 1); Insert Into CP (C_id, P_id) Values (1, 2); Insert Into CP (C_id, P_id) Values (1, 3); Insert Into CP (C_id, P_id) Values (1, 4); Inserted 40000 records (4 properties for each of 10000 objects) into OPV (I have a python script which does that). create index opv_value_idx on opv (opv_value); create index opv_oid_idx on opv (o_id); create index opv_pid_idx on opv (p_id); create unique index o_oid_idx on o (o_id); create unique index p_pid_idx on p (p_id); select o.o_id, p.p_id, opv.opv_value from o, p, opv where (opv.o_id = o.o_id) And (opv.p_id = p.p_id) And (p.p_id in (1, 2, 3, 4)) And (o.o_id in (select opv.o_id from opv where opv.p_id = 2 and opv.opv_value = 'GigaCorp Inc.' ) ) And (o.o_id in (select opv.o_id from opv where opv.p_id = 3 and int4( trim(opv.opv_value) ) >= 100 ) ) And (o.o_id in (select opv.o_id from opv where opv.p_id = 3 and int4( trim(opv.opv_value) ) <= 800 ) ) order by o.o_id; Sort (cost=3460.13 size=0 width=0) -> Hash Join (cost=3460.13 size=30000 width=28) -> Nested Loop (cost=751.13 size=30000 width=24) -> Seq Scan on p (cost=1.13 size=3 width=4) -> Index Scan using opv_pid_idx on opv (cost=250.00 size=40000 width=20) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on o (cost=394.00 size=10000 width=4) SubPlan -> Index Scan using opv_pid_idx on opv (cost=250.00 size=401 width=4) -> Index Scan using opv_pid_idx on opv (cost=250.00 size=2001 width=4) -> Index Scan using opv_pid_idx on opv (cost=250.00 size=2001 width=4)
Re: [SQL] performance problems of a complex query with 2 subselects
From
pierre@desertmoon.com
Date:
> > Hello! > > I am quite new to PostgreSQL and was hoping to use it as an alternative > to MS Access for the product catalog software I am developing (the data > is represented in a meta-model). > > I fully recreated the environment I had on MS Access (see attachment) > and tried to run the query. > > The execution time for this query on MS Access for 40000 records was > approx. 10 seconds. > First the pgsql query on 400 records just seemed to hang. I installed a > newer version of pgsql (6.4.2), created indices on the columns > used until 'explain' showed only 1 'seq search' (couldn't get rid of > that one). You need to 'vacuum analyze <table>' for each table that you have created AFTER you've created the indexes for them before the sql optimizer will use the indexes. This should solve most of your speed issues. I also believe that this is in the FAQ. -=pierre
> You need to 'vacuum analyze <table>' for each table that you have created > AFTER you've created the indexes for them before the sql optimizer will > use the indexes. This should solve most of your speed issues. I also believe > that this is in the FAQ. I forgot to tell you, but I did do vacuum; vacuum analyze; It seemed to process all tables, so I thought there was no need to vacuum analyze individual tables. I'll try that, thanks. Juris.