performance problems of a complex query with 2 subselects - Mailing list pgsql-sql

From Juris
Subject performance problems of a complex query with 2 subselects
Date
Msg-id 36BB1581.555A2364@rfb.lv
Whole thread Raw
Responses Re: [SQL] performance problems of a complex query with 2 subselects  (pierre@desertmoon.com)
List pgsql-sql
            Hello!

I am quite new to PostgreSQL and was hoping to use it as an alternative
to MS Access for the product catalog software I am developing (the data
is represented in a meta-model).

I fully recreated the environment I had on MS Access (see attachment)
and tried to run the query.

The execution time for this query on MS Access for 40000 records was
approx. 10 seconds.
First the pgsql query on 400 records just seemed to hang. I installed a
newer version of pgsql (6.4.2), created indices on the columns
used until 'explain' showed only 1 'seq search' (couldn't get rid of
that one).
I got the query time on 400 records to be 800ms and decided
to try it on 40000 records, but to no avail. It trashed disk for 5
minutes and then declared it was out of virtual memory (which was
probably
true).

So the question here is why did Access do just fine (I didn't even
create any indices) while Pgsql didn't? What could I be doing wrong?
What can I improve? Any more comments?

I've heard Access keeps the whole .mdb file in memory that's why for
small databases it is faster... Could that be the answer?

                            Best regards,
                                Juris.
P.S. Here come my SQL statements and 'explain' output:

Create Table C (C_id int, C_name Char(20));
Create Table CP (C_id int, P_id int);
Create Table OPV (O_id int, P_id int, OPV_value char(20));
Create Table O (O_id int, C_id int);
Create Table P (P_id int, P_name char(20));

Insert Into P (P_id, P_name) Values (1, 'Name');
Insert Into P (P_id, P_name) Values (2, 'Manufacturer');
Insert Into P (P_id, P_name) Values (3, 'Price');
Insert Into P (P_id, P_name) Values (4, 'URL');
Insert Into C (C_Id, C_Name) Values (1, 'Trivial');
Insert Into CP (C_id, P_id) Values (1, 1);
Insert Into CP (C_id, P_id) Values (1, 2);
Insert Into CP (C_id, P_id) Values (1, 3);
Insert Into CP (C_id, P_id) Values (1, 4);

Inserted 40000 records (4 properties for each of 10000 objects) into OPV
(I have a python script which does that).

create index opv_value_idx on opv (opv_value);
create index opv_oid_idx on opv (o_id);
create index opv_pid_idx on opv (p_id);
create unique index o_oid_idx on o (o_id);
create unique index p_pid_idx on p (p_id);

select o.o_id, p.p_id, opv.opv_value from o, p, opv where
        (opv.o_id = o.o_id) And
        (opv.p_id = p.p_id) And
        (p.p_id in (1, 2, 3, 4)) And
        (o.o_id in
                (select opv.o_id from opv where opv.p_id = 2 and
                 opv.opv_value = 'GigaCorp Inc.'
                )
        ) And
        (o.o_id in
                (select opv.o_id from opv where opv.p_id = 3 and
                 int4( trim(opv.opv_value)  ) >= 100
                )
        ) And
        (o.o_id in
                (select opv.o_id from opv where opv.p_id = 3 and
                 int4( trim(opv.opv_value)  ) <= 800
                )
        )
order by o.o_id;

Sort  (cost=3460.13 size=0 width=0)
  ->  Hash Join  (cost=3460.13 size=30000 width=28)
        ->  Nested Loop  (cost=751.13 size=30000 width=24)
              ->  Seq Scan on p  (cost=1.13 size=3 width=4)
              ->  Index Scan using opv_pid_idx on opv  (cost=250.00
size=40000 width=20)
        ->  Hash  (cost=0.00 size=0 width=0)
              ->  Seq Scan on o  (cost=394.00 size=10000 width=4)
                    SubPlan
                      ->  Index Scan using opv_pid_idx on opv
(cost=250.00 size=401 width=4)
                      ->  Index Scan using opv_pid_idx on opv
(cost=250.00 size=2001 width=4)
                      ->  Index Scan using opv_pid_idx on opv
(cost=250.00 size=2001 width=4)

pgsql-sql by date:

Previous
From: Patrik Kudo
Date:
Subject: Function returning multiple rows?
Next
From: pierre@desertmoon.com
Date:
Subject: Re: [SQL] performance problems of a complex query with 2 subselects