Re: Why are queries with subselects so slow? - Mailing list pgsql-general

From shreedhar
Subject Re: Why are queries with subselects so slow?
Date
Msg-id 00df01c2e49c$86ec4f20$1201a8c0@a4005
Whole thread Raw
In response to Why are queries with subselects so slow?  ("Robert Wille" <robertw@willeweb.com>)
List pgsql-general
Its Sure,
 
SubSelect which takes lot of temporary tables to store the required data which makes the process slow.
 
Subselects and Joins does not show much difference, but if data grows Subselect becoming very slow.
 
So you can use joins which makes your query very fast.
 
Regards,
Sreedhar
----- Original Message -----
Sent: Friday, March 07, 2003 11:08 AM
Subject: [GENERAL] Why are queries with subselects so slow?

I am trying to execute an update that has a subselect, and it takes an extraordinarily long time.
By performing the subselect in my application, and issuing the update multiple times, it appears that I can do the entire update about 400 times faster.
 
Following is a detailed description of the problem.
 
I am running Postgres 7.2.3 on Linux, and I have a freshly vacuum analyze'd database. Here are the tables:
 
create table container(
 containerid integer not null,
 parentid integer not null,
 name varchar(40) not null,
 createtime timestamp not null default current_timestamp,
 type smallint not null,
 primary key (containerid),
 foreign key (parentid) references container (containerid) on delete cascade,
 unique (parentid, name)
);
 
create table image(
 imageid integer not null,
 containerid integer not null,
 name varchar(120) not null,
 createtime timestamp not null default current_timestamp,
 state bigint not null default 0,
 articlescope integer not null default 0,
 primary key (imageid),
 foreign key (containerid) references container (containerid) on delete cascade,
 unique (containerid, name)
);
 
create table ancestry(
 containerid integer not null,
 ancestorid integer not null,\
 foreign key (containerid) references container (containerid) on delete cascade,
 foreign key (ancestorid) references container (containerid) on delete cascade,
 unique (containerid, ancestorid),
 unique (ancestorid, containerid)
);
 
The image table has 2043686 rows
The container table has 4016 rows
The ancestry table has 24728 rows
 
I am trying to execute the following update statement:
 
update image set state = ((state & -35184340369664::int8) | 0::int8) where containerid in (select containerid from ancestry where ancestorid = 1228067)
 
This statement will eventually update 766459 of the 2043686 rows.
 
When I explain this statement, I get the following:
 
Seq Scan on image  (cost=0.00..906827616.06 rows=1023222 width=47)
  SubPlan
    ->  Materialize  (cost=443.10..443.10 rows=898 width=4)
          ->  Seq Scan on ancestry  (cost=0.00..443.10 rows=898 width=4)
 
It is predicting that it will take upwards of a billion page fetches to update about one million rows, or nearly a thousand page fetches per row. Obviously, this can be done more efficiently, as my image table is only about 50 megs in size. So in order to understand its execution plan, I asked it to explain smaller parts.
 
If I explain the subselect, I get:
 
explain select containerid from ancestry where ancestorid = 1228067
 
Seq Scan on ancestry  (cost=0.00..443.10 rows=898 width=4)
 
This seems fairly reasonable (the predicted number of rows is very close to the actual number of rows), except that I wonder why it is doing a sequential scan of a table with 25K rows to retrieve less than 4% of them.
 
If I explain the cost of doing an update of the image table for one row in the subselect, I get the following:
 
explain update image set state = ((state & -35184340369664::int8) | 0::int8) where containerid=1228533
 
Index Scan using image_containerid_key on image  (cost=0.00..2809.74 rows=862 width=47)
 
Again, this seems reasonable. Therefore, if I were to do the update for each row in the subselect, it should take about 2.5 million (898 * 2809) page fetches, which is about 400 times fewer page fetches than the original statement. Therefore, I should be able to do this update before I am an old man by executing the subselect, and then calling update 865 times.

Can anybody explain why the database performs so poorly in this case? Is there anything I can do besides executing the subselect and then calling update multiple times?

pgsql-general by date:

Previous
From: "Carl Olivier"
Date:
Subject: User Backend Sessions
Next
From: "Cristian Custodio"
Date:
Subject: coerce_type: no conversion function from "unknown type" to text