Re: [pgsql-hackers-win32] Poor Performance for large queries in functions - Mailing list pgsql-performance
From | John Meinel |
---|---|
Subject | Re: [pgsql-hackers-win32] Poor Performance for large queries in functions |
Date | |
Msg-id | 415A575F.2060802@johnmeinel.com Whole thread Raw |
Responses |
Re: [pgsql-hackers-win32] Poor Performance for large queries
Re: [pgsql-hackers-win32] Poor Performance for large queries in functions |
List | pgsql-performance |
Tom Lane wrote: > John Meinel <john@johnmeinel.com> writes: > >>... However, if I try to >>bundle this query up into a server side function, it runs very slow (10 >>seconds). I'm trying to figure out why, but since I can't run EXPLAIN >>ANALYZE inside a function, I don't really know what else to do. > > > A parameterized query inside a function is basically the same as a > PREPARE'd query with parameters at the SQL level. So you can > investigate what's happening here with > > PREPARE foo(int) AS > SELECT * FROM object WHERE id in ( > SELECT id FROM data_t WHERE project_id = $1 > UNION SELECT ... ; > > EXPLAIN ANALYZE EXECUTE foo(48542); > > I'm not sure where the problem is either, so please do send along the > results. > > regards, tom lane > > PS: pgsql-performance would be a more appropriate venue for this > discussion. Well, I think I tracked the problem down to the fact that the column does not have a "not null" constraint on it. Here is a demonstration. Basically, I have 3 tables, tobjects, tdata, and tproject. tdata basically just links between tobjects and tproject, but isn't required to link to tproject. Yes, the real data has more columns, but this shows the problem. jfmeinel=> \d tobjects Table "public.tobjects" Column | Type | Modifiers --------+---------+----------- id | integer | not null Indexes: "tobjects_pkey" primary key, btree (id) jfmeinel=> \d tproject Table "public.tproject" Column | Type | Modifiers --------+---------+----------- id | integer | not null Indexes: "tproject_pkey" primary key, btree (id) jfmeinel=> \d tdata Table "public.tdata" Column | Type | Modifiers ------------+---------+----------- id | integer | not null project_id | integer | Indexes: "tdata_pkey" primary key, btree (id) "tdata_project_id_idx" btree (project_id) Foreign-key constraints: "tdata_id_fkey" FOREIGN KEY (id) REFERENCES tobjects(id) ON UPDATE CASCADE ON DELETE CASCADE "tdata_project_id_fkey" FOREIGN KEY (project_id) REFERENCES tproject(id) ON UPDATE CASCADE ON DELETE SET DEFAULT jfmeinel=> select count(*) from tdata; count -------- 545768 jfmeinel=> select count(*) - count(project_id) from tdata; ?column? ---------- 240 So tdata(project_id) is almost completely full, of the 540000+ entries, only 240 are null. jfmeinel=> prepare myget(int) as select id from tdata jfmeinel-> where project_id = $1; PREPARE jfmeinel=> explain analyze execute myget(30000); QUERY PLAN -------------------------------------------------------------------- Seq Scan on tdata (cost=0.00..9773.10 rows=181923 width=4) (actual time=1047.000..1047.000 rows=0 loops=1) Filter: (project_id = $1) Total runtime: 1047.000 ms jfmeinel=> explain analyze select id from tdata where project_id = 30000; QUERY PLAN ------------------------------------------------------------------------- Index Scan using tdata_project_id_idx on tdata (cost=0.00..4.20 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops =1) Index Cond: (project_id = 30000) Total runtime: 0.000 ms So notice that when doing the actual select it is able to do the index query. But for some reason with a prepared statement, it is not able to do it. Any ideas? Since I only have the integers now, I can send the data to someone if they care to investigate it. It comes to 2.2M as a .tar.bz2, so obviously I'm not going to spam the list. If I rewrite myget as: prepare myget(int) as select id from tdata where project_id = 30000; it does the right thing again. So it's something about how a variable interacts with an indexed column with null values. Note: I've tried creating a script that generates dummy data to show this problem and I have failed (it always performed the query correctly.) But this test data definitely shows the problem. And yes, I've vacuum analyzed all over the place. John =:-> PS> I tested this on PostgreSQL 7.4.3, and it did not demonstrate this problem. I am using PostgreSQL 8.0.0beta2 (probably -dev1)
Attachment
pgsql-performance by date: