Thread: Confirmation of bad query plan generated by 7.4 tree
Just so I don't think I'm insane: warehouse=# explain analyze select e.event_date::date warehouse-# from l_event_log e warehouse-# JOIN c_event_type t ON (t.id = e.event_type_id) warehouse-# WHERE e.event_date > now() - interval '2 days' warehouse-# AND t.event_name = 'activation'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=9.22..2723869.56 rows=268505 width=8) (actual time=107.324..408.466 rows=815 loops=1) Hash Cond: ("outer".event_type_id = "inner".id) -> Index Scan using idx_evt_dt on l_event_log e (cost=0.00..2641742.75 rows=15752255 width=12) (actual time=0.034..229.641 rows=38923 loops=1) Index Cond: (event_date > (now() - '2 days'::interval)) -> Hash (cost=9.21..9.21 rows=3 width=4) (actual time=0.392..0.392 rows=0 loops=1) -> Index Scan using pk_c_event_type on c_event_type t (cost=0.00..9.21 rows=3 width=4) (actual time=0.071..0.353 rows=6 loops=1) Filter: ((event_name)::text = 'activation'::text) Total runtime: 412.015 ms (8 rows) Am I correct in assuming this terrible plan is due to our ancient version of Postgres? This plan is so bad, the system prefers a sequence scan on our 27M row table with dates spanning 4 years. 2 days should come back instantly. Both tables are freshly vacuumed and analyzed, so I'll just chalk this up to 7.4 sucking unless someone says otherwise. -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com Confidentiality Note: The document(s) accompanying this e-mail transmission, if any, and the e-mail transmittal message contain information from Leapfrog Online Customer Acquisition, LLC is confidential or privileged. The information is intended to be for the use of the individual(s) or entity(ies) named on this e-mail transmission message. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is prohibited. If you have received this e-mail in error, please immediately delete this e-mail and notify us by telephone of the error
"Shaun Thomas" <sthomas@leapfrogonline.com> writes: > Am I correct in assuming this terrible plan is due to our ancient > version of Postgres? I missed the part where you explain why you think this plan is terrible? 412ms for what seems a rather expensive query doesn't sound so awful. Do you know an alternative that is better? regards, tom lane
> warehouse-# WHERE e.event_date > now() - interval '2 days' Try explicitly querying: WHERE e.event_date > '2006-06-11 20:15:00' In my understanding 7.4 does not precalculate this timestamp value for the purpose of choosing a plan. Greetings Marcin
>>> On 6/13/2006 at 1:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I missed the part where you explain why you think this plan is terrible? > 412ms for what seems a rather expensive query doesn't sound so awful. Sorry, I based that statement on the estimated/actual disparity. That particular query plan is not terrible in its results, but look at the estimates and how viciously the explain analyze corrects the values. Here's an example: -> Index Scan using idx_evt_dt on l_event_log e (cost=0.00..2641742.75 rows=15752255 width=12) (actual time=0.034..229.641 rows=38923 loops=1) rows=15752255 ? That's over half the 27M row table. As expected, the *actual* match is much, much lower at 38923. As it turns out, Marcin was right. Simply changing: now() - interval '2 days' to '2006-06-11 15:30:00' generated a much more accurate set of estimates. I have to assume that 7.4 is incapable of that optimization step. Now that I know this, I plan on modifying my stored proc to calculate the value before inserting it into the query. Thanks! -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com Confidentiality Note: The document(s) accompanying this e-mail transmission, if any, and the e-mail transmittal message contain information from Leapfrog Online Customer Acquisition, LLC is confidential or privileged. The information is intended to be for the use of the individual(s) or entity(ies) named on this e-mail transmission message. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is prohibited. If you have received this e-mail in error, please immediately delete this e-mail and notify us by telephone of the error
"Shaun Thomas" <sthomas@leapfrogonline.com> writes: > Simply changing: > now() - interval '2 days' > to > '2006-06-11 15:30:00' > generated a much more accurate set of estimates. Yeah, 7.4 won't risk basing estimates on the results of non-immutable functions. We relaxed that in 8.0 I believe. regards, tom lane
On Tue, Jun 13, 2006 at 03:54:44PM -0500, Shaun Thomas wrote: > >>> On 6/13/2006 at 1:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > I missed the part where you explain why you think this plan is > terrible? > > 412ms for what seems a rather expensive query doesn't sound so > awful. > > Sorry, I based that statement on the estimated/actual disparity. That > particular query plan is not terrible in its results, but look at the > estimates and how viciously the explain analyze corrects the values. > > Here's an example: > > -> Index Scan using idx_evt_dt on l_event_log e > (cost=0.00..2641742.75 rows=15752255 width=12) > (actual time=0.034..229.641 rows=38923 loops=1) > > rows=15752255 ? That's over half the 27M row table. As expected, the > *actual* match is much, much lower at 38923. As it turns out, Marcin > was right. Simply changing: > > now() - interval '2 days' > > to > > '2006-06-11 15:30:00' > > generated a much more accurate set of estimates. I have to assume > that > 7.4 is incapable of that optimization step. Now that I know this, I > plan on modifying my stored proc to calculate the value before > inserting > it into the query. Is there some compelling reason to stick with 7.4? In my experience you'll see around double (+100%) the performance going to 8.1... Also, I'm not sure that the behavior is entirely changed, either. On a 8.1.4 database I'm still seeing a difference between now() - interval and a hard-coded date. What's your stats target set to for that table? > -- > Shaun Thomas > Database Administrator > > Leapfrog Online > 807 Greenwood Street > Evanston, IL 60201 Heh, I grew up 3 miles from there. In fact, IIRC my old dentist is/was at 807 Davis. > Tel. 847-440-8253 > Fax. 847-570-5750 > www.leapfrogonline.com -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>>> On 6/13/2006 at 4:13 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > Is there some compelling reason to stick with 7.4? In my experience > you'll see around double (+100%) the performance going to 8.1... Not really. We *really* want to upgrade, but we're in the middle of buying the new machine right now. There's also the issue of migrating 37GB of data which I don't look forward to, considering we'll need to set up a slony replication for the entire thing to avoid the hours of downtime necessary for a full dump/restore. > What's your stats target set to for that table? Not sure what you mean by that. It's just that this table has 27M rows extending over 4 years, and I'm not quite sure how to hint to that. An index scan for a few days would be a tiny fraction of the entire table, so PG being insistent on the sequence scans was confusing the hell out of me. -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com Confidentiality Note: The document(s) accompanying this e-mail transmission, if any, and the e-mail transmittal message contain information from Leapfrog Online Customer Acquisition, LLC is confidential or privileged. The information is intended to be for the use of the individual(s) or entity(ies) named on this e-mail transmission message. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is prohibited. If you have received this e-mail in error, please immediately delete this e-mail and notify us by telephone of the error
On Tue, Jun 13, 2006 at 04:35:41PM -0500, Shaun Thomas wrote: > >>> On 6/13/2006 at 4:13 PM, "Jim C. Nasby" <jnasby@pervasive.com> > wrote: > > > > Is there some compelling reason to stick with 7.4? In my experience > > you'll see around double (+100%) the performance going to 8.1... > > Not really. We *really* want to upgrade, but we're in the middle of > buying the new machine right now. There's also the issue of migrating > 37GB of data which I don't look forward to, considering we'll need to > set up a slony replication for the entire thing to avoid the hours > of downtime necessary for a full dump/restore. As long as the master isn't very heavily loaded it shouldn't be that big a deal to do so... > > What's your stats target set to for that table? > > Not sure what you mean by that. It's just that this table has 27M > rows > extending over 4 years, and I'm not quite sure how to hint to that. > An index scan for a few days would be a tiny fraction of the entire > table, so PG being insistent on the sequence scans was confusing the > hell > out of me. What's the output of SELECT attname, attstattarget FROM pg_attribute WHERE attrelid='table_name'::regclass AND attnum >= 0; and SHOW default_statistics_target; ? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > Also, I'm not sure that the behavior is entirely changed, either. On a > 8.1.4 database I'm still seeing a difference between now() - interval > and a hard-coded date. It'd depend on the context, possibly, but it's easy to show that the current planner does fold "now() - interval_constant" when making estimates. Simple example: -- create and populate 1000-row table: regression=# create table t1 (f1 timestamptz); CREATE TABLE regression=# insert into t1 select now() - x * interval '1 day' from generate_series(1,1000) x; INSERT 0 1000 -- default estimate is pretty awful: regression=# explain select * from t1 where f1 > now(); QUERY PLAN ----------------------------------------------------- Seq Scan on t1 (cost=0.00..39.10 rows=647 width=8) Filter: (f1 > now()) (2 rows) regression=# vacuum t1; VACUUM -- now the planner at least knows how many rows in the table with some -- accuracy, but with no stats it's still falling back on a default -- selectivity estimate: regression=# explain select * from t1 where f1 > now(); QUERY PLAN ----------------------------------------------------- Seq Scan on t1 (cost=0.00..21.00 rows=333 width=8) Filter: (f1 > now()) (2 rows) -- and the default doesn't really care what the comparison value is: regression=# explain select * from t1 where f1 > now() - interval '10 days'; QUERY PLAN ----------------------------------------------------- Seq Scan on t1 (cost=0.00..23.50 rows=333 width=8) Filter: (f1 > (now() - '10 days'::interval)) (2 rows) -- but let's give it some stats: regression=# vacuum analyze t1; VACUUM -- and things get better: regression=# explain select * from t1 where f1 > now() - interval '10 days'; QUERY PLAN --------------------------------------------------- Seq Scan on t1 (cost=0.00..23.50 rows=9 width=8) Filter: (f1 > (now() - '10 days'::interval)) (2 rows) 7.4 would still be saying "rows=333" in the last case, because it's falling back on DEFAULT_INEQ_SEL whenever the comparison value isn't strictly constant. regards, tom lane
On Tue, Jun 13, 2006 at 06:04:42PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > Also, I'm not sure that the behavior is entirely changed, either. On a > > 8.1.4 database I'm still seeing a difference between now() - interval > > and a hard-coded date. > > It'd depend on the context, possibly, but it's easy to show that the > current planner does fold "now() - interval_constant" when making > estimates. Simple example: Turns out the difference is between feeding a date vs a timestamp into the query... I would have thought that since date is a date that the WHERE clause would be casted to a date if it was a timestamptz, but I guess not... stats=# explain select * from email_contrib where project_id=8 and date >= now()-'15 days'::interval; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Index Scan using email_contrib__project_date on email_contrib (cost=0.01..45405.83 rows=14225 width=24) Index Cond: ((project_id = 8) AND (date >= (now() - '15 days'::interval))) (2 rows) stats=# explain select * from email_contrib where project_id=8 AND date >= '2006-05-29 22:09:56.814897+00'::date; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Index Scan using email_contrib__project_date on email_contrib (cost=0.00..48951.74 rows=15336 width=24) Index Cond: ((project_id = 8) AND (date >= '2006-05-29'::date)) (2 rows) stats=# explain select * from email_contrib where project_id=8 AND date >= '2006-05-29 22:09:56.814897+00'::timestamp; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Index Scan using email_contrib__project_date on email_contrib (cost=0.00..45472.76 rows=14246 width=24) Index Cond: ((project_id = 8) AND (date >= '2006-05-29 22:09:56.814897'::timestamp without time zone)) (2 rows) Actual row count is 109071; reason for the vast difference is querying on two columns. I know comming up with general-purpose multicolumn stats is extremely difficult, but can't we at least add histograms for multi-column indexes?? In this case that would most likely make the estimate dead-on, because there's an index on project_id, date. Details below for the morbidly curious/bored... stats=# \d email_contrib Table "public.email_contrib" Column | Type | Modifiers ------------+---------+----------- project_id | integer | not null id | integer | not null date | date | not null team_id | integer | work_units | bigint | not null Indexes: "email_contrib_pkey" PRIMARY KEY, btree (project_id, id, date), tablespace "raid10" "email_contrib__pk24" btree (id, date) WHERE project_id = 24, tablespace "raid10" "email_contrib__pk25" btree (id, date) WHERE project_id = 25, tablespace "raid10" "email_contrib__pk8" btree (id, date) WHERE project_id = 8, tablespace "raid10" "email_contrib__project_date" btree (project_id, date), tablespace "raid10" "email_contrib__project_id" btree (project_id), tablespace "raid10" "email_contrib__team_id" btree (team_id), tablespace "raid10" Foreign-key constraints: "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE Tablespace: "raid10" stats=# explain analyze select * from email_contrib where project_id=8 and date >= now()-'15 days'::interval; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using email_contrib__project_date on email_contrib (cost=0.01..45475.95 rows=14247 width=24) (actual time=0.294..264.345rows=109071 loops=1) Index Cond: ((project_id = 8) AND (date >= (now() - '15 days'::interval))) Total runtime: 412.167 ms (3 rows) stats=# select now()-'15 days'::interval; ?column? ------------------------------- 2006-05-29 22:09:56.814897+00 (1 row) stats=# explain analyze select * from email_contrib where project_id=8 and date >= '2006-05-29 22:09:56.814897+00'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using email_contrib__project_date on email_contrib (cost=0.00..48951.74 rows=15336 width=24) (actual time=0.124..229.800rows=116828 loops=1) Index Cond: ((project_id = 8) AND (date >= '2006-05-29'::date)) Total runtime: 391.240 ms (3 rows) stats=# explain select * from email_contrib where project_id=8 and date >= '2006-05-29 22:09:56.814897+00'::date; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Index Scan using email_contrib__project_date on email_contrib (cost=0.00..48951.74 rows=15336 width=24) Index Cond: ((project_id = 8) AND (date >= '2006-05-29'::date)) (2 rows) So casting to date doesn't change anything, but dropping project_id from the where clause certainly does... stats=# explain analyze select * from email_contrib where date >= now()-'15 days'::interval; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on email_contrib (cost=847355.98..1256538.96 rows=152552 width=24) (actual time=74886.028..75267.633 rows=148894loops=1) Recheck Cond: (date >= (now() - '15 days'::interval)) -> Bitmap Index Scan on email_contrib__project_date (cost=0.00..847355.98 rows=152552 width=0) (actual time=74885.690..74885.690rows=148894 loops=1) Index Cond: (date >= (now() - '15 days'::interval)) Total runtime: 75472.490 ms (5 rows) That estimate is dead-on. So it appears it's yet another case of cross-column stats. :( But there's still a difference between now()-interval and something hard-coded: stats=# explain analyze select * from email_contrib where date >= '2006-05-29 22:09:56.814897+00'::date; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on email_contrib (cost=847355.98..1278756.22 rows=164256 width=24) (actual time=19356.752..19623.450 rows=159348loops=1) Recheck Cond: (date >= '2006-05-29'::date) -> Bitmap Index Scan on email_contrib__project_date (cost=0.00..847355.98 rows=164256 width=0) (actual time=19356.391..19356.391rows=159348 loops=1) Index Cond: (date >= '2006-05-29'::date) Total runtime: 19841.614 ms (5 rows) stats=# explain analyze select * from email_contrib where date >= (now()-'15 days'::interval)::date; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on email_contrib (cost=847355.98..1279988.15 rows=164256 width=24) (actual time=19099.417..19372.167 rows=159348loops=1) Recheck Cond: (date >= ((now() - '15 days'::interval))::date) -> Bitmap Index Scan on email_contrib__project_date (cost=0.00..847355.98 rows=164256 width=0) (actual time=19099.057..19099.057rows=159348 loops=1) Index Cond: (date >= ((now() - '15 days'::interval))::date) Total runtime: 19589.785 ms Aha! It's the casting to date that changes things. The stats target is 100... stats=# select attname, n_distinct from pg_stats where tablename='email_contrib'; attname | n_distinct ------------+------------ project_id | 6 team_id | 4104 work_units | 6795 date | 3034 id | 35301 (5 rows) The n_distinct for project_id and date both look about right. stats=# select * from pg_stats where tablename='email_contrib' and attname='project_id'; -[ RECORD 1 ]-----+------------------------------------------------------------ schemaname | public tablename | email_contrib attname | project_id null_frac | 0 avg_width | 4 n_distinct | 6 most_common_vals | {205,5,8,25,24,3} most_common_freqs | {0.4273,0.419833,0.0933667,0.0514667,0.00506667,0.00296667} histogram_bounds | correlation | 0.605662 stats=# select relpages,reltuples from pg_class where relname='email_contrib'; relpages | reltuples ----------+------------- 996524 | 1.35509e+08 If we look at how many rows would match project_id 8 and any 15 dates... stats=# SELECT 1.35509e+08 * 0.0933667 / 3034 * 15; ?column? ------------------------ 62551.2268472313777195 We come up with something much closer to reality (116828 rows). I guess the problem is in the histogram for date; where the last 3 values are: 2005-11-02,2006-03-05,2006-06-11 -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>>> On 6/13/2006 at 4:54 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > SELECT attname, attstattarget > FROM pg_attribute > WHERE attrelid='table_name'::regclass AND attnum >= 0; -1 for all values. > SHOW default_statistics_target; 10. Here's something slightly annoying: I tried precalculating the value in my stored proc, and it's still ignoring it. lastTime := now() - interval ''7 days''; UPDATE fact_credit_app SET activated_date_id = ad.date_id FROM l_event_log e JOIN c_event_type t ON (t.id = e.event_type_id) JOIN wf_date ad ON (e.event_date::date=ad.datestamp) WHERE e.ext_id=fact_credit_app.unique_id AND t.event_name = ''activation'' AND e.event_date > lastTime AND fact_credit_app.activated_date_id IS NULL; Instead of taking a handful of seconds (like when I replace lastTime with the text equivalent), it takes 10 minutes... I can see the planner not liking the results of a function, but a variable? That's a static value! ::cry:: -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com Confidentiality Note: The document(s) accompanying this e-mail transmission, if any, and the e-mail transmittal message contain information from Leapfrog Online Customer Acquisition, LLC is confidential or privileged. The information is intended to be for the use of the individual(s) or entity(ies) named on this e-mail transmission message. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is prohibited. If you have received this e-mail in error, please immediately delete this e-mail and notify us by telephone of the error
On Tue, Jun 13, 2006 at 05:41:06PM -0500, Shaun Thomas wrote: > >>> On 6/13/2006 at 4:54 PM, "Jim C. Nasby" <jnasby@pervasive.com> > wrote: > > > SELECT attname, attstattarget > > FROM pg_attribute > > WHERE attrelid='table_name'::regclass AND attnum >= 0; > > -1 for all values. > > > SHOW default_statistics_target; > > 10. Increasing the statistics target for that table (or default_statistics_target) might help. I'd try between 50 and 100. > Here's something slightly annoying: I tried precalculating the value > in my stored proc, and it's still ignoring it. > > lastTime := now() - interval ''7 days''; > > UPDATE fact_credit_app > SET activated_date_id = ad.date_id > FROM l_event_log e > JOIN c_event_type t ON (t.id = e.event_type_id) > JOIN wf_date ad ON (e.event_date::date=ad.datestamp) > WHERE e.ext_id=fact_credit_app.unique_id > AND t.event_name = ''activation'' > AND e.event_date > lastTime > AND fact_credit_app.activated_date_id IS NULL; > > Instead of taking a handful of seconds (like when I replace > lastTime with the text equivalent), it takes 10 minutes... > I can see the planner not liking the results of a function, > but a variable? That's a static value! ::cry:: If you're using plpgsql, it should be turning that update into a prepared statement and then binding the variable to it. That means that if you pass in different values in the same session, you could end up with bad plans depending on the valuse, since it will cache the query plan. Actually, come to think of it... I'm not sure if bound parameters are used in query planning... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Tue, Jun 13, 2006 at 06:04:42PM -0400, Tom Lane wrote: >> It'd depend on the context, possibly, but it's easy to show that the >> current planner does fold "now() - interval_constant" when making >> estimates. Simple example: > Turns out the difference is between feeding a date vs a timestamp into the > query... I would have thought that since date is a date that the WHERE clause > would be casted to a date if it was a timestamptz, but I guess not... Hmm ... worksforme. Could you provide a complete test case? regards, tom lane
"Shaun Thomas" <sthomas@leapfrogonline.com> writes: > I can see the planner not liking the results of a function, > but a variable? That's a static value! Read what you wrote, and rethink... If you're desperate you can construct a query string with the variable value embedded as a literal, and then EXECUTE that. This isn't a great solution since it forces a re-plan on every execution. We've occasionally debated ways to do it better, but no such improvement will ever appear in 7.4 ;-) regards, tom lane
On Tue, Jun 13, 2006 at 09:50:49PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > On Tue, Jun 13, 2006 at 06:04:42PM -0400, Tom Lane wrote: > >> It'd depend on the context, possibly, but it's easy to show that the > >> current planner does fold "now() - interval_constant" when making > >> estimates. Simple example: > > > Turns out the difference is between feeding a date vs a timestamp into the > > query... I would have thought that since date is a date that the WHERE clause > > would be casted to a date if it was a timestamptz, but I guess not... > > Hmm ... worksforme. Could you provide a complete test case? I can't provide the data I used for that, but I'll try and come up with something else. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>>> On 6/13/2006 at 9:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Read what you wrote, and rethink... Hah. Yes, I understand the irony of that statement, but the point is that the value of the variable won't change during query execution. > If you're desperate you can construct a query string with the variable > value embedded as a literal, and then EXECUTE that. This isn't a great > solution since it forces a re-plan on every execution. That's so gross... but it might work. I'm not really desperate, just frustrated. I really can't wait until we can upgrade; 7.4 is driving me nuts. I'm not really worried about a re-plan, since this SP just updates a fact table, so it only gets called twice a day. Cutting the execution time of the SP down to < 20 seconds from 15 minutes would be nice, but not absolutely required. I was just surprised at the large difference in manual execution as opposed to the SP with the same query. > We've occasionally debated ways to do it better, but no such > improvement will ever appear in 7.4 ;-) Agreed! When we finally upgrade, I fully plan on putting a symbolic bullet into our old installation. ;) Thanks! -- Shaun Thomas Database Administrator Leapfrog Online 807 Greenwood Street Evanston, IL 60201 Tel. 847-440-8253 Fax. 847-570-5750 www.leapfrogonline.com Confidentiality Note: The document(s) accompanying this e-mail transmission, if any, and the e-mail transmittal message contain information from Leapfrog Online Customer Acquisition, LLC is confidential or privileged. The information is intended to be for the use of the individual(s) or entity(ies) named on this e-mail transmission message. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is prohibited. If you have received this e-mail in error, please immediately delete this e-mail and notify us by telephone of the error
On Jun 13, 2006, at 8:50 PM, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: >> On Tue, Jun 13, 2006 at 06:04:42PM -0400, Tom Lane wrote: >>> It'd depend on the context, possibly, but it's easy to show that the >>> current planner does fold "now() - interval_constant" when making >>> estimates. Simple example: > >> Turns out the difference is between feeding a date vs a timestamp >> into the >> query... I would have thought that since date is a date that the >> WHERE clause >> would be casted to a date if it was a timestamptz, but I guess not... > > Hmm ... worksforme. Could you provide a complete test case? decibel=# create table date_test(d date not null, i int not null); CREATE TABLE decibel=# insert into date_test select now()-x*'1 day'::interval, i from generate_series(0,3000) x, generate_series(1,100000) i; INSERT 0 300100000 decibel=# analyze verbose date_test; INFO: analyzing "decibel.date_test" INFO: "date_test": scanned 30000 of 1622163 pages, containing 5550000 live rows and 0 dead rows; 30000 rows in sample, 300100155 estimated total rows ANALYZE decibel=# explain select * from date_test where d >= now()-'15 days'::interval; QUERY PLAN --------------------------------------------------------------------- Seq Scan on date_test (cost=0.00..6873915.80 rows=1228164 width=8) Filter: (d >= (now() - '15 days'::interval)) (2 rows) decibel=# explain select * from date_test where d >= (now()-'15 days'::interval)::date; QUERY PLAN --------------------------------------------------------------------- Seq Scan on date_test (cost=0.00..7624166.20 rows=1306467 width=8) Filter: (d >= ((now() - '15 days'::interval))::date) (2 rows) decibel=# select version(); version ------------------------------------------------------------------------ ------------------------- PostgreSQL 8.1.4 on amd64-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518 (1 row) decibel=# -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim Nasby <jnasby@pervasive.com> writes: > On Jun 13, 2006, at 8:50 PM, Tom Lane wrote: >> Hmm ... worksforme. Could you provide a complete test case? > decibel=# create table date_test(d date not null, i int not null); > [etc] Not sure what you are driving at. The estimates are clearly not defaults (the default estimate would be 1/3rd of the table, or about 100mil rows). Are you expecting them to be the same? If so why? The comparison values are slightly different after all. regards, tom lane
On Wed, Jun 14, 2006 at 10:36:55PM -0400, Tom Lane wrote: > Jim Nasby <jnasby@pervasive.com> writes: > > On Jun 13, 2006, at 8:50 PM, Tom Lane wrote: > >> Hmm ... worksforme. Could you provide a complete test case? > > > decibel=# create table date_test(d date not null, i int not null); > > [etc] > > Not sure what you are driving at. The estimates are clearly not > defaults (the default estimate would be 1/3rd of the table, or > about 100mil rows). Are you expecting them to be the same? If so why? > The comparison values are slightly different after all. Yes... I was expecting that since we're looking at a date field that the timestamp would get cast to a date. Sorry I wasn't clear on that... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461