Thread: query plan question
I have a query for which postgres is generating a different plan on different machines. The database schema is the same,the dataset is the same, the configuration is the same (e.g., pg_autovacuum running in both cases), both systems areSolaris 9. The main difference in the two systems is that one is sparc and the other is intel. The query runs in about 40 ms on the intel box, but takes about 18 seconds on the sparc box. Now, the intel boxes we haveare certainly faster, but I'm curious why the query plan might be different. For the intel: QUERY PLAN Unique (cost=11.50..11.52 rows=2 width=131) -> Sort (cost=11.50..11.50 rows=2 width=131) Sort Key: up.prefix, s.name, s.tuid, s.foundryversion -> Hash Join (cost=10.42..11.49 rows=2 width=131) Hash Cond: ("outer".dbid = "inner"."schema") -> Seq Scan on "schema" s (cost=0.00..1.02 rows=2 width=128) -> Hash (cost=10.41..10.41 rows=4 width=11) -> Nested Loop (cost=0.00..10.41 rows=4 width=11) -> Nested Loop (cost=0.00..2.14 rows=4 width=4) -> Seq Scan on flow fl (cost=0.00..0.00 rows=1 width=4) Filter: (servicetype = 646) -> Index Scan using usage_flow_i on "usage" u (cost=0.00..2.06 rows=6 width=8) Index Cond: (u.flow = "outer".dbid) -> Index Scan using usageparameter_usage_i on usageparameter up (cost=0.00..2.06 rows=1 width=15) Index Cond: (up."usage" = "outer".dbid) Filter: ((prefix)::text <> 'xsd'::text) For the sparc: QUERY PLAN Unique (cost=10.81..10.83 rows=1 width=167) -> Sort (cost=10.81..10.82 rows=1 width=167) Sort Key: up.prefix, s.name, s.tuid, s.foundryversion -> Nested Loop (cost=9.75..10.80 rows=1 width=167) Join Filter: ("outer".flow = "inner".dbid) -> Hash Join (cost=9.75..10.79 rows=1 width=171) Hash Cond: ("outer".dbid = "inner"."schema") -> Seq Scan on "schema" s (cost=0.00..1.02 rows=2 width=128) -> Hash (cost=9.75..9.75 rows=1 width=51) -> Nested Loop (cost=0.00..9.75 rows=1 width=51) Join Filter: ("inner"."usage" = "outer".dbid) -> Index Scan using usage_flow_i on "usage" u (cost=0.00..4.78 rows=1 width=8) -> Index Scan using usageparameter_schema_i on usageparameter up (cost=0.00..4.96 rows=1width=51) Filter: ((prefix)::text <> 'xsd'::text) -> Seq Scan on flow fl (cost=0.00..0.00 rows=1 width=4) Filter: (servicetype = 646) I assume the problem with the second plan starts with doing a Nested Loop rather than a Hash Join at the 4th line of theplan, but I don't know why it would be different for the same schema, same dataset. What factors go into the planner's decision to choose a nested loop over a hash join? Should I be looking at adjusting myruntime configuration on the sparc box somehow? Thanks. - DAP ---------------------------------------------------------------------------------- David Parker Tazz Networks (401) 709-5130
On Wed, 17 Nov 2004 02:54 pm, you wrote: > I have a query for which postgres is generating a different plan on different machines. The database schema is the same,the dataset is the same, the configuration is the same (e.g., pg_autovacuum running in both cases), both systems areSolaris 9. The main difference in the two systems is that one is sparc and the other is intel. > > The query runs in about 40 ms on the intel box, but takes about 18 seconds on the sparc box. Now, the intel boxes we haveare certainly faster, but I'm curious why the query plan might be different. > > For the intel: > > QUERY PLAN > Unique (cost=11.50..11.52 rows=2 width=131) > -> Sort (cost=11.50..11.50 rows=2 width=131) > Sort Key: up.prefix, s.name, s.tuid, s.foundryversion > -> Hash Join (cost=10.42..11.49 rows=2 width=131) > Hash Cond: ("outer".dbid = "inner"."schema") > -> Seq Scan on "schema" s (cost=0.00..1.02 rows=2 width=128) > -> Hash (cost=10.41..10.41 rows=4 width=11) > -> Nested Loop (cost=0.00..10.41 rows=4 width=11) > -> Nested Loop (cost=0.00..2.14 rows=4 width=4) > -> Seq Scan on flow fl (cost=0.00..0.00 rows=1 width=4) > Filter: (servicetype = 646) > -> Index Scan using usage_flow_i on "usage" u (cost=0.00..2.06 rows=6 width=8) > Index Cond: (u.flow = "outer".dbid) > -> Index Scan using usageparameter_usage_i on usageparameter up (cost=0.00..2.06 rows=1 width=15) > Index Cond: (up."usage" = "outer".dbid) > Filter: ((prefix)::text <> 'xsd'::text) > > For the sparc: > > QUERY PLAN > Unique (cost=10.81..10.83 rows=1 width=167) > -> Sort (cost=10.81..10.82 rows=1 width=167) > Sort Key: up.prefix, s.name, s.tuid, s.foundryversion > -> Nested Loop (cost=9.75..10.80 rows=1 width=167) > Join Filter: ("outer".flow = "inner".dbid) > -> Hash Join (cost=9.75..10.79 rows=1 width=171) > Hash Cond: ("outer".dbid = "inner"."schema") > -> Seq Scan on "schema" s (cost=0.00..1.02 rows=2 width=128) > -> Hash (cost=9.75..9.75 rows=1 width=51) > -> Nested Loop (cost=0.00..9.75 rows=1 width=51) > Join Filter: ("inner"."usage" = "outer".dbid) > -> Index Scan using usage_flow_i on "usage" u (cost=0.00..4.78 rows=1 width=8) > -> Index Scan using usageparameter_schema_i on usageparameter up (cost=0.00..4.96 rows=1width=51) > Filter: ((prefix)::text <> 'xsd'::text) > -> Seq Scan on flow fl (cost=0.00..0.00 rows=1 width=4) > Filter: (servicetype = 646) > Unique (cost=11.50..11.52 rows=2 width=131) Unique (cost=10.81..10.83 rows=1 width=167) The estimations for the cost is basically the same, 10ms for the first row. Can you supply Explain analyze to see what it'sactually doing? Russell Smith
David Parker wrote: >I have a query for which postgres is generating a different plan on different machines. The database schema is the same,the dataset is the same, the configuration is the same (e.g., pg_autovacuum running in both cases), both systems areSolaris 9. The main difference in the two systems is that one is sparc and the other is intel. > >The query runs in about 40 ms on the intel box, but takes about 18 seconds on the sparc box. Now, the intel boxes we haveare certainly faster, but I'm curious why the query plan might be different. > > If they are the same and PostgreSQL are the same, are the intel machines Xeons? Sincerely, Joshua D. Drake >For the intel: > >QUERY PLAN >Unique (cost=11.50..11.52 rows=2 width=131) > -> Sort (cost=11.50..11.50 rows=2 width=131) > Sort Key: up.prefix, s.name, s.tuid, s.foundryversion > -> Hash Join (cost=10.42..11.49 rows=2 width=131) > Hash Cond: ("outer".dbid = "inner"."schema") > -> Seq Scan on "schema" s (cost=0.00..1.02 rows=2 width=128) > -> Hash (cost=10.41..10.41 rows=4 width=11) > -> Nested Loop (cost=0.00..10.41 rows=4 width=11) > -> Nested Loop (cost=0.00..2.14 rows=4 width=4) > -> Seq Scan on flow fl (cost=0.00..0.00 rows=1 width=4) > Filter: (servicetype = 646) > -> Index Scan using usage_flow_i on "usage" u (cost=0.00..2.06 rows=6 width=8) > Index Cond: (u.flow = "outer".dbid) > -> Index Scan using usageparameter_usage_i on usageparameter up (cost=0.00..2.06 rows=1 width=15) > Index Cond: (up."usage" = "outer".dbid) > Filter: ((prefix)::text <> 'xsd'::text) > >For the sparc: > >QUERY PLAN >Unique (cost=10.81..10.83 rows=1 width=167) > -> Sort (cost=10.81..10.82 rows=1 width=167) > Sort Key: up.prefix, s.name, s.tuid, s.foundryversion > -> Nested Loop (cost=9.75..10.80 rows=1 width=167) > Join Filter: ("outer".flow = "inner".dbid) > -> Hash Join (cost=9.75..10.79 rows=1 width=171) > Hash Cond: ("outer".dbid = "inner"."schema") > -> Seq Scan on "schema" s (cost=0.00..1.02 rows=2 width=128) > -> Hash (cost=9.75..9.75 rows=1 width=51) > -> Nested Loop (cost=0.00..9.75 rows=1 width=51) > Join Filter: ("inner"."usage" = "outer".dbid) > -> Index Scan using usage_flow_i on "usage" u (cost=0.00..4.78 rows=1 width=8) > -> Index Scan using usageparameter_schema_i on usageparameter up (cost=0.00..4.96 rows=1width=51) > Filter: ((prefix)::text <> 'xsd'::text) > -> Seq Scan on flow fl (cost=0.00..0.00 rows=1 width=4) > Filter: (servicetype = 646) > >I assume the problem with the second plan starts with doing a Nested Loop rather than a Hash Join at the 4th line of theplan, but I don't know why it would be different for the same schema, same dataset. > >What factors go into the planner's decision to choose a nested loop over a hash join? Should I be looking at adjusting myruntime configuration on the sparc box somehow? > >Thanks. > >- DAP >---------------------------------------------------------------------------------- >David Parker Tazz Networks (401) 709-5130 > > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Attachment
Oh, I didn't realize that analyze gave that much more info. I've got a lot to learn about this tuning stuff ;-) I've attached the output. I see from the new output where the slow query is taking its time (the nested loop at line 10), but I still have no idea why this plan is getting chosen.... Thanks! - DAP >-----Original Message----- >From: pgsql-performance-owner@postgresql.org >[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of >Russell Smith >Sent: Tuesday, November 16, 2004 11:36 PM >To: pgsql-performance@postgresql.org >Subject: Re: [PERFORM] query plan question > >On Wed, 17 Nov 2004 02:54 pm, you wrote: >> I have a query for which postgres is generating a different >plan on different machines. The database schema is the same, >the dataset is the same, the configuration is the same (e.g., >pg_autovacuum running in both cases), both systems are Solaris >9. The main difference in the two systems is that one is sparc >and the other is intel. >> >> The query runs in about 40 ms on the intel box, but takes >about 18 seconds on the sparc box. Now, the intel boxes we >have are certainly faster, but I'm curious why the query plan >might be different. >> >> For the intel: >> >> QUERY PLAN >> Unique (cost=11.50..11.52 rows=2 width=131) >> -> Sort (cost=11.50..11.50 rows=2 width=131) >> Sort Key: up.prefix, s.name, s.tuid, s.foundryversion >> -> Hash Join (cost=10.42..11.49 rows=2 width=131) >> Hash Cond: ("outer".dbid = "inner"."schema") >> -> Seq Scan on "schema" s (cost=0.00..1.02 >rows=2 width=128) >> -> Hash (cost=10.41..10.41 rows=4 width=11) >> -> Nested Loop (cost=0.00..10.41 >rows=4 width=11) >> -> Nested Loop (cost=0.00..2.14 >rows=4 width=4) >> -> Seq Scan on flow fl >(cost=0.00..0.00 rows=1 width=4) >> Filter: (servicetype = 646) >> -> Index Scan using >usage_flow_i on "usage" u (cost=0.00..2.06 rows=6 width=8) >> Index Cond: (u.flow = >"outer".dbid) >> -> Index Scan using >usageparameter_usage_i on usageparameter up (cost=0.00..2.06 >rows=1 width=15) >> Index Cond: (up."usage" = >"outer".dbid) >> Filter: ((prefix)::text <> >> 'xsd'::text) >> >> For the sparc: >> >> QUERY PLAN >> Unique (cost=10.81..10.83 rows=1 width=167) >> -> Sort (cost=10.81..10.82 rows=1 width=167) >> Sort Key: up.prefix, s.name, s.tuid, s.foundryversion >> -> Nested Loop (cost=9.75..10.80 rows=1 width=167) >> Join Filter: ("outer".flow = "inner".dbid) >> -> Hash Join (cost=9.75..10.79 rows=1 width=171) >> Hash Cond: ("outer".dbid = "inner"."schema") >> -> Seq Scan on "schema" s >(cost=0.00..1.02 rows=2 width=128) >> -> Hash (cost=9.75..9.75 rows=1 width=51) >> -> Nested Loop (cost=0.00..9.75 >rows=1 width=51) >> Join Filter: >("inner"."usage" = "outer".dbid) >> -> Index Scan using >usage_flow_i on "usage" u (cost=0.00..4.78 rows=1 width=8) >> -> Index Scan using >usageparameter_schema_i on usageparameter up (cost=0.00..4.96 >rows=1 width=51) >> Filter: >((prefix)::text <> 'xsd'::text) >> -> Seq Scan on flow fl (cost=0.00..0.00 >rows=1 width=4) >> Filter: (servicetype = 646) >> >Unique (cost=11.50..11.52 rows=2 width=131) Unique >(cost=10.81..10.83 rows=1 width=167) > >The estimations for the cost is basically the same, 10ms for >the first row. Can you supply Explain analyze to see what >it's actually doing? > >Russell Smith > >---------------------------(end of >broadcast)--------------------------- >TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Attachment
>If they are the same and PostgreSQL are the same, are the >intel machines Xeons? Yup, dual 3.06-GHz Intel Xeon Processors. I'm not sure off the top of my head what the sparcs are exactly. We're in the process of moving completely to intel, but we still have to support our app on sparc, and we are seeing these weird differences... - DAP
On Nov 17, 2004, at 7:32 AM, David Parker wrote: > Oh, I didn't realize that analyze gave that much more info. I've got a > lot to learn about this tuning stuff ;-) > > I've attached the output. I see from the new output where the slow > query > is taking its time (the nested loop at line 10), but I still have no > idea why this plan is getting chosen.... > looks like your stats are incorrect on the sparc. Did you forget to run vacuum analyze on it? also, do both db's have the same data loaded? there are some very different numbers in terms of actual rows floating around there... -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
I've got pg_autovacuum running on both platforms. I've verified that the tables involved in the query have the same number of rows on both databases. I'm not sure where to look to see how the stats might be different. The "good" database's pg_statistic table has 24 more rows than that in the "bad" database, so there's definitely a difference. The good database's pg_statistic has rows for 2 extra tables, but they are not tables involved in the query in question... So something must be up with stats, but can you tell me what the most signicant columns in the pg_statistic table are for the planner making its decision? I'm sure this has been discussed before, so if there's a thread you can point me to, that would be great - I realize it's a big general question. Thanks for your time. - DAP >-----Original Message----- >From: Jeff [mailto:threshar@torgo.978.org] >Sent: Wednesday, November 17, 2004 9:01 AM >To: David Parker >Cc: Russell Smith; pgsql-performance@postgresql.org >Subject: Re: [PERFORM] query plan question > > >On Nov 17, 2004, at 7:32 AM, David Parker wrote: > >> Oh, I didn't realize that analyze gave that much more info. >I've got a >> lot to learn about this tuning stuff ;-) >> >> I've attached the output. I see from the new output where the slow >> query is taking its time (the nested loop at line 10), but I still >> have no idea why this plan is getting chosen.... >> > >looks like your stats are incorrect on the sparc. >Did you forget to run vacuum analyze on it? > >also, do both db's have the same data loaded? >there are some very different numbers in terms of actual rows floating >around there... > >-- >Jeff Trout <jeff@jefftrout.com> >http://www.jefftrout.com/ >http://www.stuarthamm.net/ > >
Hmm, I'm really a beginner at this... It turns out that the pg_statistic table in my good database has records in it for the tables in the query, while the pg_statistic table in my bad database has no records for those tables at all! So I guess I need to figure out why pg_autovacuum isn't analyzing those tables. - DAP >-----Original Message----- >From: David Parker >Sent: Wednesday, November 17, 2004 9:44 AM >To: 'Jeff' >Cc: Russell Smith; pgsql-performance@postgresql.org >Subject: RE: [PERFORM] query plan question > >I've got pg_autovacuum running on both platforms. I've >verified that the tables involved in the query have the same >number of rows on both databases. > >I'm not sure where to look to see how the stats might be >different. The "good" database's pg_statistic table has 24 >more rows than that in the "bad" database, so there's >definitely a difference. The good database's pg_statistic has >rows for 2 extra tables, but they are not tables involved in >the query in question... > >So something must be up with stats, but can you tell me what >the most signicant columns in the pg_statistic table are for >the planner making its decision? I'm sure this has been >discussed before, so if there's a thread you can point me to, >that would be great - I realize it's a big general question. > >Thanks for your time. > >- DAP > >>-----Original Message----- >>From: Jeff [mailto:threshar@torgo.978.org] >>Sent: Wednesday, November 17, 2004 9:01 AM >>To: David Parker >>Cc: Russell Smith; pgsql-performance@postgresql.org >>Subject: Re: [PERFORM] query plan question >> >> >>On Nov 17, 2004, at 7:32 AM, David Parker wrote: >> >>> Oh, I didn't realize that analyze gave that much more info. >>I've got a >>> lot to learn about this tuning stuff ;-) >>> >>> I've attached the output. I see from the new output where the slow >>> query is taking its time (the nested loop at line 10), but I still >>> have no idea why this plan is getting chosen.... >>> >> >>looks like your stats are incorrect on the sparc. >>Did you forget to run vacuum analyze on it? >> >>also, do both db's have the same data loaded? >>there are some very different numbers in terms of actual rows >floating >>around there... >> >>-- >>Jeff Trout <jeff@jefftrout.com> >>http://www.jefftrout.com/ >>http://www.stuarthamm.net/ >> >>
"David Parker" <dparker@tazznetworks.com> writes: > So I guess I need to figure out why pg_autovacuum isn't analyzing those > tables. Which autovacuum version are you using? The early releases had some nasty bugs that would allow it to skip tables sometimes. I think all the known problems are fixed as of recent 7.4.x updates. regards, tom lane
We're using postgresql 7.4.5. I've only recently put pg_autovacuum in place as part of our installation, and I'm basically taking the defaults. I doubt it's a problem with autovacuum itself, but rather with my configuration of it. I have some reading to do, so any pointers to existing autovacuum threads would be greatly appreciated! Thanks. - DAP >-----Original Message----- >From: Tom Lane [mailto:tgl@sss.pgh.pa.us] >Sent: Wednesday, November 17, 2004 10:46 AM >To: David Parker >Cc: Jeff; Russell Smith; pgsql-performance@postgresql.org >Subject: Re: [PERFORM] query plan question > >"David Parker" <dparker@tazznetworks.com> writes: >> So I guess I need to figure out why pg_autovacuum isn't analyzing >> those tables. > >Which autovacuum version are you using? The early releases >had some nasty bugs that would allow it to skip tables >sometimes. I think all the known problems are fixed as of >recent 7.4.x updates. > > regards, tom lane >
David Parker wrote: >We're using postgresql 7.4.5. I've only recently put pg_autovacuum in >place as part of our installation, and I'm basically taking the >defaults. I doubt it's a problem with autovacuum itself, but rather with >my configuration of it. I have some reading to do, so any pointers to >existing autovacuum threads would be greatly appreciated! > Well the first thing to do is increase the verbosity of the pg_autovacuum logging output. If you use -d2 or higher, pg_autovacuum will print out a lot of detail on what it thinks the thresholds are and why it is or isn't performing vacuums and analyzes. Attach some of the log and I'll take a look at it.
Well based on the autovacuum log that you attached, all of those tables are insert only (at least during the time period included in the log. Is that correct? If so, autovacuum will never do a vacuum (unless required by xid wraparound issues) on those tables. So this doesn't appear to be an autovacuum problem. I'm not sure about the missing pg_statistic entries anyone else care to field that one? Matthew David Parker wrote: >Thanks. The tables I'm concerned with are named: 'schema', 'usage', >'usageparameter', and 'flow'. It looks like autovacuum is performing >analyzes: > >% grep "Performing: " logs/.db.tazz.vacuum.log >[2004-11-17 12:05:58 PM] Performing: ANALYZE >"public"."scriptlibrary_library" >[2004-11-17 12:15:59 PM] Performing: ANALYZE >"public"."scriptlibraryparm" >[2004-11-17 12:15:59 PM] Performing: ANALYZE "public"."usageparameter" >[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageproperty" >[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."route" >[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageparameter" >[2004-11-17 12:21:00 PM] Performing: ANALYZE >"public"."scriptlibrary_library" >[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usage" >[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usageparameter" >[2004-11-17 12:31:04 PM] Performing: ANALYZE "public"."usageproperty" >[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."route" >[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."service_usage" >[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."usageparameter" > >But when I run the following: > >select * from pg_statistic where starelid in >(select oid from pg_class where relname in >('schema','usageparameter','flow','usage')) > >it returns no records. Shouldn't it? It doesn't appear to be doing a >vacuum anywhere, which makes sense because none of these tables have >over the default threshold of 1000. Are there statistics which only get >generated by vacuum? > >I've attached a gzip of the pg_autovacuum log file, with -d 3. > >Thanks again. > >- DAP > > > > >>-----Original Message----- >>From: Matthew T. O'Connor [mailto:matthew@zeut.net] >>Sent: Wednesday, November 17, 2004 11:41 AM >>To: David Parker >>Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance@postgresql.org >>Subject: Re: [PERFORM] query plan question >> >>David Parker wrote: >> >> >> >>>We're using postgresql 7.4.5. I've only recently put pg_autovacuum in >>>place as part of our installation, and I'm basically taking the >>>defaults. I doubt it's a problem with autovacuum itself, but rather >>>with my configuration of it. I have some reading to do, so >>> >>> >>any pointers >> >> >>>to existing autovacuum threads would be greatly appreciated! >>> >>> >>> >>Well the first thing to do is increase the verbosity of the >>pg_autovacuum logging output. If you use -d2 or higher, >>pg_autovacuum will print out a lot of detail on what it thinks >>the thresholds are and >>why it is or isn't performing vacuums and analyzes. Attach >>some of the >>log and I'll take a look at it. >> >> >>
What I think is happening with the missing pg_statistic entries: The install of our application involves a lot of data importing (via JDBC) in one large transaction, which can take up to 30 minutes. (I realize I left out this key piece of info in my original post...) The pg_autovacuum logic is relying on data from pg_stat_all_tables to make the decision about running analyze. As far as I can tell, the data in this view gets updated outside of the transaction, because I saw the numbers growing while I was importing. I saw pg_autovacuum log messages for running analyze on several tables, but no statistics data showed up for these, I assume because the actual data in the table wasn't yet visible to pg_autovacuum because the import transaction had not finished yet. When the import finished, not all of the tables affected by the import were re-visited because they had not bumped up over the threshold again, even though the analyze run for those tables had not generated any stats because of the still-open transaction. Am I making the correct assumptions about the way the various pieces work? Does this scenario make sense? It's easy enough for us to kick off a vacuum/analyze at the end of a long import - but this "mysterious" behavior was bugging me! Thanks. - DAP >-----Original Message----- >From: Matthew T. O'Connor [mailto:matthew@zeut.net] >Sent: Wednesday, November 17, 2004 2:02 PM >To: David Parker >Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance@postgresql.org >Subject: Re: [PERFORM] query plan question > >Well based on the autovacuum log that you attached, all of >those tables >are insert only (at least during the time period included in >the log. >Is that correct? If so, autovacuum will never do a vacuum >(unless required by xid wraparound issues) on those tables. >So this doesn't appear to be an autovacuum problem. I'm not >sure about the missing pg_statistic entries anyone else care >to field that one? > >Matthew > > >David Parker wrote: > >>Thanks. The tables I'm concerned with are named: 'schema', 'usage', >>'usageparameter', and 'flow'. It looks like autovacuum is performing >>analyzes: >> >>% grep "Performing: " logs/.db.tazz.vacuum.log >>[2004-11-17 12:05:58 PM] Performing: ANALYZE >>"public"."scriptlibrary_library" >>[2004-11-17 12:15:59 PM] Performing: ANALYZE >>"public"."scriptlibraryparm" >>[2004-11-17 12:15:59 PM] Performing: ANALYZE "public"."usageparameter" >>[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageproperty" >>[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."route" >>[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageparameter" >>[2004-11-17 12:21:00 PM] Performing: ANALYZE >>"public"."scriptlibrary_library" >>[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usage" >>[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usageparameter" >>[2004-11-17 12:31:04 PM] Performing: ANALYZE "public"."usageproperty" >>[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."route" >>[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."service_usage" >>[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."usageparameter" >> >>But when I run the following: >> >>select * from pg_statistic where starelid in (select oid from >pg_class >>where relname in >>('schema','usageparameter','flow','usage')) >> >>it returns no records. Shouldn't it? It doesn't appear to be doing a >>vacuum anywhere, which makes sense because none of these tables have >>over the default threshold of 1000. Are there statistics >which only get >>generated by vacuum? >> >>I've attached a gzip of the pg_autovacuum log file, with -d 3. >> >>Thanks again. >> >>- DAP >> >> >> >> >>>-----Original Message----- >>>From: Matthew T. O'Connor [mailto:matthew@zeut.net] >>>Sent: Wednesday, November 17, 2004 11:41 AM >>>To: David Parker >>>Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance@postgresql.org >>>Subject: Re: [PERFORM] query plan question >>> >>>David Parker wrote: >>> >>> >>> >>>>We're using postgresql 7.4.5. I've only recently put >pg_autovacuum in >>>>place as part of our installation, and I'm basically taking the >>>>defaults. I doubt it's a problem with autovacuum itself, but rather >>>>with my configuration of it. I have some reading to do, so >>>> >>>> >>>any pointers >>> >>> >>>>to existing autovacuum threads would be greatly appreciated! >>>> >>>> >>>> >>>Well the first thing to do is increase the verbosity of the >>>pg_autovacuum logging output. If you use -d2 or higher, >pg_autovacuum >>>will print out a lot of detail on what it thinks the thresholds are >>>and >>>why it is or isn't performing vacuums and analyzes. Attach >>>some of the >>>log and I'll take a look at it. >>> >>> >>> > >