Thread: Performance Anomalies in 7.4.5
I'm seeing some weird behavior on a repurposed server that was wiped clean and set up to run as a database and application server with postgres and Apache, as well as some command-line PHP scripts. The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody GNU/Linux (2.6.2) system. postgres is crawling on some fairly routine queries. I'm wondering if this could somehow be related to the fact that this isn't a database-only server, but Apache is not really using any resources when postgres slows to a crawl. Here's an example of analysis of a recent query: EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id) FROM userdata as u, userdata_history as h WHERE h.id = '18181' AND h.id = u.id; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ Aggregate (cost=0.02..0.02 rows=1 width=8) (actual time=298321.421..298321.422 rows=1 loops=1) -> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual time=1.771..298305.531 rows=2452 loops=1) Join Filter: ("inner".id = "outer".id) -> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) (actual time=0.026..11.869 rows=2452 loops=1) -> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 width=8) (actual time=0.005..70.519 rows=41631 loops=2452) Filter: (id = 18181::bigint) Total runtime: 298321.926 ms (7 rows) userdata has a primary/foreign key on id, which references userdata_history.id, which is a primary key. At the time of analysis, the userdata table had < 2,500 rows. userdata_history had < 50,000 rows. I can't imagine how even a seq scan could result in a runtime of nearly 5 minutes in these circumstances. Also, doing a count( * ) from each table individually returns nearly instantly. I can provide details of postgresql.conf and kernel settings if necessary, but I'm using some pretty well tested settings that I use any time I admin a postgres installation these days based on box resources and database size. I'm more interested in knowing if there are any bird's eye details I should be checking immediately. Thanks. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005
I know, I know: I should've done this before I posted. REINDEXing and VACUUMing mostly fixed this problem. Which gets me back to where I was yesterday, reviewing an import process (that existed previously) that populates tables in this system that seems to allow small data sets to cause simple queries like this to crawl. Is there anything about general COPY/INSERT activity that can cause small data sets to become so severely slow in postgres that can be prevented other than being diligent about VACUUMing? I was hoping that pg_autovacuum along with post-import manual VACUUMs would be sufficient, but it doesn't seem to be the case necessarily. Granted, I haven't done a methodical and complete review of the process, but I'm still surprised at how quickly it seems able to debilitate postgres with even small amounts of data. I had a similar situation crawl yesterday based on a series of COPYs involving 5 rows! As in, can I look for something to treat the cause rather than the symptoms? If not, should I be REINDEXing manually, as well as VACUUMing manually after large data imports (whether via COPY or INSERT)? Or will a VACUUM FULL ANALYZE be enough? Thanks! -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Oct 21, 2004, at 3:36 PM, Thomas F.O'Connell wrote: > I'm seeing some weird behavior on a repurposed server that was wiped > clean and set up to run as a database and application server with > postgres and Apache, as well as some command-line PHP scripts. > > The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody > GNU/Linux (2.6.2) system. > > postgres is crawling on some fairly routine queries. I'm wondering if > this could somehow be related to the fact that this isn't a > database-only server, but Apache is not really using any resources > when postgres slows to a crawl. > > Here's an example of analysis of a recent query: > > EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id) > FROM userdata as u, userdata_history as h > WHERE h.id = '18181' > AND h.id = u.id; > > QUERY PLAN > ----------------------------------------------------------------------- > ----------------------------------------------------------------------- > -- > Aggregate (cost=0.02..0.02 rows=1 width=8) (actual > time=298321.421..298321.422 rows=1 loops=1) > -> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual > time=1.771..298305.531 rows=2452 loops=1) > Join Filter: ("inner".id = "outer".id) > -> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) > (actual time=0.026..11.869 rows=2452 loops=1) > -> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 > width=8) (actual time=0.005..70.519 rows=41631 loops=2452) > Filter: (id = 18181::bigint) > Total runtime: 298321.926 ms > (7 rows) > > userdata has a primary/foreign key on id, which references > userdata_history.id, which is a primary key. > > At the time of analysis, the userdata table had < 2,500 rows. > userdata_history had < 50,000 rows. I can't imagine how even a seq > scan could result in a runtime of nearly 5 minutes in these > circumstances. > > Also, doing a count( * ) from each table individually returns nearly > instantly. > > I can provide details of postgresql.conf and kernel settings if > necessary, but I'm using some pretty well tested settings that I use > any time I admin a postgres installation these days based on box > resources and database size. I'm more interested in knowing if there > are any bird's eye details I should be checking immediately. > > Thanks. > > -tfo > > -- > Thomas F. O'Connell > Co-Founder, Information Architect > Sitening, LLC > http://www.sitening.com/ > 110 30th Avenue North, Suite 6 > Nashville, TN 37203-6320 > 615-260-0005 > > > ---------------------------(end of > broadcast)--------------------------- > TIP 8: explain analyze is your friend
On Thu, 21 Oct 2004, Thomas F.O'Connell wrote: > Aggregate (cost=0.02..0.02 rows=1 width=8) (actual > time=298321.421..298321.422 rows=1 loops=1) > -> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual > time=1.771..298305.531 rows=2452 loops=1) > Join Filter: ("inner".id = "outer".id) > -> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) > (actual time=0.026..11.869 rows=2452 loops=1) > -> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 > width=8) (actual time=0.005..70.519 rows=41631 loops=2452) > Filter: (id = 18181::bigint) It looks like you have not run ANALYZE recently. Most people run VACUUM ANALYZE every night (or similar) in a cron job. -- /Dennis Björklund
The irony is that I had just disabled pg_autovacuum the previous day during analysis of a wider issue affecting imports of data into the system. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Oct 21, 2004, at 4:05 PM, Dennis Bjorklund wrote: > On Thu, 21 Oct 2004, Thomas F.O'Connell wrote: > >> Aggregate (cost=0.02..0.02 rows=1 width=8) (actual >> time=298321.421..298321.422 rows=1 loops=1) >> -> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual >> time=1.771..298305.531 rows=2452 loops=1) >> Join Filter: ("inner".id = "outer".id) >> -> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) >> (actual time=0.026..11.869 rows=2452 loops=1) >> -> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 >> width=8) (actual time=0.005..70.519 rows=41631 loops=2452) >> Filter: (id = 18181::bigint) > > It looks like you have not run ANALYZE recently. Most people run VACUUM > ANALYZE every night (or similar) in a cron job. > > -- > /Dennis Björklund
"Thomas F.O'Connell" <tfo@sitening.com> writes: > -> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual > time=1.771..298305.531 rows=2452 loops=1) > Join Filter: ("inner".id = "outer".id) > -> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) > (actual time=0.026..11.869 rows=2452 loops=1) > -> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 > width=8) (actual time=0.005..70.519 rows=41631 loops=2452) > Filter: (id = 18181::bigint) > Total runtime: 298321.926 ms > (7 rows) What's killing you here is that the planner thinks these tables are completely empty (notice the zero cost estimates, which implies the table has zero blocks --- the fact that the rows estimate is 1 and not 0 is the result of sanity-check clamping inside costsize.c). This leads it to choose a nestloop, which would be the best plan if there were only a few rows involved, but it degenerates rapidly when there are not. It's easy to fall into this trap when truncating and reloading tables; all you need is an "analyze" while the table is empty. The rule of thumb is to analyze just after you reload the table, not just before. I'm getting more and more convinced that we need to drop the reltuples and relpages entries in pg_class, in favor of checking the physical table size whenever we make a plan. We could derive the tuple count estimate by having ANALYZE store a tuples-per-page estimate in pg_class and then multiply by the current table size; tuples-per-page should be a much more stable figure than total tuple count. One drawback to this is that it would require an additional lseek per table while planning, but that doesn't seem like a huge penalty. Probably the most severe objection to doing things this way is that the selected plan could change unexpectedly as a result of the physical table size changing. Right now the DBA can keep tight rein on actions that might affect plan selection (ie, VACUUM and ANALYZE), but that would go by the board with this. OTOH, we seem to be moving towards autovacuum, which also takes away any guarantees in this department. In any case this is speculation for 8.1; I think it's too late for 8.0. regards, tom lane
On 21 Oct 2004 at 15:50, Thomas F.O'Connell wrote: > If not, should I be REINDEXing manually, as well as VACUUMing manually > after large data imports (whether via COPY or INSERT)? Or will a VACUUM > FULL ANALYZE be enough? > It's not the vacuuming that's important here, just the analyze. If you import any data into a table, Postgres often does not *know* that until you gather the statistics on the table. You are simply running into the problem of the planner not knowing how much data/distribution of data in your tables. If you have large imports it may be faster overall to drop the indexes first, then insert the data, then put the indexes back on, then analyze. Cheers, Gary.
> Probably the most severe objection to doing things this way is that the > selected plan could change unexpectedly as a result of the physical > table size changing. Right now the DBA can keep tight rein on actions > that might affect plan selection (ie, VACUUM and ANALYZE), but that > would go by the board with this. OTOH, we seem to be moving towards > autovacuum, which also takes away any guarantees in this department. But aren't we requiring that we can disable autovacuum on some tables? I've actually used, more than once, the finger-on-the-scale method of thumping values in pg_class when I had a pretty good idea of how the table was changing, particularly when it would change in such a way as to confuse the planner. There are still enough cases where the planner doesn't quite get things right that I'd really prefer the ability to give it clues, at least indirectly. I can't imagine that there's going to be a lot of enthusiasm for hints, so anything that isn't a sure-fire planner helper is a potential loss, at least to me. A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Andrew Sullivan wrote: >>Probably the most severe objection to doing things this way is that the >>selected plan could change unexpectedly as a result of the physical >>table size changing. Right now the DBA can keep tight rein on actions >>that might affect plan selection (ie, VACUUM and ANALYZE), but that >>would go by the board with this. OTOH, we seem to be moving towards >>autovacuum, which also takes away any guarantees in this department. >> >> > >But aren't we requiring that we can disable autovacuum on some >tables? > Yes that is the long term goal, but the autovac in 8.0 is still all or nothing.
On Fri, Oct 22, 2004 at 05:13:18PM -0400, Matthew T. O'Connor wrote: > Yes that is the long term goal, but the autovac in 8.0 is still all or > nothing. Yes, which is why I couldn't use the current iteration for production: the cost is too high. I think this re-inforces my original point, which is that taking away the ability of DBAs to thump the planner for certain tables -- even indirectly -- under certain pathological conditions is crucial for production work. In the ideal world, the wizards and genius planners and such like would work perfectly, and the DBA would never have to intervene. In practice, there are cases when you need to haul on a knob or two. While this doesn't mean that we should adopt the Oracle approach of having knobs which adjust the sensitivity of the knob that tunes the main knob-tuner, I'm still pretty leery of anything which smacks of completely locking the DBA's knowledge out of the system. A -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner
Thomas F.O'Connell wrote: > I'm seeing some weird behavior on a repurposed server that was wiped > clean and set up to run as a database and application server with > postgres and Apache, as well as some command-line PHP scripts. > > The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody > GNU/Linux (2.6.2) system. > > postgres is crawling on some fairly routine queries. I'm wondering if > this could somehow be related to the fact that this isn't a > database-only server, but Apache is not really using any resources when > postgres slows to a crawl. > > Here's an example of analysis of a recent query: > > EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id) > FROM userdata as u, userdata_history as h > WHERE h.id = '18181' > AND h.id = u.id; > > QUERY PLAN > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > Aggregate (cost=0.02..0.02 rows=1 width=8) (actual > time=298321.421..298321.422 rows=1 loops=1) > -> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual > time=1.771..298305.531 rows=2452 loops=1) > Join Filter: ("inner".id = "outer".id) > -> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) > (actual time=0.026..11.869 rows=2452 loops=1) > -> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 > width=8) (actual time=0.005..70.519 rows=41631 loops=2452) > Filter: (id = 18181::bigint) > Total runtime: 298321.926 ms > (7 rows) > > userdata has a primary/foreign key on id, which references > userdata_history.id, which is a primary key. > > At the time of analysis, the userdata table had < 2,500 rows. > userdata_history had < 50,000 rows. I can't imagine how even a seq scan > could result in a runtime of nearly 5 minutes in these circumstances. > > Also, doing a count( * ) from each table individually returns nearly > instantly. > > I can provide details of postgresql.conf and kernel settings if > necessary, but I'm using some pretty well tested settings that I use > any time I admin a postgres installation these days based on box > resources and database size. I'm more interested in knowing if there > are any bird's eye details I should be checking immediately. > > Thanks. > > -tfo > > -- > Thomas F. O'Connell > Co-Founder, Information Architect > Sitening, LLC > http://www.sitening.com/ > 110 30th Avenue North, Suite 6 > Nashville, TN 37203-6320 > 615-260-0005 Is your enable_seqscan set to true? Try it after issuing set enable_seqscan to off;
This topic probably available in 8.x will be very usefull for people just using postgresql as a "normal" Database user. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane Sent: jeudi 21 octobre 2004 23:53 To: Thomas F.O'Connell Cc: PgSQL - Performance Subject: Re: [PERFORM] Performance Anomalies in 7.4.5 "Thomas F.O'Connell" <tfo@sitening.com> writes: > -> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual > time=1.771..298305.531 rows=2452 loops=1) > Join Filter: ("inner".id = "outer".id) > -> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) > (actual time=0.026..11.869 rows=2452 loops=1) > -> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 > width=8) (actual time=0.005..70.519 rows=41631 loops=2452) > Filter: (id = 18181::bigint) > Total runtime: 298321.926 ms > (7 rows) What's killing you here is that the planner thinks these tables are completely empty (notice the zero cost estimates, which implies the table has zero blocks --- the fact that the rows estimate is 1 and not 0 is the result of sanity-check clamping inside costsize.c). This leads it to choose a nestloop, which would be the best plan if there were only a few rows involved, but it degenerates rapidly when there are not. It's easy to fall into this trap when truncating and reloading tables; all you need is an "analyze" while the table is empty. The rule of thumb is to analyze just after you reload the table, not just before. I'm getting more and more convinced that we need to drop the reltuples and relpages entries in pg_class, in favor of checking the physical table size whenever we make a plan. We could derive the tuple count estimate by having ANALYZE store a tuples-per-page estimate in pg_class and then multiply by the current table size; tuples-per-page should be a much more stable figure than total tuple count. One drawback to this is that it would require an additional lseek per table while planning, but that doesn't seem like a huge penalty. Probably the most severe objection to doing things this way is that the selected plan could change unexpectedly as a result of the physical table size changing. Right now the DBA can keep tight rein on actions that might affect plan selection (ie, VACUUM and ANALYZE), but that would go by the board with this. OTOH, we seem to be moving towards autovacuum, which also takes away any guarantees in this department. In any case this is speculation for 8.1; I think it's too late for 8.0. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Tom, > One drawback to this is that it would require an additional lseek per table > while planning, but that doesn't seem like a huge penalty. Hmmm ... would the additional lseek take longer for larger tables, or would it be a fixed cost? -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> One drawback to this is that it would require an additional lseek per table >> while planning, but that doesn't seem like a huge penalty. > Hmmm ... would the additional lseek take longer for larger tables, or would it > be a fixed cost? Should be pretty much a fixed cost: one kernel call per table. regards, tom lane
On Thu, 2004-10-28 at 12:31, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> One drawback to this is that it would require an additional lseek per table > >> while planning, but that doesn't seem like a huge penalty. > > > Hmmm ... would the additional lseek take longer for larger tables, or would it > > be a fixed cost? > > Should be pretty much a fixed cost: one kernel call per table. Is this something that the bgwriter could periodically do and share the data? Possibly in the future it could even force a function or prepared statement recompile if the data has changed significantly?
Rod Taylor <pg@rbt.ca> writes: > On Thu, 2004-10-28 at 12:31, Tom Lane wrote: >> Should be pretty much a fixed cost: one kernel call per table. > Is this something that the bgwriter could periodically do and share the > data? I think a kernel call would be cheaper. regards, tom lane