Thread: Postgres query completion status?
Dear All, I've just joined this list, so let me first thank you in advance for your hospitality. I'm having lots of trouble with variously slow running queries on a production system. I've tried all the "obvious" fixes: changing the query planner, checking for indexing, autovacuum, making sure the thing has heaps of memory (20GB), running on solid state disks etc. 1. Is there any way to debug or trace a running query? I think I have all the logging options turned on, but I'd like to see something like: "Currently reading 3452 rows from table_x, at 0.2 us per row" or whatever, being really, really verbose in the logfiles. Likewise, is there any way to check whether, for example, postgres is running out of work memory? 2. Is there any way, whatsoever, to get any kind of "progress bar" for a running query? I know these things have terrible faults, but anything monotonic would be better than running completely blind. [There's a wonderful paper here: http://pages.cs.wisc.edu/~naughton/includes/papers/multiquery.pdf which seems to have got 90% of the way there, but appears to have been abandoned as it didn't get all 100% of the way] The operations people in the warehouse are currently going crazy because we can't ever answer the question "when will this query complete?". I know it's hard to do accurately, but knowing the difference between "5 seconds to go" and "5 hours to go" would be so fantastically useful. Thanks, Richard P.S. Sometimes, some queries seem to benefit from being cancelled and then immediately re-started. As there are being run in a transaction, I can't see how this could make a difference. Am I missing anything obvious? Occasionally, a re-start of postgresql-8.4l itself seems to help.
Richard -- You might post the results of "EXPLAIN ANALYZE <your SQL here>;" ... be sure to run it in a transaction if you want to beable roll it back. Perhaps try "EXPLAIN <your SQL>;" first as it is faster, but EXPLAIN ANALYZE shows what the planneris doing. You wrote: > > P.S. Sometimes, some queries seem to benefit from being cancelled and then immediately > re-started. As there are being run in a transaction, I can't see how this could make a difference. > Am I missing anything obvious? Occasionally, a re-start of postgresql-8.4l itself seems to help. This may be the result of caching of the desired rows, either by PostgreSQL or by your operating system. The rollback wouldn'teffect this -- the rows are already in memory and not on disk waiting to be grabbed -- much faster on all subsequentqueries. HTH, Greg Williamson -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Thanks for your help. This issue splits into 2 bits: 1. Fixing specific queries. 2. Finding out when a specific running query is going to complete. (At the moment, this is the bit I really need to know). Greg Williamson wrote: > Richard -- > > You might post the results of "EXPLAIN ANALYZE <your SQL here>;" ... be sure to run it in a transaction if you want tobe able roll it back. Perhaps try "EXPLAIN <your SQL>;" first as it is faster, but EXPLAIN ANALYZE shows what the planneris doing. > The offending query (simplified to just do a select - which is the slow bit) is: ------------- SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining ) FROM core.demand, viwcs.previous_wave LEFT OUTER JOIN viwcs.wave_end_demand USING ( wid ) WHERE core.demand.id = viwcs.wave_end_demand.demand_id; ------------ Over the last few weeks, this has gradually slowed down from 6 minutes to about 6.5, then last night it took 25, and today it's taken an hour already and still not completed. The system hasn't been doing anything special in the last 2 days. Here's EXPLAIN (Explain analyze will take too long!) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=885367.03..1123996.87 rows=8686 width=12) -> Merge Join (cost=885367.03..1115452.17 rows=8688 width=16) Merge Cond: ((core.demand.target_id = wave_genreorders_map.target_id) AND (core.demand.material_id = core.material.id)) -> Index Scan using demand_target_id_key on demand (cost=0.00..186520.46 rows=3800715 width=24) -> Sort (cost=885364.61..893425.30 rows=3224275 width=24) Sort Key: wave_genreorders_map.target_id, core.material.id -> Hash Join (cost=511934.12..536811.73 rows=3224275 width=24) Hash Cond: (core.material.tag = (product_info_sku.sid)::text) -> Append (cost=0.00..10723.27 rows=689377 width=28) -> Seq Scan on material (cost=0.00..5474.75 rows=397675 width=21) -> Seq Scan on container material (cost=0.00..5248.52 rows=291702 width=37) -> Hash (cost=506657.25..506657.25 rows=422149 width=42) -> Hash Join (cost=474610.85..506657.25 rows=422149 width=42) Hash Cond: ((wave_gol.sid)::text = (product_info_sku.sid)::text) -> Merge Left Join (cost=463919.35..487522.78 rows=422149 width=29) Merge Cond: (((wave_gol.wid)::text = (du_report_sku.wid)::text) AND ((wave_gol.storeorderid)::text = (du_report_sku.storeorderid)::text) AND ((wave_gol.genreorderid)::text = (du_report_sku.genreorderid)::text)) Join Filter: ((wave_gol.sid)::text = (du_report_sku.sid)::text) -> Merge Join (cost=183717.70..197229.24 rows=422149 width=44) Merge Cond: (((wave_genreorders_map.wid)::text = (wave_gol.wid)::text) AND ((wave_genreorders_map.storeorderid)::text = (wave_gol.storeorderid)::text) AND ((wave_genreorders_map.genreorderid)::text = (wave_gol.genreorderid)::text)) -> Index Scan using "wave_genreorders_map_ERR_GENREORDERID_EXISTS" on wave_genreorders_map (cost=0.00..4015.36 rows=116099 width=27) -> Sort (cost=183717.70..184818.90 rows=440483 width=47) Sort Key: wave_gol.wid, wave_gol.storeorderid, wave_gol.genreorderid -> Nested Loop (cost=9769.36..142425.22 rows=440483 width=47) -> Index Scan using "wave_rxw_ERR_WID_EXISTS" on wave_rxw (cost=0.00..7.08 rows=1 width=11) Filter: is_previous -> Bitmap Heap Scan on wave_gol (cost=9769.36..136912.11 rows=440483 width=36) Recheck Cond: ((wave_gol.wid)::text = (wave_rxw.wid)::text) -> Bitmap Index Scan on "wave_gol_ERR_SID_EXISTS" (cost=0.00..9659.24 rows=440483 width=0) Index Cond: ((wave_gol.wid)::text = (wave_rxw.wid)::text) -> Sort (cost=280201.66..281923.16 rows=688602 width=300) Sort Key: du_report_sku.wid, du_report_sku.storeorderid, du_report_sku.genreorderid -> HashAggregate (cost=197936.75..206544.27 rows=688602 width=36) -> Seq Scan on du_report_sku (cost=0.00..111861.61 rows=6886011 width=36) -> Hash (cost=5681.22..5681.22 rows=400822 width=13) -> Seq Scan on product_info_sku (cost=0.00..5681.22 rows=400822 width=13) -> Index Scan using demand_pkey on demand (cost=0.00..0.97 rows=1 width=12) Index Cond: (core.demand.id = core.demand.id) (37 rows) -------------------------------------------------- > You wrote: > > > >> P.S. Sometimes, some queries seem to benefit from being cancelled and then immediately >> re-started. As there are being run in a transaction, I can't see how this could make a difference. >> Am I missing anything obvious? Occasionally, a re-start of postgresql-8.4l itself seems to help. > > This may be the result of caching of the desired rows, either by PostgreSQL or by your operating system. The rollback wouldn'teffect this -- the rows are already in memory and not on disk waiting to be grabbed -- much faster on all subsequent queries. Yes...but the data should already be in RAM. We've got 20 GB of it, (Postgres is given 5GB, and the effective_cache_size is 10GB); the dataset size for the relevant part should only be about 100 MB at the most. Also we're using solid-state disks (Intel X25-E), and iotop shows that the disk access rate isn't the problem; the CPU is pegged at 100% though. It seems to be that the query-planner is doing something radically different. Richard
> Greg Williamson wrote: >> Richard -- >> >> You might post the results of "EXPLAIN ANALYZE <your SQL here>;" ... >> be sure to run it in a transaction if you want to be able roll it >> back. Perhaps try "EXPLAIN <your SQL>;" first as it is faster, but >> EXPLAIN ANALYZE shows what the planner is doing. >> > Is there any way I can gather some information by tracing the query that's currently actually running? strace doesn't help much, but is there anything else I can do? As far as I know, the only tools that exist are pg_stat_activity, top, and iotop Have I missed one? Thanks, Richard
> > Greg Williamson wrote: >> Richard -- >> >> You might post the results of "EXPLAIN ANALYZE <your SQL here>;" ... >> be sure to run it in a transaction if you want to be able roll it >> back. Perhaps try "EXPLAIN <your SQL>;" first as it is faster, but >> EXPLAIN ANALYZE shows what the planner is doing. Here's something very very odd. Explain Analyze has now run, in about 4 minutes. (result below) However, I'd be willing to swear that the last time I ran explain on this query about half an hour ago, the final 2 lines were sequential scans. So, I've just terminated the real job (which uses this select for an update) after 77 minutes of fruitless cpu-hogging, and re-started it.... ...This time, the same job ran through in 24 minutes. [This is running exactly the same transaction on exactly the same data!] Richard ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=885367.03..1123996.87 rows=8686 width=12) (actual time=248577.879..253168.466 rows=347308 loops=1) -> Merge Join (cost=885367.03..1115452.17 rows=8688 width=16) (actual time=248577.834..252092.536 rows=347308 loops=1) Merge Cond: ((core.demand.target_id = wave_genreorders_map.target_id) AND (core.demand.material_id = core.material.id)) -> Index Scan using demand_target_id_key on demand (cost=0.00..186520.46 rows=3800715 width=24) (actual time=0.031..2692.661 rows=3800715 loops=1) -> Sort (cost=885364.61..893425.30 rows=3224275 width=24) (actual time=248577.789..248659.751 rows=347308 loops=1) Sort Key: wave_genreorders_map.target_id, core.material.id Sort Method: quicksort Memory: 39422kB -> Hash Join (cost=511934.12..536811.73 rows=3224275 width=24) (actual time=247444.988..248263.151 rows=347308 loops=1) Hash Cond: (core.material.tag = (product_info_sku.sid)::text) -> Append (cost=0.00..10723.27 rows=689377 width=28) (actual time=0.008..177.076 rows=690647 loops=1) -> Seq Scan on material (cost=0.00..5474.75 rows=397675 width=21) (actual time=0.008..59.234 rows=395551 loops=1) -> Seq Scan on container material (cost=0.00..5248.52 rows=291702 width=37) (actual time=0.008..52.844 rows=295096 loops=1) -> Hash (cost=506657.25..506657.25 rows=422149 width=42) (actual time=247444.555..247444.555 rows=347308 loops=1) -> Hash Join (cost=474610.85..506657.25 rows=422149 width=42) (actual time=182224.904..247282.711 rows=347308 loops=1) Hash Cond: ((wave_gol.sid)::text = (product_info_sku.sid)::text) -> Merge Left Join (cost=463919.35..487522.78 rows=422149 width=29) (actual time=182025.065..246638.762 rows=347308 loops=1) Merge Cond: (((wave_gol.wid)::text = (du_report_sku.wid)::text) AND ((wave_gol.storeorderid)::text = (du_report_sku.storeorderid)::text) AND ((wave_gol.genreorderid)::text = (du_report_sku.genreorderid)::text)) Join Filter: ((wave_gol.sid)::text = (du_report_sku.sid)::text) -> Merge Join (cost=183717.70..197229.24 rows=422149 width=44) (actual time=859.551..1506.515 rows=347308 loops=1) Merge Cond: (((wave_genreorders_map.wid)::text = (wave_gol.wid)::text) AND ((wave_genreorders_map.storeorderid)::text = (wave_gol.storeorderid)::text) AND ((wave_genreorders_map.genreorderid)::text = (wave_gol.genreorderid)::text)) -> Index Scan using "wave_genreorders_map_ERR_GENREORDERID_EXISTS" on wave_genreorders_map (cost=0.00..4015.36 rows=116099 width=27) (actual time=0.018..23.599 rows=116099 loops=1) -> Sort (cost=183717.70..184818.90 rows=440483 width=47) (actual time=782.102..813.753 rows=347308 loops=1) Sort Key: wave_gol.wid, wave_gol.storeorderid, wave_gol.genreorderid Sort Method: quicksort Memory: 39422kB -> Nested Loop (cost=9769.36..142425.22 rows=440483 width=47) (actual time=33.668..138.668 rows=347308 loops=1) -> Index Scan using "wave_rxw_ERR_WID_EXISTS" on wave_rxw (cost=0.00..7.08 rows=1 width=11) (actual time=0.021..0.031 rows=1 loops=1) Filter: is_previous -> Bitmap Heap Scan on wave_gol (cost=9769.36..136912.11 rows=440483 width=36) (actual time=33.628..75.091 rows=347308 loops=1) Recheck Cond: ((wave_gol.wid)::text = (wave_rxw.wid)::text) -> Bitmap Index Scan on "wave_gol_ERR_SID_EXISTS" (cost=0.00..9659.24 rows=440483 width=0) (actual time=33.104..33.104 rows=347308 loops=1) Index Cond: ((wave_gol.wid)::text = (wave_rxw.wid)::text) -> Sort (cost=280201.66..281923.16 rows=688602 width=300) (actual time=177511.806..183486.593 rows=41317448 loops=1) Sort Key: du_report_sku.wid, du_report_sku.storeorderid, du_report_sku.genreorderid Sort Method: external sort Disk: 380768kB -> HashAggregate (cost=197936.75..206544.27 rows=688602 width=36) (actual time=7396.426..11224.839 rows=6282564 loops=1) -> Seq Scan on du_report_sku (cost=0.00..111861.61 rows=6886011 width=36) (actual time=0.006..573.419 rows=6897682 loops=1) -> Hash (cost=5681.22..5681.22 rows=400822 width=13) (actual time=199.422..199.422 rows=400737 loops=1) -> Seq Scan on product_info_sku (cost=0.00..5681.22 rows=400822 width=13) (actual time=0.004..78.357 rows=400737 loops=1) -> Index Scan using demand_pkey on demand (cost=0.00..0.97 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=347308) Index Cond: (core.demand.id = core.demand.id) Total runtime: 253455.603 ms (41 rows)
Richard Neill wrote: > As far as I know, the only tools that exist are > pg_stat_activity, top, and iotop > Have I missed one? The ui for pgTop might be easier for what you're trying to do: http://pgfoundry.org/projects/pgtop/ -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
2009/11/20 Richard Neill <rn214@cam.ac.uk>
It looks like your statistics are way out of sync with the real data.
> Nested Loop (cost=885367.03..1123996.87 rows=8686 width=12) (actual time=248577.879..253168.466 rows=347308 loops=1)
This shows that it thinks there will be 8,686 rows, but actually traverses 347,308.
Have you manually run a VACUUM on these tables? Preferrably a full one if you can. I notice that you appear ot have multiple sorts going on. Are all of those actually necessary for your output? Also consider using partial or multicolumn indexes where useful.
And which version of PostgreSQL are you using?
Thom
Here's something very very odd.
Greg Williamson wrote:Richard --
You might post the results of "EXPLAIN ANALYZE <your SQL here>;" ... be sure to run it in a transaction if you want to be able roll it back. Perhaps try "EXPLAIN <your SQL>;" first as it is faster, but EXPLAIN ANALYZE shows what the planner is doing.
Explain Analyze has now run, in about 4 minutes. (result below)
However, I'd be willing to swear that the last time I ran explain on this query about half an hour ago, the final 2 lines were sequential scans.
So, I've just terminated the real job (which uses this select for an update) after 77 minutes of fruitless cpu-hogging, and re-started it....
...This time, the same job ran through in 24 minutes.
[This is running exactly the same transaction on exactly the same data!]
Richard
It looks like your statistics are way out of sync with the real data.
> Nested Loop (cost=885367.03..1123996.87 rows=8686 width=12) (actual time=248577.879..253168.466 rows=347308 loops=1)
This shows that it thinks there will be 8,686 rows, but actually traverses 347,308.
Have you manually run a VACUUM on these tables? Preferrably a full one if you can. I notice that you appear ot have multiple sorts going on. Are all of those actually necessary for your output? Also consider using partial or multicolumn indexes where useful.
And which version of PostgreSQL are you using?
Thom
Sorry for top-posting -- challenged mail client.
Thom's suggestion that the estimates are off seems like a useful line of inquiry, but ANALYZE is what builds statistics. If it is not run often enough the planner will base its idea of what a good plan is on bad data. So ANALYZE <table name>; is your friend. You may need to change the statistics for the tables in question if there are odd distributions of data -- as Thom asked -- which version of PostgreSQL ?
Stay away from VACUUM FULL ! It will block other activity and will be horribly slow on large tables. It will get rid of bloat but there may be better ways of doing that depending on what version you are using and what you maintenance window looks like.
HTH,
Greg W.
Thom's suggestion that the estimates are off seems like a useful line of inquiry, but ANALYZE is what builds statistics. If it is not run often enough the planner will base its idea of what a good plan is on bad data. So ANALYZE <table name>; is your friend. You may need to change the statistics for the tables in question if there are odd distributions of data -- as Thom asked -- which version of PostgreSQL ?
Stay away from VACUUM FULL ! It will block other activity and will be horribly slow on large tables. It will get rid of bloat but there may be better ways of doing that depending on what version you are using and what you maintenance window looks like.
HTH,
Greg W.
From: Thom Brown <thombrown@gmail.com>
To: Richard Neill <rn214@cam.ac.uk>
Cc: Greg Williamson <gwilliamson39@yahoo.com>; pgsql-performance@postgresql.org
Sent: Fri, November 20, 2009 4:13:03 AM
Subject: Re: [PERFORM] Postgres query completion status?
2009/11/20 Richard Neill <rn214@cam.ac.uk>
It looks like your statistics are way out of sync with the real data.
> Nested Loop (cost=885367.03..1123996.87 rows=8686 width=12) (actual time=248577.879..253168.466 rows=347308 loops=1)
This shows that it thinks there will be 8,686 rows, but actually traverses 347,308.
Have you manually run a VACUUM on these tables? Preferrably a full one if you can. I notice that you appear ot have multiple sorts going on. Are all of those actually necessary for your output? Also consider using partial or multicolumn indexes where useful.
And which version of PostgreSQL are you using?
Thom
Here's something very very odd.
Greg Williamson wrote:Richard --
You might post the results of "EXPLAIN ANALYZE <your SQL here>;" ... be sure to run it in a transaction if you want to be able roll it back. Perhaps try "EXPLAIN <your SQL>;" first as it is faster, but EXPLAIN ANALYZE shows what the planner is doing.
Explain Analyze has now run, in about 4 minutes. (result below)
However, I'd be willing to swear that the last time I ran explain on this query about half an hour ago, the final 2 lines were sequential scans.
So, I've just terminated the real job (which uses this select for an update) after 77 minutes of fruitless cpu-hogging, and re-started it....
...This time, the same job ran through in 24 minutes.
[This is running exactly the same transaction on exactly the same data!]
Richard
It looks like your statistics are way out of sync with the real data.
> Nested Loop (cost=885367.03..1123996.87 rows=8686 width=12) (actual time=248577.879..253168.466 rows=347308 loops=1)
This shows that it thinks there will be 8,686 rows, but actually traverses 347,308.
Have you manually run a VACUUM on these tables? Preferrably a full one if you can. I notice that you appear ot have multiple sorts going on. Are all of those actually necessary for your output? Also consider using partial or multicolumn indexes where useful.
And which version of PostgreSQL are you using?
Thom
Richard Neill wrote: > SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining ) > FROM > core.demand, > viwcs.previous_wave > LEFT OUTER JOIN viwcs.wave_end_demand USING ( wid ) > WHERE core.demand.id = viwcs.wave_end_demand.demand_id; For comparison, how does this do?: SELECT (core.demand.qty - viwcs.wave_end_demand.qty_remaining) FROM core.demand, JOIN viwcs.previous_wave ON (core.demand.id = viwcs.wave_end_demand.demand_id) LEFT OUTER JOIN viwcs.wave_end_demand USING (wid); -Kevin
Kevin Grittner wrote: > Richard Neill wrote: > >> SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining ) >> FROM >> core.demand, >> viwcs.previous_wave >> LEFT OUTER JOIN viwcs.wave_end_demand USING ( wid ) >> WHERE core.demand.id = viwcs.wave_end_demand.demand_id; > > For comparison, how does this do?: > > SELECT (core.demand.qty - viwcs.wave_end_demand.qty_remaining) > FROM core.demand > JOIN viwcs.previous_wave > ON (core.demand.id = viwcs.wave_end_demand.demand_id) > LEFT OUTER JOIN viwcs.wave_end_demand USING (wid); > Thanks for your help, Unfortunately, it just complains: ERROR: missing FROM-clause entry for table "wave_end_demand" LINE 4: ON (core.demand.id = viwcs.wave_end_demand.demand_id) Incidentally, I don't think that this particular re-ordering will make much difference: viwcs.previous_wave is a table with a single row, and 3 columns in it. Here are the bits of schema, if they're helpful. View "viwcs.wave_end_demand" Column | Type | Modifiers ---------------+-----------------------+----------- wid | character varying(10) | storeorderid | character varying(30) | genreorderid | character varying(30) | target_id | bigint | sid | character varying(30) | material_id | bigint | demand_id | bigint | eqa | integer | aqu | bigint | qty_remaining | bigint | View definition: SELECT wave_gol.wid, wave_gol.storeorderid, wave_gol.genreorderid, wave_genreorders_map.target_id, wave_gol.sid, product_info_sku_map.material_id, demand.id AS demand_id, wave_gol.eqa, COALESCE(du_report_sku_sum.aqu, 0::bigint) AS aqu, wave_gol.eqa - COALESCE(du_report_sku_sum.aqu, 0::bigint) AS qty_remaining FROM viwcs.wave_gol LEFT JOIN viwcs.wave_genreorders_map USING (wid, storeorderid, genreorderid) LEFT JOIN viwcs.product_info_sku_map USING (sid) LEFT JOIN core.demand USING (target_id, material_id) LEFT JOIN ( SELECT du_report_sku.wid, du_report_sku.storeorderid, du_report_sku.genreorderid, du_report_sku.sid, sum(du_report_sku.aqu) AS aqu FROM viwcs.du_report_sku GROUP BY du_report_sku.wid, du_report_sku.storeorderid, du_report_sku.genreorderid, du_report_sku.sid) du_report_sku_sum USING (wid, storeorderid, genreorderid, sid); View "viwcs.previous_wave" Column | Type | Modifiers --------+-----------------------+----------- wid | character varying(10) | View definition: SELECT wave_rxw.wid FROM viwcs.wave_rxw WHERE wave_rxw.is_previous; Table "core.demand" Column | Type | Modifiers -------------+---------+-------------------------------- id | bigint | not null default core.new_id() target_id | bigint | not null material_id | bigint | not null qty | integer | not null benefit | integer | not null default 0 Indexes: "demand_pkey" PRIMARY KEY, btree (id) "demand_target_id_key" UNIQUE, btree (target_id, material_id) "demand_material_id" btree (material_id) "demand_target_id" btree (target_id) Foreign-key constraints: "demand_material_id_fkey" FOREIGN KEY (material_id) REFERENCES core.__material_id(id) "demand_target_id_fkey" FOREIGN KEY (target_id) REFERENCES core.waypoint(id) Referenced by: TABLE "core.inventory" CONSTRAINT "inventory_demand_id_fkey" FOREIGN KEY (demand_id) REFERENCES core.demand(id) Thanks, Richard
Thom Brown wrote: > > It looks like your statistics are way out of sync with the real data. > > > Nested Loop (cost=885367.03..1123996.87 rows=8686 width=12) (actual > time=248577.879..253168.466 rows=347308 loops=1) > > This shows that it thinks there will be 8,686 rows, but actually > traverses 347,308. Yes, I see what you mean. > > Have you manually run a VACUUM on these tables? Preferrably a full one > if you can. Every night, it runs Vacuum verbose analyze on the entire database. We also have the autovacuum daemon enabled (in the default config). About 2 weeks ago, I ran cluster followed by vacuum full - which seemed to help more than I'd expect. [As I understand it, the statistics shouldn't change very much from day to day, as long as the database workload remains roughly constant. What we're actually doing is running a warehouse sorting books - so from one day to the next the particular book changes, but the overall statistics basically don't.] I notice that you appear ot have multiple sorts going on. > Are all of those actually necessary for your output? I think so. I didn't actually write all of this, so I can't be certain. Also consider > using partial or multicolumn indexes where useful. > Already done that. The query was originally pretty quick, with a few weeks worth of data, but not now. (after a few months). The times don't rise gradually, but have a very sudden knee. > And which version of PostgreSQL are you using? 8.4.1, including this patch: http://archives.postgresql.org/pgsql-bugs/2009-10/msg00118.php Richard
2009/11/20 Richard Neill <rn214@cam.ac.uk>
Yes, I see what you mean.
Thom Brown wrote:
>It looks like your statistics are way out of sync with the real data.
> Nested Loop (cost=885367.03..1123996.87 rows=8686 width=12) (actual time=248577.879..253168.466 rows=347308 loops=1)
This shows that it thinks there will be 8,686 rows, but actually traverses 347,308.Every night, it runs Vacuum verbose analyze on the entire database. We also have the autovacuum daemon enabled (in the default config).
Have you manually run a VACUUM on these tables? Preferrably a full one if you can.
About 2 weeks ago, I ran cluster followed by vacuum full - which seemed to help more than I'd expect.
[As I understand it, the statistics shouldn't change very much from day to day, as long as the database workload remains roughly constant. What we're actually doing is running a warehouse sorting books - so from one day to the next the particular book changes, but the overall statistics basically don't.]I think so. I didn't actually write all of this, so I can't be certain.
I notice that you appear ot have multiple sorts going on.Are all of those actually necessary for your output?Already done that. The query was originally pretty quick, with a few weeks worth of data, but not now. (after a few months). The times don't rise gradually, but have a very sudden knee.
Also considerusing partial or multicolumn indexes where useful.8.4.1, including this patch:And which version of PostgreSQL are you using?
http://archives.postgresql.org/pgsql-bugs/2009-10/msg00118.php
Richard
Okay, have you tried monitoring the connections to your database?
Try: select * from pg_stat_activity;
And this to see current backend connections:
SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
pg_stat_get_backend_activity(s.backendid) AS current_query
FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
It might also help if you posted your postgresql.conf too.
Thom
Thom Brown wrote: > > Okay, have you tried monitoring the connections to your database? > > Try: select * from pg_stat_activity; Tried that - it's very useful as far as it goes. I can see that in most cases, the DB is running just the one query. What I really want to know is, how far through that query has it got? (For example, if the query is an update, then surely it knows how many rows have been updated, and how many are yet to go). > > And this to see current backend connections: > > SELECT pg_stat_get_backend_pid(s.backendid) AS procpid, > pg_stat_get_backend_activity(s.backendid) AS current_query > FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s; > This looks identical to just some of the columns from pg_stat_activity. > It might also help if you posted your postgresql.conf too. Below (have removed the really non-interesting bits). Thanks, Richard > > Thom #------------------------------------------------------------------------------ # CONNECTIONS AND AUTHENTICATION #------------------------------------------------------------------------------ max_connections = 500 # (change requires restart) #------------------------------------------------------------------------------ # RESOURCE USAGE (except WAL) #------------------------------------------------------------------------------ # - Memory - shared_buffers = 4500MB # min 128kB # (change requires restart) temp_buffers = 64MB # min 800kB #max_prepared_transactions = 0 # zero disables the feature # (change requires restart) # Note: Increasing max_prepared_transactions costs ~600 bytes of shared # memory per transaction slot, plus lock space (see # max_locks_per_transaction). # It is not advisable to set max_prepared_transactions nonzero unless you # actively intend to use prepared transactions. work_mem = 256MB # min 64kB maintenance_work_mem = 256MB # min 1MB max_stack_depth = 4MB # min 100kB # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 # (change requires restart) #shared_preload_libraries = '' # (change requires restart) # - Cost-Based Vacuum Delay - #vacuum_cost_delay = 0ms # 0-100 milliseconds #vacuum_cost_page_hit = 1 # 0-10000 credits #vacuum_cost_page_miss = 10 # 0-10000 credits #vacuum_cost_page_dirty = 20 # 0-10000 credits #vacuum_cost_limit = 200 # 1-10000 credits # - Background Writer - #bgwriter_delay = 200ms # 10-10000ms between rounds #bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round #bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round # - Asynchronous Behavior - #effective_io_concurrency = 1 # 1-1000. 0 disables prefetching #------------------------------------------------------------------------------ # WRITE AHEAD LOG #------------------------------------------------------------------------------ # - Settings - #fsync = on # turns forced synchronization on or off #synchronous_commit = on # immediate fsync at commit #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync # fdatasync # fsync # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes wal_buffers = 2MB # min 32kB # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds commit_delay = 50000 # range 0-100000, in microseconds commit_siblings = 5 # range 1-1000 # - Checkpoints - checkpoint_segments = 64 # in logfile segments, min 1, 16MB each (was safe value of 4) #checkpoint_timeout = 5min # range 30s-1h #checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 #checkpoint_warning = 30s # 0 disables # - Archiving - #archive_mode = off # allows archiving to be done # (change requires restart) #archive_command = '' # command to use to archive a logfile segment #archive_timeout = 0 # force a logfile segment switch after this # number of seconds; 0 disables #------------------------------------------------------------------------------ # QUERY TUNING #------------------------------------------------------------------------------ # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - #seq_page_cost = 1 # measured on an arbitrary scale #random_page_cost = 4 # same scale as above #seq_page_cost = 0.25 # use 0.25, 0.75 for normal #random_page_cost = 0.75 # but 1 and 4 for wave-deactivate. seq_page_cost = 0.5 # It looks as though 0.5 and 2 (exactly) random_page_cost = 2 # will work for both problems. (very brittle fix!) #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above effective_cache_size = 10000MB # - Genetic Query Optimizer - #geqo = on geqo_threshold = 12 geqo_effort = 10 # range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - default_statistics_target = 1000 # range 1-10000 #constraint_exclusion = partition # on, off, or partition #cursor_tuple_fraction = 0.1 # range 0.0-1.0 #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit # JOIN clauses #------------------------------------------------------------------------------ # ERROR REPORTING AND LOGGING #------------------------------------------------------------------------------ # - When to Log - #client_min_messages = notice #log_error_verbosity = default # terse, default, or verbose messages #log_min_error_statement = error log_min_duration_statement = 80 # - What to Log - #debug_print_parse = off #debug_print_rewritten = off #debug_print_plan = off #debug_pretty_print = on #log_checkpoints = off #log_connections = off #log_disconnections = off #log_duration = off #log_hostname = off log_line_prefix = '%t ' #log_lock_waits = off # log lock waits >= deadlock_timeout #log_statement = 'none' # none, ddl, mod, all #log_temp_files = -1 # log temporary files equal or larger # than the specified size in kilobytes; # -1 disables, 0 logs all temp files #log_timezone = unknown # actually, defaults to TZ environment # setting #------------------------------------------------------------------------------ # RUNTIME STATISTICS #------------------------------------------------------------------------------ # - Query/Index Statistics Collector - #track_activities = on #track_counts = on #track_functions = none # none, pl, all #track_activity_query_size = 1024 #update_process_title = on #stats_temp_directory = 'pg_stat_tmp' # - Statistics Monitoring - #log_parser_stats = off #log_planner_stats = off #log_executor_stats = off #log_statement_stats = off #------------------------------------------------------------------------------ # AUTOVACUUM PARAMETERS #------------------------------------------------------------------------------ #autovacuum = on # Enable autovacuum subprocess? 'on' autovacuum = on # requires track_counts to also be on. log_autovacuum_min_duration = 1000 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds. #autovacuum_max_workers = 3 # max number of autovacuum subprocesses #autovacuum_naptime = 1min # time between autovacuum runs #autovacuum_vacuum_threshold = 50 # min number of row updates before # vacuum #autovacuum_analyze_threshold = 50 # min number of row updates before # analyze #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum # (change requires restart) #autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use # vacuum_cost_limit
2009/11/20 Richard Neill <rn214@cam.ac.uk>
Below (have removed the really non-interesting bits).It might also help if you posted your postgresql.conf too.
Thanks,
Richard
I can't actually see anything in your config that would cause this problem. :/
As for seeing the progress of an update, I would have thought due to the atomic nature of updates, only the transaction in which the update is running would have visibility of the as-yet uncommitted updates.
Thom
Thom Brown wrote: > 2009/11/20 Richard Neill <rn214@cam.ac.uk <mailto:rn214@cam.ac.uk>> > > > It might also help if you posted your postgresql.conf too. > > > Below (have removed the really non-interesting bits). > > Thanks, > > Richard > > > I can't actually see anything in your config that would cause this > problem. :/ > > As for seeing the progress of an update, I would have thought due to the > atomic nature of updates, only the transaction in which the update is > running would have visibility of the as-yet uncommitted updates. > Yes, but surely the postmaster itself (and any administrative user) should be able to find this out. What I need for slow queries is some kind of progress bar. Any estimate (no matter how poor, or non-linear) of the query progress, or time remaining would be invaluable. Richard
> -----Mensaje original----- > De: Richard Neill > > > max_connections = 500 # (change requires restart) > work_mem = 256MB # min 64kB Not that it has to do with your current problem but this combination could bog your server if enough clients run sorted queries simultaneously. You probably should back on work_mem at least an order of magnitude.
Fernando Hevia wrote: > > >> -----Mensaje original----- >> De: Richard Neill >> >> >> max_connections = 500 # (change requires restart) >> work_mem = 256MB # min 64kB > > Not that it has to do with your current problem but this combination could > bog your server if enough clients run sorted queries simultaneously. > You probably should back on work_mem at least an order of magnitude. > What's the correct way to configure this? * We have one client which needs to run really big transactions (therefore needs the work memory). * We also have about 200 clients which run always very small, short queries. Richard
> -----Mensaje original----- > De: Richard Neill > > Fernando Hevia wrote: > > > > > >> -----Mensaje original----- > >> De: Richard Neill > >> > >> > >> max_connections = 500 # (change requires restart) > >> work_mem = 256MB # min 64kB > > > > Not that it has to do with your current problem but this > combination > > could bog your server if enough clients run sorted queries > simultaneously. > > You probably should back on work_mem at least an order of magnitude. > > > > What's the correct way to configure this? > > * We have one client which needs to run really big > transactions (therefore needs the work memory). > > * We also have about 200 clients which run always very small, > short queries. > > Richard > Set the default value at postgresql.conf much lower, probably 4MB. And just before running any big transaction raise it for the current session only issuing a: set work_mem = '256MB'; Regards, Fernando.
>>> max_connections = 500 # (change requires restart) >>> work_mem = 256MB # min 64kB >> >> Not that it has to do with your current problem but this combination could >> bog your server if enough clients run sorted queries simultaneously. >> You probably should back on work_mem at least an order of magnitude. >> > > What's the correct way to configure this? > > * We have one client which needs to run really big transactions (therefore > needs the work memory). > You can set the work_mem for the specific user (like "set work_mem to x") at the begginning of the session. Here are some things I noticed (it is more like shooting in the dark, but still...) the expensive part is this: -> Sort (cost=280201.66..281923.16 rows=688602 width=300) (actual time=177511.806..183486.593 rows=41317448 loops=1) Sort Key: du_report_sku.wid, du_report_sku.storeorderid, du_report_sku.genreorderid Sort Method: external sort Disk: 380768kB -> HashAggregate (cost=197936.75..206544.27 rows=688602 width=36) (actual time=7396.426..11224.839 rows=6282564 loops=1) -> Seq Scan on du_report_sku (cost=0.00..111861.61 rows=6886011 width=36) (actual time=0.006..573.419 rows=6897682 loops=1) (it is pretty confusing that the HashAggregate reports ~6M rows, but the sort does 41M rows, but maybe I can not read this). Anyway, I think that if You up the work_mem for this query to 512M, the sort will be in memory, an thus plenty faster. Also, You say You are experiencing unstable query plans, and this may mean that geqo is kicking in (but Your query seems too simple for that, even considering the views involved). A quick way to check that would be to run explain <the query> a coule tens of times, and check if the plans change. If they do, try upping geqo_threshold. You have seq_page_cost 4 times larger than random_page_cost. You say You are on SSD, so there is no random access penalty. Try setting them equal. Your plan is full of merge-joins, some indices may be in order. Merge join is a kind of "last-chance" plan. the query is : SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining ) FROM core.demand, viwcs.previous_wave LEFT OUTER JOIN viwcs.wave_end_demand USING ( wid ) WHERE core.demand.id = viwcs.wave_end_demand.demand_id; Isn`t the left join equivalent to an inner join, since in where You are comparing values from the outer side of the join? If they come out nulls, they will get discarded anyway... I hope You find some of this useful. Greetings Marcin
Justin Pitts wrote: > Set work_mem in postgresql.conf down to what the 200 clients need, which > sounds to me like the default setting. > > In the session which needs more work_mem, execute: > SET SESSION work_mem TO '256MB' Isn't that terribly ugly? It seems to me less hackish to rely on the many clients not to abuse work_mem (as we know exactly what query they will run, we can be sure it won't happen). It's a shame that the work_mem parameter is a per-client one, rather than a single big pool. Richard
Richard Neill wrote: > Am I missing something though, or is this project dormant, without > having released any files? My bad--gave you the wrong url. http://git.postgresql.org/gitweb?p=pg_top.git;a=summary has the project I meant to point you toward. > What I really want to know is, how far through that query has it got? > (For example, if the query is an update, then surely it knows how many > rows have been updated, and how many are yet to go). I understand what you want. The reason you're not getting any suggestions is because that just isn't exposed in PostgreSQL yet. Clients ask for queries to be run, eventually they get rows of results back, but there's no notion of how many they're going to get in advance or how far along they are in executing the query's execution plan. There's a couple of academic projects that have started exposing more of the query internals, but I'm not aware of anyone who's even started moving in the direction of what you'd need to produce a progress bar. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Richard Neill wrote: > Likewise, is there any way to check whether, for example, postgres is > running out of work memory? It doesn't work like that; it's not an allocation. What happens is that the optimizer estimates how much memory a sort is going to need, and then uses work_mem to decide whether that is something it can do in RAM or something that needs to be done via a more expensive disk-based sorting method. You can tell if it's not set high enough by toggling on log_temp_files and watching when those get created--those appear when sorts bigger than work_mem need to be done. > commit_delay = 50000 # range 0-100000, in microseconds > commit_siblings = 5 # range 1-1000 Random note: that is way too high of a value for commit_delay. It's unlikely to be helping you, and might be hurting sometimes. The whole commit_delay feature is quite difficult to tune correctly, and is really only useful for situations where there's really heavy writing going on and you want to carefully tweak write chunking size. The useful range for commit_delay is very small even in that situation, 50K is way too high. I'd recommend changing this back to the default, if you're not at the point where you're running your own benchmarks to prove the parameter is useful to you it's not something you should try to adjust. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Thanks very much for your help so far. > > (it is pretty confusing that the HashAggregate reports ~6M rows, but > the sort does 41M rows, but maybe I can not read this). > Anyway, I think that if You up the work_mem for this query to 512M, > the sort will be in memory, an thus plenty faster. Tried this (with work_mem 2GB). It seems to make a difference, but not enough: the query time is about halved (from 220 sec to 120 sec) > > Also, You say You are experiencing unstable query plans, and this may > mean that geqo is kicking in (but Your query seems too simple for > that, even considering the views involved). A quick way to check that > would be to run explain <the query> a coule tens of times, and check > if the plans change. If they do, try upping geqo_threshold. It's not unstable from one run to the next; it's unstable from one day to the next (more on this later) > > You have seq_page_cost 4 times larger than random_page_cost. You say > You are on SSD, so there is no random access penalty. Try setting them > equal. > Again, experimentally, it seems to be non-equal. I didn't benchmark this, but the random access tests done by TomsHardware et al suggest a factor 2.5 penalty for random access vs sequential. This is very much better than rotational disks, but still significant. > Your plan is full of merge-joins, some indices may be in order. Merge > join is a kind of "last-chance" plan. > I think the fix here is going to be to do more work at write-time and less at read-time. i.e. rather than having really complex views, we'll generate some extra tables, and keep them synchronized with triggers. Richard
Justin Pitts wrote: > I don't know if I would call it "terribly" ugly. Its not especially > pretty, but it affords the needed degree of twiddling to get the job > done. Relying on the clients is fine - if you can. I suspect the vast > majority of DBAs would find that notion unthinkable. The usual result of > a memory overrun is a server crash. > It's probably OK in this context: the multiple clients are all instances of the same perl script, running particular, pre-defined queries. So we can trust them not to ask a really memory-intensive query. Besides which, if you can't trust the clients to ask sensible queries, why can you trust them to set their own work_mem values? Richard > On Nov 20, 2009, at 4:39 PM, Richard Neill wrote: > >> Justin Pitts wrote: >>> Set work_mem in postgresql.conf down to what the 200 clients need, >>> which sounds to me like the default setting. >>> In the session which needs more work_mem, execute: >>> SET SESSION work_mem TO '256MB' >> >> Isn't that terribly ugly? It seems to me less hackish to rely on the >> many clients not to abuse work_mem (as we know exactly what query they >> will run, we can be sure it won't happen). >> >> It's a shame that the work_mem parameter is a per-client one, rather >> than a single big pool. >> >> Richard >> >> -- >> Sent via pgsql-performance mailing list >> (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance > >
Greg Smith wrote: > Richard Neill wrote: >> Am I missing something though, or is this project dormant, without >> having released any files? > > My bad--gave you the wrong url. > http://git.postgresql.org/gitweb?p=pg_top.git;a=summary has the project > I meant to point you toward. Will try that out... > >> What I really want to know is, how far through that query has it got? >> (For example, if the query is an update, then surely it knows how many >> rows have been updated, and how many are yet to go). > I understand what you want. The reason you're not getting any > suggestions is because that just isn't exposed in PostgreSQL yet. > Clients ask for queries to be run, eventually they get rows of results > back, but there's no notion of how many they're going to get in advance > or how far along they are in executing the query's execution plan. > There's a couple of academic projects that have started exposing more of > the query internals, but I'm not aware of anyone who's even started > moving in the direction of what you'd need to produce a progress bar. > Is there any internal table (similar to pg_stat_activity) I can look at? Richard