Thread: temp files getting me down
I'm running 9.0.3, and recently started getting temp files being created. This is a problem because it's making a bunch ofdirty buffers that have to be flushed to disk and my poor little disk isn't up to the task. I'm not sure why though, becausethis is the explain verbose for the queries that are creating them: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..15180.05 rows=4000 width=109) (actual time=159.462..174.694 rows=4000 loops=1) -> Index Scan using pending_replication_items on replication_queue (cost=0.00..37247114.20 rows=9814757 width=109) (actualtime=159.459..169.061 rows=4000 loops=1) Total runtime: 177.437 ms (3 rows) My understanding with temp files is that they are generated when sorts need to happen. But given that the index is doingall the sorting for me, I'm at a loss. Why else might I get temp files?
On May 25, 2011, at 9:31 AM, Ben Chobot wrote: > I'm running 9.0.3, and recently started getting temp files being created. This is a problem because it's making a bunchof dirty buffers that have to be flushed to disk and my poor little disk isn't up to the task. I'm not sure why though,because this is the explain verbose for the queries that are creating them: > > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..15180.05 rows=4000 width=109) (actual time=159.462..174.694 rows=4000 loops=1) > -> Index Scan using pending_replication_items on replication_queue (cost=0.00..37247114.20 rows=9814757 width=109)(actual time=159.459..169.061 rows=4000 loops=1) > Total runtime: 177.437 ms > (3 rows) > > > My understanding with temp files is that they are generated when sorts need to happen. But given that the index is doingall the sorting for me, I'm at a loss. Why else might I get temp files? Just to close the loop, the problem was (apparently) due to table bloat. After I clustered the table, the problems went awayimmediately. I'd still like to understand what was happening, but at least my problem is solved.
On Wed, May 25, 2011 at 2:14 PM, Ben Chobot <bench@silentmedia.com> wrote: > On May 25, 2011, at 9:31 AM, Ben Chobot wrote: > >> I'm running 9.0.3, and recently started getting temp files being created. This is a problem because it's making a bunchof dirty buffers that have to be flushed to disk and my poor little disk isn't up to the task. I'm not sure why though,because this is the explain verbose for the queries that are creating them: >> >> >> QUERY PLAN >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..15180.05 rows=4000 width=109) (actual time=159.462..174.694 rows=4000 loops=1) >> -> Index Scan using pending_replication_items on replication_queue (cost=0.00..37247114.20 rows=9814757 width=109)(actual time=159.459..169.061 rows=4000 loops=1) >> Total runtime: 177.437 ms >> (3 rows) >> >> >> My understanding with temp files is that they are generated when sorts need to happen. But given that the index is doingall the sorting for me, I'm at a loss. Why else might I get temp files? > > Just to close the loop, the problem was (apparently) due to table bloat. After I clustered the table, the problems wentaway immediately. I'd still like to understand what was happening, but at least my problem is solved. are you sure this is the query that caused the temp files? can we see the query? merlin
On May 25, 2011, at 2:57 PM, Merlin Moncure wrote: > On Wed, May 25, 2011 at 2:14 PM, Ben Chobot <bench@silentmedia.com> wrote: >> On May 25, 2011, at 9:31 AM, Ben Chobot wrote: >> >>> I'm running 9.0.3, and recently started getting temp files being created. This is a problem because it's making a bunchof dirty buffers that have to be flushed to disk and my poor little disk isn't up to the task. I'm not sure why though,because this is the explain verbose for the queries that are creating them: >>> >>> >>> QUERY PLAN >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Limit (cost=0.00..15180.05 rows=4000 width=109) (actual time=159.462..174.694 rows=4000 loops=1) >>> -> Index Scan using pending_replication_items on replication_queue (cost=0.00..37247114.20 rows=9814757 width=109)(actual time=159.459..169.061 rows=4000 loops=1) >>> Total runtime: 177.437 ms >>> (3 rows) >>> >>> >>> My understanding with temp files is that they are generated when sorts need to happen. But given that the index is doingall the sorting for me, I'm at a loss. Why else might I get temp files? >> >> Just to close the loop, the problem was (apparently) due to table bloat. After I clustered the table, the problems wentaway immediately. I'd still like to understand what was happening, but at least my problem is solved. > > are you sure this is the query that caused the temp files? can we see the query? Well, the query itself was calling a plpgsql function, and the function itself was doing: DECLARE row formatted_replication_queue%ROWTYPE; BEGIN for row in select * from formatted_replication_queue where distributor_id IS NULL AND ('{{%,%}}'::varchar[] @> ARRAY[source_site, dest_site]::varchar[] OR '{{%,%}}'::varchar[] @> ARRAY['%', dest_site]::varchar[] OR '{{%,%}}'::varchar[] @> ARRAY[source_site, '%']::varchar[] OR '{{%,%}}'::varchar[] @> ARRAY['%', '%']::varchar[]) ORDER BY update_time ASC limit 4000 for update LOOP UPDATE replication_queue SET distributor_id = 't32' WHERE filehash = row.filehash; RETURN NEXT row; END LOOP; RETURN; END Doing that select manually didn't seem to be causing the same issues. formatted_replication_queue is a simple view that reformatssome columns but does no sorting.
On Wed, May 25, 2011 at 6:44 PM, Ben Chobot <bench@silentmedia.com> wrote: > On May 25, 2011, at 2:57 PM, Merlin Moncure wrote: > >> On Wed, May 25, 2011 at 2:14 PM, Ben Chobot <bench@silentmedia.com> wrote: >>> On May 25, 2011, at 9:31 AM, Ben Chobot wrote: >>> >>>> I'm running 9.0.3, and recently started getting temp files being created. This is a problem because it's making a bunchof dirty buffers that have to be flushed to disk and my poor little disk isn't up to the task. I'm not sure why though,because this is the explain verbose for the queries that are creating them: >>>> >>>> >>>> QUERY PLAN >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> Limit (cost=0.00..15180.05 rows=4000 width=109) (actual time=159.462..174.694 rows=4000 loops=1) >>>> -> Index Scan using pending_replication_items on replication_queue (cost=0.00..37247114.20 rows=9814757 width=109)(actual time=159.459..169.061 rows=4000 loops=1) >>>> Total runtime: 177.437 ms >>>> (3 rows) >>>> >>>> >>>> My understanding with temp files is that they are generated when sorts need to happen. But given that the index is doingall the sorting for me, I'm at a loss. Why else might I get temp files? >>> >>> Just to close the loop, the problem was (apparently) due to table bloat. After I clustered the table, the problems wentaway immediately. I'd still like to understand what was happening, but at least my problem is solved. >> >> are you sure this is the query that caused the temp files? can we see the query? > > Well, the query itself was calling a plpgsql function, and the function itself was doing: > > DECLARE > row formatted_replication_queue%ROWTYPE; > BEGIN > for row in select * from formatted_replication_queue where > distributor_id IS NULL AND > ('{{%,%}}'::varchar[] @> ARRAY[source_site, dest_site]::varchar[] OR > '{{%,%}}'::varchar[] @> ARRAY['%', dest_site]::varchar[] OR > '{{%,%}}'::varchar[] @> ARRAY[source_site, '%']::varchar[] OR > '{{%,%}}'::varchar[] @> ARRAY['%', '%']::varchar[]) > ORDER BY update_time ASC > limit 4000 > for update > LOOP > UPDATE replication_queue SET distributor_id = 't32' WHERE filehash = row.filehash; > RETURN NEXT row; > END LOOP; > RETURN; > END > > Doing that select manually didn't seem to be causing the same issues. formatted_replication_queue is a simple view thatreformats some columns but does no sorting. um, right. how about for starters querying the view outside of the function and see what plan we get... also you still haven't posted the original query. one major gotcha with plpgsql is the function executing more times than you think. posting the calling query will help, or simple logging from inside the function (raise notice, etc). merlin
On May 25, 2011, at 7:36 PM, Merlin Moncure wrote: > On Wed, May 25, 2011 at 6:44 PM, Ben Chobot <bench@silentmedia.com> wrote: >> >> >> Well, the query itself was calling a plpgsql function, and the function itself was doing: >> >> DECLARE >> row formatted_replication_queue%ROWTYPE; >> BEGIN >> for row in select * from formatted_replication_queue where >> distributor_id IS NULL AND >> ('{{%,%}}'::varchar[] @> ARRAY[source_site, dest_site]::varchar[] OR >> '{{%,%}}'::varchar[] @> ARRAY['%', dest_site]::varchar[] OR >> '{{%,%}}'::varchar[] @> ARRAY[source_site, '%']::varchar[] OR >> '{{%,%}}'::varchar[] @> ARRAY['%', '%']::varchar[]) >> ORDER BY update_time ASC >> limit 4000 >> for update >> LOOP >> UPDATE replication_queue SET distributor_id = 't32' WHERE filehash = row.filehash; >> RETURN NEXT row; >> END LOOP; >> RETURN; >> END >> >> Doing that select manually didn't seem to be causing the same issues. formatted_replication_queue is a simple view thatreformats some columns but does no sorting. > > um, right. how about for starters querying the view outside of the > function and see what plan we get... Sorry, I thought I'd mentioned that I had run this select outside the function but was unable to replicate the temp filecreation that way. That's how I got the explain analyze plan. > also you still haven't posted the original query. one major gotcha > with plpgsql is the function executing more times than you think. > posting the calling query will help, or simple logging from inside the > function (raise notice, etc). Well the calling query was merely "select * from f(4000,'t32');" It wasn't in a join, and the args were constants. It's asimple "give me more work from the work queue" function. I really don't think it was being called more than I thought, because I saw each query was taking about 3 minutes. Duringthose 3 minutes, I observed a temp file building up. Then it would go away right as the query finished. I'll be lax about staying on top of bloat for this database and if it returns to previous levels of bloat, perhaps the problemwill return.