Thread: big transaction slows down over time - but disk seems almost unused

big transaction slows down over time - but disk seems almost unused

From
Ben
Date:
I've got a long-running, update-heavy transaction that increasingly slows
down the longer it runs. I would expect that behavior, if there was some
temp file creation going on. But monitoring vmstat over the life of the
transaction shows virtually zero disk activity. Instead, the system has
its CPU pegged the whole time.

So.... why the slowdown? Is it a MVCC thing? A side effect of calling
stored proceedures a couple hundred thousand times in a single
transaction? Or am I just doing something wrong?

Re: big transaction slows down over time - but disk seems

From
Heikki Linnakangas
Date:
Ben wrote:
> I've got a long-running, update-heavy transaction that increasingly
> slows down the longer it runs. I would expect that behavior, if there
> was some temp file creation going on. But monitoring vmstat over the
> life of the transaction shows virtually zero disk activity. Instead, the
> system has its CPU pegged the whole time.
>
> So.... why the slowdown? Is it a MVCC thing? A side effect of calling
> stored proceedures a couple hundred thousand times in a single
> transaction? Or am I just doing something wrong?

My guess is that the updates are creating a lot of old row versions, and
a command within the transaction is doing a seq scan that has to scan
through all of them. Or something like that. It's hard to tell without
more details.

Calling stored procedures repeatedly shouldn't cause a slowdown over time.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Memory usage remains consistent, which is to say that postgres is
using most available system memory all the time, as I configured it
to. There is no swapping going on.

It's not clear to me why forcing a WAL checkpoint would help
anything.... but it doesn't matter, as only superusers can do it, so
it's not an option for me. Unless there's a whole other meaning you
were implying....?

On Nov 1, 2006, at 1:21 AM, Andreas Kostyrka wrote:

> Am Dienstag, den 31.10.2006, 21:58 -0800 schrieb Ben:
>> I've got a long-running, update-heavy transaction that
>> increasingly slows
>> down the longer it runs. I would expect that behavior, if there
>> was some
>> temp file creation going on. But monitoring vmstat over the life
>> of the
>> transaction shows virtually zero disk activity. Instead, the
>> system has
>> its CPU pegged the whole time.
>>
>> So.... why the slowdown? Is it a MVCC thing? A side effect of calling
>> stored proceedures a couple hundred thousand times in a single
>
> Memory usage? Have you tried to checkpoint your transaction from
> time to
> time?
>
> Andreas
>
>> transaction? Or am I just doing something wrong?
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings


Re: big transaction slows down over time - but disk

From
Andreas Kostyrka
Date:
Am Dienstag, den 31.10.2006, 21:58 -0800 schrieb Ben:
> I've got a long-running, update-heavy transaction that increasingly slows
> down the longer it runs. I would expect that behavior, if there was some
> temp file creation going on. But monitoring vmstat over the life of the
> transaction shows virtually zero disk activity. Instead, the system has
> its CPU pegged the whole time.
>
> So.... why the slowdown? Is it a MVCC thing? A side effect of calling
> stored proceedures a couple hundred thousand times in a single

Memory usage? Have you tried to checkpoint your transaction from time to
time?

Andreas

> transaction? Or am I just doing something wrong?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Attachment

Re: big transaction slows down over time - but disk seems

From
Richard Huxton
Date:
Ben wrote:
> My transaction calls the same stored procedure many times over. Over the
> lifetime of the transaction, that stored procedure slows down by roughly
> 2 orders of magnitude. The procedure itself tries to look up several
> strings in dictionary tables, and if the strings aren't there (most of
> them will be) it inserts them. All those dictionary tables have indexes.
> After it has converted most of the strings into ids, it does another
> lookup on a table and if it finds a matching row (should be the common
> case) it updates a timestamp column of that row; otherwise, it inserts a
> new row.

Which would suggest Heikki's guess was pretty much right and it's dead
rows that are causing the problem.

Assuming most updates are to this timestamp, could you try a test case
that does everything *except* update the timestamp. If that runs
blazingly fast then we've found the problem.

If that is the problem, there's two areas to look at:
1. Avoid updating the same timestamp more than once (if that's happening)
2. Update timestamps in one go at the end of the transaction (perhaps by
loading updates into a temp table).
3. Split the transaction in smaller chunks of activity.

> So.... there isn't much table size changing, but there are a lot of
> updates. Based on pg_stat_user_tables I suspect that the procedure is
> using indexes more than table scans. Is there a better way to know?

Not really. You can check the plans of queries within the function, but
there's no way to capture query plans of running functions.

--
   Richard Huxton
   Archonet Ltd

My transaction calls the same stored procedure many times over. Over
the lifetime of the transaction, that stored procedure slows down by
roughly 2 orders of magnitude. The procedure itself tries to look up
several strings in dictionary tables, and if the strings aren't there
(most of them will be) it inserts them. All those dictionary tables
have indexes. After it has converted most of the strings into ids, it
does another lookup on a table and if it finds a matching row (should
be the common case) it updates a timestamp column of that row;
otherwise, it inserts a new row.

So.... there isn't much table size changing, but there are a lot of
updates. Based on pg_stat_user_tables I suspect that the procedure is
using indexes more than table scans. Is there a better way to know?

On Nov 1, 2006, at 1:31 AM, Richard Huxton wrote:

> Ben wrote:
>> I've got a long-running, update-heavy transaction that
>> increasingly slows down the longer it runs. I would expect that
>> behavior, if there was some temp file creation going on. But
>> monitoring vmstat over the life of the transaction shows virtually
>> zero disk activity. Instead, the system has its CPU pegged the
>> whole time.
>> So.... why the slowdown? Is it a MVCC thing? A side effect of
>> calling stored proceedures a couple hundred thousand times in a
>> single transaction? Or am I just doing something wrong?
>
> You'll need to provide some more information before anyone can come
> up with something conclusive. What queries slow down, by how much
> and after what updates (for example). It could be an update/vacuum-
> related problem, or it could be that your stored procedures aren't
> coping with changes in table size (if table(s) are changing size).
>
> --
>   Richard Huxton
>   Archonet Ltd


Re: big transaction slows down over time - but disk seems

From
Ben
Date:

On Wed, 1 Nov 2006, Richard Huxton wrote:

> 1. Avoid updating the same timestamp more than once (if that's happening)

Each row is updated at most once, and not all rows are updated.

> 2. Update timestamps in one go at the end of the transaction (perhaps by
> loading updates into a temp table).

Hey, that's not a bad idea. I'll give that a shot. Thanks!

> 3. Split the transaction in smaller chunks of activity.

I'd be happy to do this too, except that I need a simple way to rollback
everything, and I don't see how I can get that with this.




Re: big transaction slows down over time - but disk seems

From
Richard Huxton
Date:
Ben wrote:
>
>
> On Wed, 1 Nov 2006, Richard Huxton wrote:
>
>> 1. Avoid updating the same timestamp more than once (if that's happening)
>
> Each row is updated at most once, and not all rows are updated.
>
>> 2. Update timestamps in one go at the end of the transaction (perhaps
>> by loading updates into a temp table).
>
> Hey, that's not a bad idea. I'll give that a shot. Thanks!
>
>> 3. Split the transaction in smaller chunks of activity.
>
> I'd be happy to do this too, except that I need a simple way to rollback
> everything, and I don't see how I can get that with this.

Well, you could with a temp-table, but it probably won't be necessary if
you have one. You might wan to issue a vacuum on the updated table after
the transaction completes.

Note that this idea is built on a set of assumptions that might not be
true, so do test.

Oh - if you're processing rows one at a time with your stored procedure,
see if there's not a way to process the whole set. That can make a huge
difference.

--
   Richard Huxton
   Archonet Ltd