Re: UPDATEDs slowing SELECTs in a fully cached database - Mailing list pgsql-performance

From Robert Klemme
Subject Re: UPDATEDs slowing SELECTs in a fully cached database
Date
Msg-id CAM9pMnMw=ggvg4_xyLw8uO68+BujpCLiOotnAcKVX742n0oG3w@mail.gmail.com
Whole thread Raw
In response to Re: UPDATEDs slowing SELECTs in a fully cached database  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
On Thu, Jul 14, 2011 at 4:05 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> It seems like we ought to distinguish heap cleanup activities from
>> user-visible semantics (IOW, users shouldn't care if a HOT cleanup
>> has to be done over after restart, so if the transaction only
>> wrote such records there's no need to flush).  This'd require more
>> process-global state than we keep now, I'm afraid.
>
> That makes sense, and seems like the right long-term fix.  It seems
> like a boolean might do it; the trick would be setting it (or not)
> in all the right places.

I also believe this is the right way to go.  I think the crucial thing
is in "distinguish heap cleanup activities from user-visible
semantics" - basically this is what happens with auto vacuum: it does
work concurrently that you do not want to burden on user transactions.

>> Another approach we could take (also nontrivial) is to prevent
>> select-only queries from doing HOT cleanups.  You said upthread
>> that there were alleged performance benefits from aggressive
>> cleanup, but IMO that can charitably be described as unproven.
>> The real reason it happens is that we didn't see a simple way for
>> page fetches to know soon enough whether a tuple update would be
>> likely to happen later, so they just do cleanups unconditionally.
>
> Hmm.  One trivial change could be to skip it when the top level
> transaction is declared to be READ ONLY.  At least that would give
> people a way to work around it for now.  Of course, that can't be
> back-patched before 9.1 because subtransactions could override READ
> ONLY before that.

What I don't like about this approach is that it a) increases
complexity for the user, b) might not be for everyone (i.e. tools like
OR mappers which do not allow such setting of the TX or cases where
you do not know what type of TX this is when you start it) and c) it
still keeps the performance penalty to suddenly come to haunt a
different TX.

I can only speculate whether the latter might actually cause other
people to run into issues because their usage patterns currently force
the cleanout activities into an unimportant TX while the workaround
would suddenly have the cleanout delay show up in an important TX
which used to be fast.  This is also hard to debug since you would
normally only look at the slow TX before you realize you need to look
elsewhere (the length of this thread is kind of proof of this already
:-)).

My 0.01 EUR...

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database
Next
From: Tom Lane
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database