Thread: Limiting setting of hint bits by read-only queries; vacuum_delay

Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
vacuum_delay is designed to slow down VACUUMs from writing too many
blocks. However, SELECTs also dirty data blocks but are NOT slowed
down by vacuum_delay.

So the current situation is that a large SELECT operates similarly to
a VACUUM, throwing out many dirty blocks and using additional I/O
resources but without constraint or control.

The user issuing the SELECT experiences a noticeable slow-down, which
is annoying if it wasn't them that issued any writes to that data. The
dbadmin is also annoyed because the SELECT is uncontrollable in its
write behaviour, which has a knock-on effect on replication lag and so
reduces high availability. The checksum patch highlights this
behaviour, but its been pretty annoying for years even without that.
Yes, it is that which inspires this commentary now, but its also been
the subject of much recent discussion and patch submission, which
regrettably has come to nothing.

IMHO it is time to limit the hint bit writes caused by SELECTs, or at
least larger SELECTs.

Proposal is to prevent SELECTs from causing more than N buffers from
being dirtied by hint bit setting and block cleanup. Smaller SELECTs
still clean up, but larger queries don't get swamped by background
duties which autovacuum ought to be performing. Write statements
(INSERT, UPDATE, DELETE) are not affected, nor are SELECT ... FOR
$LOCK queries, i.e. they will clean blocks if they can (because they
need to).

query_cleanup_limit = 4 (default) range -1... INT_MAX
-1 means "no limit" and is equivalent to current behaviour

Once a query has reached its query_cleanup_limit it will no longer
mark *heap* buffers dirty in MarkBufferDirtyHint, nor will it attempt
to do optional HOT block cleanup.

Patch to implement is a few hours work. The only complexity is
deciding how to handle SQL in functions.... to which I would say, as
simply as possible.

Can we do this now?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Greg Stark
Date:
On Sun, Mar 24, 2013 at 11:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Proposal is to prevent SELECTs from causing more than N buffers from
> being dirtied by hint bit setting and block cleanup.

I think you need to clarify what you mean by this.

I'm guessing you mean that there would be a global variable somewhere
(why not a field in the executor node for the sequential scan?)
counting how many blocks were marked dirty by the current plan (why
not the whole transaction?) and once it reaches the limit subsequent
blocks would be processed normally including setting hint bits but
wouldn't mark the block dirty? (Or do you mean the hint bits wouldn't
even be set so even if the block was dirty they wouldn't be updated?)

This means that there's no limit to the number of times an in-doubt
record would have to have its xmin/xmax looked up in the clog. Every
select would have to do the same work over and over again
indefinitely.

The whole point of the hint bits is to guarantee that each update
causes a bounded amount of work. Each update transaction currently
causes one write, followed by a limited period when the transaction
status needs to be checked, followed by another write with the hint
bit set. And that's the total work caused. After that the record can
be read without consulting the clog or anything else.


-- 
greg



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Can we do this now?

No.

We are trying to get 9.3 out the door, not undertake design, coding,
and testing of entirely new behaviors which will have complex
performance tradeoffs.  This sounds like a great project for early in
the 9.4 development cycle, but it's way too late to try to do it now.
(At least if you'd like to ship 9.3 this year.  If we're going to
re-open 9.3 for this kind of development, I have some planner fixes
I need to make ...)
        regards, tom lane



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 24 March 2013 13:29, Greg Stark <stark@mit.edu> wrote:
> On Sun, Mar 24, 2013 at 11:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Proposal is to prevent SELECTs from causing more than N buffers from
>> being dirtied by hint bit setting and block cleanup.

> I'm guessing you mean that there would be a global variable somewhere
> (why not a field in the executor node for the sequential scan?)
> counting how many blocks were marked dirty by the current plan (why
> not the whole transaction?) and once it reaches the limit subsequent
> blocks would be processed normally including setting hint bits but
> wouldn't mark the block dirty?

Yep, that's what I'm proposing.

> This means that there's no limit to the number of times an in-doubt
> record would have to have its xmin/xmax looked up in the clog. Every
> select would have to do the same work over and over again
> indefinitely.

> The whole point of the hint bits is to guarantee that each update
> causes a bounded amount of work. Each update transaction currently
> causes one write, followed by a limited period when the transaction
> status needs to be checked, followed by another write with the hint
> bit set. And that's the total work caused. After that the record can
> be read without consulting the clog or anything else.

...and as a result, the rest of your comments don't apply at all to
the proposal. Sorry about that confusion.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 24 March 2013 13:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Can we do this now?
>
> No.
>
> We are trying to get 9.3 out the door, not undertake design, coding,
> and testing of entirely new behaviors which will have complex
> performance tradeoffs.  This sounds like a great project for early in
> the 9.4 development cycle, but it's way too late to try to do it now.
> (At least if you'd like to ship 9.3 this year.  If we're going to
> re-open 9.3 for this kind of development, I have some planner fixes
> I need to make ...)

Yes, there are tradeoffs and people would like to control them rather
than being stuck with one and only one behavour that we know has major
downsides in certain use cases.

I'm proposing a parameter to control this, with the existing behaviour
being still obtainable. This is not a new automatic behaviour. If you
have a new optimiser behaviour that can be controlled by parameter and
takes a few hours to implement, please do it.

I'm greatly concerned that nobody seems to care about replication
delay; this is the second time I've raised major issues that affect
that and we have no resolution.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Greg Stark
Date:
On Sun, Mar 24, 2013 at 9:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> ...and as a result, the rest of your comments don't apply at all to
> the proposal. Sorry about that confusion.

How do you figure that?


-- 
greg



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Greg Smith
Date:
On 3/24/13 7:14 AM, Simon Riggs wrote:
> Patch to implement is a few hours work. The only complexity is
> deciding how to handle SQL in functions.... to which I would say, as
> simply as possible.

Like the Page replacement ideas, the throttle on how fast something like 
this will get done depends not on development time, but on putting 
together more performance regression tests.  The idea I was thinking 
about is refactoring the background writer's role in hint bit 
maintenance.  If backends could push "this looks dirty" page numbers 
toward the BGW and keep going, it might stream those out to disk under 
its control.

I also have a larger proposal for how to refactor I/O so that both 
queries and CREATE INDEX have similar cost controls to the ones used to 
limit vacuum.  And two more based on collecting extra data for high cost 
queries before they run.

But right now I keep biting my tongue about the design on all these, and 
return to working on one of the patches already in the CF queue instead.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 25 March 2013 02:50, Greg Smith <greg@2ndquadrant.com> wrote:
> On 3/24/13 7:14 AM, Simon Riggs wrote:
>>
>> Patch to implement is a few hours work. The only complexity is
>> deciding how to handle SQL in functions.... to which I would say, as
>> simply as possible.
>
>
> Like the Page replacement ideas, the throttle on how fast something like
> this will get done depends not on development time, but on putting together
> more performance regression tests.

I don't see any role for regression tests there, nor even performance
tests. This isn't vague discussion about page replacement.

There's absolutely zero point spending time on how background tasks
might work while the foreground tasks continue to do the work right in
front of our eyes. Until we agree how to limit what foreground tasks
do, we'll never move forwards.

> The idea I was thinking about is
> refactoring the background writer's role in hint bit maintenance.  If
> backends could push "this looks dirty" page numbers toward the BGW and keep
> going, it might stream those out to disk under its control.

> I also have a larger proposal for how to refactor I/O so that both queries
> and CREATE INDEX have similar cost controls to the ones used to limit
> vacuum.  And two more based on collecting extra data for high cost queries
> before they run.
>
> But right now I keep biting my tongue about the design on all these, and
> return to working on one of the patches already in the CF queue instead.

This is related to one of the patches in this CF, hence why I bring it up now.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Greg Stark
Date:
On Mon, Mar 25, 2013 at 2:50 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> The idea I was thinking about is refactoring the background writer's role in
> hint bit maintenance

A good first step might be to separate the "dirty" bit into two bits.
"mandatory dirty" and "optional dirty". (Or maybe "hard dirty" and
"soft dirty"). Hint bit updates could set the latter and then some
later policy decision could be made about whether to bother writing
out the buffer if it's only optionally dirty.

My personal suspicion is that the right policy will look something
like writing out 1/nth of the hint bit dirtied pages. Where n is
something like MaxTuplesPerPage or perhaps the actual number of tuples
per page. And perhaps that should only happen if there are any
mandatory dirty pages waiting to be written or something like that.


-- 
greg



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Merlin Moncure
Date:
On Sun, Mar 24, 2013 at 6:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> vacuum_delay is designed to slow down VACUUMs from writing too many
> blocks. However, SELECTs also dirty data blocks but are NOT slowed
> down by vacuum_delay.
>
> So the current situation is that a large SELECT operates similarly to
> a VACUUM, throwing out many dirty blocks and using additional I/O
> resources but without constraint or control.
>
> The user issuing the SELECT experiences a noticeable slow-down, which
> is annoying if it wasn't them that issued any writes to that data. The
> dbadmin is also annoyed because the SELECT is uncontrollable in its
> write behaviour, which has a knock-on effect on replication lag and so
> reduces high availability. The checksum patch highlights this
> behaviour, but its been pretty annoying for years even without that.
> Yes, it is that which inspires this commentary now, but its also been
> the subject of much recent discussion and patch submission, which
> regrettably has come to nothing.
>
> IMHO it is time to limit the hint bit writes caused by SELECTs, or at
> least larger SELECTs.
>
> Proposal is to prevent SELECTs from causing more than N buffers from
> being dirtied by hint bit setting and block cleanup. Smaller SELECTs
> still clean up, but larger queries don't get swamped by background
> duties which autovacuum ought to be performing. Write statements
> (INSERT, UPDATE, DELETE) are not affected, nor are SELECT ... FOR
> $LOCK queries, i.e. they will clean blocks if they can (because they
> need to).
>
> query_cleanup_limit = 4 (default) range -1... INT_MAX
> -1 means "no limit" and is equivalent to current behaviour
>
> Once a query has reached its query_cleanup_limit it will no longer
> mark *heap* buffers dirty in MarkBufferDirtyHint, nor will it attempt
> to do optional HOT block cleanup.
>
> Patch to implement is a few hours work. The only complexity is
> deciding how to handle SQL in functions.... to which I would say, as
> simply as possible.

This is pretty similar to the proposal Atri and I just recently made.
I am 100% in agreement that something must be done here...SELECT has
none of the i/o mitigation features that vacuum has.  Is your idea
better? probably (although you have to give a small penalty for a user
facing tunable) but we need testing against real world workloads, or
at least a much better synthetic one than pgbench, which per recent
discussions is probably the top objective of the project (a
performance farm, etc.).

merlin



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Atri Sharma
Date:
> This is pretty similar to the proposal Atri and I just recently made.
> I am 100% in agreement that something must be done here...SELECT has
> none of the i/o mitigation features that vacuum has.  Is your idea
> better? probably (although you have to give a small penalty for a user
> facing tunable) but we need testing against real world workloads, or
> at least a much better synthetic one than pgbench, which per recent
> discussions is probably the top objective of the project (a
> performance farm, etc.).
>

I have been working on some tests for improving the performance in
case of bulk INSERTs for our patch. So far, I think it has some
relation to the visibility map optimization, which our patch seems to
be affecting.

Some more testing is in place, which has been delayed due to me being
wound up in other projects. Now that they are complete, I will resume
testing next week or so.

Regards,

Atri


--
Regards,

Atri
l'apprenant



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 25 March 2013 14:26, Merlin Moncure <mmoncure@gmail.com> wrote:

> This is pretty similar to the proposal Atri and I just recently made.
> I am 100% in agreement that something must be done here...SELECT has
> none of the i/o mitigation features that vacuum has.  Is your idea
> better? probably (although you have to give a small penalty for a user
> facing tunable)

I was hoping this was a new idea entirely, since I was focusing on
simply limiting foreground work rather than trying to work out how to
optimise foreground work or work out how to make background tasks work
better.

> but we need testing against real world workloads, or
> at least a much better synthetic one than pgbench, which per recent
> discussions is probably the top objective of the project (a
> performance farm, etc.).

Self-tuning the background workloads needs lots of testing. Limiting
foreground work needs very little, or none.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Kevin Grittner
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> Merlin Moncure <mmoncure@gmail.com> wrote:

>> we need testing against real world workloads, or at least a much
>> better synthetic one than pgbench, which per recent discussions
>> is probably the top objective of the project (a performance
>> farm, etc.).
>
> Self-tuning the background workloads needs lots of testing.
> Limiting foreground work needs very little, or none.

This is absolutely a real-world problem, but I disagree that the
solution you propose is risk-free.  It would be trivial to
construct a test which would show massive performance degradation.
Consider a single largish table which fits in cache and is subject
to frequent seqscans, and a workload which burns through
transaction IDs fast enough to cause clog thrashing as these xids
get old and still lack hint bits.

I think there are ways to deal with that problem, with the
foreground select telling the bgwriter or autovacuum to pay
attention to the problem tables (or pages), but now is not the time
to start designing that.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Merlin Moncure
Date:
On Mon, Mar 25, 2013 at 1:05 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 25 March 2013 14:26, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> This is pretty similar to the proposal Atri and I just recently made.
>> I am 100% in agreement that something must be done here...SELECT has
>> none of the i/o mitigation features that vacuum has.  Is your idea
>> better? probably (although you have to give a small penalty for a user
>> facing tunable)
>
> I was hoping this was a new idea entirely, since I was focusing on
> simply limiting foreground work rather than trying to work out how to
> optimise foreground work or work out how to make background tasks work
> better.

They are very similar, in that based on $simple_condition hint bits do
not get written out during a scan.  Also, the effect in both cases is
to push more work into vacuum.

Our $simple_condition was a little different and maybe less good than
yours, but that should be proven.  A good starting point would be to
run the battery of performance tests that Amit and Hari ran against
what Atri proposed.  After seeing the results, I hedged on pushing the
patch further -- it wasn't clear that the results were win-win and I
think your patch idea mostly has the same pros/cons (see:
http://postgresql.1045698.n5.nabble.com/WIP-patch-for-hint-bit-i-o-mitigation-td5730963i20.html).

merlin



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 25 March 2013 20:44, Kevin Grittner <kgrittn@ymail.com> wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>> Merlin Moncure <mmoncure@gmail.com> wrote:
>
>>> we need testing against real world workloads, or at least a much
>>> better synthetic one than pgbench, which per recent discussions
>>> is probably the top objective of the project (a performance
>>> farm, etc.).
>>
>> Self-tuning the background workloads needs lots of testing.
>> Limiting foreground work needs very little, or none.
>
> This is absolutely a real-world problem, but I disagree that the
> solution you propose is risk-free.  It would be trivial to
> construct a test which would show massive performance degradation.

It is trivial to show massive performance degredation through the
*lack* of this feature, as you know.

Since so many people have experienced the pain, doing nothing because
it isn't auto-tuned is not sensible. Preventing manual control of
problems just doesn't make sense.

> Consider a single largish table which fits in cache and is subject
> to frequent seqscans, and a workload which burns through
> transaction IDs fast enough to cause clog thrashing as these xids
> get old and still lack hint bits.

That wouldn't happen. I suggested setting hint bits but not dirtying
the data blocks.

> I think there are ways to deal with that problem, with the
> foreground select telling the bgwriter or autovacuum to pay
> attention to the problem tables (or pages), but now is not the time
> to start designing that.

We do already tell autovacuum to deal with the problem, but it wakes
up too late to do anything useful.

We've been waiting for a solution along those lines for most of a
decade (late 2004). My guess is we'll spend a whole chunk of time and
still implement something that doesn't work, just as we did with
bgwriter in 8.0

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Kevin Grittner
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> On 25 March 2013 20:44, Kevin Grittner <kgrittn@ymail.com> wrote:

>> This is absolutely a real-world problem, but I disagree that the
>> solution you propose is risk-free.  It would be trivial to
>> construct a test which would show massive performance degradation.
>
> It is trivial to show massive performance degredation through the
> *lack* of this feature, as you know.

True, but I think we have to be more careful about causing
performance *regressions* in a new release than perpetuating
*existing* performance problems.

> Since so many people have experienced the pain, doing nothing because
> it isn't auto-tuned is not sensible. Preventing manual control of
> problems just doesn't make sense.

It's a matter of not destabilizing things with brand-new ideas this
late in the release cycle.

>> Consider a single largish table which fits in cache and is subject
>> to frequent seqscans, and a workload which burns through
>> transaction IDs fast enough to cause clog thrashing as these xids
>> get old and still lack hint bits.
>
> That wouldn't happen. I suggested setting hint bits but not dirtying
> the data blocks.

[ thinks about the implications of that a bit more ]

That would make it harder to construct a degenerate case; but it's
hard to be confident that real-world applications couldn't hit some
significant performance regressions.  If we had a performance
testing farm, this would be a lot less scary.

>> I think there are ways to deal with that problem, with the
>> foreground select telling the bgwriter or autovacuum to pay
>> attention to the problem tables (or pages), but now is not the time
>> to start designing that.
>
> We do already tell autovacuum to deal with the problem, but it wakes
> up too late to do anything useful.

I was thinking about some more direct communication to nudge those
into action under certain circumstances.  I think we could come up
with heuristics to ensure that the worst-case regression was small
enough not to be noticed.

> We've been waiting for a solution along those lines for most of a
> decade (late 2004). My guess is we'll spend a whole chunk of time and
> still implement something that doesn't work, just as we did with
> bgwriter in 8.0

So submit this for the first CF in the next release.  If we start
allowing new features to be designed at this point in the 9.3
cycle, 9.3 won't be released any sooner than 9.4 otherwise would
be.  Certainly you must have some features you would like to see
released before this summer is out.  If we open the door to new
ideas for 9.3, there will be an avalanche of "this is important,
too" patches, and you just won't see the release until *much*
later.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Greg Stark
Date:
On Mon, Mar 25, 2013 at 9:53 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> That would make it harder to construct a degenerate case

I don't think it's hard at all. It's the same as the case Simon wants
to solve except that the cost is incurred in a different way. Imagine
a system where there's a huge data load to a table which is then
read-only for an OLTP system. Until vacuum comes along -- and it may
never since the table never sees deletes or updates -- every
transaction needs to do a clog lookup for every tuple it sees. That
means a significant cpu slowdown for every row lookup forever more. To
save a one-time i/o cost.

-- 
greg



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Merlin Moncure
Date:
On Mon, Mar 25, 2013 at 5:57 PM, Greg Stark <stark@mit.edu> wrote:
> On Mon, Mar 25, 2013 at 9:53 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> That would make it harder to construct a degenerate case
>
> I don't think it's hard at all. It's the same as the case Simon wants
> to solve except that the cost is incurred in a different way. Imagine
> a system where there's a huge data load to a table which is then
> read-only for an OLTP system. Until vacuum comes along -- and it may
> never since the table never sees deletes or updates -- every
> transaction needs to do a clog lookup for every tuple it sees. That
> means a significant cpu slowdown for every row lookup forever more. To
> save a one-time i/o cost.

That is simply not the case, unless every tuple was created by a
unique (or at least non-sequential) transaction xid.  There is a 'last
transaction id' guard over clog lookup which is pretty effective.  The
real cost is actually i/o from writing hint bits.

merlin



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> On Mon, Mar 25, 2013 at 9:53 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> That would make it harder to construct a degenerate case

> I don't think it's hard at all. It's the same as the case Simon wants
> to solve except that the cost is incurred in a different way. Imagine
> a system where there's a huge data load to a table which is then
> read-only for an OLTP system. Until vacuum comes along -- and it may
> never since the table never sees deletes or updates -- every
> transaction needs to do a clog lookup for every tuple it sees. That
> means a significant cpu slowdown for every row lookup forever more. To
> save a one-time i/o cost.

If I read Simon's original mail correctly, he was proposing that
individual SELECTs would update some hint bits but eventually "get
tired" and stop doing that.  So repeated queries would eventually get
all the hint bits set in a scenario such as you describe.  It's less
clear what would happen in a scenario where there's a steady flow of
new updates.

This is clearly worth thinking about and trying to find better solutions
for.  I'm only against trying to solve it in the 9.3 timeframe.  It will
take a lot longer than that to get something that works tolerably well.
        regards, tom lane



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 25 March 2013 23:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> On Mon, Mar 25, 2013 at 9:53 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>> That would make it harder to construct a degenerate case
>
>> I don't think it's hard at all. It's the same as the case Simon wants
>> to solve except that the cost is incurred in a different way. Imagine
>> a system where there's a huge data load to a table which is then
>> read-only for an OLTP system. Until vacuum comes along -- and it may
>> never since the table never sees deletes or updates -- every
>> transaction needs to do a clog lookup for every tuple it sees. That
>> means a significant cpu slowdown for every row lookup forever more. To
>> save a one-time i/o cost.
>
> If I read Simon's original mail correctly, he was proposing that
> individual SELECTs would update some hint bits but eventually "get
> tired" and stop doing that.

Yes, thank you for understanding that.

> So repeated queries would eventually get
> all the hint bits set in a scenario such as you describe.  It's less
> clear what would happen in a scenario where there's a steady flow of
> new updates.

With a steady flow of updates we rely for cleanup upon i) the updates
themselves and/or small selects, ii) VACUUM

The idea is to avoid penalising users wishing to run larger queries,
who are effectively forced to clean up everybody's else's mess. It is
important to understand that the steady state is exactly the same
between the patched/unpatched cases when there are no larger queries.
The difference is that the steady state is maintained with patch,
rather than forcing the large query to perform long and slow *early*
cleanup of the database - earlier than would have occurred naturally
in the steady state. So we cannot argue that there is "more I/O"
because actually there will be exactly the same I/O required for
steady state updates, its just that less of that I/O happens earlier
during the large user query. Think about the I/O required with/without
the large query... it seems likely that early cleanup could itself
result in higher overall I/O.

> This is clearly worth thinking about and trying to find better solutions
> for.  I'm only against trying to solve it in the 9.3 timeframe.  It will
> take a lot longer than that to get something that works tolerably well.

I'll bet you all a beer at PgCon 2014 that this remains unresolved at
that point.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 25 March 2013 23:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This is clearly worth thinking about and trying to find better solutions
>> for.  I'm only against trying to solve it in the 9.3 timeframe.  It will
>> take a lot longer than that to get something that works tolerably well.

> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
> that point.

Well, if so, then either (a) it wasn't actually important so nobody
bothered to work on it, or (b) it's a lot more difficult than you think.
This is hardly an argument for pushing a one-tenth-baked fix into 9.3.
        regards, tom lane



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Greg Stark
Date:
On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
> that point.

Are you saying you're only interested in working on it now? That after
9.3 is release you won't be interested in working on it any more?

As you said we've been eyeing this particular logic since 2004, why
did it suddenly become more urgent now? Why didn't you work on it 9
months ago at the beginning of the release cycle?




-- 
greg



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Jeff Davis
Date:
On Mon, 2013-03-25 at 12:21 +0000, Greg Stark wrote:
> On Mon, Mar 25, 2013 at 2:50 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> > The idea I was thinking about is refactoring the background writer's role in
> > hint bit maintenance
> 
> A good first step might be to separate the "dirty" bit into two bits.
> "mandatory dirty" and "optional dirty". (Or maybe "hard dirty" and
> "soft dirty"). Hint bit updates could set the latter and then some
> later policy decision could be made about whether to bother writing
> out the buffer if it's only optionally dirty.

I like this idea, but I think there's a problem. When checksums are
enabled, we may need to write WAL at the time the page is dirtied. It
would seem a waste if a full-page image was written, but the bgwriter
never wrote the page out because it was only a "soft dirty". 

One solution is to make it work as you say, unless checksums are enabled
and it needs to write WAL (which is only for the first modification
after a checkpoint). Unfortunately, it basically means that users of
checksums will still see a big impact from doing a SELECT on a large,
freshly loaded table. So that doesn't really answer Simon's use case.

Maybe it can still be made to work. Let's say that the bgwriter can't
write soft-dirty pages out, it can only ignore them or drop them on the
floor. But VACUUM would promote pages from soft-dirty to dirty at a
controlled rate, and would write out WAL if necessary for checksums.

That should still keep some distance between the WAL writing and the WAL
flushing, but still offer better control than writing it at SELECT time.
Also, it would solve a complaint I have about Simon's proposal: that we
lose the information that we have a changed buffer in shared memory.
Maybe that's not a practical problem, but it seems like we should keep
track of that and have a way to make sure a page gets cleaned if we want
to.

The downside is that it doesn't allow anyone to get the current
behavior, unless we provide an additional GUC to say whether SELECT
causes a page to be marked soft-dirty or dirty. At least that would be a
boolean though, and the finer-grained control can be over VACUUM.

Regards,Jeff Davis





Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 26 March 2013 01:35, Greg Stark <stark@mit.edu> wrote:
> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
>> that point.
>
> Are you saying you're only interested in working on it now? That after
> 9.3 is release you won't be interested in working on it any more?
>
> As you said we've been eyeing this particular logic since 2004, why
> did it suddenly become more urgent now? Why didn't you work on it 9
> months ago at the beginning of the release cycle?

I'm not sure why your comments are so confrontational here, but I
don't think it helps much. I'm happy to buy you a beer too.

As I explained clearly in my first post, this idea came about trying
to improve on the negative aspects of the checksum patch. People were
working on ideas 9 months ago to resolve this, but they have come to
nothing. I regret that; Merlin and others have worked hard to find a
way: Respect to them.

My suggestion is to implement a feature that takes 1 day to write and
needs little testing to show it works. I'm happy to pursue that path
now, or later. Deciding we need an all-singing, all-dancing solution
that will take our best men (another) 6 months of hard arguing and
implementation is by far the best way I know of killing anything and I
won't be pursuing that route. If we did have 6 months funding for
any-feature-you-like, I wouldn't spend it all on this. My bet that
nobody else will have enough patience, time and skill, let alone
unpaid leave to follow that path, stands.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Robert Haas
Date:
On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 26 March 2013 01:35, Greg Stark <stark@mit.edu> wrote:
>> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
>>> that point.
>>
>> Are you saying you're only interested in working on it now? That after
>> 9.3 is release you won't be interested in working on it any more?
>>
>> As you said we've been eyeing this particular logic since 2004, why
>> did it suddenly become more urgent now? Why didn't you work on it 9
>> months ago at the beginning of the release cycle?
>
> I'm not sure why your comments are so confrontational here, but I
> don't think it helps much. I'm happy to buy you a beer too.
>
> As I explained clearly in my first post, this idea came about trying
> to improve on the negative aspects of the checksum patch. People were
> working on ideas 9 months ago to resolve this, but they have come to
> nothing. I regret that; Merlin and others have worked hard to find a
> way: Respect to them.
>
> My suggestion is to implement a feature that takes 1 day to write and
> needs little testing to show it works.

Any patch in this area isn't likely to take much testing to establish
whether it improves some particular case.  The problem is what happens
to all of the other cases - and I don't believe that part needs little
testing, hence the objections (with which I agree) to doing anything
about this now.

If we want to change something in this area, we might consider
resurrecting the patch I worked on for this last year, which had, I
believe, a fairly similar mechanism of operation to what you're
proposing, and some other nice properties as well:

http://www.postgresql.org/message-id/AANLkTik5QzR8wTs0MqCWwmNp-qHGrdKY5Av5aOB7W4Dp@mail.gmail.com
http://www.postgresql.org/message-id/AANLkTimGKaG7wdu-x77GNV2Gh6_Qo5Ss1u5b6Q1MsPUy@mail.gmail.com

...but I think the main reason why that never went anywhere is because
we never really had any confidence that the upsides were worth the
downsides.  Fundamentally, postponing hint bit setting (or hint bit
I/O) increases the total amount of work done by the system.  You still
end up writing the hint bits eventually, and in the meantime you do
more CLOG lookups.  Now, as a compensating benefit, you can spread the
work of writing the hint-bit updated pages out over a longer period of
time, so that no single query carries too much of the burden of
getting the bits set.  The worst-case-latency vs. aggregate-throughput
tradeoff is one with a long history and I think it's appropriate to
view this problem through that lens also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 26 March 2013 11:33, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 26 March 2013 01:35, Greg Stark <stark@mit.edu> wrote:
>>> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
>>>> that point.
>>>
>>> Are you saying you're only interested in working on it now? That after
>>> 9.3 is release you won't be interested in working on it any more?
>>>
>>> As you said we've been eyeing this particular logic since 2004, why
>>> did it suddenly become more urgent now? Why didn't you work on it 9
>>> months ago at the beginning of the release cycle?
>>
>> I'm not sure why your comments are so confrontational here, but I
>> don't think it helps much. I'm happy to buy you a beer too.
>>
>> As I explained clearly in my first post, this idea came about trying
>> to improve on the negative aspects of the checksum patch. People were
>> working on ideas 9 months ago to resolve this, but they have come to
>> nothing. I regret that; Merlin and others have worked hard to find a
>> way: Respect to them.
>>
>> My suggestion is to implement a feature that takes 1 day to write and
>> needs little testing to show it works.
>
> Any patch in this area isn't likely to take much testing to establish
> whether it improves some particular case.  The problem is what happens
> to all of the other cases - and I don't believe that part needs little
> testing, hence the objections (with which I agree) to doing anything
> about this now.
>
> If we want to change something in this area, we might consider
> resurrecting the patch I worked on for this last year, which had, I
> believe, a fairly similar mechanism of operation to what you're
> proposing, and some other nice properties as well:
>
> http://www.postgresql.org/message-id/AANLkTik5QzR8wTs0MqCWwmNp-qHGrdKY5Av5aOB7W4Dp@mail.gmail.com
> http://www.postgresql.org/message-id/AANLkTimGKaG7wdu-x77GNV2Gh6_Qo5Ss1u5b6Q1MsPUy@mail.gmail.com
>
> ...but I think the main reason why that never went anywhere is because
> we never really had any confidence that the upsides were worth the
> downsides.  Fundamentally, postponing hint bit setting (or hint bit
> I/O) increases the total amount of work done by the system.  You still
> end up writing the hint bits eventually, and in the meantime you do
> more CLOG lookups.  Now, as a compensating benefit, you can spread the
> work of writing the hint-bit updated pages out over a longer period of
> time, so that no single query carries too much of the burden of
> getting the bits set.  The worst-case-latency vs. aggregate-throughput
> tradeoff is one with a long history and I think it's appropriate to
> view this problem through that lens also.

I hadn't realised so many patches existed that were similar. Hackers
is bigger these days.

Reviewing the patch, I'd say the problem is that it is basically
implementing a new automatic heuristic. We simply don't have any
evidence that any new heuristic will work for all cases, so we do
nothing.

Whether we apply my patch, yours or Merlin's, my main thought now is
that we need a user parameter to control it so it can be adjusted
according to need and not touched at all if there is no problem.

My washing machine has a wonderful feature "15 min wash" and it works
great for the times I know I need it; but in general, the auto wash
mode works fine since often you don't care that it takes 90 minutes.
It's much easier to see that the additional user option is beneficial,
but much harder to start arguing that the default wash cycle should be
85 or 92 minutes. It'd be great if the washing machine could work out
that I need my clothes quickly and that on-this-day-only I don't care
about the thoroughness of the wash, but it can't. I don't think the
washing machine engineers are idiots for not being able to work that
out, but if they only offered a single option because they thought
they knew better than me, I'd be less than impressed.

In the same way, we need some way to say "big queries shouldn't do
cleanup" even if autovacuum ends up doing more I/O over time (though
in fact I doubt this is the case, detailed argument on other post).

So please, lets go with a simple solution now that allows users to say
what they want.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Merlin Moncure
Date:
On Tue, Mar 26, 2013 at 7:30 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 26 March 2013 11:33, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 26 March 2013 01:35, Greg Stark <stark@mit.edu> wrote:
>>>> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>>> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
>>>>> that point.
>>>>
>>>> Are you saying you're only interested in working on it now? That after
>>>> 9.3 is release you won't be interested in working on it any more?
>>>>
>>>> As you said we've been eyeing this particular logic since 2004, why
>>>> did it suddenly become more urgent now? Why didn't you work on it 9
>>>> months ago at the beginning of the release cycle?
>>>
>>> I'm not sure why your comments are so confrontational here, but I
>>> don't think it helps much. I'm happy to buy you a beer too.
>>>
>>> As I explained clearly in my first post, this idea came about trying
>>> to improve on the negative aspects of the checksum patch. People were
>>> working on ideas 9 months ago to resolve this, but they have come to
>>> nothing. I regret that; Merlin and others have worked hard to find a
>>> way: Respect to them.
>>>
>>> My suggestion is to implement a feature that takes 1 day to write and
>>> needs little testing to show it works.
>>
>> Any patch in this area isn't likely to take much testing to establish
>> whether it improves some particular case.  The problem is what happens
>> to all of the other cases - and I don't believe that part needs little
>> testing, hence the objections (with which I agree) to doing anything
>> about this now.
>>
>> If we want to change something in this area, we might consider
>> resurrecting the patch I worked on for this last year, which had, I
>> believe, a fairly similar mechanism of operation to what you're
>> proposing, and some other nice properties as well:
>>
>> http://www.postgresql.org/message-id/AANLkTik5QzR8wTs0MqCWwmNp-qHGrdKY5Av5aOB7W4Dp@mail.gmail.com
>> http://www.postgresql.org/message-id/AANLkTimGKaG7wdu-x77GNV2Gh6_Qo5Ss1u5b6Q1MsPUy@mail.gmail.com
>>
>> ...but I think the main reason why that never went anywhere is because
>> we never really had any confidence that the upsides were worth the
>> downsides.  Fundamentally, postponing hint bit setting (or hint bit
>> I/O) increases the total amount of work done by the system.  You still
>> end up writing the hint bits eventually, and in the meantime you do
>> more CLOG lookups.  Now, as a compensating benefit, you can spread the
>> work of writing the hint-bit updated pages out over a longer period of
>> time, so that no single query carries too much of the burden of
>> getting the bits set.  The worst-case-latency vs. aggregate-throughput
>> tradeoff is one with a long history and I think it's appropriate to
>> view this problem through that lens also.
>
> I hadn't realised so many patches existed that were similar. Hackers
> is bigger these days.
>
> Reviewing the patch, I'd say the problem is that it is basically
> implementing a new automatic heuristic. We simply don't have any
> evidence that any new heuristic will work for all cases, so we do
> nothing.
>
> Whether we apply my patch, yours or Merlin's, my main thought now is
> that we need a user parameter to control it so it can be adjusted
> according to need and not touched at all if there is no problem.

After a night thinking about this, I'd like to make some points:

*) my patch deliberately did not 'set bits without dirty' -- with
checksums in mind as you noted (thanks for that).  I think the upside
for marking pages in that fasion anyways is overrated.

*) Any strategy that does not approximate hint bit behavior IMNSHO is
a non-starter.  By that I mean when your $condition is met so that
hint bits are not being written out, scans need to bail out of
HeapTupleSatisfiesMVCC processing with a cheap check.  If you don't do
that and rely on the transam.c guard, you've already missed the boat:
the even without clog lookup the extra processing there I can assure
you will show up in profiling of repeated scans (until vacuum).

*) The case of sequential tuples with the same xid is far and away the
most important one.  In OLTP workloads hint bit i/o is minor compared
to everything else going on.  Also, OLTP workloads are probably better
handled with an hint bit check just before eviction via bgwriter vs
during scan.

*) The budget for extra work inside HeapTupleSatisfiesMVCC is
exceptionally low.  For this reason, I think your idea would be better
framed at the page level and the bailout should be measured in the
number of pages, not tuples (that way the page can send in a single
boolean to control hint bit behavior).

*) The upside of optimizing xmax processing is fairly low for most
workloads I've seen

*) The benchmarking Amit and Hari did needs analysis.

*) For off-cycle release work that would help enable patches with
complex performance trade-offs (I'm working up another patch that has
even more compelling benefits and risks in the buffer allocator),  We
desperately need a standard battery of comprehensive performance tests
and doner machines.

merlin



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> So please, lets go with a simple solution now that allows users to say
> what they want.

Simon, this is just empty posturing, as your arguments have nothing
whatsoever to do with whether the above description applies to your
patch.

More generally, the fact that a patch has some user-frobbable knob
does not mean that it's actually a good or even usable solution.  As
everybody keeps saying, testing on a wide range of use-cases would be
needed to prove that, and we don't have enough time left for such
testing in the 9.3 timeframe.  This problem needs to be attacked in
an organized and deliberate fashion, not by hacking something up under
time pressure and shipping it with minimal testing.
        regards, tom lane



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Simon Riggs
Date:
On 26 March 2013 14:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> So please, lets go with a simple solution now that allows users to say
>> what they want.
>
> Simon, this is just empty posturing, as your arguments have nothing
> whatsoever to do with whether the above description applies to your
> patch.

Waiting for an auto-tuned solution to *every* problem means we just
sit and watch bad things happen, knowing how to fix them for
particular cases yet not being able to do anything at all.

> More generally, the fact that a patch has some user-frobbable knob
> does not mean that it's actually a good or even usable solution.  As
> everybody keeps saying, testing on a wide range of use-cases would be
> needed to prove that, and we don't have enough time left for such
> testing in the 9.3 timeframe.  This problem needs to be attacked in
> an organized and deliberate fashion, not by hacking something up under
> time pressure and shipping it with minimal testing.

Well, it has been tackled like that and we've *all* got nowhere. No
worries, I can wait a year for that beer.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Bruce Momjian
Date:
On Tue, Mar 26, 2013 at 03:06:30PM +0000, Simon Riggs wrote:
> > More generally, the fact that a patch has some user-frobbable knob
> > does not mean that it's actually a good or even usable solution.  As
> > everybody keeps saying, testing on a wide range of use-cases would be
> > needed to prove that, and we don't have enough time left for such
> > testing in the 9.3 timeframe.  This problem needs to be attacked in
> > an organized and deliberate fashion, not by hacking something up under
> > time pressure and shipping it with minimal testing.
> 
> Well, it has been tackled like that and we've *all* got nowhere. No
> worries, I can wait a year for that beer.

This was the obvious result of this discussion --- it is a shame we had
to discuss this rather than working on more pressing 9.3 issues.  I also
think someone saying "I would like to apply this now" is more disruptive
than casual discussion about things like buffer count locking.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Limiting setting of hint bits by read-only queries; vacuum_delay

From
Kevin Grittner
Date:
Merlin Moncure <mmoncure@gmail.com> wrote:

> *) For off-cycle release work that would help enable patches with
> complex performance trade-offs (I'm working up another patch that has
> even more compelling benefits and risks in the buffer allocator),  We
> desperately need a standard battery of comprehensive performance tests
> and doner machines.

Such a thing would vastly reduce the time needed to work on
something like this with confidence that it would not be a disaster
for some unidentified workload.  Sure, something could still "slip
though the cracks" -- but they would *be* cracks, not a wide gaping
hole.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company