Thread: Turning off HOT/Cleanup sometimes

Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
VACUUM cleans up blocks, which is nice because it happens offline in a
lazy manner.

We also make SELECT clean up blocks as it goes. That is useful in OLTP
workloads, but it means that large SQL queries and pg_dump effectively
do much the same work as VACUUM, generating huge amounts of I/O and
WAL on the master, the cost and annoyance of which is experienced
directly by the user. That is avoided on standbys.

Effects of that are that long running statements often run much longer
than we want, increasing bloat as a result. It also produces wildly
varying response times, depending upon extent of cleanup required.

It is a simple task to make that behaviour optional on the master.

I propose a USERSET parameter, prune_cost_limit (<---insert better name here)
which will make the behaviour optional, default -1, in normal user
processes. VACUUM will ignore this parameter and so its actions will
never be deferred.

In detail, this parameter would disable pruning for any scan larger
than the cost limit. So large scans will disable the behaviour. The
default, -1, means never disable pruning, which is the current
behavour.

We track the number of pages dirtied by the current statement. When
this reaches prune_cost_limit, we will apply these behaviours to all
shared_buffer block accesses...

(1) avoid running heap_page_prune_opt()

(2) avoid dirtying the buffer for hints. (This is safe because the
hinted changes will either be lost or will be part of the full page
image when we make a logged-change).

(i.e. doesn't apply to temp tables)

For example, if we set prune_cost_limit = 4 this behaviour allows
small index lookups via bitmapheapscan to continue to cleanup, while
larger index and seq scans will avoid cleanup.



There would be a postgresql.conf parameter prune_cost_limit, as well
as a table level parameter that would prevent pruning except via
VACUUM.

This will help in these ways
* Reduce write I/O from SELECTs and pg_dump - improving backups and BI queries
* Allow finer grained control over Hot Standby conflicts
* Potentially allow diagnostic inspection of older data via SeqScan

Prototype patch shows this is possible and simple enough for 9.4.
Major objections? Or should I polish up and submit?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Wed, Jan 8, 2014 at 3:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> VACUUM cleans up blocks, which is nice because it happens offline in a
> lazy manner.
>
> We also make SELECT clean up blocks as it goes. That is useful in OLTP
> workloads, but it means that large SQL queries and pg_dump effectively
> do much the same work as VACUUM, generating huge amounts of I/O and
> WAL on the master, the cost and annoyance of which is experienced
> directly by the user. That is avoided on standbys.

On a pgbench workload, though, essentially all page cleanup happens as
a result of HOT cleanups, like >99.9%.  It might be OK to have that
happen for write operations, but it would be a performance disaster if
updates didn't try to HOT-prune.  Our usual argument for doing HOT
pruning even on SELECT cleanups is that not doing so pessimizes
repeated scans, but there are clearly cases that end up worse off as a
result of that decision.

I'm not entirely wild about adding a parameter in this area because it
seems that we're increasingly choosing to further expose what arguably
ought to be internal implementation details.  The recent wal_log_hints
parameter is another recent example of this that I'm not thrilled
with, but in that case, as in this one, I can see the value of it.
Still, I think it'd be loads better to restrict what you're talking
about here to the SELECT-only case; I have a strong feeling that this
will be a disaster on write workloads.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jan 8, 2014 at 3:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> We also make SELECT clean up blocks as it goes. That is useful in OLTP
>> workloads, but it means that large SQL queries and pg_dump effectively
>> do much the same work as VACUUM, generating huge amounts of I/O and
>> WAL on the master, the cost and annoyance of which is experienced
>> directly by the user. That is avoided on standbys.

> On a pgbench workload, though, essentially all page cleanup happens as
> a result of HOT cleanups, like >99.9%.  It might be OK to have that
> happen for write operations, but it would be a performance disaster if
> updates didn't try to HOT-prune.  Our usual argument for doing HOT
> pruning even on SELECT cleanups is that not doing so pessimizes
> repeated scans, but there are clearly cases that end up worse off as a
> result of that decision.

My recollection of the discussion when HOT was developed is that it works
that way not because anyone thought it was beneficial, but simply because
we didn't see an easy way to know when first fetching a page whether we're
going to try to UPDATE some tuple on the page.  (And we can't postpone the
pruning, because the query will have tuple pointers into the page later.)
Maybe we should work a little harder on passing that information down.
It seems reasonable to me that SELECTs shouldn't be tasked with doing
HOT pruning.

> I'm not entirely wild about adding a parameter in this area because it
> seems that we're increasingly choosing to further expose what arguably
> ought to be internal implementation details.

I'm -1 for a parameter as well, but I think that just stopping SELECTs
from doing pruning at all might well be a win.  It's at least worthy
of some investigation.
        regards, tom lane



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Thu, Jan 9, 2014 at 12:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jan 8, 2014 at 3:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> We also make SELECT clean up blocks as it goes. That is useful in OLTP
>>> workloads, but it means that large SQL queries and pg_dump effectively
>>> do much the same work as VACUUM, generating huge amounts of I/O and
>>> WAL on the master, the cost and annoyance of which is experienced
>>> directly by the user. That is avoided on standbys.
>
>> On a pgbench workload, though, essentially all page cleanup happens as
>> a result of HOT cleanups, like >99.9%.  It might be OK to have that
>> happen for write operations, but it would be a performance disaster if
>> updates didn't try to HOT-prune.  Our usual argument for doing HOT
>> pruning even on SELECT cleanups is that not doing so pessimizes
>> repeated scans, but there are clearly cases that end up worse off as a
>> result of that decision.
>
> My recollection of the discussion when HOT was developed is that it works
> that way not because anyone thought it was beneficial, but simply because
> we didn't see an easy way to know when first fetching a page whether we're
> going to try to UPDATE some tuple on the page.  (And we can't postpone the
> pruning, because the query will have tuple pointers into the page later.)
> Maybe we should work a little harder on passing that information down.
> It seems reasonable to me that SELECTs shouldn't be tasked with doing
> HOT pruning.
>
>> I'm not entirely wild about adding a parameter in this area because it
>> seems that we're increasingly choosing to further expose what arguably
>> ought to be internal implementation details.
>
> I'm -1 for a parameter as well, but I think that just stopping SELECTs
> from doing pruning at all might well be a win.  It's at least worthy
> of some investigation.

Unfortunately, there's no categorical answer.  You can come up with
workloads where HOT pruning on selects is a win; just create a bunch
of junk and then read the same pages lots of times in a row.  And you
can also come up with workloads where it's a loss; create a bunch of
junk and then read them just once.  I don't know how easy it's going
to be to set that parameter in a useful way for some particular
environment, and I think that's possibly an argument against having
it.  But the argument that we don't need a parameter because one
behavior is best for everyone is not going to fly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Robert Haas escribió:

> Unfortunately, there's no categorical answer.  You can come up with
> workloads where HOT pruning on selects is a win; just create a bunch
> of junk and then read the same pages lots of times in a row.  And you
> can also come up with workloads where it's a loss; create a bunch of
> junk and then read them just once.  I don't know how easy it's going
> to be to set that parameter in a useful way for some particular
> environment, and I think that's possibly an argument against having
> it.  But the argument that we don't need a parameter because one
> behavior is best for everyone is not going to fly.

In the above, there's the underlying assumption that it doesn't matter
*what* we do with the page after doing or not doing pruning.  But this
is not necessarily the case: in the case of an UPDATE, having the space
be freed beforehand is beneficial because there's the option of putting
the new version of the tuple in the same page, potentially saving lots
of I/O (bring up another destination page for the new tuple, write the
new tuple there, end up dirtying two pages instead of one).  But in a
SELECT, the effect is only that you will have to skip less dead tuples,
which is not as exciting.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 9 January 2014 17:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jan 8, 2014 at 3:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> We also make SELECT clean up blocks as it goes. That is useful in OLTP
>>> workloads, but it means that large SQL queries and pg_dump effectively
>>> do much the same work as VACUUM, generating huge amounts of I/O and
>>> WAL on the master, the cost and annoyance of which is experienced
>>> directly by the user. That is avoided on standbys.
>
>> On a pgbench workload, though, essentially all page cleanup happens as
>> a result of HOT cleanups, like >99.9%.  It might be OK to have that
>> happen for write operations, but it would be a performance disaster if
>> updates didn't try to HOT-prune.  Our usual argument for doing HOT
>> pruning even on SELECT cleanups is that not doing so pessimizes
>> repeated scans, but there are clearly cases that end up worse off as a
>> result of that decision.
>
> My recollection of the discussion when HOT was developed is that it works
> that way not because anyone thought it was beneficial, but simply because
> we didn't see an easy way to know when first fetching a page whether we're
> going to try to UPDATE some tuple on the page.  (And we can't postpone the
> pruning, because the query will have tuple pointers into the page later.)
> Maybe we should work a little harder on passing that information down.
> It seems reasonable to me that SELECTs shouldn't be tasked with doing
> HOT pruning.
>
>> I'm not entirely wild about adding a parameter in this area because it
>> seems that we're increasingly choosing to further expose what arguably
>> ought to be internal implementation details.
>
> I'm -1 for a parameter as well, but I think that just stopping SELECTs
> from doing pruning at all might well be a win.  It's at least worthy
> of some investigation.

Turning HOT off completely would be an absolute disaster for OLTP on
high update use cases against medium-large tables. That scenario is
well represented by pgbench and TPC-C.  I am *not* suggesting we
recommend that and would look for very large caveats in the docs.
(That may not have been clear, I guess I just assumed people would
know I was heavily involved in the HOT project and understood its
benefits).

As stated, I am interested in turning off HOT in isolated, user
specified situations, perhaps just for isolated tables.

I'm not crazy about exposing magic parameters either but then I'm not
crazy about either automatic settings or deferring things because we
don't know how to set it. In general, I prefer the idea of having a
user settable parameter in one release then automating it in a later
release if clear settings emerge from usage. I'll submit a patch with
parameter, to allow experimentation, for possible removal at commit or
beta.

If I had to suggest a value for an internal parameter, I would say
that each SELECT statement should clean no more than 4 blocks. That
way current OLTP behaviour is mostly preserved while the big queries
and pg_dump don't suck in unpredictable ways.

I'll submit the patch and we can talk some more.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Robert Haas escribi�:
>> But the argument that we don't need a parameter because one
>> behavior is best for everyone is not going to fly.

> In the above, there's the underlying assumption that it doesn't matter
> *what* we do with the page after doing or not doing pruning.  But this
> is not necessarily the case: in the case of an UPDATE, having the space
> be freed beforehand is beneficial because there's the option of putting
> the new version of the tuple in the same page, potentially saving lots
> of I/O (bring up another destination page for the new tuple, write the
> new tuple there, end up dirtying two pages instead of one).  But in a
> SELECT, the effect is only that you will have to skip less dead tuples,
> which is not as exciting.

Yeah.  Once they're hinted dead, it doesn't cost that much to skip over
them.  Not to mention that you might well never visit them at all, if
this is an indexscan that knows which TIDs it needs to look at.

It's possible that it can be shown that different use-cases have
sufficiently different behaviors that we really do need a user-visible
parameter.  I don't want to start from that position though.  If we
did have a simple GUC parameter, it'd likely end up in the same boat
as, say, enable_seqscan, which is way too blunt an instrument for real
world use --- so I'm afraid this would soon bloat into a request for
per-table settings, planner hints, or god knows what to try to confine
the effects to the queries where it's appropriate.  Let's not go there
without proof that we have to.  It's a much better thing if we can get
the system's native behavior to be tuned well enough by depending on
things it already knows.
        regards, tom lane



Re: Turning off HOT/Cleanup sometimes

From
Stephen Frost
Date:
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> But in a
> SELECT, the effect is only that you will have to skip less dead tuples,
> which is not as exciting.

Agreed.  There's also the option to have it be done based on some
expectation of future work- that is, if we have to traverse X number of
dead tuples during a select, then don't bother with HOT pruning, but if
we get up to X+Y dead tuples, then do HOT pruning.

That said, I'm not entirely convinced that traversing these dead tuples
is all *that* painful during SELECT.  If there's that many levels then
hopefully it's not long til an UPDATE comes along and cleans them up.
Thanks,
    Stephen

Re: Turning off HOT/Cleanup sometimes

From
Stephen Frost
Date:
* Simon Riggs (simon@2ndQuadrant.com) wrote:
> > I'm -1 for a parameter as well, but I think that just stopping SELECTs
> > from doing pruning at all might well be a win.  It's at least worthy
> > of some investigation.
>
> Turning HOT off completely would be an absolute disaster for OLTP on
> high update use cases against medium-large tables. That scenario is
> well represented by pgbench and TPC-C.  I am *not* suggesting we
> recommend that and would look for very large caveats in the docs.

This is true even if we're only talking about turning it off for the
SELECT case...?  That's what's under discussion here, after all.
Certainly, we wouldn't turn it off completely...

> (That may not have been clear, I guess I just assumed people would
> know I was heavily involved in the HOT project and understood its
> benefits).

I'm certainly aware that you were heavily involved in HOT but I don't
think anyone is argueing to turn it off for everything.

> As stated, I am interested in turning off HOT in isolated, user
> specified situations, perhaps just for isolated tables.

I tend to agree w/ Tom on this point- having this be a per-table
configurable doesn't sound very appealing to me and it wouldn't address
the case you mentioned around pg_dump, but I'm sure that'd be the next
step for this and a per-session GUC wouldn't be sufficient.

> I'm not crazy about exposing magic parameters either but then I'm not
> crazy about either automatic settings or deferring things because we
> don't know how to set it. In general, I prefer the idea of having a
> user settable parameter in one release then automating it in a later
> release if clear settings emerge from usage. I'll submit a patch with
> parameter, to allow experimentation, for possible removal at commit or
> beta.

Ugh, adding GUCs is bad *because* we end up never being able to remove
them.

> If I had to suggest a value for an internal parameter, I would say
> that each SELECT statement should clean no more than 4 blocks. That
> way current OLTP behaviour is mostly preserved while the big queries
> and pg_dump don't suck in unpredictable ways.

Right, this was one idea that I had also, as noted in the other
subthread.  I'm not convinced that it's a great idea and it'd probably
be good to do a bit of testing to see just what the cost is; perhaps
even just come up with a "worst-case" example to see the difference
between a "clean" table and one with HOT chains as deep as they can go..

> I'll submit the patch and we can talk some more.

Neat.
Thanks!
    Stephen

Re: Turning off HOT/Cleanup sometimes

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> That said, I'm not entirely convinced that traversing these dead tuples
> is all *that* painful during SELECT.  If there's that many levels then
> hopefully it's not long til an UPDATE comes along and cleans them up.

There's always VACUUM ;-)

If you take about ten steps back, what's happening here is that
maintenance work that we'd originally delegated to VACUUM, precisely so
that it wouldn't have to be done by foreground queries, is now being done
by foreground queries.  And oddly enough, people don't like that.

There is a reasonable argument for forcing UPDATE queries to do it anyway,
to improve the odds they can do same-page updates (whether HOT or
otherwise).  And probably an INSERT should do it on a page that it's
selected as an insertion target.  But I think the argument that the
original do-maintenance-in-background-whenever-possible design was wrong
is a lot harder to sustain for SELECT or even DELETE queries.  As I said
upthread, I think the current behavior was *not* chosen for performance
reasons but just to limit the scope of what we had to change for HOT.
        regards, tom lane



Re: Turning off HOT/Cleanup sometimes

From
Jim Nasby
Date:
On 1/9/14, 12:54 PM, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> That said, I'm not entirely convinced that traversing these dead tuples
>> is all *that* painful during SELECT.  If there's that many levels then
>> hopefully it's not long til an UPDATE comes along and cleans them up.
>
> There's always VACUUM ;-)
>
> If you take about ten steps back, what's happening here is that
> maintenance work that we'd originally delegated to VACUUM, precisely so
> that it wouldn't have to be done by foreground queries, is now being done
> by foreground queries.  And oddly enough, people don't like that.
>
> There is a reasonable argument for forcing UPDATE queries to do it anyway,
> to improve the odds they can do same-page updates (whether HOT or
> otherwise).  And probably an INSERT should do it on a page that it's
> selected as an insertion target.  But I think the argument that the
> original do-maintenance-in-background-whenever-possible design was wrong
> is a lot harder to sustain for SELECT or even DELETE queries.  As I said
> upthread, I think the current behavior was *not* chosen for performance
> reasons but just to limit the scope of what we had to change for HOT.

Instead of looking at how to avoid this work in SELECTs maybe it'd be more useful to look at how we can get it done
morequickly in the background. The VSM is already a step in the right direction, but it seems the big use case here is
whensome bulk operation comes through and touches a sizeable number of blocks (but perhaps not enough to hit autovac
thresholds).

ISTM it wouldn't be too difficult for a backend to track how many blocks in a relation it's dirtied (keep in mind that
countdoesn't have to be perfect). If we tracked that info, it could be put into a maintenance queue (LISTEN/NOTIFY?)
alongwith our XID. That gives us a list of relations to vacuum and exactly when to vacuum them. Thanks to the VSM we
wouldn'tneed to track individual pages (though it might be useful to track the minimum and maximum block IDs we hit,
perrelation).
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Thu, Jan 9, 2014 at 1:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> That said, I'm not entirely convinced that traversing these dead tuples
>> is all *that* painful during SELECT.  If there's that many levels then
>> hopefully it's not long til an UPDATE comes along and cleans them up.
>
> There's always VACUUM ;-)
>
> If you take about ten steps back, what's happening here is that
> maintenance work that we'd originally delegated to VACUUM, precisely so
> that it wouldn't have to be done by foreground queries, is now being done
> by foreground queries.  And oddly enough, people don't like that.

People *think* they don't like that, because that's the way it works
right now.  If it worked some other way, there's a good chance people
would be complaining about that behavior, too.  I submitted a patch a
few years back to limit the setting of hint bits by foreground
processes to approximately 5% of the buffers they touched in a large
scan, so that no single scan would incur all the cost of setting the
hint bits; instead, the cost would be amortized over the first 20 or
so scans.  However, nobody was very enthusiastic about that patch,
because while it greatly softened the blow for the first scan,
subsequent scans were slower, because now they had to carry part of
the burden, too.  And you know what?  People didn't like *that*
either.

The problem with saying that we should let VACUUM do this work is the
same as the problem with saying that if you're late for your Concorde
flight, you should go running across the tarmac and try to catch it.
The cost of dead tuples is related in a linear fashion to the rate at
which pages are accessed.  Not coincidentally, the number of
opportunities for HOT pruning is *also* related in a linear fashion to
the rate at which pages are accessed.  This is why it works so well.
The rate at which vacuuming happens does not ramp up in the same way;
it's limited by autovacuum cost settings (which people tend not have
set correctly, and don't adjust themselves on the fly) or by their
hardware capabilities.  If autovacuum can't keep up, foreground
activity doesn't slow down to compensate; instead, the system just
bloats out of control.  While people may not like having this
maintenance activity in the foreground, they like not having it at all
even less.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2014-01-09 16:27:23 -0500, Robert Haas wrote:
> People *think* they don't like that, because that's the way it works
> right now.  If it worked some other way, there's a good chance people
> would be complaining about that behavior, too.

I think on of the primary reason why it's causing huge slowdowns is that
the ring buffer of scan strategies causes dirty buffer writes pretty
much immediately, when a buffer is reused.

Not that delaying the writeout would work all that effectively right
now, with the current bgwriter...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> The problem with saying that we should let VACUUM do this work is the
> same as the problem with saying that if you're late for your Concorde
> flight, you should go running across the tarmac and try to catch it.
> The cost of dead tuples is related in a linear fashion to the rate at
> which pages are accessed.  Not coincidentally, the number of
> opportunities for HOT pruning is *also* related in a linear fashion to
> the rate at which pages are accessed.  This is why it works so well.

That seems like a large oversimplification.  Some (most?) of the costs of
dead tuples are proportional to the rate of dead tuple creation.  I grant
that there are also some costs proportional to the rate at which scans
visit dead tuples, but I really don't believe that the latter are
dominant.  So I think it's bogus to claim that the current behavior is
somehow optimal.

One more time: the sole reason it works the way it does now is that that
was the path of least resistance back in 2007, and we never yet got around
to trying to optimize that.  I'm glad to see someone wanting to revisit
the issue, but I don't think that we necessarily have to go as far as
creating user-visible knobs in order to make it better.

> The rate at which vacuuming happens does not ramp up in the same way;
> it's limited by autovacuum cost settings (which people tend not have
> set correctly, and don't adjust themselves on the fly)

True, but that seems like a pretty well-defined improvement project right
there (as well as an argument against user-visible knobs in general ;-)).
Nasby's speculations just upthread could be useful here, too.
        regards, tom lane



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Thu, Jan 9, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> The problem with saying that we should let VACUUM do this work is the
>> same as the problem with saying that if you're late for your Concorde
>> flight, you should go running across the tarmac and try to catch it.
>> The cost of dead tuples is related in a linear fashion to the rate at
>> which pages are accessed.  Not coincidentally, the number of
>> opportunities for HOT pruning is *also* related in a linear fashion to
>> the rate at which pages are accessed.  This is why it works so well.
>
> That seems like a large oversimplification.  Some (most?) of the costs of
> dead tuples are proportional to the rate of dead tuple creation.  I grant
> that there are also some costs proportional to the rate at which scans
> visit dead tuples, but I really don't believe that the latter are
> dominant.  So I think it's bogus to claim that the current behavior is
> somehow optimal.

This doesn't make any sense to me at all.  What costs are proportional
to the rate of dead tuple creation?  I'm referring specifically to the
performance penalty that scans incur for having to skip over dead
tuples, and those costs aren't incurred when the tuples are created,
but rather when you try to access the still-live data afterwards.  If
anything, our system is a big WIN at the time tuples are created,
precisely because we leave the old tuples around to be cleaned up
later rather than getting rid of them at once.  That's why, for
example, we tend to win delete-heavy benchmarks vs. other database
systems.

> One more time: the sole reason it works the way it does now is that that
> was the path of least resistance back in 2007, and we never yet got around
> to trying to optimize that.  I'm glad to see someone wanting to revisit
> the issue, but I don't think that we necessarily have to go as far as
> creating user-visible knobs in order to make it better.

Sure, I'm not denying that.  The fact that it was the path of least
resistance doesn't mean it was a bad idea.  I'm happy to see it
improved, too, but I think it's important to understand what happens
now.  And at least on the pgbench tests I've done, what happens is
that VACUUM makes no significant contribution to pruning; IIRC, it
would have to visit pages at least 1000 times more often to be
relevant.  So when somebody says "relying on vacuum instead of doing
HOT pruning" what I hear is "flush performance down the toilet"... but
of course the real way to resolve this is to test whatever patch Simon
or someone else eventually posts, not to speculate without data.

>> The rate at which vacuuming happens does not ramp up in the same way;
>> it's limited by autovacuum cost settings (which people tend not have
>> set correctly, and don't adjust themselves on the fly)
>
> True, but that seems like a pretty well-defined improvement project right
> there (as well as an argument against user-visible knobs in general ;-)).
> Nasby's speculations just upthread could be useful here, too.

Fair point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> So when somebody says "relying on vacuum instead of doing
> HOT pruning" what I hear is "flush performance down the toilet"... but
> of course the real way to resolve this is to test whatever patch Simon
> or someone else eventually posts, not to speculate without data.

I don't think anyone was seriously proposing that (certainly not with
today's VACUUM).  What I've heard speculated about is doing HOT pruning
during UPDATE and/or INSERT but specifically not during SELECT.  I
concur that we need data to really understand the difference, hopefully
there'll be a patch posted which we can play with.
Thanks,
    Stephen

Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 8 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
> VACUUM cleans up blocks, which is nice because it happens offline in a
> lazy manner.
>
> We also make SELECT clean up blocks as it goes. That is useful in OLTP
> workloads, but it means that large SQL queries and pg_dump effectively
> do much the same work as VACUUM, generating huge amounts of I/O and
> WAL on the master, the cost and annoyance of which is experienced
> directly by the user. That is avoided on standbys.
>
> Effects of that are that long running statements often run much longer
> than we want, increasing bloat as a result. It also produces wildly
> varying response times, depending upon extent of cleanup required.
>
> It is a simple task to make that behaviour optional on the master.
>
> I propose a USERSET parameter, prune_cost_limit (<---insert better name here)
> which will make the behaviour optional, default -1, in normal user
> processes. VACUUM will ignore this parameter and so its actions will
> never be deferred.
>
> In detail, this parameter would disable pruning for any scan larger
> than the cost limit. So large scans will disable the behaviour. The
> default, -1, means never disable pruning, which is the current
> behavour.
>
> We track the number of pages dirtied by the current statement. When
> this reaches prune_cost_limit, we will apply these behaviours to all
> shared_buffer block accesses...
>
> (1) avoid running heap_page_prune_opt()
>
> (2) avoid dirtying the buffer for hints. (This is safe because the
> hinted changes will either be lost or will be part of the full page
> image when we make a logged-change).
>
> (i.e. doesn't apply to temp tables)
>
> For example, if we set prune_cost_limit = 4 this behaviour allows
> small index lookups via bitmapheapscan to continue to cleanup, while
> larger index and seq scans will avoid cleanup.
>
>
>
> There would be a postgresql.conf parameter prune_cost_limit, as well
> as a table level parameter that would prevent pruning except via
> VACUUM.
>
> This will help in these ways
> * Reduce write I/O from SELECTs and pg_dump - improving backups and BI queries
> * Allow finer grained control over Hot Standby conflicts
> * Potentially allow diagnostic inspection of older data via SeqScan
>
> Prototype patch shows this is possible and simple enough for 9.4.
> Major objections? Or should I polish up and submit?

Patch attached, implemented to reduce writes by SELECTs only.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Tue, Jan 14, 2014 at 4:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 8 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
>> VACUUM cleans up blocks, which is nice because it happens offline in a
>> lazy manner.
>>
>> We also make SELECT clean up blocks as it goes. That is useful in OLTP
>> workloads, but it means that large SQL queries and pg_dump effectively
>> do much the same work as VACUUM, generating huge amounts of I/O and
>> WAL on the master, the cost and annoyance of which is experienced
>> directly by the user. That is avoided on standbys.
>>
>> Effects of that are that long running statements often run much longer
>> than we want, increasing bloat as a result. It also produces wildly
>> varying response times, depending upon extent of cleanup required.
>>
>> It is a simple task to make that behaviour optional on the master.
>>
>> I propose a USERSET parameter, prune_cost_limit (<---insert better name here)
>> which will make the behaviour optional, default -1, in normal user
>> processes. VACUUM will ignore this parameter and so its actions will
>> never be deferred.
>>
>> In detail, this parameter would disable pruning for any scan larger
>> than the cost limit. So large scans will disable the behaviour. The
>> default, -1, means never disable pruning, which is the current
>> behavour.
>>
>> We track the number of pages dirtied by the current statement. When
>> this reaches prune_cost_limit, we will apply these behaviours to all
>> shared_buffer block accesses...
>>
>> (1) avoid running heap_page_prune_opt()
>>
>> (2) avoid dirtying the buffer for hints. (This is safe because the
>> hinted changes will either be lost or will be part of the full page
>> image when we make a logged-change).
>>
>> (i.e. doesn't apply to temp tables)
>>
>> For example, if we set prune_cost_limit = 4 this behaviour allows
>> small index lookups via bitmapheapscan to continue to cleanup, while
>> larger index and seq scans will avoid cleanup.
>>
>>
>>
>> There would be a postgresql.conf parameter prune_cost_limit, as well
>> as a table level parameter that would prevent pruning except via
>> VACUUM.
>>
>> This will help in these ways
>> * Reduce write I/O from SELECTs and pg_dump - improving backups and BI queries
>> * Allow finer grained control over Hot Standby conflicts
>> * Potentially allow diagnostic inspection of older data via SeqScan
>>
>> Prototype patch shows this is possible and simple enough for 9.4.
>> Major objections? Or should I polish up and submit?
>
> Patch attached, implemented to reduce writes by SELECTs only.

I am still not sure whether we want this, but I think it's definitely
an improvement over the previous version.  Assorted comments:

- Naming consistency seems to me to dictate that there should be more
similarity between the reloption name (allow_buffer_cleanup) and the
GUC (prune_page_dirty_limit).

- The documentation doesn't describe the use case where suppressing
cleanup on a per-table basis would be desirable, and I can't think of
one, either.

- There are a variety of ways to limit pruning; here, you've chosen to
limit it to a particular number of pruning operations per executor
invocation.  But the flag is global, not part of the executor state,
so a query that calls a PL/pgsql function during execution will reset
the counter for the parent query also, which doesn't seem very
principled.

In a patch I posted a few years ago to set hint bits only sometimes, I
settled on an algorithm where I dirtied the first 50 pages per scan
and then skipped the next 950, or something like that.  The idea was
that you wanted the pages that did get dirtied to be clustered
together to avoid random I/O; and also that you wanted table of
arbitrary size to get hinted within a certain number of scans (e.g.
20).  The limiting here is much more aggressive, so on large tables it
will amount to basically no pruning at all.  I dunno whether that's a
good idea or not.  But if the idea of making this an integer rather
than a boolean is to allow some pruning to still happen while keeping
it checked within reasonable bounds, I'm not sure it will succeed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 15 January 2014 16:47, Robert Haas <robertmhaas@gmail.com> wrote:

>>> There would be a postgresql.conf parameter prune_cost_limit, as well
>>> as a table level parameter that would prevent pruning except via
>>> VACUUM.
>>>
>>> This will help in these ways
>>> * Reduce write I/O from SELECTs and pg_dump - improving backups and BI queries
>>> * Allow finer grained control over Hot Standby conflicts
>>> * Potentially allow diagnostic inspection of older data via SeqScan

>> Patch attached, implemented to reduce writes by SELECTs only.
>
> I am still not sure whether we want this, but I think it's definitely
> an improvement over the previous version.  Assorted comments:
>
> - Naming consistency seems to me to dictate that there should be more
> similarity between the reloption name (allow_buffer_cleanup) and the
> GUC (prune_page_dirty_limit).

Now that I've written the patch, I'm seeing those as two different
things, but YMMV and I am very open to naming suggestions.


> - The documentation doesn't describe the use case where suppressing
> cleanup on a per-table basis would be desirable, and I can't think of
> one, either.

We already know that HOT is ineffective in areas of high contention
(previous thread by me). Prior experience was that smaller tables
didn't show much apparent benefit from using HOT either; its
effectiveness was limited to medium and large tables being updated.
The two already stated use cases that would apply are these ones
   * Allow finer grained control over Hot Standby conflicts   * Potentially allow diagnostic inspection of older data
viaSeqScan
 

So the use cases for the two parameters seem quite different and we
may decide we want one but not the other.

> - There are a variety of ways to limit pruning; here, you've chosen to
> limit it to a particular number of pruning operations per executor
> invocation.  But the flag is global, not part of the executor state,
> so a query that calls a PL/pgsql function during execution will reset
> the counter for the parent query also, which doesn't seem very
> principled.

That is subtle thing in this patch and I agree that potential problem
exists. The current limit is set according to the current executing
statement, but the current total is not reset until start of the top
level statement. So the behaviour is not reset during statements
executed within PL/pgSQL function.

> In a patch I posted a few years ago to set hint bits only sometimes, I
> settled on an algorithm where I dirtied the first 50 pages per scan
> and then skipped the next 950, or something like that.  The idea was
> that you wanted the pages that did get dirtied to be clustered
> together to avoid random I/O; and also that you wanted table of
> arbitrary size to get hinted within a certain number of scans (e.g.
> 20).  The limiting here is much more aggressive, so on large tables it
> will amount to basically no pruning at all.  I dunno whether that's a
> good idea or not.  But if the idea of making this an integer rather
> than a boolean is to allow some pruning to still happen while keeping
> it checked within reasonable bounds, I'm not sure it will succeed.

It sounds like you're in favour of the overall concept of limiting
writes, which is good.

The behaviour I think we need, based on listening to everybody so far is

* OLTP is unaffected
* Large SELECTs and pg_dump don't cause lots of write I/O.

and hence why "prune_page_dirty_limit" offers a change in behaviour at
a certain point.

Reducing cleanup to "only 5%" just reduces but doesn't remove the
problem. If the data is stored on very poor I/O infrastructure, any
significant volume of writes can adversely affect performance. As we
reduce the percentage, we also reduce the benefit from inducing writes
in the first place and so I would question why bother at all using a
percentage. For me, a parameter that gives you absolute rather than
relative control is more desirable.

The current behaviour assumes it is OK for the first/next user to
touch the data to be the one that won't mind re-writing everything. In
time critical applications, the first/next user could well have a very
urgent need to access the data quickly and doesn't want to have to pay
this price. In seldom-accessed data applications, VACUUM has lots of
time to run out of hours, so users are OK to defer this work. Some
applications exist where we literally want zero I/O.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Wed, Jan 15, 2014 at 5:14 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> We already know that HOT is ineffective in areas of high contention
> (previous thread by me). Prior experience was that smaller tables
> didn't show much apparent benefit from using HOT either; its
> effectiveness was limited to medium and large tables being updated.
> The two already stated use cases that would apply are these ones

Do you have a link to that previous thread?  I don't happen to recall
that conversation.

I've found that HOT can be very important on smaller tables, so I'm
skeptical of that as a general conclusion.  What I think might be true
is that if VACUUM is going to hit the table often enough to make you
happy, then you don't really need HOT.  In other words, if the update
rate is non-zero but low, not too much cruft will accumulate before
the table gets vacuumed, and you may be OK.  If the update rate is
high, though, I think disabling HOT will be painful on a table of any
size.  There might be exceptions, but I can't think of what the are
off-hand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Amit Kapila
Date:
On Wed, Jan 15, 2014 at 2:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 8 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> Patch attached, implemented to reduce writes by SELECTs only.

This is really a valuable improvement over current SELECT behaviour
w.r.t Writes.

While going though patch, I observed few points, so thought of
sharing with you:

+ /*
+ * If we are tracking pruning in SELECTs then we can only get
+ * here by heap_page_prune_opt() call that cleans a block,
+ * so in that case, register it as a pruning operation.
+ * Make sure we don't double count during VACUUMs.
+ */
+ if (PrunePageDirtyLimit > -1)
+ PrunePageDirty++;

a. As PrunePageDirtyLimit variable is not initialized for DDL flow,  any statement like Create Function().. will have
valueof  PrunePageDirtyLimit as 4 (default) and in such cases MarkBufferDirty()  will increment the wrong counter.
 

b. For DDL statements like Create Materialized view, it will behave as  Select statement.  Ex.  Create Materialized
viewmv1 as select * from t1;
 
  Now here I think it might not be a problem, because for t1 anyway there  will be no write, so skipping pruning should
notbe a problem and for  materialized views also there will no dead rows, so skipping should be  okay, but I think it
isnot strictly adhering to statement "to reduce writes  by SELECTs only" and purpose of patch which is to avoid only
when Top level statement is SELECT.  Do you think it's better to consider such cases and optimize for them  or should
weavoid it by following thumb rule that pruning will be avoided  only for top level SELECT?
 

2. + "Allow cleanup of shared buffers by foreground processes, allowing       later cleanup by VACUUM",
This line is not clear, what do you mean to say by "allowing later cleanup
by VACUUM", if already foreground process has done cleanup, then it
should save effort of Vacuum.


In general, though both the optimisations (allow_buffer_cleanup and
prune_page_dirty_limit )  used in patch have similarity in the sense
that they will be used to avoid pruning, but still I feel they are for different
cases (READ ONLY OP and WRITE ON SMALL TABLES) and also as there
are more people inclined to do this for only SELECT operations, do you think
it will be a good idea to make them as separate patches?

I think there can be some applications or use cases which can be benefited
by avoiding pruning for WRITE ON SMALL TABLES, but the case for SELECT
is more general and more applications can get benefit with this optimisation,so
it would be better if we first try to accomplish that case.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Turning off HOT/Cleanup sometimes

From
Michael Paquier
Date:
On Mon, Feb 3, 2014 at 3:42 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jan 15, 2014 at 2:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 8 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> Patch attached, implemented to reduce writes by SELECTs only.

This patch is registered in this CF. It does not apply anymore and
needs a rebase. Robert and Amit have provided as well some comments
but they have not been addressed. Is it fair to mark it as "returned
with feedback" even if it has not been reviewed within the last month?
-- 
Michael



Re: Turning off HOT/Cleanup sometimes

From
Michael Paquier
Date:
On Fri, Sep 12, 2014 at 3:19 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Feb 3, 2014 at 3:42 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Wed, Jan 15, 2014 at 2:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 8 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>
>>> Patch attached, implemented to reduce writes by SELECTs only.
>
> This patch is registered in this CF. It does not apply anymore and
> needs a rebase. Robert and Amit have provided as well some comments
> but they have not been addressed. Is it fair to mark it as "returned
> with feedback" even if it has not been reviewed within the last month?
For the time being, status has been changed to "waiting on author".
-- 
Michael



Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Michael Paquier wrote:
> On Fri, Sep 12, 2014 at 3:19 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Mon, Feb 3, 2014 at 3:42 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> On Wed, Jan 15, 2014 at 2:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>> On 8 January 2014 08:33, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>>
> >>> Patch attached, implemented to reduce writes by SELECTs only.
> >
> > This patch is registered in this CF. It does not apply anymore and
> > needs a rebase. Robert and Amit have provided as well some comments
> > but they have not been addressed. Is it fair to mark it as "returned
> > with feedback" even if it has not been reviewed within the last month?
> For the time being, status has been changed to "waiting on author".

As it happens, I was studying this patch yesterday on the flight back
home.  I gave it a quick look; I noticed it was in the commitfest and
hadn't seen any review activity for many months, which seemed odd.

Anyway I first read the whole thread to know what to focus on, before
going over the patch itself.  Once I finished reading the emails, I had
a vague idea of how I thought it would work: my thinking was that
heap/index scans would either call heap_page_prune_opt, or not,
depending on whether they were part of a read-only executor node.  So if
you have a query that updates a certain table, and while doing so scans
another table in read-only mode, then the HOT updates would be enabled
for the table being written, but disabled for the one being read.

As it turns out, the patch as written is nothing at all like that, and
TBH I don't think I like it very much.

My idea is that we would have a new executor flag, say
EXEC_FLAG_READ_ONLY; we would set it on nodes that are known to be
read-only, and reset it on those that aren't, such as LockRows and
ModifyTable (obviously we need to pass it down correctly from parent to
children).  Then in ExecInitSeqScan and ExecInitIndexScan, if we see the
flag set, we call heap/index_set_allow_prune(false) for the heap scan;
same thing in index scans.  (I envisioned it as a boolean rather than
enabling a certain number of cleanups per scan.)

I tried to code this but I think it doesn't work correctly, and no time
for debug currently.  Anyway let me know what you think of this general
idea.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 12 September 2014 14:54, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> My idea is that we would have a new executor flag, say
> EXEC_FLAG_READ_ONLY; we would set it on nodes that are known to be
> read-only, and reset it on those that aren't, such as LockRows and
> ModifyTable (obviously we need to pass it down correctly from parent to
> children).  Then in ExecInitSeqScan and ExecInitIndexScan, if we see the
> flag set, we call heap/index_set_allow_prune(false) for the heap scan;
> same thing in index scans.  (I envisioned it as a boolean rather than
> enabling a certain number of cleanups per scan.)
>
> I tried to code this but I think it doesn't work correctly, and no time
> for debug currently.  Anyway let me know what you think of this general
> idea.

Thanks for looking at this.

My concern was to ensure that UPDATEs and DELETEs continue to call
heap_page_prune_opt while larger SELECTs do not.

This is achieved without a counter, so after some thought like it
better; simple is good. Happy to progress from here, or you can?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> On 12 September 2014 14:54, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> 
> > My idea is that we would have a new executor flag, say
> > EXEC_FLAG_READ_ONLY; we would set it on nodes that are known to be
> > read-only, and reset it on those that aren't, such as LockRows and
> > ModifyTable (obviously we need to pass it down correctly from parent to
> > children).  Then in ExecInitSeqScan and ExecInitIndexScan, if we see the
> > flag set, we call heap/index_set_allow_prune(false) for the heap scan;
> > same thing in index scans.  (I envisioned it as a boolean rather than
> > enabling a certain number of cleanups per scan.)
> >
> > I tried to code this but I think it doesn't work correctly, and no time
> > for debug currently.  Anyway let me know what you think of this general
> > idea.
> 
> Thanks for looking at this.
> 
> My concern was to ensure that UPDATEs and DELETEs continue to call
> heap_page_prune_opt while larger SELECTs do not.
> 
> This is achieved without a counter, so after some thought like it
> better; simple is good. Happy to progress from here, or you can?

Please feel free to take over.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 12 September 2014 14:54, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> My idea is that we would have a new executor flag, say
>> EXEC_FLAG_READ_ONLY; we would set it on nodes that are known to be
>> read-only, and reset it on those that aren't, such as LockRows and
>> ModifyTable (obviously we need to pass it down correctly from parent to
>> children).  Then in ExecInitSeqScan and ExecInitIndexScan, if we see the
>> flag set, we call heap/index_set_allow_prune(false) for the heap scan;
>> same thing in index scans.  (I envisioned it as a boolean rather than
>> enabling a certain number of cleanups per scan.)
>> 
>> I tried to code this but I think it doesn't work correctly, and no time
>> for debug currently.  Anyway let me know what you think of this general
>> idea.

> Thanks for looking at this.

> My concern was to ensure that UPDATEs and DELETEs continue to call
> heap_page_prune_opt while larger SELECTs do not.

I think there's another way to think about it: what about saying that
the query's target relation(s) are subject to pruning, while others
are not?  Then you do not need an executor flag, you just need to
look at the estate->es_result_relations array (or maybe even only at
estate->es_result_relation_info).  This would have the advantage of
doing what-I-think-is-the-right-thing for updates/deletes involving
joins to other tables.  The mechanism Alvaro describes would probably
have to prune all tables involved in such a query; do we really want
that?
        regards, tom lane



Re: Turning off HOT/Cleanup sometimes

From
Tom Lane
Date:
I wrote:
> I think there's another way to think about it: what about saying that
> the query's target relation(s) are subject to pruning, while others
> are not?  Then you do not need an executor flag, you just need to
> look at the estate->es_result_relations array (or maybe even only at
> estate->es_result_relation_info).

After a little bit I remembered there was already a function for this.
So specifically, I'd suggest using ExecRelationIsTargetRelation()
to decide whether to mark the scan as requiring pruning.
        regards, tom lane



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 12 September 2014 15:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I think there's another way to think about it: what about saying that
>> the query's target relation(s) are subject to pruning, while others
>> are not?  Then you do not need an executor flag, you just need to
>> look at the estate->es_result_relations array (or maybe even only at
>> estate->es_result_relation_info).
>
> After a little bit I remembered there was already a function for this.
> So specifically, I'd suggest using ExecRelationIsTargetRelation()
> to decide whether to mark the scan as requiring pruning.

Sounds cool. Thanks both, this is sounding like a viable route now.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 12 September 2014 18:19, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 12 September 2014 15:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> After a little bit I remembered there was already a function for this.
>> So specifically, I'd suggest using ExecRelationIsTargetRelation()
>> to decide whether to mark the scan as requiring pruning.
>
> Sounds cool. Thanks both, this is sounding like a viable route now.

Yes, this is viable.

Patch attached, using Alvaro's idea of use-case specific pruning and
Tom's idea of aiming at target relations. Patch uses or extends
existing infrastructure, so its shorter than it might have been, yet
with all that bufmgr yuck removed.

This is very, very good because while going through this I notice the
dozen or more places where we were pruning blocks in annoying places I
didn't even know about such as about 4-5 constraint checks. In more
than a few DDL commands like ALTER TABLE and CLUSTER we were even
pruning the old relation prior to rewrite.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Sun, Sep 14, 2014 at 4:37 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 12 September 2014 18:19, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 12 September 2014 15:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>>> After a little bit I remembered there was already a function for this.
>>> So specifically, I'd suggest using ExecRelationIsTargetRelation()
>>> to decide whether to mark the scan as requiring pruning.
>>
>> Sounds cool. Thanks both, this is sounding like a viable route now.
>
> Yes, this is viable.
>
> Patch attached, using Alvaro's idea of use-case specific pruning and
> Tom's idea of aiming at target relations. Patch uses or extends
> existing infrastructure, so its shorter than it might have been, yet
> with all that bufmgr yuck removed.
>
> This is very, very good because while going through this I notice the
> dozen or more places where we were pruning blocks in annoying places I
> didn't even know about such as about 4-5 constraint checks. In more
> than a few DDL commands like ALTER TABLE and CLUSTER we were even
> pruning the old relation prior to rewrite.

Do we really want to disable HOT for all catalog scans?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 15 September 2014 17:09, Robert Haas <robertmhaas@gmail.com> wrote:

> Do we really want to disable HOT for all catalog scans?

The intention of the patch is that catalog scans are treated
identically to non-catalog scans. The idea here is that HOT cleanup
only occurs on scans on target relations, so only INSERT, UPDATE and
DELETE do HOT cleanup.

It's possible that many catalog scans don't follow the normal target
relation logic, so we might argue we should use HOT every time. OTOH,
since we now have separate catalog xmins we may find that using HOT on
catalogs is no longer effective. So I could go either way on how to
proceed; its an easy change either way.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Emanuel Calvo
Date:
El 14/09/14 17:37, Simon Riggs escribió:
> On 12 September 2014 18:19, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 12 September 2014 15:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> After a little bit I remembered there was already a function for this.
>>> So specifically, I'd suggest using ExecRelationIsTargetRelation()
>>> to decide whether to mark the scan as requiring pruning.
>> Sounds cool. Thanks both, this is sounding like a viable route now.
> Yes, this is viable.
>
> Patch attached, using Alvaro's idea of use-case specific pruning and
> Tom's idea of aiming at target relations. Patch uses or extends
> existing infrastructure, so its shorter than it might have been, yet
> with all that bufmgr yuck removed.
>
> This is very, very good because while going through this I notice the
> dozen or more places where we were pruning blocks in annoying places I
> didn't even know about such as about 4-5 constraint checks. In more
> than a few DDL commands like ALTER TABLE and CLUSTER we were even
> pruning the old relation prior to rewrite.
>

A simple performance test with the following variables:
LOOP=50
CONN=60
TXSS=500
SCALE=30


Select only:
WITH PATCH
Average:  20716.1 tps

NO PATCH
Average:  19141.7 tps


With writes:
WITH PATCH
Average:  2602.65

NO PATCH
Average:  2565.32


TODO:
- Consistency check.
- ALTER and CLUSTER test.







Re: Turning off HOT/Cleanup sometimes

From
Emanuel Calvo
Date:
El 15/09/14 18:13, Simon Riggs escribió:
> On 15 September 2014 17:09, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> Do we really want to disable HOT for all catalog scans?
> The intention of the patch is that catalog scans are treated
> identically to non-catalog scans. The idea here is that HOT cleanup
> only occurs on scans on target relations, so only INSERT, UPDATE and
> DELETE do HOT cleanup.
>
> It's possible that many catalog scans don't follow the normal target
> relation logic, so we might argue we should use HOT every time. OTOH,
> since we now have separate catalog xmins we may find that using HOT on
> catalogs is no longer effective. So I could go either way on how to
> proceed; its an easy change either way.
>

I setup a more concurrent scenario and the difference is quite larger:

Without patch:
1st concurrent with writes:
tps = 5705.261620 (including connections establishing)
tps = 5945.338557 (excluding connections establishing)
2nd no writes being executed:
tps = 9988.792575 (including connections establishing)
tps = 11059.671351 (excluding connections establishing)


Patched version:
1st concurrent with writes:
tps = 9476.741707 (including connections establishing)
tps = 10274.831185 (excluding connections establishing)
2nd no writes being executed:
tps = 12993.644808 (including connections establishing)
tps = 15171.214744 (excluding connections establishing)


Stats (writes have been run with a time limit, not by tx):

hotcleanup=# select relname ,n_live_tup, n_dead_tup, n_tup_hot_upd from
pg_stat_user_tables where relname ~ 'pgbench';    relname      | n_live_tup | n_dead_tup | n_tup_hot_upd
------------------+------------+------------+---------------pgbench_tellers  |        500 |          0 |
2044192pgbench_accounts|    5109728 |     310842 |       1969264pgbench_history  |    2265882 |          0 |
0pgbench_branches |         50 |          0 |       2237167 
(4 rows)

hotcleanup=# select relname ,n_live_tup, n_dead_tup, n_tup_hot_upd from
pg_stat_user_tables where relname ~ 'pgbench';    relname      | n_live_tup | n_dead_tup | n_tup_hot_upd
------------------+------------+------------+---------------pgbench_history  |    2148946 |          0 |
0pgbench_tellers |        500 |          0 |       1969675pgbench_branches |         50 |          0 |
2150655pgbench_accounts|    5098774 |     300123 |       1897484 
(4 rows)


I ran the regression tests over the patched version and they passed ok.


--
--
Emanuel Calvo                 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services





Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 15 September 2014 22:13, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 15 September 2014 17:09, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> Do we really want to disable HOT for all catalog scans?
>
> The intention of the patch is that catalog scans are treated
> identically to non-catalog scans. The idea here is that HOT cleanup
> only occurs on scans on target relations, so only INSERT, UPDATE and
> DELETE do HOT cleanup.

Since INSERT, UPDATE and DELETE can only be called when
!RecoveryInProgress(), we can completely avoid making this test at the
top of each heap_page_prune_opt() call.

I very much like the simplicity of saying "no target, no cleanup".

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Mon, Sep 15, 2014 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 15 September 2014 17:09, Robert Haas <robertmhaas@gmail.com> wrote:
>> Do we really want to disable HOT for all catalog scans?
>
> The intention of the patch is that catalog scans are treated
> identically to non-catalog scans. The idea here is that HOT cleanup
> only occurs on scans on target relations, so only INSERT, UPDATE and
> DELETE do HOT cleanup.
>
> It's possible that many catalog scans don't follow the normal target
> relation logic, so we might argue we should use HOT every time. OTOH,
> since we now have separate catalog xmins we may find that using HOT on
> catalogs is no longer effective. So I could go either way on how to
> proceed; its an easy change either way.

What I'm thinking about is that the smarts to enable pruning is all in
the executor nodes.  So anything that updates the catalog without
going through the executor will never be subject to pruning.  That
includes nearly all catalog-modifying code throughout the backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 19 September 2014 13:04, Robert Haas <robertmhaas@gmail.com> wrote:

> What I'm thinking about is that the smarts to enable pruning is all in
> the executor nodes.  So anything that updates the catalog without
> going through the executor will never be subject to pruning.  That
> includes nearly all catalog-modifying code throughout the backend.

Are you saying this is a problem or a benefit? (and please explain why).

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On September 19, 2014 10:16:35 PM CEST, Simon Riggs <simon@2ndquadrant.com> wrote:
>On 19 September 2014 13:04, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> What I'm thinking about is that the smarts to enable pruning is all
>in
>> the executor nodes.  So anything that updates the catalog without
>> going through the executor will never be subject to pruning.  That
>> includes nearly all catalog-modifying code throughout the backend.
>
>Are you saying this is a problem or a benefit? (and please explain
>why).

I have no idea what Robert is thinking of, but I'd imagine its horrible for workloads with catalog bloat. Like ones
involvingtemp tables.
 

I generally have serious doubts about disabling it generally for read workloads. I imagine it e.g. will significantly
penalizeworkloads where its likely that a cleanup lock can't be acquired every time...
 

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.



Re: Turning off HOT/Cleanup sometimes

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On September 19, 2014 10:16:35 PM CEST, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Are you saying this is a problem or a benefit? (and please explain
>> why).

> I have no idea what Robert is thinking of, but I'd imagine its horrible for workloads with catalog bloat. Like ones
involvingtemp tables.
 

Yeah.  But it's also the case that we know a good deal more about the
access patterns for system-driven catalog updates than we do about user
queries.  ISTM we could probably suppress HOT pruning during catalog
*scans* and instead try to do it when a system-driven heap_update
occurs.

Having said that, this could reasonably be considered outside the scope
of a patch that's trying to improve the behavior for user queries.
But if the patch author doesn't want to expand the scope like that,
ISTM he ought to ensure that the behavior *doesn't* change for system
accesses, rather than trying to convince us that disabling HOT for
system updates is a good idea.
        regards, tom lane



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Fri, Sep 19, 2014 at 4:30 PM, Andres Freund <andres@anarazel.de> wrote:
> On September 19, 2014 10:16:35 PM CEST, Simon Riggs <simon@2ndquadrant.com> wrote:
>>On 19 September 2014 13:04, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>> What I'm thinking about is that the smarts to enable pruning is all
>>in
>>> the executor nodes.  So anything that updates the catalog without
>>> going through the executor will never be subject to pruning.  That
>>> includes nearly all catalog-modifying code throughout the backend.
>>
>>Are you saying this is a problem or a benefit? (and please explain
>>why).
>
> I have no idea what Robert is thinking of, but I'd imagine its horrible for workloads with catalog bloat. Like ones
involvingtemp tables.
 

Right, that's what I was going for.

> I generally have serious doubts about disabling it generally for read workloads. I imagine it e.g. will significantly
penalizeworkloads where its likely that a cleanup lock can't be acquired every time...
 

I share that doubt.  But I understand why Simon wants to do something,
too, because the current situation is not great either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2014-09-19 17:29:08 -0400, Robert Haas wrote:
> > I generally have serious doubts about disabling it generally for
> > read workloads. I imagine it e.g. will significantly penalize
> > workloads where its likely that a cleanup lock can't be acquired
> > every time...
> 
> I share that doubt.  But I understand why Simon wants to do something,
> too, because the current situation is not great either.

Right, I totally agree. I doubt a simple approach like this will work in
the general case, but I think something needs to be done.

I think limiting the amount of HOT cleanup for readonly queries is a
good idea, but I think it has to be gradual. Say after a single cleaned
up page at least another 500 pages need to have been touched till the
next hot cleanup. That way a single query won't be penalized with
cleaning up everything, but there'll be some progress.

The other thing I think might be quite worthwile would be to abort hot
cleanup when the gain is only minimal. If e.g. only 1 small tuple is
removed from a half full page it's not worth the cost of the wal logging
et al.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 19 September 2014 15:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Having said that, this could reasonably be considered outside the scope
> of a patch that's trying to improve the behavior for user queries.
> But if the patch author doesn't want to expand the scope like that,
> ISTM he ought to ensure that the behavior *doesn't* change for system
> accesses, rather than trying to convince us that disabling HOT for
> system updates is a good idea.

As I said, I could make an argument to go either way, so I was unsure.

I'm happy to avoid changing behaviour for catalog scans in this patch.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2014-09-19 16:35:19 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On September 19, 2014 10:16:35 PM CEST, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> Are you saying this is a problem or a benefit? (and please explain
> >> why).
> 
> > I have no idea what Robert is thinking of, but I'd imagine its horrible for workloads with catalog bloat. Like ones
involvingtemp tables.
 
> 
> Yeah.  But it's also the case that we know a good deal more about the
> access patterns for system-driven catalog updates than we do about user
> queries.  ISTM we could probably suppress HOT pruning during catalog
> *scans* and instead try to do it when a system-driven heap_update
> occurs.
> 
> Having said that, this could reasonably be considered outside the scope
> of a patch that's trying to improve the behavior for user queries.
> But if the patch author doesn't want to expand the scope like that,
> ISTM he ought to ensure that the behavior *doesn't* change for system
> accesses, rather than trying to convince us that disabling HOT for
> system updates is a good idea.

I think it'd have to change for anything not done via the
executor. There definitely is user defined code out there doing manual
heap_* stuff. I know because i've written some. And I know I'm not the
only one.

If such paths suddenly stop doing HOT cleanup we'll cause a noticeable
amount of pain.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Fri, Sep 19, 2014 at 5:42 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2014-09-19 17:29:08 -0400, Robert Haas wrote:
>> > I generally have serious doubts about disabling it generally for
>> > read workloads. I imagine it e.g. will significantly penalize
>> > workloads where its likely that a cleanup lock can't be acquired
>> > every time...
>>
>> I share that doubt.  But I understand why Simon wants to do something,
>> too, because the current situation is not great either.
>
> Right, I totally agree. I doubt a simple approach like this will work in
> the general case, but I think something needs to be done.
>
> I think limiting the amount of HOT cleanup for readonly queries is a
> good idea, but I think it has to be gradual. Say after a single cleaned
> up page at least another 500 pages need to have been touched till the
> next hot cleanup. That way a single query won't be penalized with
> cleaning up everything, but there'll be some progress.

I tried this kind of thing several years ago with hint-bit-setting and
was unimpressed by the results.

http://www.postgresql.org/message-id/AANLkTik5QzR8wTs0MqCWwmNp-qHGrdKY5Av5aOB7W4Dp@mail.gmail.com
http://www.postgresql.org/message-id/AANLkTimGKaG7wdu-x77GNV2Gh6_Qo5Ss1u5b6Q1MsPUy@mail.gmail.com

Granted, I never tried a ratio as low as 500:1, and HOT pruning is not
the same thing as setting hint bits, but I think the basic problems
are similar, namely:

1. You can't know how many times the page is going to be referenced in
the future before it again gets modified.  If that number is small,
then you shouldn't bother with hint bits, or HOT-pruning, or freezing.
But if it's big, you should do all of those things as soon as possible
because the benefits are quite significant.  Therefore, any change in
this area is  guaranteed to lose on some easy-to-construct workload,
because I just described two of them that want opposing things.

2. Dirtying every N'th page is a great way to generate lots of random
I/O that will quite possibly make your disk almost as sad - or even
sadder - than dirtying all of them, but without anywhere as near as
much performance benefit.

Variations on this idea have been proposed so many times over the
years that I'm tempted to give some credence to the theory that we
ought to adopt one of them.  But there will certainly be losers, as
well as winners.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Heikki Linnakangas
Date:
This patch has gotten a fair amount of review, and has been rewritten 
once during the commitfest. I think it's pretty close to being 
committable, the only remaining question seems to be what to do with 
system catalogs. I'm marking this as "Returned with feedback", I take it 
that Simon can proceed from here, outside the commitfest.

- Heikki




Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2014-09-27 10:23:33 +0300, Heikki Linnakangas wrote:
> This patch has gotten a fair amount of review, and has been rewritten once
> during the commitfest. I think it's pretty close to being committable, the
> only remaining question seems to be what to do with system catalogs. I'm
> marking this as "Returned with feedback", I take it that Simon can proceed
> from here, outside the commitfest.

FWIW, I don't think it is, even with that. As is it seems very likely
that it's going to regress a fair share of workloads. At the very least
it needs a fair amount of benchmarking beforehand.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 27 September 2014 09:29, Andres Freund <andres@anarazel.de> wrote:
> On 2014-09-27 10:23:33 +0300, Heikki Linnakangas wrote:
>> This patch has gotten a fair amount of review, and has been rewritten once
>> during the commitfest. I think it's pretty close to being committable, the
>> only remaining question seems to be what to do with system catalogs. I'm
>> marking this as "Returned with feedback", I take it that Simon can proceed
>> from here, outside the commitfest.
>
> FWIW, I don't think it is, even with that. As is it seems very likely
> that it's going to regress a fair share of workloads. At the very least
> it needs a fair amount of benchmarking beforehand.

There is some doubt there. We've not seen a workload that does
actually exhibit a negative behaviour. I'm not saying one doesn't
exist, but it does matter how common/likely it is. If anyone can
present a performance test case that demonstrates a regression, I
think it will make it easier to discuss how wide that case is and what
we should do about it. Discussing whether to do various kinds of
limited pruning are moot until that is clear.

My memory was that it took months for people to understand the
frequent update use case, since catching it in flagrante delicto was
hard. That may be the case here, or not, but negative-benefit
experimental results very welcome.

Updated patch attached to address earlier comments.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2014-09-28 19:51:36 +0100, Simon Riggs wrote:
> On 27 September 2014 09:29, Andres Freund <andres@anarazel.de> wrote:
> > On 2014-09-27 10:23:33 +0300, Heikki Linnakangas wrote:
> >> This patch has gotten a fair amount of review, and has been rewritten once
> >> during the commitfest. I think it's pretty close to being committable, the
> >> only remaining question seems to be what to do with system catalogs. I'm
> >> marking this as "Returned with feedback", I take it that Simon can proceed
> >> from here, outside the commitfest.
> >
> > FWIW, I don't think it is, even with that. As is it seems very likely
> > that it's going to regress a fair share of workloads. At the very least
> > it needs a fair amount of benchmarking beforehand.
>
> There is some doubt there. We've not seen a workload that does
> actually exhibit a negative behaviour.

Neither is there much data about the magnitude of positive effect the
patch has...

> I'm not saying one doesn't exist, but it does matter how common/likely
> it is. If anyone can present a performance test case that demonstrates
> a regression, I think it will make it easier to discuss how wide that
> case is and what we should do about it. Discussing whether to do
> various kinds of limited pruning are moot until that is clear.

I doubt it'll be hard to construct a case where it'll show. My first try
of using a pgbench scale 100, -M prepared, -cj8 with a custom file with
1 write and 5 read transaction yielded the following on my laptop:

Baseline:relname                | pgbench_tellerspg_total_relation_size | 458752relname                |
pgbench_accountspg_total_relation_size| 1590337536relname                | pgbench_branchespg_total_relation_size |
286720relname               | pgbench_historypg_total_relation_size | 49979392
 
Patched:relname                | pgbench_tellerspg_total_relation_size | 516096relname                |
pgbench_accountspg_total_relation_size| 1590337536relname                | pgbench_branchespg_total_relation_size |
360448relname               | pgbench_historypg_total_relation_size | 49528832
 

So, there's a noticeable increase in size. Mostly on the smaller tables,
so probably HOT cleanup was sometimes skipped during UPDATEs due to
locks.

Baseline was:
tps = 9655.486532 (excluding connections establishing)
Patched was:
tps = 9466.158701 (including connections establishing)

That's not a unrealistic testcase.

I'm pretty sure this could be made quite a bit more pronounced by not
using a uniform distribution in the pgbench runs. And selecting a test
that's more vulnerable to the change (e.g. using a wider distribution
for the read only statements than the modifying ones) would make the the
CPU overhead of the additional heap_hot_search_buffer() overhead
heavier.

>
> My memory was that it took months for people to understand the
> frequent update use case, since catching it in flagrante delicto was
> hard. That may be the case here, or not, but negative-benefit
> experimental results very welcome.
>
> Updated patch attached to address earlier comments.

contrib (at least pgstattuple) doesn't currently compile with
this... Easily patched up tho.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2014-09-29 11:13:43 +0200, Andres Freund wrote:
> I doubt it'll be hard to construct a case where it'll show. My first try
> of using a pgbench scale 100, -M prepared, -cj8 with a custom file with
> 1 write and 5 read transaction yielded the following on my laptop:

Hm. On second thought that testcase probably *decreased* the bad effects
of this because pgbench's readonly statements only touch one table...

patched:

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2014-09-29 11:31:11 +0200, Andres Freund wrote:
> On 2014-09-29 11:13:43 +0200, Andres Freund wrote:
> > I doubt it'll be hard to construct a case where it'll show. My first try
> > of using a pgbench scale 100, -M prepared, -cj8 with a custom file with
> > 1 write and 5 read transaction yielded the following on my laptop:
>
> Hm. On second thought that testcase probably *decreased* the bad effects
> of this because pgbench's readonly statements only touch one table...

I've attached the file nonetheless, for posterities sake.

Note it has the scale hardcoded...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
What happened to this patch?  I'm going over something that could use
the concept of "clean some stuff up when reading this page, but only if
we're already writing" or similar.

I see some cases were presented that had a performance decrease.  Did we
get any numbers for the increase in performance in some other
interesting cases?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 17 November 2014 21:09, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> What happened to this patch?  I'm going over something that could use
> the concept of "clean some stuff up when reading this page, but only if
> we're already writing" or similar.
>
> I see some cases were presented that had a performance decrease.  Did we
> get any numbers for the increase in performance in some other
> interesting cases?

It's not dead; it just needs more work. Maybe for next CF, or you can now.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 17 November 2014 at 22:08, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 17 November 2014 21:09, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> What happened to this patch?  I'm going over something that could use
>> the concept of "clean some stuff up when reading this page, but only if
>> we're already writing" or similar.
>>
>> I see some cases were presented that had a performance decrease.  Did we
>> get any numbers for the increase in performance in some other
>> interesting cases?
>
> It's not dead; it just needs more work. Maybe for next CF, or you can now.

Latest version attached for next CF

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Turning off HOT/Cleanup sometimes

From
Jeff Janes
Date:
On Thu, Dec 11, 2014 at 7:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 17 November 2014 at 22:08, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 17 November 2014 21:09, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> What happened to this patch?  I'm going over something that could use
>> the concept of "clean some stuff up when reading this page, but only if
>> we're already writing" or similar.
>>
>> I see some cases were presented that had a performance decrease.  Did we
>> get any numbers for the increase in performance in some other
>> interesting cases?
>
> It's not dead; it just needs more work. Maybe for next CF, or you can now.

Latest version attached for next CF


I still get the compiler error in contrib:

pgstattuple.c: In function 'pgstat_heap':
pgstattuple.c:279: error: too few arguments to function 'heap_beginscan_strat'

Should it pass false for the always_prune?  

Cheers,

Jeff

Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 15 December 2014 at 20:26, Jeff Janes <jeff.janes@gmail.com> wrote:

> I still get the compiler error in contrib:
>
> pgstattuple.c: In function 'pgstat_heap':
> pgstattuple.c:279: error: too few arguments to function
> 'heap_beginscan_strat'
>
> Should it pass false for the always_prune?

Yes.

New version attached.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Turning off HOT/Cleanup sometimes

From
Michael Paquier
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Wed, Dec 17, 2014 at 5:39 PM, Simon
Riggs<span dir="ltr"><<a href="mailto:simon@2ndquadrant.com" target="_blank">simon@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">On15 December 2014 at 20:26, Jeff Janes <<a href="mailto:jeff.janes@gmail.com">jeff.janes@gmail.com</a>>
wrote:<br/><br /> > I still get the compiler error in contrib:<br /> ><br /> > pgstattuple.c: In function
'pgstat_heap':<br/> > pgstattuple.c:279: error: too few arguments to function<br /> > 'heap_beginscan_strat'<br
/>><br /> > Should it pass false for the always_prune?<br /><br /></span>Yes.<br /><br /> New version
attached.<br/></blockquote></div><br clear="all" /></div><div class="gmail_extra">Moved patch to CF 2015-02 with same
status"Needs review". It visibly needs more work, and numbers to show increase in performance while only cases showing
upperformance decrease showed up.<br /></div><div class="gmail_extra">-- <br /><div class="gmail_signature">Michael<br
/></div></div></div>

Re: Turning off HOT/Cleanup sometimes

From
Jeff Janes
Date:
On Wed, Dec 17, 2014 at 12:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 15 December 2014 at 20:26, Jeff Janes <jeff.janes@gmail.com> wrote:

> I still get the compiler error in contrib:
>
> pgstattuple.c: In function 'pgstat_heap':
> pgstattuple.c:279: error: too few arguments to function
> 'heap_beginscan_strat'
>
> Should it pass false for the always_prune?

Yes.

New version attached.

This no longer applies directly against head, but if I apply to an older checkout and then do "git checkout -m origin" it rolls forward cleanly.

Did versions 7 and 8 of this patch address Andres' concern about performance regressions?

Thanks,

Jeff

Re: Turning off HOT/Cleanup sometimes

From
Peter Eisentraut
Date:
On 12/17/14 3:39 AM, Simon Riggs wrote:
> On 15 December 2014 at 20:26, Jeff Janes <jeff.janes@gmail.com> wrote:
> 
>> I still get the compiler error in contrib:
>>
>> pgstattuple.c: In function 'pgstat_heap':
>> pgstattuple.c:279: error: too few arguments to function
>> 'heap_beginscan_strat'
>>
>> Should it pass false for the always_prune?
> 
> Yes.
> 
> New version attached.

README.HOT section "When can/should we prune or defragment?" needs a
major update as a result of this patch.



Re: Turning off HOT/Cleanup sometimes

From
Peter Eisentraut
Date:
On 3/9/15 1:36 PM, Jeff Janes wrote:
> Did versions 7 and 8 of this patch address Andres' concern about
> performance regressions?

I don't think so.  Andres basically wanted a nontrival algorithm to
determine how much pruning to do during a read-only scan.  And Robert
basically said, that's not really possible.

The presented patch actually has a hardcoded prune limit of 4 per scan,
which I don't see mentioned in the discussion anywhere (except in very
early versions, where this was exposed as a knob).

I think most people were of the opinion that scans on system catalogs
should not be affected by this behavior change.  Makes sense to me:
System catalog bloat is likely a bigger problem than speeding up queries
on catalogs with large live data.

And then there is still some disagreement whether just turning this on
is tolerable for all uses.  Andres mentioned workloads that have trouble
getting a cleanup lock.  README.HOT seems to think that cleaning up
during reads is important because skipping over dead tuples is
expensive.  Nobody seems to like the idea of (implicitly) pushing more
responsibility on VACUUM.  We have seen some benchmarks that show
significant improvements.  We have seen some (constructed ones) that
show problems.

I don't know how to move forward.  We could give users a knob: This
might make your queries faster or not -- good luck.  But of course
nobody will like that either.




Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2015-03-11 20:55:18 -0400, Peter Eisentraut wrote:
> I don't think so.  Andres basically wanted a nontrival algorithm to
> determine how much pruning to do during a read-only scan.  And Robert
> basically said, that's not really possible.

I don't think either of us made really strong statements.

> We have seen some benchmarks that show significant improvements.  We
> have seen some (constructed ones) that show problems.

FWIW, it's not that constructed. It's just a mixture of read with write
load.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Peter Eisentraut
Date:
On 3/12/15 5:41 AM, Andres Freund wrote:
> On 2015-03-11 20:55:18 -0400, Peter Eisentraut wrote:
>> I don't think so.  Andres basically wanted a nontrival algorithm to
>> determine how much pruning to do during a read-only scan.  And Robert
>> basically said, that's not really possible.
> 
> I don't think either of us made really strong statements.

I didn't mean to put words in your mouth.  I just wanted to summarize
the thread as, Andres wanted more fine-tuning on the behavior, Robert
expressed serious doubts that that will lead to an acceptable result.




Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Thu, Mar 12, 2015 at 3:48 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 3/12/15 5:41 AM, Andres Freund wrote:
>> On 2015-03-11 20:55:18 -0400, Peter Eisentraut wrote:
>>> I don't think so.  Andres basically wanted a nontrival algorithm to
>>> determine how much pruning to do during a read-only scan.  And Robert
>>> basically said, that's not really possible.
>>
>> I don't think either of us made really strong statements.
>
> I didn't mean to put words in your mouth.  I just wanted to summarize
> the thread as, Andres wanted more fine-tuning on the behavior, Robert
> expressed serious doubts that that will lead to an acceptable result.

Or to put that another way, I'm not sure there's one behavior here
that will please everybody.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 11 March 2015 at 20:55, Peter Eisentraut <peter_e@gmx.net> wrote:

> I don't know how to move forward.  We could give users a knob: This
> might make your queries faster or not -- good luck.  But of course
> nobody will like that either.

What is clear is that large SELECT queries are doing the work VACUUM
should do. We should not be doing large background tasks (block
cleanup) during long running foreground tasks. But there is no need
for changing behaviour during small SELECTs. So the setting of 4 gives
current behaviour for small SELECTs and new behaviour for larger
SELECTs.

The OP said this...
<op>
We also make SELECT clean up blocks as it goes. That is useful in OLTP
workloads, but it means that large SQL queries and pg_dump effectively
do much the same work as VACUUM, generating huge amounts of I/O and
WAL on the master, the cost and annoyance of which is experienced
directly by the user. That is avoided on standbys.

Effects of that are that long running statements often run much longer
than we want, increasing bloat as a result. It also produces wildly
varying response times, depending upon extent of cleanup required.
</op>

This is not a performance patch. This is about one user doing the
cleanup work for another. People running large SELECTs should not be
penalised. The patch has been shown to avoid that and no further
discussion should be required.

I don't really care whether we have a parameter for this or not. As
long as we have the main feature.

It's trivial to add/remove a parameter to control this. Currently
there isn't one.

I'd like to commit this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services

Attachment

Re: Turning off HOT/Cleanup sometimes

From
David Steele
Date:
On 4/14/15 6:07 PM, Simon Riggs wrote:
> On 11 March 2015 at 20:55, Peter Eisentraut <peter_e@gmx.net> wrote:
>
>> I don't know how to move forward.  We could give users a knob: This
>> might make your queries faster or not -- good luck.  But of course
>> nobody will like that either.
>
> What is clear is that large SELECT queries are doing the work VACUUM
> should do. We should not be doing large background tasks (block
> cleanup) during long running foreground tasks. But there is no need
> for changing behaviour during small SELECTs. So the setting of 4 gives
> current behaviour for small SELECTs and new behaviour for larger
> SELECTs.
>
> The OP said this...
> <op>
> We also make SELECT clean up blocks as it goes. That is useful in OLTP
> workloads, but it means that large SQL queries and pg_dump effectively
> do much the same work as VACUUM, generating huge amounts of I/O and
> WAL on the master, the cost and annoyance of which is experienced
> directly by the user. That is avoided on standbys.
>
> Effects of that are that long running statements often run much longer
> than we want, increasing bloat as a result. It also produces wildly
> varying response times, depending upon extent of cleanup required.
> </op>
>
> This is not a performance patch. This is about one user doing the
> cleanup work for another. People running large SELECTs should not be
> penalised. The patch has been shown to avoid that and no further
> discussion should be required.
>
> I don't really care whether we have a parameter for this or not. As
> long as we have the main feature.
>
> It's trivial to add/remove a parameter to control this. Currently
> there isn't one.
>
> I'd like to commit this.

+1 from me.  One of the last databases I worked on had big raw
partitions that were written to and then sequentially scanned exactly
once before being dropped.  It was painful to see all those writes
happening for nothing.

In other cases there were sequential scans that happened directly after
the main writes, but then the next read might be days in the future (if
ever) and the system was basically idle for a while which would have
allowed vacuum to come in and do the job without affecting performance
of the main job.

I think that in batch-oriented databases this patch will definitely be a
boon to performance.

--
- David Steele
david@pgmasters.net


Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Tue, Apr 14, 2015 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 11 March 2015 at 20:55, Peter Eisentraut <peter_e@gmx.net> wrote:
>> I don't know how to move forward.  We could give users a knob: This
>> might make your queries faster or not -- good luck.  But of course
>> nobody will like that either.
>
> What is clear is that large SELECT queries are doing the work VACUUM
> should do. We should not be doing large background tasks (block
> cleanup) during long running foreground tasks. But there is no need
> for changing behaviour during small SELECTs. So the setting of 4 gives
> current behaviour for small SELECTs and new behaviour for larger
> SELECTs.

Peter commented previously that README.HOT should get an update.  The
relevant section seems to be "When can/should we prune or
defragment?".

I wonder if it would be a useful heuristic to still prune pages if
those pages are already dirty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 14 April 2015 at 21:53, Robert Haas <robertmhaas@gmail.com> wrote:

> Peter commented previously that README.HOT should get an update.  The
> relevant section seems to be "When can/should we prune or
> defragment?".

That's easy enough to change once we agree to commit.

> I wonder if it would be a useful heuristic to still prune pages if
> those pages are already dirty.

Useful for who? This is about responsibility. Why should someone
performing a large SELECT take the responsibility for cleaning pages?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Wed, Apr 15, 2015 at 3:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 14 April 2015 at 21:53, Robert Haas <robertmhaas@gmail.com> wrote:
>> Peter commented previously that README.HOT should get an update.  The
>> relevant section seems to be "When can/should we prune or
>> defragment?".
>
> That's easy enough to change once we agree to commit.
>
>> I wonder if it would be a useful heuristic to still prune pages if
>> those pages are already dirty.
>
> Useful for who? This is about responsibility. Why should someone
> performing a large SELECT take the responsibility for cleaning pages?

Because it makes it subsequent accesses to the page cheaper.  Of
course, that applies in all cases, but when the page is already dirty,
the cost of pruning it is probably quite small - we're going to have
to write the page anyway, and pruning it before it gets evicted
(perhaps even by our scan) will be cheaper than writing it now and
writing it again after it's pruned.  When the page is clean, the cost
of pruning is significantly higher.

I won't take responsibility for paying my neighbor's tax bill, but I
might take responsibility for picking up his mail while he's on
holiday.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 15 April 2015 at 08:04, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 15, 2015 at 3:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 14 April 2015 at 21:53, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Peter commented previously that README.HOT should get an update.  The
>>> relevant section seems to be "When can/should we prune or
>>> defragment?".
>>
>> That's easy enough to change once we agree to commit.
>>
>>> I wonder if it would be a useful heuristic to still prune pages if
>>> those pages are already dirty.
>>
>> Useful for who? This is about responsibility. Why should someone
>> performing a large SELECT take the responsibility for cleaning pages?
>
> Because it makes it subsequent accesses to the page cheaper.

Cheaper for whom?

> Of
> course, that applies in all cases, but when the page is already dirty,
> the cost of pruning it is probably quite small - we're going to have
> to write the page anyway, and pruning it before it gets evicted
> (perhaps even by our scan) will be cheaper than writing it now and
> writing it again after it's pruned.  When the page is clean, the cost
> of pruning is significantly higher.

"We" aren't going to have to write the page, but someone will.

In a single workload, the mix of actions can be useful. In separate
workloads, where some guy just wants to run a report or a backup, its
not right that we slow them down because of someone else's actions.

> I won't take responsibility for paying my neighbor's tax bill, but I
> might take responsibility for picking up his mail while he's on
> holiday.

That makes it sound like this is an occasional, non-annoying thing.

It's more like, whoever fetches the mail needs to fetch it for
everybody. So we are slowing down one person disproportionately, while
others fly through without penalty. There is no argument that one
workload necessarily needs to perform that on behalf of the other
workload.

The actions you suggest are reasonable and should ideally be the role
of a background process. But that doesn't mean in the absence of that
we should pay the cost in the foreground.

Let me apply this patch.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2015-04-15 08:42:33 -0400, Simon Riggs wrote:
> > Because it makes it subsequent accesses to the page cheaper.
> 
> Cheaper for whom?

Everyone. Including further readers. Following HOT chains in read mostly
workloads can be really expensive. If you have workloads with a 'hot'
value range that's frequently updated, but that range moves you can
easily end up with heavily chained tuples which won't soon be touched by
a writer again.

And writers will often not yet be able to prune the page because there's
still live readers for the older versions (like other updaters).

> > Of
> > course, that applies in all cases, but when the page is already dirty,
> > the cost of pruning it is probably quite small - we're going to have
> > to write the page anyway, and pruning it before it gets evicted
> > (perhaps even by our scan) will be cheaper than writing it now and
> > writing it again after it's pruned.  When the page is clean, the cost
> > of pruning is significantly higher.
> 
> "We" aren't going to have to write the page, but someone will.

If it's already dirty that doesn't change at all. *Not* pruning in that
moment actually will often *increase* the total amount of writes to the
OS. Because now the pruning will happen on the next write access or
vacuum - when the page already might have been undirtied.

I don't really see the downside to this suggestion.

> The actions you suggest are reasonable and should ideally be the role
> of a background process. But that doesn't mean in the absence of that
> we should pay the cost in the foreground.

I'm not sure that's true. A background process will either cause
additional read IO to find worthwhile pages, or it'll not find
worthwhile pages because they're already paged out.

Greetings,

Andres Freund



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 15 April 2015 at 09:10, Andres Freund <andres@anarazel.de> wrote:
> On 2015-04-15 08:42:33 -0400, Simon Riggs wrote:
>> > Because it makes it subsequent accesses to the page cheaper.
>>
>> Cheaper for whom?
>
> Everyone.

I think what you mean is "Everyone else". It is demonstrably quicker
and more consistent for a process when it limits the amount of pruning
it does, as well as the fact that it causes additional WAL traffic
when it does so, causing replication lag.

I love it when someone cleans up for me. I just don't think they'll
accept the argument that they should clean up for me because it makes
their life easier.   Certainly doesn't work with my kids.


> I don't really see the downside to this suggestion.

The suggestion makes things better than they are now but is still less
than I have proposed.

If what you both mean is "IMHO this is an acceptable compromise", I
can accept it also, at this point in the CF.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> On 15 April 2015 at 09:10, Andres Freund <andres@anarazel.de> wrote:

> > I don't really see the downside to this suggestion.
> 
> The suggestion makes things better than they are now but is still less
> than I have proposed.
> 
> If what you both mean is "IMHO this is an acceptable compromise", I
> can accept it also, at this point in the CF.

Let me see if I understand things.

What we have now is: when reading a page, we also HOT-clean it.  This
runs HOT-cleanup a large number of times, and causes many pages to
become dirty.

Your patch is "when reading a page, HOT-clean it, but only 5 times in
each scan".  This runs HOT-cleanup at most 5 times, and causes at most 5
pages to become dirty.

Robert's proposal is "when reading a page, if dirty HOT-clean it; if not
dirty, also HOT-clean it but only 5 times in each scan".  This runs
HOT-cleanup some number of times (as many as there are dirty), and
causes at most 5 pages to become dirty.


Am I right in thinking that HOT-clean in a dirty page is something that
runs completely within CPU cache?  If so, it would be damn fast and
would have benefits for future readers, for very little cost.

Dirtying a page is very different; if buffer reads are common, the
system is later bogged down trying to find clean pages to read uncached
buffers (including the read-only scan itself, so it becomes slower.)


If I have understood things correctly, then I stand behind Robert's
suggestion.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Heikki Linnakangas
Date:
On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
> Simon Riggs wrote:
>> On 15 April 2015 at 09:10, Andres Freund <andres@anarazel.de> wrote:
>
>>> I don't really see the downside to this suggestion.
>>
>> The suggestion makes things better than they are now but is still less
>> than I have proposed.
>>
>> If what you both mean is "IMHO this is an acceptable compromise", I
>> can accept it also, at this point in the CF.
>
> Let me see if I understand things.
>
> What we have now is: when reading a page, we also HOT-clean it.  This
> runs HOT-cleanup a large number of times, and causes many pages to
> become dirty.
>
> Your patch is "when reading a page, HOT-clean it, but only 5 times in
> each scan".  This runs HOT-cleanup at most 5 times, and causes at most 5
> pages to become dirty.
>
> Robert's proposal is "when reading a page, if dirty HOT-clean it; if not
> dirty, also HOT-clean it but only 5 times in each scan".  This runs
> HOT-cleanup some number of times (as many as there are dirty), and
> causes at most 5 pages to become dirty.
>
>
> Am I right in thinking that HOT-clean in a dirty page is something that
> runs completely within CPU cache?  If so, it would be damn fast and
> would have benefits for future readers, for very little cost.

If there are many tuples on the page, it takes some CPU effort to scan 
all the HOT chains and move tuples around. Also, it creates a WAL 
record, which isn't free.

Another question is whether the patch can reliably detect whether it's 
doing a "read-only" scan or not. I haven't tested, but I suspect it'd 
not do pruning when you do something like "INSERT INTO foo SELECT * FROM 
foo WHERE blah". I.e. when the target relation is referenced twice in 
the same statement: once as the target, and second time as a source. 
Maybe that's OK, though.

- Heikki




Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:
> On 04/15/2015 05:44 PM, Alvaro Herrera wrote:

> >Robert's proposal is "when reading a page, if dirty HOT-clean it; if not
> >dirty, also HOT-clean it but only 5 times in each scan".  This runs
> >HOT-cleanup some number of times (as many as there are dirty), and
> >causes at most 5 pages to become dirty.
> >
> >
> >Am I right in thinking that HOT-clean in a dirty page is something that
> >runs completely within CPU cache?  If so, it would be damn fast and
> >would have benefits for future readers, for very little cost.
> 
> If there are many tuples on the page, it takes some CPU effort to scan all
> the HOT chains and move tuples around. Also, it creates a WAL record, which
> isn't free.

But if the page is in CPU cache, the CPU effort shouldn't be all that
noticeable, should it?  That's my point, but then maybe I'm wrong.  Now,
the WAL logging is annoying, so let's limit that too -- do it at most
for, say, 20 dirty pages and at most 5 clean pages.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Heikki Linnakangas
Date:
On 04/15/2015 07:11 PM, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>> On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
>
>>> Robert's proposal is "when reading a page, if dirty HOT-clean it; if not
>>> dirty, also HOT-clean it but only 5 times in each scan".  This runs
>>> HOT-cleanup some number of times (as many as there are dirty), and
>>> causes at most 5 pages to become dirty.
>>>
>>>
>>> Am I right in thinking that HOT-clean in a dirty page is something that
>>> runs completely within CPU cache?  If so, it would be damn fast and
>>> would have benefits for future readers, for very little cost.
>>
>> If there are many tuples on the page, it takes some CPU effort to scan all
>> the HOT chains and move tuples around. Also, it creates a WAL record, which
>> isn't free.
>
> But if the page is in CPU cache, the CPU effort shouldn't be all that
> noticeable, should it?  That's my point, but then maybe I'm wrong.  Now,
> the WAL logging is annoying, so let's limit that too -- do it at most
> for, say, 20 dirty pages and at most 5 clean pages.

There isn't much difference between that and just doing it on first 5 
pages. Both of those numbers were pulled out of thin air, anyway. I'd 
rather just keep it simple.

- Heikki



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Wed, Apr 15, 2015 at 8:42 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I won't take responsibility for paying my neighbor's tax bill, but I
>> might take responsibility for picking up his mail while he's on
>> holiday.
>
> That makes it sound like this is an occasional, non-annoying thing.
>
> It's more like, whoever fetches the mail needs to fetch it for
> everybody. So we are slowing down one person disproportionately, while
> others fly through without penalty. There is no argument that one
> workload necessarily needs to perform that on behalf of the other
> workload.

Sure there is.  It's called a tragedy of the commons - everybody acts
in their own selfish interest (it's not *my* responsibility to limit
grazing on public land, or prune this page that I'm not modifying) and
as a result some resource that everybody cares about (grass,
system-wide I/O) gets trashed to everyone's detriment.  Purely selfish
behavior can only be justified here if we assume that the selfish
actor intends to participate in the system only once: I'm going to run
one big reporting query which must run as fast as possible, and then
I'm getting on a space ship to Mars.  So if my refusal to do any
pruning during that reporting query causes lots of extra I/O on the
system ten minutes from now, I don't care, because I'll have left the
playing field forever at that point.

As Heikki points out, any HOT pruning operation generates WAL and has
a CPU cost.  However, pruning a page that is currently dirty
*decreases* the total volume of writes to the data files, whereas
pruning a page that is currently clean *increases* the total volume of
writes to the data files.  In the first case, if we prune the page
right now while it's still dirty, we can't possibly cause any
additional data-file writes, and we may save one, because it's
possible that someone else would later have pruned it when it was
clean and there was no other reason to dirty it.  In the second case,
if we prune the page that is currently clean, it will become dirty.
That will cost us no additional I/O if the page is again modified
before it's written out, but otherwise it costs an additional data
file write.  I think there's a big difference between those two cases.
Sure, from the narrow point of view of how much work it takes this
scan to process this page, it's always better not to prune.  But if
you make the more realistic assumption that you will keep on issuing
queries on the system, then what you're doing to the overall system
I/O load is pretty important.

By the way, was anything ever done about this:

http://www.postgresql.org/message-id/20140929091343.GA4716@alap3.anarazel.de

That's just a workload that is 5/6th pgbench -S and 1/6th pgbench,
which is in no way an unrealistic workload, and showed a significant
regression with an earlier version of the patch.  You seem very eager
to commit this patch after four months of inactivity, but I think this
is a pretty massive behavior change that deserves careful scrutiny
before it goes in.  If we push something that changes longstanding
behavior and can't even be turned off, and it regresses behavior for a
use case that common, our users are going to come after us with
pitchforks.  That's not to say some people won't be happy, but in my
experience it takes a lot of happy users to make up for getting
stabbed with even one pitchfork.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 15 April 2015 at 12:39, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 15, 2015 at 8:42 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> I won't take responsibility for paying my neighbor's tax bill, but I
>>> might take responsibility for picking up his mail while he's on
>>> holiday.
>>
>> That makes it sound like this is an occasional, non-annoying thing.
>>
>> It's more like, whoever fetches the mail needs to fetch it for
>> everybody. So we are slowing down one person disproportionately, while
>> others fly through without penalty. There is no argument that one
>> workload necessarily needs to perform that on behalf of the other
>> workload.
>
> Sure there is.  It's called a tragedy of the commons - everybody acts
> in their own selfish interest (it's not *my* responsibility to limit
> grazing on public land, or prune this page that I'm not modifying) and
> as a result some resource that everybody cares about (grass,
> system-wide I/O) gets trashed to everyone's detriment.  Purely selfish
> behavior can only be justified here if we assume that the selfish
> actor intends to participate in the system only once: I'm going to run
> one big reporting query which must run as fast as possible, and then
> I'm getting on a space ship to Mars.  So if my refusal to do any
> pruning during that reporting query causes lots of extra I/O on the
> system ten minutes from now, I don't care, because I'll have left the
> playing field forever at that point.

It all depends upon who is being selfish. Why is a user "selfish" for
not wanting to clean every single block they scan, when the people
that made the mess do nothing and go faster 10 minutes from now?
Randomly and massively penalising large SELECTs makes no sense. Some
cleanup is OK, with reasonable limits, which is why that is proposed.

On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
> Robert's proposal is "when reading a page, if dirty HOT-clean it; if not
> dirty, also HOT-clean it but only 5 times in each scan".  This runs
> HOT-cleanup some number of times (as many as there are dirty), and
> causes at most 5 pages to become dirty.

My understanding of Robert's proposal was "when reading a page,
HOT-clean it, but only do this up to 5 times on clean pages, but
continue to do this indefinitely when the page is already dirty.".
Andres said that was the only way and I have agreed to it.

Are you now saying not to commit your proposal at all?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> On 15 April 2015 at 12:39, Robert Haas <robertmhaas@gmail.com> wrote:

> On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
> > Robert's proposal is "when reading a page, if dirty HOT-clean it; if not
> > dirty, also HOT-clean it but only 5 times in each scan".  This runs
> > HOT-cleanup some number of times (as many as there are dirty), and
> > causes at most 5 pages to become dirty.
> 
> My understanding of Robert's proposal was "when reading a page,
> HOT-clean it, but only do this up to 5 times on clean pages, but
> continue to do this indefinitely when the page is already dirty.".

To me, both statements look identical.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 15 April 2015 at 16:01, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Simon Riggs wrote:
>> On 15 April 2015 at 12:39, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
>> > Robert's proposal is "when reading a page, if dirty HOT-clean it; if not
>> > dirty, also HOT-clean it but only 5 times in each scan".  This runs
>> > HOT-cleanup some number of times (as many as there are dirty), and
>> > causes at most 5 pages to become dirty.
>>
>> My understanding of Robert's proposal was "when reading a page,
>> HOT-clean it, but only do this up to 5 times on clean pages, but
>> continue to do this indefinitely when the page is already dirty.".
>
> To me, both statements look identical.

I didn't read it that way, apologies for any confusion. But that is good news.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Peter Geoghegan
Date:
On Wed, Apr 15, 2015 at 6:10 AM, Andres Freund <andres@anarazel.de> wrote:
>> > Of
>> > course, that applies in all cases, but when the page is already dirty,
>> > the cost of pruning it is probably quite small - we're going to have
>> > to write the page anyway, and pruning it before it gets evicted
>> > (perhaps even by our scan) will be cheaper than writing it now and
>> > writing it again after it's pruned.  When the page is clean, the cost
>> > of pruning is significantly higher.
>>
>> "We" aren't going to have to write the page, but someone will.
>
> If it's already dirty that doesn't change at all. *Not* pruning in that
> moment actually will often *increase* the total amount of writes to the
> OS. Because now the pruning will happen on the next write access or
> vacuum - when the page already might have been undirtied.
>
> I don't really see the downside to this suggestion.

+1. I think, in general, the opportunity cost of not pruning when a
page is already dirty is likely to be rather high. In general, it's
likely to be worth it.


-- 
Peter Geoghegan



Re: Turning off HOT/Cleanup sometimes

From
Greg Stark
Date:
<p dir="ltr"><br /> On 15 Apr 2015 15:43, "Simon Riggs" <<a
href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br /> ><br /> > It all depends upon who
isbeing selfish. Why is a user "selfish" for<br /> > not wanting to clean every single block they scan, when the
people<br/> > that made the mess do nothing and go faster 10 minutes from now?<br /> > Randomly and massively
penalisinglarge SELECTs makes no sense. Some<br /> > cleanup is OK, with reasonable limits, which is why that is
proposed.<pdir="ltr">I don't think it's productive to think of a query as a different actor with only an interest in
itsown performance and no interest in overall system performance.<p dir="ltr">From a holistic point of view the
questionis how many times is a given hit chain going to need to be followed before it's pruned. Or to put it another
way,how expensive is creating a hot chain. Does it cause a single prune? a fixed number of chain readers followed by a
prune?Does the amount of work depend on the workload or is it consistent?<p dir="ltr">My intuition is that a fixed
cutofflike "five pages" is dangerous because if you update many pages there's no limit to the number of times they'll
beread before they're all pruned. The steady state could easily be that every query is having to read hot chains
forever.<pdir="ltr">My intuition, again, is that what we need is a percentage such as "do 10 prunes then ignore the
next1000 clean pages with hot chains. That guarantees that after 100 selects the hot chains will all be pruned but each
selectwill only prune 1% of the clean pages it sees. 

Re: Turning off HOT/Cleanup sometimes

From
Pavan Deolasee
Date:


On Thu, Apr 16, 2015 at 2:47 PM, Greg Stark <stark@mit.edu> wrote:


On 15 Apr 2015 15:43, "Simon Riggs" <simon@2ndquadrant.com> wrote:
>
> It all depends upon who is being selfish. Why is a user "selfish" for
> not wanting to clean every single block they scan, when the people
> that made the mess do nothing and go faster 10 minutes from now?
> Randomly and massively penalising large SELECTs makes no sense. Some
> cleanup is OK, with reasonable limits, which is why that is proposed.

I don't think it's productive to think of a query as a different actor with only an interest in its own performance and no interest in overall system performance.

From a holistic point of view the question is how many times is a given hit chain going to need to be followed before it's pruned. Or to put it another way, how expensive is creating a hot chain. Does it cause a single prune? a fixed number of chain readers followed by a prune? Does the amount of work depend on the workload or is it consistent?


IMO the size or traversal of the HOT chain is not that expensive compared to the cost of either pruning too frequently, which generates WAL as well as makes buffers dirty. OTOH cost of less frequent pruning could also be very high. It can cause severe table bloat which may just stay for a very long time. Even if dead space is recovered within a page, truncating a bloated heap is not always possible. In such cases, even SELECTs would be slowed down just because they need to read/scan far more pages than they otherwise would have. IOW its probably wrong to assume that not-pruning quickly enough will have impact only on the non-SELECT queries.

I also concur with arguments upthread that this change needs to be carefully calibrated because it can lead to significant degradation for certain workloads.

My intuition, again, is that what we need is a percentage such as "do 10 prunes then ignore the next 1000 clean pages with hot chains. That guarantees that after 100 selects the hot chains will all be pruned but each select will only prune 1% of the clean pages it sees.

I think some such proposal was made in the last. There could be knob to control how much a read-only query (or may be a read-only transaction) should do HOT cleanup, say as a percentage of pages it looks at. The default can be left at 100% in the first release so that the current behaviour is not suddenly disrupted. But it will allow others to play with the percentages and then based on field reports, we can change defaults in the next releases.

Thanks,
Pavan

--

Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Pavan Deolasee wrote:
> On Thu, Apr 16, 2015 at 2:47 PM, Greg Stark <stark@mit.edu> wrote:

> > From a holistic point of view the question is how many times is a given
> > hit chain going to need to be followed before it's pruned. Or to put it
> > another way, how expensive is creating a hot chain. Does it cause a single
> > prune? a fixed number of chain readers followed by a prune? Does the amount
> > of work depend on the workload or is it consistent?
> 
> IMO the size or traversal of the HOT chain is not that expensive compared
> to the cost of either pruning too frequently, which generates WAL as well
> as makes buffers dirty. OTOH cost of less frequent pruning could also be
> very high. It can cause severe table bloat which may just stay for a very
> long time. Even if dead space is recovered within a page, truncating a
> bloated heap is not always possible.

I think you're failing to consider that in the patch there is a
distinction between read-only page accesses and page updates.  During a
page update, HOT cleanup is always done even with the patch, so there
won't be any additional bloat that would not be there without the patch.
It's only the read-only accesses to the patch that skip the HOT pruning.

Of course, as Greg says there will be some additional scans of the HOT
chain by read-only processes.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Pavan Deolasee
Date:


On Thu, Apr 16, 2015 at 6:50 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Pavan Deolasee wrote:
> On Thu, Apr 16, 2015 at 2:47 PM, Greg Stark <stark@mit.edu> wrote:

> > From a holistic point of view the question is how many times is a given
> > hit chain going to need to be followed before it's pruned. Or to put it
> > another way, how expensive is creating a hot chain. Does it cause a single
> > prune? a fixed number of chain readers followed by a prune? Does the amount
> > of work depend on the workload or is it consistent?
>
> IMO the size or traversal of the HOT chain is not that expensive compared
> to the cost of either pruning too frequently, which generates WAL as well
> as makes buffers dirty. OTOH cost of less frequent pruning could also be
> very high. It can cause severe table bloat which may just stay for a very
> long time. Even if dead space is recovered within a page, truncating a
> bloated heap is not always possible.

I think you're failing to consider that in the patch there is a
distinction between read-only page accesses and page updates.  During a
page update, HOT cleanup is always done even with the patch, so there
won't be any additional bloat that would not be there without the patch.
It's only the read-only accesses to the patch that skip the HOT pruning.


Ah, Ok. I'd not read the patch. But now that I do, I feel much more comfortable with the change. In fact, I wonder if its just enough to either do full HOT prune for target relations and not at all for all other relations involved in the query. My apologies if this is done based on discussions upthread. I haven't read the entire thread yet.

Thanks,

Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2015-04-16 10:20:20 -0300, Alvaro Herrera wrote:
> I think you're failing to consider that in the patch there is a
> distinction between read-only page accesses and page updates.  During a
> page update, HOT cleanup is always done even with the patch, so there
> won't be any additional bloat that would not be there without the
> patch.

That's not really true (and my benchmark upthread proves it). The fact
that hot pruning only happens when we can get a cleanup lock means that
we can end up with more pages that are full, if we prune on select less
often. Especially if SELECTs are more frequent than write accesses -
pretty darn common - the likelihood of SELECTs getting the lock is
correspondingly higher.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 16 April 2015 at 15:21, Andres Freund <andres@anarazel.de> wrote:
On 2015-04-16 10:20:20 -0300, Alvaro Herrera wrote:
> I think you're failing to consider that in the patch there is a
> distinction between read-only page accesses and page updates.  During a
> page update, HOT cleanup is always done even with the patch, so there
> won't be any additional bloat that would not be there without the
> patch.

That's not really true (and my benchmark upthread proves it). The fact
that hot pruning only happens when we can get a cleanup lock means that
we can end up with more pages that are full, if we prune on select less
often. Especially if SELECTs are more frequent than write accesses -
pretty darn common - the likelihood of SELECTs getting the lock is
correspondingly higher.

Your point that we *must* do *some* HOT cleanup on SELECTs is proven beyond question. Alvaro has not disputed that, ISTM you misread that. Pavan has questioned that point but the results upthread are there, he explains he hasn't read that yet. 

The only question is "how much cleanup on SELECT"? Having one SELECT hit 10,000 cleanups while another hits 0 creates an unfairness and unpredictability in the way we work. Maybe some people running a backup actually like the fact it cleans the database; others think that is a bad thing. Few people issuing large queries think it is good behaviour. Anybody running replication also knows that this causes a huge slam of WAL which can increase replication delay, which is a concern for HA.

That is how we arrive at the idea of a cleanup limit, further enhanced by a limit that applies only to dirtying clean blocks, which we have 4? recent votes in favour of.

I would personally be in favour of a parameter to control the limit, since whatever we chose is right/wrong depending upon circumstances. I am however comfortable with not having a parameter if people think it is hard to tune that, which I agree it would be, hence no parameter in the patch.
 

Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2015-04-16 10:20:20 -0300, Alvaro Herrera wrote:
> > I think you're failing to consider that in the patch there is a
> > distinction between read-only page accesses and page updates.  During a
> > page update, HOT cleanup is always done even with the patch, so there
> > won't be any additional bloat that would not be there without the
> > patch.
> 
> That's not really true (and my benchmark upthread proves it). The fact
> that hot pruning only happens when we can get a cleanup lock means that
> we can end up with more pages that are full, if we prune on select less
> often. Especially if SELECTs are more frequent than write accesses -
> pretty darn common - the likelihood of SELECTs getting the lock is
> correspondingly higher.

Interesting point.  Of course, this code should count HOT cleanups
against the total limit when they are effectively carried out, and
ignore those that are skipped because of inability to acquire the
cleanup lock.  Not sure whether the submitted code does that.

Can we keep stats on how many pages we don't clean in the updating
process due to failure to acquire cleanup lock?  My intuition says that
it should be similar to the number of backends running concurrently,
but that might be wrong.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Jeff Janes
Date:
<p dir="ltr"><br /> On Mon, Sep 29, 2014 at 2:13 AM, Andres Freund <<a
href="mailto:andres@anarazel.de">andres@anarazel.de</a>>wrote:<br /> ><br /> > On 2014-09-28 19:51:36 +0100,
SimonRiggs wrote:<br /> > > On 27 September 2014 09:29, Andres Freund <<a
href="mailto:andres@anarazel.de">andres@anarazel.de</a>>wrote:<br /> > > > On 2014-09-27 10:23:33 +0300,
HeikkiLinnakangas wrote:<br /> > > >> This patch has gotten a fair amount of review, and has been rewritten
once<br/> > > >> during the commitfest. I think it's pretty close to being committable, the<br /> > >
>>only remaining question seems to be what to do with system catalogs. I'm<br /> > > >> marking this
as"Returned with feedback", I take it that Simon can proceed<br /> > > >> from here, outside the
commitfest.<br/> > > ><br /> > > > FWIW, I don't think it is, even with that. As is it seems very
likely<br/> > > > that it's going to regress a fair share of workloads. At the very least<br /> > > >
itneeds a fair amount of benchmarking beforehand.<br /> > ><br /> > > There is some doubt there. We've not
seena workload that does<br /> > > actually exhibit a negative behaviour.<br /> ><br /> > Neither is there
muchdata about the magnitude of positive effect the<br /> > patch has...<br /> ><br /> > > I'm not saying
onedoesn't exist, but it does matter how common/likely<br /> > > it is. If anyone can present a performance test
casethat demonstrates<br /> > > a regression, I think it will make it easier to discuss how wide that<br /> >
>case is and what we should do about it. Discussing whether to do<br /> > > various kinds of limited pruning
aremoot until that is clear.<br /> ><br /> > I doubt it'll be hard to construct a case where it'll show. My first
try<br/> > of using a pgbench scale 100, -M prepared, -cj8 with a custom file with<br /> > 1 write and 5 read
transactionyielded the following on my laptop:<br /> ><br /> > Baseline:<br /> >  relname                |
pgbench_tellers<br/> >  pg_total_relation_size | 458752<br /> >  relname                | pgbench_accounts<br />
> pg_total_relation_size | 1590337536<br /> >  relname                | pgbench_branches<br /> >
 pg_total_relation_size| 286720<br /> >  relname                | pgbench_history<br /> >  pg_total_relation_size
|49979392<br /> > Patched:<br /> >  relname                | pgbench_tellers<br /> >  pg_total_relation_size |
516096<br/> >  relname                | pgbench_accounts<br /> >  pg_total_relation_size | 1590337536<br /> >
 relname               | pgbench_branches<br /> >  pg_total_relation_size | 360448<br /> >  relname             
 | pgbench_history<br /> >  pg_total_relation_size | 49528832<br /> ><br /> > So, there's a noticeable
increasein size. Mostly on the smaller tables,<br /> > so probably HOT cleanup was sometimes skipped during UPDATEs
dueto<br /> > locks.<br /> ><br /> > Baseline was:<br /> > tps = 9655.486532 (excluding connections
establishing)<br/> > Patched was:<br /> > tps = 9466.158701 (including connections establishing)<br /><p
dir="ltr">Wasthis reproducible?  I've run your custom sql file with 4 clients (that is how many CPUs I have) on a
machine<br/> with a BBU.  I had wal_level = hot_standby, but the archive_command just returned true without archiving
anything.And using the latest patch.<p dir="ltr">The size of the pgbench_tellers and pgbench_branches relations were
surprisinglyvariable in both patched and unpatched, but there was no reliable difference between them, just within
them.<pdir="ltr">On the TPS front, there was a hint that patched one was slightly slower but the within sample
variationwas also high, and the p-val for difference was only 0.214 on n of 66.<br />  <br /> test case attached.<p
dir="ltr">>That's not a unrealistic testcase.<br /> ><br /> > I'm pretty sure this could be made quite a bit
morepronounced by not<br /> > using a uniform distribution in the pgbench runs. And selecting a test<br /> >
that'smore vulnerable to the change (e.g. using a wider distribution<br /> > for the read only statements than the
modifyingones) would make the the<br /> > CPU overhead of the additional heap_hot_search_buffer() overhead<br />
>heavier.<br /><p dir="ltr">Sorry I don't understand this description.  Why would queries selecting data that is not
changinghave any extra overhead?<p dir="ltr">Is the idea that the hot part of the table for updates would move around
overtime, but the hot part for selects would be even throughout?  I'm not sure how to put that to the test.<p
dir="ltr">Cheers,<pdir="ltr">Jeff<p dir="ltr">  

Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2015-04-20 01:04:18 -0700, Jeff Janes wrote:
> Was this reproducible?

Yes, at least with an old version of the patch.

I don't think you could see a difference using exactly that with the
newer versions which have the 5 page limit. After all it'll pretty much
never reach it.

> > That's not a unrealistic testcase.
> >
> > I'm pretty sure this could be made quite a bit more pronounced by not
> > using a uniform distribution in the pgbench runs. And selecting a test
> > that's more vulnerable to the change (e.g. using a wider distribution
> > for the read only statements than the modifying ones) would make the the
> > CPU overhead of the additional heap_hot_search_buffer() overhead
> > heavier.
>
> Sorry I don't understand this description.  Why would queries selecting
> data that is not changing have any extra overhead?

The idea, I think, was that by having a uniform (or just wider)
distribution of the reads they'd be more likely to land on values that
have been updated at some point, but not been pruned since (because at
that point the patch IIRC didn't prune during reads at all). I.e. ones
wer

> Is the idea that the hot part of the table for updates would move around
> over time, but the hot part for selects would be even throughout?

Pretty much.

> I'm not sure how to put that to the test.

That pretty much was what I'd tried to model, yea. I guess it'd be
possible to model this by inserting NOW()/updating values NOW() - 5 and
selecting values up to NOW() - 60. That'd roughly model some realistic
insert/update/select patterns I've seen.

To possibly see any difference with the new patch this would have to be
done in a way that regularly a couple of pages would be touched, with
not that many selected tuples on each.

Greetings,

Andres Freund



Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Thu, Apr 16, 2015 at 03:41:54PM +0100, Simon Riggs wrote:
> That is how we arrive at the idea of a cleanup limit, further enhanced by a
> limit that applies only to dirtying clean blocks, which we have 4? recent votes
> in favour of.
> 
> I would personally be in favour of a parameter to control the limit, since
> whatever we chose is right/wrong depending upon circumstances. I am however
> comfortable with not having a parameter if people think it is hard to tune
> that, which I agree it would be, hence no parameter in the patch.

I think the limit has to be in terms of a percentage of the table size. 
For example, if we do one SELECT on a table with all non-dirty pages, it
would be good to know that 5% of the pages were pruned --- that tells me
that another 19 SELECTs will totally prune the table, assuming no future
writes.  If there are future writes, they would dirty the pages and
cause even more pruning, but the 5% gives me the maximum pruning number
of SELECTs.  If there aren't another 19 SELECTs, do I care if the table
is pruned or not?  Probably not.  Measuring in page count doesn't do
that, and a large table could receive millions of selects before being
fully cleaned.

Also, I am also not sure we should be designing features at this stage
in our release process.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 20 April 2015 at 18:33, Bruce Momjian <bruce@momjian.us> wrote:
 
Also, I am also not sure we should be designing features at this stage
in our release process.

I see this more as a process of gaining approval. I don't think patches at the back of the queue should get the "its too late treatment" just because they are at the back of the queue. They are logically all at the same stage. There should be a way to allow people that show patience and respect for the process to get the same timeshare as those that push their patches daily.

Anyway, in this case, the patch conflicts with other things going in now, so changing things isn't really sensible for this release, in terms of my time.

We clearly need to do a better job of piggybacking actions on a dirty block. The discussion has been about whether to do early cleanup, but we should also consider post-access cleanup if we set hint bits or dirtied the block in other ways.

Since we have many votes in favour of change in this area I'll post a new version and look for an early review/commit for next release.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Bruce Momjian wrote:

> I think the limit has to be in terms of a percentage of the table size. 
> For example, if we do one SELECT on a table with all non-dirty pages, it
> would be good to know that 5% of the pages were pruned --- that tells me
> that another 19 SELECTs will totally prune the table, assuming no future
> writes.

This seems simple to implement: keep two counters, where the second one
is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
reset the first counter so that further 5 pages will get HOT pruned.  5%
seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
essentially +infinity.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Jeff Janes
Date:
On Mon, Apr 20, 2015 at 10:33 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Apr 16, 2015 at 03:41:54PM +0100, Simon Riggs wrote:
> That is how we arrive at the idea of a cleanup limit, further enhanced by a
> limit that applies only to dirtying clean blocks, which we have 4? recent votes
> in favour of.
>
> I would personally be in favour of a parameter to control the limit, since
> whatever we chose is right/wrong depending upon circumstances. I am however
> comfortable with not having a parameter if people think it is hard to tune
> that, which I agree it would be, hence no parameter in the patch.

I think the limit has to be in terms of a percentage of the table size.
For example, if we do one SELECT on a table with all non-dirty pages, it
would be good to know that 5% of the pages were pruned --- that tells me
that another 19 SELECTs will totally prune the table, assuming no future
writes. 

But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job, while the user waits, which is fundamentally VACUUM's duty to do in the background?  If there are a handful of very hot pages, then it makes sense not to wait for vacuum to get to them.  And that is what a block-count limit does.  

But if the entire table is very hot, I think that that is just another of way of saying that autovacuum is horribly misconfigured.  I think the purpose of this patch is to fix something that can't be fixed through configuration alone.
 
If there are future writes, they would dirty the pages and
cause even more pruning, but the 5% gives me the maximum pruning number
of SELECTs.  If there aren't another 19 SELECTs, do I care if the table
is pruned or not? 

The use case I see for this is when there is a mixed workload.  There is one select which reads the entire table, and hundreds of thousands of selects/updates/insert that don't, and of course vacuum comes along every now and then and does it thing.  Why should the one massive SELECT have horrible performance just because it was run right before autovacuum would have kicked in instead of right after if finished?
 
Cheers,

Jeff

Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > I think the limit has to be in terms of a percentage of the table size. 
> > For example, if we do one SELECT on a table with all non-dirty pages, it
> > would be good to know that 5% of the pages were pruned --- that tells me
> > that another 19 SELECTs will totally prune the table, assuming no future
> > writes.
> 
> This seems simple to implement: keep two counters, where the second one
> is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
> reset the first counter so that further 5 pages will get HOT pruned.  5%
> seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
> essentially +infinity.)

Oh, I pulled 5% out of the air.  Thinking of a SELECT-only workload,
which would be our worse case, I was thinking how many SELECTS running
through HOT update chains would it take to be slower than generating the
WAL to prune the page.  I see the percentage as something that we could
reasonably balance, while a fixed page count couldn't be analyzed in
that way.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Mon, Apr 20, 2015 at 12:28:11PM -0700, Jeff Janes wrote:
> But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job, while
> the user waits, which is fundamentally VACUUM's duty to do in the background? 
> If there are a handful of very hot pages, then it makes sense not to wait for
> vacuum to get to them.  And that is what a block-count limit does.  
> 
> But if the entire table is very hot, I think that that is just another of way
> of saying that autovacuum is horribly misconfigured.  I think the purpose of

Well, we have to assume there are many misconfigured configurations ---
autovacuum isn't super-easy to configure, so we can't just blame the
user if this makes things worse.  In fact, page pruning was designed
spefically for cases where autovacuum wasn't running our couldn't keep
up.

> this patch is to fix something that can't be fixed through configuration alone.
>  
> 
>     If there are future writes, they would dirty the pages and
>     cause even more pruning, but the 5% gives me the maximum pruning number
>     of SELECTs.  If there aren't another 19 SELECTs, do I care if the table
>     is pruned or not? 
> 
> 
> The use case I see for this is when there is a mixed workload.  There is one
> select which reads the entire table, and hundreds of thousands of selects/
> updates/insert that don't, and of course vacuum comes along every now and then
> and does it thing.  Why should the one massive SELECT have horrible performance
> just because it was run right before autovacuum would have kicked in instead of
> right after if finished?

I see your point, but what about the read-only workload after a big
update?  Do we leave large tables to be non-pruned for a long time? 
Also, consider cases where you did a big update, the autovacuum
thresh-hold was not met, so autovacuum doesn't run on that table ---
again, do we keep those non-pruned rows around for millions of scans?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Simon Riggs
Date:
On 20 April 2015 at 20:28, Jeff Janes <jeff.janes@gmail.com> wrote:
 
But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job, while the user waits, which is fundamentally VACUUM's duty to do in the background? 

Agreed. I don't see a % as giving us anything at all.

The idea is that we want to turn an O(N) problem for one query into an O(1) task.
 
The use case I see for this is when there is a mixed workload.  There is one select which reads the entire table, and hundreds of thousands of selects/updates/insert that don't, and of course vacuum comes along every now and then and does it thing.  Why should the one massive SELECT have horrible performance just because it was run right before autovacuum would have kicked in instead of right after if finished?

+1

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Mon, Apr 20, 2015 at 3:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job, while
> the user waits, which is fundamentally VACUUM's duty to do in the
> background?  If there are a handful of very hot pages, then it makes sense
> not to wait for vacuum to get to them.  And that is what a block-count limit
> does.

I think that's a fundamental mischaracterization of the problem.  As
soon as you define this as "vacuum's problem", then of course it makes
no sense to prune in the foreground, ever.  But if you define the
problem as "get the best overall system performance", then it clearly
DOES sometimes make sense to prune in the foreground, as benchmark
results upthread demonstrate.

The fact is that on a workload like pgbench - and it doesn't have to
be exactly pgbench, just any kind of workload where there are lots of
changes to the table - vacuum can at any given time be pruning at most
one page of the table.  That is because only one vacuum process can be
running in a given table at one time, and it can't be doing two things
at once.  But there can be many processes doing inserts, updates, or
deletes on that table, as many as whatever you have max_connections
set to.  There can easily be dozens even on a well-configured system;
on a poorly configured system, there could be hundreds.  It seems
obvious that if you can have dozens or hundreds of processes creating
garbage and at most one process cleaning it up, there will be cases
where you get further and further behind.

Now, it might well be that the right solution to that problem is to
allow multiple vacuum processes in the same database, or add
background workers to help with opportunistic HOT-pruning of pages so
it doesn't get done in the foreground.  Fine.  But as of today, on a
heavily-modified table, the ONLY way that we can possibly remove junk
from the table as fast as we're creating junk is if the backends
touching the table do some of the work.  Now, Simon is making the
argument that it should be good enough to have people *modifying* the
table help with the cleanup rather than imposing that load on the
people who are only *reading* it, and that's not a dumb argument, but
there are still cases where that strategy loses - specifically, where
the table churn has stopped or paused, by autovacuum hasn't run yet.
If you're going to do 1 sequential scan of the table and then go home
for the day, HOT-pruning is dumb even in that case.  If you're going
to do 1000 sequential scans of that table in a row, HOT-pruning may
very well be smart.  There's no guarantee that the table has met the
autovacuum threshold, but HOT-pruning it could well be a win anyway.
Or it might be a loss.  You can make any policy here look smart or
dumb by picking a particular workload, and you don't even have to
invent crazy things that will never happen in real life to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > I think the limit has to be in terms of a percentage of the table size. 
> > For example, if we do one SELECT on a table with all non-dirty pages, it
> > would be good to know that 5% of the pages were pruned --- that tells me
> > that another 19 SELECTs will totally prune the table, assuming no future
> > writes.
> 
> This seems simple to implement: keep two counters, where the second one
> is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
> reset the first counter so that further 5 pages will get HOT pruned.  5%
> seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
> essentially +infinity.)

This would tend to dirty non-sequential heap pages --- it seems best to
just clean as many as we are supposed to, then skip the rest, so we can
write sequential dirty pages to storage.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Mon, Apr 20, 2015 at 09:56:20PM +0100, Simon Riggs wrote:
> On 20 April 2015 at 20:28, Jeff Janes <jeff.janes@gmail.com> wrote:
>  
> 
>     But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job,
>     while the user waits, which is fundamentally VACUUM's duty to do in the
>     background? 
> 
> 
> Agreed. I don't see a % as giving us anything at all.
> 
> The idea is that we want to turn an O(N) problem for one query into an O(1)
> task.
>  
> 
>     The use case I see for this is when there is a mixed workload.  There is
>     one select which reads the entire table, and hundreds of thousands of
>     selects/updates/insert that don't, and of course vacuum comes along every
>     now and then and does it thing.  Why should the one massive SELECT have
>     horrible performance just because it was run right before autovacuum would
>     have kicked in instead of right after if finished?
> 
> 
> +1

You can +1 all you want, but if you ignore the specific workloads I
mentioned, you are not going to get much traction.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
> On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > 
> > This seems simple to implement: keep two counters, where the second one
> > is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
> > reset the first counter so that further 5 pages will get HOT pruned.  5%
> > seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
> > essentially +infinity.)
> 
> This would tend to dirty non-sequential heap pages --- it seems best to
> just clean as many as we are supposed to, then skip the rest, so we can
> write sequential dirty pages to storage.

Keep in mind there's a disconnect between dirtying a page and writing it
to storage.  A page could remain dirty for a long time in the buffer
cache.  This writing of sequential pages would occur at checkpoint time
only, which seems the wrong thing to optimize.  If some other process
needs to evict pages to make room to read some other page in, surely
it's going to try one page at a time, not write "many sequential dirty
pages."

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Mon, Apr 20, 2015 at 07:13:38PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > > 
> > > This seems simple to implement: keep two counters, where the second one
> > > is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
> > > reset the first counter so that further 5 pages will get HOT pruned.  5%
> > > seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
> > > essentially +infinity.)
> > 
> > This would tend to dirty non-sequential heap pages --- it seems best to
> > just clean as many as we are supposed to, then skip the rest, so we can
> > write sequential dirty pages to storage.
> 
> Keep in mind there's a disconnect between dirtying a page and writing it
> to storage.  A page could remain dirty for a long time in the buffer
> cache.  This writing of sequential pages would occur at checkpoint time
> only, which seems the wrong thing to optimize.  If some other process
> needs to evict pages to make room to read some other page in, surely
> it's going to try one page at a time, not write "many sequential dirty
> pages."

Yes, it might be too much optimization to try to get the checkpoint to
flush all those pages sequentially, but I was thinking of our current
behavior where, after an update of all rows, we effectively write out
the entire table because we have dirtied every page.  I guess with later
prune-based writes, we aren't really writing all the pages as we have
the pattern where pages with prunable content is kind of random. I guess
I was just wondering what value there is to your write-then-skip idea,
vs just writing the first X% of pages we find?  Your idea certainly
spreads out the pruning, and doesn't require knowing the size of the
table, though I though that information was easily determined.

One thing to consider is how we handle pruning of index scans that hit
multiple heap pages.  Do we still write X% of the pages in the table, or
%X of the heap pages we actually access via SELECT?  With the
write-then-skip approach, we would do X% of the pages we access, while
with the first-X% approach, we would probably prune all of them as we
would not be accessing most of the table.  I don't think we can do the
first first-X% of pages and have the percentage based on the number of
pages accessed as we have no way to know how many heap pages we will
access from the index.  (We would know for bitmap scans, but that
complexity doesn't seem worth it.)  That would argue, for consistency
with sequential and index-based heap access, that your approach is best.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Mon, Apr 20, 2015 at 6:13 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Bruce Momjian wrote:
>> On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
>> > Bruce Momjian wrote:
>> > This seems simple to implement: keep two counters, where the second one
>> > is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
>> > reset the first counter so that further 5 pages will get HOT pruned.  5%
>> > seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
>> > essentially +infinity.)
>>
>> This would tend to dirty non-sequential heap pages --- it seems best to
>> just clean as many as we are supposed to, then skip the rest, so we can
>> write sequential dirty pages to storage.
>
> Keep in mind there's a disconnect between dirtying a page and writing it
> to storage.  A page could remain dirty for a long time in the buffer
> cache.  This writing of sequential pages would occur at checkpoint time
> only, which seems the wrong thing to optimize.  If some other process
> needs to evict pages to make room to read some other page in, surely
> it's going to try one page at a time, not write "many sequential dirty
> pages."

Well, for a big sequential scan, we use a ring buffer, so we will
typically be evicting the pages that we ourselves read in moments
before.  So in this case we would do a lot of sequential writes of
dirty pages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Jim Nasby
Date:
On 4/21/15 10:04 AM, Bruce Momjian wrote:
> One thing to consider is how we handle pruning of index scans that hit
> multiple heap pages.  Do we still write X% of the pages in the table, or
> %X of the heap pages we actually access via SELECT?  With the
> write-then-skip approach, we would do X% of the pages we access, while
> with the first-X% approach, we would probably prune all of them as we
> would not be accessing most of the table.  I don't think we can do the
> first first-X% of pages and have the percentage based on the number of
> pages accessed as we have no way to know how many heap pages we will
> access from the index.

This comment made me wonder... has anyone considered handing the pruning 
work off to a bgworker, at least for SELECTs? That means the selects 
themselves wouldn't be burdened by the actual prune work, only in 
notifying the bgworker. While that's not going to be free, presumably 
it's a lot cheaper...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Tue, Apr 21, 2015 at 11:04 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Yes, it might be too much optimization to try to get the checkpoint to
> flush all those pages sequentially, but I was thinking of our current
> behavior where, after an update of all rows, we effectively write out
> the entire table because we have dirtied every page.  I guess with later
> prune-based writes, we aren't really writing all the pages as we have
> the pattern where pages with prunable content is kind of random. I guess
> I was just wondering what value there is to your write-then-skip idea,
> vs just writing the first X% of pages we find?  Your idea certainly
> spreads out the pruning, and doesn't require knowing the size of the
> table, though I though that information was easily determined.
>
> One thing to consider is how we handle pruning of index scans that hit
> multiple heap pages.  Do we still write X% of the pages in the table, or
> %X of the heap pages we actually access via SELECT?  With the
> write-then-skip approach, we would do X% of the pages we access, while
> with the first-X% approach, we would probably prune all of them as we
> would not be accessing most of the table.  I don't think we can do the
> first first-X% of pages and have the percentage based on the number of
> pages accessed as we have no way to know how many heap pages we will
> access from the index.  (We would know for bitmap scans, but that
> complexity doesn't seem worth it.)  That would argue, for consistency
> with sequential and index-based heap access, that your approach is best.

I actually implemented something like this for setting hint bits a few
years ago:

http://www.postgresql.org/message-id/AANLkTik5QzR8wTs0MqCWwmNp-qHGrdKY5Av5aOB7W4Dp@mail.gmail.com
http://www.postgresql.org/message-id/AANLkTimGKaG7wdu-x77GNV2Gh6_Qo5Ss1u5b6Q1MsPUy@mail.gmail.com

At least in later versions, the patch writes a certain number of
hinted pages, then skips writing a run of pages, then writes another
run of hinted pages.  The basic problem here is that, after the fsync
queue compaction patch went in, the benefits on my tests were pretty
modest.  Yeah, it costs something to write out lots of dirty pages,
but before the fsync queue compaction stuff, the initial scan of an
unhinted table took like 6x the time on the machine I tested on, but
after that, it was like 1.5x the time.  Blunting that spike just
wasn't exciting enough.

It strikes me that it would be better to have an integrated strategy
for this problem.  It doesn't make sense to have one strategy for
deciding whether to set hint bits and a separate strategy for deciding
whether to HOT-prune.  And if we decide to set hint bits and
HOT-prune, it might be smart to try to mark the page all-visible, too,
if it is and we're not about to update it.  I believe we're losing a
lot of performance on OLTP workloads by re-dirtying the same pages
over and over again.  We've probably all hit cases where there is an
obvious loss of performance because of this sort of thing, but I'm
starting to think it's hurting us in a lot of less-obvious ways.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Peter Eisentraut
Date:
On 4/21/15 4:45 PM, Jim Nasby wrote:
> This comment made me wonder... has anyone considered handing the pruning
> work off to a bgworker, at least for SELECTs? That means the selects
> themselves wouldn't be burdened by the actual prune work, only in
> notifying the bgworker. While that's not going to be free, presumably
> it's a lot cheaper...

The nice thing about having foreground queries to the light cleanup is
that they can work in parallel and naturally hit the interesting parts
of the table first.

In order for a background worker to keep up with some of the workloads
that have been presented as counterexamples, you'd need multiple
background workers operating in parallel and preferring to work on
certain parts of a table.  That would require a lot more sophisticated
job management than we currently have for, say, autovacuum.




Re: Turning off HOT/Cleanup sometimes

From
Greg Stark
Date:
On Mon, Apr 20, 2015 at 8:48 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
>> But if the entire table is very hot, I think that that is just another of way
>> of saying that autovacuum is horribly misconfigured.  I think the purpose of
>
> Well, we have to assume there are many misconfigured configurations ---
> autovacuum isn't super-easy to configure, so we can't just blame the
> user if this makes things worse.  In fact, page pruning was designed
> spefically for cases where autovacuum wasn't running our couldn't keep
> up.

Well autovacuum isn't currently considering HOT pruning part of its
job at all. It's hard to call it "misconfigured" when there's
literally *no* way to configure it "correctly".

If you update less than autovacuum_vacuum_scale_factor fraction of the
table and then never update another row autovacuum will never run.
Ever. Every select will forevermore need to follow hot chains on that
table. Until eventually transaction wraparound forces a vacuum on that
table if that ever happens.

Possibly autovacuum could be adjusted to count how many selects are
happening on the table and decide to vacuum it when the cost of the
selects following the dead tuples is balanced by the cost of doing a
vacuum. But that's not something included in the design of autovacuum
today.

The original design of tuple storage was aimed at optimizing the
steady state where most tuples were not recently updated. It
guaranteed that except for tuples that were in the process of being
updated or were recently updated a tuple read didn't have to read the
CLOG, didn't have to follow any chains, didn't have to do any I/O or
other work other than to read the bits on the tuple itself. When a
tuple is updated it's put into a state where everyone who comes along
has to do extra work but as soon as practical the hint bits get set
and that extra work stops.

We had similar discussions about setting hint bits in the past. I'm
not sure why HOT pruning is the focus now because I actually think
hint bit setting is a larger source of I/O in innocent looking selects
even today. And it's a major headache, people are always being
surprised that their selects cause lots of I/O and slow down
dramatically after a big update or data load has finished. It's
characterized as "why is the database writing everything twice" (and
saying it's actually writing everything three times doesn't make
people feel better). In the new age of checksums with hint bit logging
I wonder if it's even a bigger issue.

It occurs to me that generating these dirty pages isn't really that
expensive individually. It's only that there's a sudden influx of a
large number of dirty pages that causes them to get translated
immediately into filesystem I/O. Perhaps we should dirty pages on hint
bit updates and do HOT pruning only to the extent it can be done
without causing I/O. Of course it's hard to tell that in advance  but
maybe something like "if the current buffer had to be fetched and
caused a dirty buffer to be evicted then skip hot pruning and don't
dirty it for any hint bit updates" would at least mean that once the
select fills up its share of buffers with dirty buffers it stops
dirtying more. It would dirty pages only as fast as bgwriter or
checkpoints manage to write them out.

That sounds a bit weird but I think the right solution should have
that combination of properties. It should guarantee that hint bits get
set and hot chains pruned within some length of time but that no one
select causes a storm of dirty buffers that then need to be flushed to
disk.


-- 
greg



Re: Turning off HOT/Cleanup sometimes

From
Jim Nasby
Date:
On 4/21/15 4:07 PM, Peter Eisentraut wrote:
> On 4/21/15 4:45 PM, Jim Nasby wrote:
> In order for a background worker to keep up with some of the workloads
> that have been presented as counterexamples, you'd need multiple
> background workers operating in parallel and preferring to work on
> certain parts of a table.  That would require a lot more sophisticated
> job management than we currently have for, say, autovacuum.

My thought was that the foreground queries would send page IDs to the 
bgworker via a shmq. If the queries have to do much waiting at all on IO 
then I'd expect the bgworker to be able to keep pace with a bunch of 
them since it's just grabbing buffers that are already in the pool (and 
only those in the pool; it wouldn't make sense for it to pull it back 
from the kernel, let alone disk).

We'd need to code this so that if a queue fills up the query doesn't 
block; we just skip that opportunity to prune. I think that'd be fine.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Tue, Apr 21, 2015 at 04:36:53PM -0400, Robert Haas wrote:
> > Keep in mind there's a disconnect between dirtying a page and writing it
> > to storage.  A page could remain dirty for a long time in the buffer
> > cache.  This writing of sequential pages would occur at checkpoint time
> > only, which seems the wrong thing to optimize.  If some other process
> > needs to evict pages to make room to read some other page in, surely
> > it's going to try one page at a time, not write "many sequential dirty
> > pages."
> 
> Well, for a big sequential scan, we use a ring buffer, so we will
> typically be evicting the pages that we ourselves read in moments
> before.  So in this case we would do a lot of sequential writes of
> dirty pages.

Ah, yes, this again supports the prune-then-skip approach, rather than
doing the first X% pruneable pages seen.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Peter Eisentraut
Date:
On 4/22/15 11:37 AM, Jim Nasby wrote:
> On 4/21/15 4:07 PM, Peter Eisentraut wrote:
>> On 4/21/15 4:45 PM, Jim Nasby wrote:
>> In order for a background worker to keep up with some of the workloads
>> that have been presented as counterexamples, you'd need multiple
>> background workers operating in parallel and preferring to work on
>> certain parts of a table.  That would require a lot more sophisticated
>> job management than we currently have for, say, autovacuum.
> 
> My thought was that the foreground queries would send page IDs to the
> bgworker via a shmq. If the queries have to do much waiting at all on IO
> then I'd expect the bgworker to be able to keep pace with a bunch of
> them since it's just grabbing buffers that are already in the pool (and
> only those in the pool; it wouldn't make sense for it to pull it back
> from the kernel, let alone disk).
> 
> We'd need to code this so that if a queue fills up the query doesn't
> block; we just skip that opportunity to prune. I think that'd be fine.

I think a "to-clean-up map" would work better.  But basically we need a
way to remember where to clean up later if we're not going to do it in
the foreground.




Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Tue, Apr 21, 2015 at 05:07:52PM -0400, Peter Eisentraut wrote:
> On 4/21/15 4:45 PM, Jim Nasby wrote:
> > This comment made me wonder... has anyone considered handing the pruning
> > work off to a bgworker, at least for SELECTs? That means the selects
> > themselves wouldn't be burdened by the actual prune work, only in
> > notifying the bgworker. While that's not going to be free, presumably
> > it's a lot cheaper...
> 
> The nice thing about having foreground queries to the light cleanup is
> that they can work in parallel and naturally hit the interesting parts
> of the table first.
> 
> In order for a background worker to keep up with some of the workloads
> that have been presented as counterexamples, you'd need multiple
> background workers operating in parallel and preferring to work on
> certain parts of a table.  That would require a lot more sophisticated
> job management than we currently have for, say, autovacuum.

Well, the visibility map tells us where _not_ to clean up, so using
another map to tell use _where_ to cleanup might make sense.  However,
the density of the map might be low enough that a list makes more sense,
as you suggested.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Kevin Grittner
Date:
Greg Stark <stark@mit.edu> wrote:

> And it's a major headache, people are always being surprised that
> their selects cause lots of I/O and slow down dramatically after
> a big update or data load has finished. It's characterized as
> "why is the database writing everything twice" (and saying it's
> actually writing everything three times doesn't make people feel
> better).

When I looked at the life-cycle of a heap tuple in a database I was
using, I found that (ignoring related index access and ignoring
WAL-file copying, etc., for our backups), each tuple that existed
long enough to freeze and be eventually deleted caused a lot of
writes.

(1) WAL log the insert.
(2) Write the tuple.
(3) Hint and rewrite the tuple.
(4) WAL log the freeze of the tuple.
(5) Rewrite the frozen tuple.
(6) WAL-log the delete.
(7) Rewrite the deleted tuple.
(8) Prune and rewrite the page.
(9) Free line pointers and rewrite the page.

If I was lucky some of the writes could be combined in cache
because they happened close enough together.  Also, one could hope
that not too much of the WAL-logging involved full page writes to
the WAL -- again, keeping steps close together in time helps with
that.  If all of (1) through (5) are done in quick succession, you
save two physical writes of the heap page and save one full page

write to WAL.  If steps (7) through (9) are done in quick
succession, you save two more physical writes to the heap.  This is
part of what makes the aggressive incremental freezing being
discussed on a nearby thread appealing -- at least for some
workloads.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Wed, Apr 22, 2015 at 04:36:17PM +0100, Greg Stark wrote:
> On Mon, Apr 20, 2015 at 8:48 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Well, we have to assume there are many misconfigured configurations ---
> > autovacuum isn't super-easy to configure, so we can't just blame the
> > user if this makes things worse.  In fact, page pruning was designed
> > spefically for cases where autovacuum wasn't running our couldn't keep
> > up.
> 
> Well autovacuum isn't currently considering HOT pruning part of its
> job at all. It's hard to call it "misconfigured" when there's
> literally *no* way to configure it "correctly".

Good point, but doesn't vacuum remove the need for pruning as it removes
all the old rows?

> If you update less than autovacuum_vacuum_scale_factor fraction of the
> table and then never update another row autovacuum will never run.
> Ever. Every select will forevermore need to follow hot chains on that
> table. Until eventually transaction wraparound forces a vacuum on that
> table if that ever happens.

Yes, that is a very good point, and it matches my concerns.  Of course,
Simon's concern is to avoid overly-aggressive pruning where the row is
being pruned but will soon be modified, making the prune, and its WAL
volume, undesirable.  We have to consider both cases in any final
solution.

> Possibly autovacuum could be adjusted to count how many selects are
> happening on the table and decide to vacuum it when the cost of the
> selects following the dead tuples is balanced by the cost of doing a
> vacuum. But that's not something included in the design of autovacuum
> today.

Well, autovacuum is also going to clean indexes, which seem like
overkill for pruning HOT updates.

> The original design of tuple storage was aimed at optimizing the
> steady state where most tuples were not recently updated. It
> guaranteed that except for tuples that were in the process of being
> updated or were recently updated a tuple read didn't have to read the
> CLOG, didn't have to follow any chains, didn't have to do any I/O or
> other work other than to read the bits on the tuple itself. When a
> tuple is updated it's put into a state where everyone who comes along
> has to do extra work but as soon as practical the hint bits get set
> and that extra work stops.

Yes, Simon is right that doing everything as-soon-as-possible is not
optimal.  I think the trick is knowing when we should give up waiting
for something else to dirty the page and prune it.

> We had similar discussions about setting hint bits in the past. I'm
> not sure why HOT pruning is the focus now because I actually think
> hint bit setting is a larger source of I/O in innocent looking selects
> even today. And it's a major headache, people are always being
> surprised that their selects cause lots of I/O and slow down
> dramatically after a big update or data load has finished. It's
> characterized as "why is the database writing everything twice" (and
> saying it's actually writing everything three times doesn't make
> people feel better). In the new age of checksums with hint bit logging
> I wonder if it's even a bigger issue.

What would be the downside of only doing pruning during SELECT hint bit
setting?  Hinting is delayed by long-running transactions, but so is
pruning.  I assume you can do more pruning than setting all_visible
hints because the old prunable rows are older by definition, but I am
unclear how much older they are.

FYI, while hint bit setting causes page writes, it does not cause WAL
writes unless you have wal_log_hints set or page-level checksums are
enabled.  By doing pruning at the same time as hint bit setting, you are
sharing the same page write, but are generating more WAL.  Of course, if
you are setting all-visible, then you are by definition waiting longer
to prune than before, and this might be enough to make it a win for all
use cases.  You wouldn't never-prune in a read-only workload because
your hint bits would eventually cause the pruning.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
> On Wed, Apr 22, 2015 at 04:36:17PM +0100, Greg Stark wrote:
> > On Mon, Apr 20, 2015 at 8:48 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > Well, we have to assume there are many misconfigured configurations ---
> > > autovacuum isn't super-easy to configure, so we can't just blame the
> > > user if this makes things worse.  In fact, page pruning was designed
> > > spefically for cases where autovacuum wasn't running our couldn't keep
> > > up.
> > 
> > Well autovacuum isn't currently considering HOT pruning part of its
> > job at all. It's hard to call it "misconfigured" when there's
> > literally *no* way to configure it "correctly".
> 
> Good point, but doesn't vacuum remove the need for pruning as it removes
> all the old rows?

Sure.  The point, I think, is to make autovacuum runs of some sort that
don't actually vacuum but only do HOT-pruning.  Maybe this is a
reasonable solution to the problem that queries don't prune anymore
after Simon's patch.  If we made autovac HOT-prune periodically, we
could have read-only queries prune only already-dirty pages.  Of course,
that would need further adjustments to default number of autovac
workers, I/O allocation, etc.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Turning off HOT/Cleanup sometimes

From
Bruce Momjian
Date:
On Wed, Apr 22, 2015 at 06:07:00PM -0300, Alvaro Herrera wrote:
> > Good point, but doesn't vacuum remove the need for pruning as it removes
> > all the old rows?
> 
> Sure.  The point, I think, is to make autovacuum runs of some sort that
> don't actually vacuum but only do HOT-pruning.  Maybe this is a
> reasonable solution to the problem that queries don't prune anymore
> after Simon's patch.  If we made autovac HOT-prune periodically, we
> could have read-only queries prune only already-dirty pages.  Of course,
> that would need further adjustments to default number of autovac
> workers, I/O allocation, etc.

Do we really want to make vacuum more complex for this?  vacuum does
have the delay settings we would need though.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Turning off HOT/Cleanup sometimes

From
Jim Nasby
Date:
On 4/22/15 1:51 PM, Kevin Grittner wrote:
> (1) WAL log the insert.
> (2) Write the tuple.
> (3) Hint and rewrite the tuple.
> (4) WAL log the freeze of the tuple.
> (5) Rewrite the frozen tuple.
> (6) WAL-log the delete.
> (7) Rewrite the deleted tuple.
> (8) Prune and rewrite the page.
> (9) Free line pointers and rewrite the page.
>
> If I was lucky some of the writes could be combined in cache
> because they happened close enough together. Also, one could hope
> that not too much of the WAL-logging involved full page writes to
> the WAL -- again, keeping steps close together in time helps with
> that.

This is why I like the idea of methods that tell us where we need to do 
cleanup... they provide us with a rough ability to track what tuples are 
in what part of their lifecycle. The VM helps with this a small amount, 
but really it only applies after 1 and 6; it doesn't help us with any 
other portions.

Having a way to track recently created tuples would allow us to be much 
more efficient with 1-3, and with aggressive freezing, 1-5. A way to 
track recently deleted tuples would help with 6-7, possibly 6-9 if no 
indexes.

If we doubled the size of the VM, that would let us track 4 states for 
each page:

- Page has newly inserted tuples
- Page has newly deleted tuples
- Page is all visible
- Page is frozen

though as discussed elsewhere, we could probably combine all visible and 
frozen.

The win from doing this would be easily knowing what pages need hinting 
(newly inserted) and pruning (newly deleted). Unfortunately we still 
wouldn't know whether we could do real work without visiting the page 
itself, but I suspect that for many workloads just having newly 
inserted/deleted would be a serious win.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Wed, Apr 22, 2015 at 5:17 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Apr 22, 2015 at 06:07:00PM -0300, Alvaro Herrera wrote:
>> > Good point, but doesn't vacuum remove the need for pruning as it removes
>> > all the old rows?
>>
>> Sure.  The point, I think, is to make autovacuum runs of some sort that
>> don't actually vacuum but only do HOT-pruning.  Maybe this is a
>> reasonable solution to the problem that queries don't prune anymore
>> after Simon's patch.  If we made autovac HOT-prune periodically, we
>> could have read-only queries prune only already-dirty pages.  Of course,
>> that would need further adjustments to default number of autovac
>> workers, I/O allocation, etc.
>
> Do we really want to make vacuum more complex for this?  vacuum does
> have the delay settings we would need though.

I think it's abundantly clear that, as wonderful as autovacuum is
compared with what we had before autovacuum, it's not good enough.
This is one area where I think improvement is definitely needed, and
I've suggested it before.  Discussion began here:

http://www.postgresql.org/message-id/AANLkTimd3ieGCm9pXV39ci6-owy3rX0mzz_N1tL=0ZLm@mail.gmail.com

Some of the things I suggested then seem dumb in hindsight, but I
think the basic concept is still valid: if we scan the heap and find
only a few dead tuples, the expense of scanning all of the indexes may
not be justified.  Also, the fact that a relation can currently only
be vacuumed by one process at a time is coming to seem like a major
limitation.  Some users are partitioning tables just so that each
partition can be autovac'd separately.  That really shouldn't be
required.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Jim Nasby
Date:
On 4/23/15 8:25 AM, Robert Haas wrote:
> Some users are partitioning tables just so that each
> partition can be autovac'd separately.  That really shouldn't be
> required.

Are they doing this for improved heap scan performance? Index scan 
performance? If the table wasn't partitioned, would they need more than 
one pass through the indexes due to exhausting maintenance_work_mem?

There's probably some fairly low-hanging fruit for parallelizing vacuum, 
but it really depends on what problems people are running into.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Turning off HOT/Cleanup sometimes

From
Robert Haas
Date:
On Thu, Apr 23, 2015 at 10:44 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 4/23/15 8:25 AM, Robert Haas wrote:
>> Some users are partitioning tables just so that each
>> partition can be autovac'd separately.  That really shouldn't be
>> required.
>
> Are they doing this for improved heap scan performance? Index scan
> performance? If the table wasn't partitioned, would they need more than one
> pass through the indexes due to exhausting maintenance_work_mem?

I don't know of anyone with a properly-configured system who needs
more than one pass through the indexes due to exhausting
maintenance_work_mem.  The issue is that you have to vacuum a table
frequently enough to avoid accumulating bloat.  The frequency with
which you need to vacuum varies depending on the size of the table and
how frequently it's updated.  However, a large, heavily-updated table
can take long enough to vacuum that, by the time you get done, it's
already overdue to be vacuumed again.  That's a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
On 2015-04-23 15:40:36 -0400, Robert Haas wrote:
> The issue is that you have to vacuum a table frequently enough to
> avoid accumulating bloat.  The frequency with which you need to vacuum
> varies depending on the size of the table and how frequently it's
> updated.  However, a large, heavily-updated table can take long enough
> to vacuum that, by the time you get done, it's already overdue to be
> vacuumed again.  That's a problem.

Especially because the indexes are scanned fully. In many cases I've
observed the heap scans themselves being fast; but scanning hundreds
(yes) of gigabytes of indexes taking ages.

Andres



Re: Turning off HOT/Cleanup sometimes

From
Andres Freund
Date:
Hi Simon,

On 2015-04-20 19:21:24 +0100, Simon Riggs wrote:
> Since we have many votes in favour of change in this area I'll post a new
> version and look for an early review/commit for next release.

If I see correctly there's been no new patch version since, right? The
patch is included in the current commitfest as needing review
nonetheless - that's possibly because it wasn't possible to "return with
feedback" for a while?

I marked it as RWF for now, if I missed something we can change it back.

Greetings,

Andres Freund