Re: Turning off HOT/Cleanup sometimes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Turning off HOT/Cleanup sometimes
Date
Msg-id CA+TgmoZ7T+KQyGYwYqFEGkwK3SnYMM3bqQzj1+=0CiKAS8YF4A@mail.gmail.com
Whole thread Raw
In response to Re: Turning off HOT/Cleanup sometimes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Turning off HOT/Cleanup sometimes  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Thu, Jan 9, 2014 at 12:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jan 8, 2014 at 3:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> We also make SELECT clean up blocks as it goes. That is useful in OLTP
>>> workloads, but it means that large SQL queries and pg_dump effectively
>>> do much the same work as VACUUM, generating huge amounts of I/O and
>>> WAL on the master, the cost and annoyance of which is experienced
>>> directly by the user. That is avoided on standbys.
>
>> On a pgbench workload, though, essentially all page cleanup happens as
>> a result of HOT cleanups, like >99.9%.  It might be OK to have that
>> happen for write operations, but it would be a performance disaster if
>> updates didn't try to HOT-prune.  Our usual argument for doing HOT
>> pruning even on SELECT cleanups is that not doing so pessimizes
>> repeated scans, but there are clearly cases that end up worse off as a
>> result of that decision.
>
> My recollection of the discussion when HOT was developed is that it works
> that way not because anyone thought it was beneficial, but simply because
> we didn't see an easy way to know when first fetching a page whether we're
> going to try to UPDATE some tuple on the page.  (And we can't postpone the
> pruning, because the query will have tuple pointers into the page later.)
> Maybe we should work a little harder on passing that information down.
> It seems reasonable to me that SELECTs shouldn't be tasked with doing
> HOT pruning.
>
>> I'm not entirely wild about adding a parameter in this area because it
>> seems that we're increasingly choosing to further expose what arguably
>> ought to be internal implementation details.
>
> I'm -1 for a parameter as well, but I think that just stopping SELECTs
> from doing pruning at all might well be a win.  It's at least worthy
> of some investigation.

Unfortunately, there's no categorical answer.  You can come up with
workloads where HOT pruning on selects is a win; just create a bunch
of junk and then read the same pages lots of times in a row.  And you
can also come up with workloads where it's a loss; create a bunch of
junk and then read them just once.  I don't know how easy it's going
to be to set that parameter in a useful way for some particular
environment, and I think that's possibly an argument against having
it.  But the argument that we don't need a parameter because one
behavior is best for everyone is not going to fly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Next
From: Robert Haas
Date:
Subject: Re: newlines at end of generated SQL