Re: HOT latest patch - version 8 - Mailing list pgsql-patches

From Simon Riggs
Subject Re: HOT latest patch - version 8
Date
Msg-id 1184537041.4512.402.camel@ebony.site
Whole thread Raw
In response to Re: HOT latest patch - version 8  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: HOT latest patch - version 8  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-patches
On Fri, 2007-07-13 at 16:22 +0100, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > I have some suggestions which I'll post separately,

> I'm looking for ways to make the patch simpler and less invasive. We may
> want to put back some of this stuff, or come up with a more clever
> solution, in future releases, but right let's keep it simple.

I believe we're all trying to do that, but I would like to see some
analysis of which techniques are truly effective and which are not.
Simpler code may not have desirable behaviour and then the whole lot of
code is pointless. Let's make it effective by making it complex enough.
I'm not clear where the optimum lies. (c.f. Flying Buttresses).

> A significant chunk of the complexity and new code in the patch comes
> from pruning hot chains and reusing the space for new updates. Because
> we can't reclaim dead space in the page like a VACUUM does, without
> holding the vacuum lock, we have to deal with pages that contain deleted
> tuples, and be able to reuse them, and keep track of the changes in
> tuple length etc.
>
> A much simpler approach would be to try to acquire the vacuum lock, and
> compact the page the usual way, and fall back to a cold update if we
> can't get the lock immediately.
>
> The obvious downside of that is that if a page is continuously pinned,
> we can't HOT update tuples on it. Keeping in mind that the primary use
> case for HOT is largeish tables, small tables are handled pretty well by
> autovacuum, chances are pretty good that you can get a vacuum lock when
> you need it.

The main problem HOT seeks to avoid is wasted inserts into indexes, and
the subsequent VACUUMing that requires. Small tables have smaller
indexes, so that the VACUUMing is less of a problem. If we have hot
spots in larger tables, DSM would allow us to avoid the I/O on the main
table, but we would still need to scan the indexes. So HOT *can* be
better than DSM. I'm worried that requiring the vacuum lock in all cases
will mean that HOT will be ineffective where it is needed most - in the
hot spots - i.e. the blocks that contain frequently updated rows. [As an
aside, in OLTP it is frequently the index blocks that become hot spots,
so reducing index inserts because of UPDATEs will also reduce block
contention]

Our main test case for OLTP is DBT-2 which follows TPC-C in being
perfectly scalable with no hot spots in the heap and limited hot spots
in the indexes. As such it's a poor test of real world applications,
where Benfold's Law holds true. Requiring the vacuum lock in all cases
would allow good benchmark performance but would probably fail in the
real world at providing good long term performance.

I'm interested in some numbers that show which we need. I'm thinking of
some pg_stats output that shows how many vac locks were taken and how
many prunes were made. Something general that allows some beta testers
to provide feedback on the efficacy of the patch.

That leads to the suggestion that we should make the HOT pruning logic
into an add-on patch, commit it, but evaluate its performance during
beta. If we have no clear evidence of additional benefit, we remove it
again.

I'm not in favour of background retail vacuuming by the bgwriter. The
timeliness of that is (similarly) in question and I think bgwriter has
enough work to do already.

[Just as a note to all performance testers: HOT is designed to show
long-term steady performance. Short performance tests frequently show no
benefit if sufficient RAM is available to avoid the table bloat and we
avoid hitting the point where autovacuums kick in. I know Heikki knows
this, just not sure we actually said it.]

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


pgsql-patches by date:

Previous
From: "Gavin M. Roy"
Date:
Subject: Re: pg_dump --no-tablespaces patch
Next
From: Tom Lane
Date:
Subject: Re: Deferred RI trigger for non-key UPDATEs and subxacts