Thread: WIP patch for hint bit i/o mitigation
Following the sig is a first cut at a patch (written by Atri) that attempts to mitigate hint bit i/o penalty when many pages worth of tuples are sequentially written out with the same transaction id. There have been other attempts to deal with this problem that fit niche cases (especially those that create the table in the same transaction as the one inserting) that work but don't really solve the problem generally. I previously attacked this problem ([1], [2]) and came up with a patch that cached hint bits inside tqual.c. The patch was pulled for a few reasons: 1) a lot of complexity without proper justification 2) sketchy cache replacement algorithm 3) I manged to misspell 'committed' just about everywhere 4) invalidation? Issues 1-3 could have been worked out but #4 was making me think the problem was a nonstarter, or at least, 'too much too soon'. The tuple visibility routines are in a very tight code path and having to deal with various things in the backend that could cause the xid to become stale were making me nervous. A smaller, simpler patch might be the ticket. What's happening here is that we're putting in 'last xid' cache in the same manner and style as in transam.c. Unlike transam.c though, we're caching the infomask bit only and only if we're safe per XLogNeedsFlush(). While useful in its own right, the transam.c 'last xid' cache is not sufficient because: 1) While it does guard against clog fetch, it does nothing to mitigate hint bit penalties which are just as bad if not worse since you get a page write+page header lock during a broad scan 2) In the very tight loops that crank HeapTupleSatisfiesMVCC, removing the minimal non-clog based processing (TransactionIdIsInProgress, etc) helps a little bit in cpu terms when scanning the table for the very first time. So given that -- the patch simple adds an extra check when/where hint bit status is checked in the visibility routines (currently, only HeapTupleSatisfiesMVCC is done but all the applicable visibility routines should be done). Basically, the way it works is like this: *) is hint bit set? *) if not? does the examined xid match the last examined one? *) if so, and the cached hint bit matches the one want, proceeed as if hint bit was set Basically, this adds two tests to visibility check in the worst case (zero if hint bit was set, one if xid does not match). Pretty cheap. If you fault through the cached xid and dirty the page, then you're performance shouldn't be worse than the status quo. Also, dealing with 'last xid' is immune from invalidation issues [3] which is a nice property. The most logic place to *set* the cached xid/bit seemed to be in SetHintBits(). Possible improvements might be to: *) set the hint bit if the page is dirty and/or *) set the hint bit but do not dirty the page upon cache (although this might have bad interplay with 'all hint bits must be WAL logged' page checksum proposal) Atri benched the patch and came up with the following: atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri -n -f bench.sql -c 8 -T 120 transaction type: Custom query number of transactions actually processed: 1433 tps = 11.900653 (including connections establishing) tps = 11.903127 (excluding connections establishing) atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri -n -f bench.sql -c 8 -T 120 transaction type: Custom query number of transactions actually processed: 1769 tps = 14.669422 (including connections establishing) tps = 14.673167 (excluding connections establishing) bench.sql: create temp table foo as select v from generate_series(1,100000) v; select * from foo; drop table foo; merlin notes: [1] http://archives.postgresql.org/pgsql-hackers/2011-03/msg01765.php [2] http://archives.postgresql.org/message-id/BANLkTinr1h1+rKX1UOukn-rAONBTEHQvew@mail.gmail.com) [3] http://postgresql.1045698.n5.nabble.com/Is-cachedFetchXidStatus-provably-valid-td5712454.html patch: *** tqual1.c 2012-09-20 03:17:58.000000000 +0530 --- tqual.c 2012-11-06 00:50:39.769409077 +0530 *************** *** 72,81 **** SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny}; SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast}; /* local functions */ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot); - /* * SetHintBits() * --- 72,83 ---- SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny}; SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast}; + static TransactionId cachedVisibilityXid; + static uint16 cachedVisibilityXidStatus; + /* local functions */ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot); /* * SetHintBits() * *************** *** 117,122 **** --- 119,127 ---- if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer)) return; /* not flushed yet,so don't set hint */ + + cachedVisibilityXid = xid; + cachedVisibilityXidStatus = infomask; } tuple->t_infomask |= infomask; *************** *** 904,910 **** HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer) { ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)) { if (tuple->t_infomask & HEAP_XMIN_INVALID) return false; --- 909,917 ---- HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer) { ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED) ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple) ! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED)) { if (tuple->t_infomask & HEAP_XMIN_INVALID) return false; *************** *** 1008,1014 **** return true; } ! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)) { if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))) { --- 1015,1023 ---- return true; } ! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED) ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple) ! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED)) { if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))) {
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Merlin Moncure > Sent: Wednesday, November 07, 2012 5:26 AM > To: PostgreSQL-development > Cc: Atri Sharma > Subject: [HACKERS] WIP patch for hint bit i/o mitigation > > Following the sig is a first cut at a patch (written by Atri) that > attempts to mitigate hint bit i/o penalty when many pages worth of > tuples are sequentially written out with the same transaction id. > There have been other attempts to deal with this problem that fit > niche cases (especially those that create the table in the same > transaction as the one inserting) that work but don't really solve the > problem generally. > > I previously attacked this problem ([1], [2]) and came up with a patch > that cached hint bits inside tqual.c. The patch was pulled for a few > reasons: > > 1) a lot of complexity without proper justification > 2) sketchy cache replacement algorithm > 3) I manged to misspell 'committed' just about everywhere > 4) invalidation? > > Issues 1-3 could have been worked out but #4 was making me think the > problem was a nonstarter, or at least, 'too much too soon'. The tuple > visibility routines are in a very tight code path and having to deal > with various things in the backend that could cause the xid to become > stale were making me nervous. A smaller, simpler patch might be the > ticket. About invalidation, I think the cached xid can become invalid due to xid wraparound. So for that one way could be to invalidate it through Vacuum. Though I am not sure what all other things can make cached id as invalid, but I think once we can think what other ways can make cached id invalid, then we can see if there is a solution to address them. With Regards, Amit Kapila.
On 07-Nov-2012, at 15:46, Amit Kapila <amit.kapila@huawei.com> wrote: >> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- >> owner@postgresql.org] On Behalf Of Merlin Moncure >> Sent: Wednesday, November 07, 2012 5:26 AM >> To: PostgreSQL-development >> Cc: Atri Sharma >> Subject: [HACKERS] WIP patch for hint bit i/o mitigation >> >> Following the sig is a first cut at a patch (written by Atri) that >> attempts to mitigate hint bit i/o penalty when many pages worth of >> tuples are sequentially written out with the same transaction id. >> There have been other attempts to deal with this problem that fit >> niche cases (especially those that create the table in the same >> transaction as the one inserting) that work but don't really solve the >> problem generally. >> >> I previously attacked this problem ([1], [2]) and came up with a patch >> that cached hint bits inside tqual.c. The patch was pulled for a few >> reasons: >> >> 1) a lot of complexity without proper justification >> 2) sketchy cache replacement algorithm >> 3) I manged to misspell 'committed' just about everywhere >> 4) invalidation? >> >> Issues 1-3 could have been worked out but #4 was making me think the >> problem was a nonstarter, or at least, 'too much too soon'. The tuple >> visibility routines are in a very tight code path and having to deal >> with various things in the backend that could cause the xid to become >> stale were making me nervous. A smaller, simpler patch might be the >> ticket. > > About invalidation, I think the cached xid can become invalid due to xid > wraparound. > So for that one way could be to invalidate it through Vacuum. > > Though I am not sure what all other things can make cached id as invalid, > but I think once we > can think what other ways can make cached id invalid, then we can see if > there is a solution to address > them. > > > With Regards, > Amit Kapila. > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation issuesare not significant any more. Regards, Atri
> -----Original Message----- > From: Atri Sharma [mailto:atri.jiit@gmail.com] > Sent: Wednesday, November 07, 2012 4:02 PM > To: Amit Kapila > Cc: Merlin Moncure; PostgreSQL-development > Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation > > On 07-Nov-2012, at 15:46, Amit Kapila <amit.kapila@huawei.com> wrote: > > >> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > >> owner@postgresql.org] On Behalf Of Merlin Moncure > >> Sent: Wednesday, November 07, 2012 5:26 AM > >> To: PostgreSQL-development > >> Cc: Atri Sharma > >> Subject: [HACKERS] WIP patch for hint bit i/o mitigation > >> > >> Following the sig is a first cut at a patch (written by Atri) that > >> attempts to mitigate hint bit i/o penalty when many pages worth of > >> tuples are sequentially written out with the same transaction id. > >> There have been other attempts to deal with this problem that fit > >> niche cases (especially those that create the table in the same > >> transaction as the one inserting) that work but don't really solve > the > >> problem generally. > > As we are now dealing with only the last xid(please refer to the details > of the patch attached), the invalidation issues are not significant any > more. I think you are right, but please check below the scenario I have in mind due to which I got confused: Session -1 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go inside SetHinbits and set it to xid of tuple whichis let's say = 708 2. now for all consecutive tuples which have same xmin (708), it can directly refer cached id and cached status, even if hint bit is not set. Other Sessions 3. now from other sessions, large operations happened on all other tables except this table. 4. The situation can reach where xid can wrap around. Session -1 5. It again tries to query the same table, at this point it will compare the xid in tuple with cachedVisibilityXid. Now if all tuple's xid's for that particular table are frozen in all cases then it seems to be okay, otherwise it might be problem. I am not fully aware about this wrap around and frozed xid concept, thats why I had doubted that it might create problem, on further thought, it appears that I was wrong. With Regards, Amit Kapila.
> -----Original Message-----
> From: Atri Sharma [mailto:atri.jiit@gmail.com]
> Sent: Wednesday, November 07, 2012 4:02 PM
> To: Amit Kapila
> Cc: Merlin Moncure; PostgreSQL-development
> Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation
>
> On 07-Nov-2012, at 15:46, Amit Kapila <amit.kapila@huawei.com> wrote:
>
> >> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> >> owner@postgresql.org] On Behalf Of Merlin Moncure
> >> Sent: Wednesday, November 07, 2012 5:26 AM
> >> To: PostgreSQL-development
> >> Cc: Atri Sharma
> >> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
> >>
> >> Following the sig is a first cut at a patch (written by Atri) that
> >> attempts to mitigate hint bit i/o penalty when many pages worth of
> >> tuples are sequentially written out with the same transaction id.
> >> There have been other attempts to deal with this problem that fit
> >> niche cases (especially those that create the table in the same
> >> transaction as the one inserting) that work but don't really solve
> the
> >> problem generally.
>> As we are now dealing with only the last xid(please refer to the detailsI think you are right, but please check below the scenario I have in mind
> of the patch attached), the invalidation issues are not significant any
> more.
due to which I got confused:
Session -1
1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
inside SetHinbits and set it to xid of tuple which is let's say = 708
2. now for all consecutive tuples which have same xmin (708), it can
directly refer
cached id and cached status, even if hint bit is not set.
Other Sessions
3. now from other sessions, large operations happened on all other tables
except this table.
4. The situation can reach where xid can wrap around.
Session -1
5. It again tries to query the same table, at this point it will compare
the xid in tuple with cachedVisibilityXid.
Now if all tuple's xid's for that particular table are frozen in all cases
then it seems to be okay, otherwise it might be problem.
I am not fully aware about this wrap around and frozed xid concept, thats
why I had doubted
that it might create problem, on further thought, it appears that I was
wrong.
With Regards,
Amit Kapila.
AFAIK, xid are managed by reference xids, that have a range of +- 2 billion xids. Once this limit is reached, then reference xids are moved forward, and the xids that do not fall in the reference xid +- 2 billion are freezed.Hence, in the given scenario, I believe once the wrap around happens, since the xmin is same for all the tuples in session-1's table, there should no be no problem and all tuple's xid for that particular table will be frozen.
Atri
--
On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit.kapila@huawei.com> wrote: >> >> Following the sig is a first cut at a patch (written by Atri) that >> >> attempts to mitigate hint bit i/o penalty when many pages worth of >> >> tuples are sequentially written out with the same transaction id. >> >> There have been other attempts to deal with this problem that fit >> >> niche cases (especially those that create the table in the same >> >> transaction as the one inserting) that work but don't really solve >> the >> >> problem generally. >> >> As we are now dealing with only the last xid(please refer to the details >> of the patch attached), the invalidation issues are not significant any >> more. > > I think you are right, but please check below the scenario I have in mind > due to which I got confused: > > Session -1 > 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go > inside SetHinbits and set it to xid of tuple which is let's say = 708 > 2. now for all consecutive tuples which have same xmin (708), it can > directly refer > cached id and cached status, even if hint bit is not set. > > Other Sessions > 3. now from other sessions, large operations happened on all other tables > except this table. > 4. The situation can reach where xid can wrap around. > > Session -1 > 5. It again tries to query the same table, at this point it will compare > the xid in tuple with cachedVisibilityXid. > > Now if all tuple's xid's for that particular table are frozen in all cases > then it seems to be okay, otherwise it might be problem. > I am not fully aware about this wrap around and frozed xid concept, thats > why I had doubted > that it might create problem, on further thought, it appears that I was > wrong. Well there's that. But more to the point for the cache to fail you'd have to have a scenario where a table didn't scan any records for 1 billion+ transactions. See note [3] above for reasoning why this is implausible. Also we're already relying on this effect in transam.c. merlin
Well there's that. But more to the point for the cache to fail you'dOn Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that
>> >> attempts to mitigate hint bit i/o penalty when many pages worth of
>> >> tuples are sequentially written out with the same transaction id.
>> >> There have been other attempts to deal with this problem that fit
>> >> niche cases (especially those that create the table in the same
>> >> transaction as the one inserting) that work but don't really solve
>> the
>> >> problem generally.
>>
>> As we are now dealing with only the last xid(please refer to the details
>> of the patch attached), the invalidation issues are not significant any
>> more.
>
> I think you are right, but please check below the scenario I have in mind
> due to which I got confused:
>
> Session -1
> 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
> inside SetHinbits and set it to xid of tuple which is let's say = 708
> 2. now for all consecutive tuples which have same xmin (708), it can
> directly refer
> cached id and cached status, even if hint bit is not set.
>
> Other Sessions
> 3. now from other sessions, large operations happened on all other tables
> except this table.
> 4. The situation can reach where xid can wrap around.
>
> Session -1
> 5. It again tries to query the same table, at this point it will compare
> the xid in tuple with cachedVisibilityXid.
>
> Now if all tuple's xid's for that particular table are frozen in all cases
> then it seems to be okay, otherwise it might be problem.
> I am not fully aware about this wrap around and frozed xid concept, thats
> why I had doubted
> that it might create problem, on further thought, it appears that I was
> wrong.
have to have a scenario where a table didn't scan any records for 1
billion+ transactions. See note [3] above for reasoning why this is
implausible. Also we're already relying on this effect in transam.c.
merlin
PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus holds the commit status of the XID in cachedVisibilityXid.
In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache.
So, in place of
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
the condition is now
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
&& !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
&& cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
This checks if the commit status can be known from the cache or not before proceeding.
I will be posting the patch to commit fest.
Thoughts/Feedback?
Atri
--
patch:
*** tqual--unchanged.c 2012-09-20 03:17:58.000000000 +0530
--- tqual.c 2012-11-14 23:27:30.470499857 +0530
***************
*** 75,80 ****
--- 75,88 ----
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
+ /*
+ * Single-item cache for results of clog visibility check. It's worth having
+ * such a cache to help reduce the amount of hint bit traffic when
+ * many sequentially touched tuples have the same XID.
+ */
+ static TransactionId cachedVisibilityXid;
+ /* Visibility status cache stores the commit status of the XID in cachedVisibilityXid */
+ static uint16 cachedVisibilityXidStatus;
/*
* SetHintBits()
***************
*** 117,122 ****
--- 125,133 ----
if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))
return; /* not flushed yet, so don't set hint */
+
+ cachedVisibilityXid = xid;
+ cachedVisibilityXidStatus = infomask;
}
tuple->t_infomask |= infomask;
***************
*** 164,170 ****
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 175,183 ----
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 247,253 ****
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 260,268 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
***************
*** 327,333 ****
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 342,350 ----
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 416,422 ****
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 433,441 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
***************
*** 493,499 ****
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 512,520 ----
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 574,580 ****
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return HeapTupleInvisible;
--- 595,603 ----
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return HeapTupleInvisible;
***************
*** 663,669 ****
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return HeapTupleMayBeUpdated;
! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return HeapTupleMayBeUpdated;
--- 686,694 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return HeapTupleMayBeUpdated;
! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return HeapTupleMayBeUpdated;
***************
*** 743,749 ****
{
snapshot->xmin = snapshot->xmax = InvalidTransactionId;
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 768,776 ----
{
snapshot->xmin = snapshot->xmax = InvalidTransactionId;
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 830,836 ****
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 857,865 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
***************
*** 904,910 ****
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 933,941 ----
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 1008,1014 ****
return true;
}
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
--- 1039,1047 ----
return true;
}
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
***************
*** 1240,1246 ****
* invalid, then we assume it's still alive (since the presumption is that
* all relevant hint bits were just set moments ago).
*/
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true : false;
/*
--- 1273,1281 ----
* invalid, then we assume it's still alive (since the presumption is that
* all relevant hint bits were just set moments ago).
*/
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true : false;
/*
***************
*** 1253,1259 ****
return false;
/* If deleter isn't known to have committed, assume it's still running. */
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
--- 1288,1296 ----
return false;
/* If deleter isn't known to have committed, assume it's still running. */
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus holds the commit status of the XID in cachedVisibilityXid.On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure <mmoncure@gmail.com> wrote:Well there's that. But more to the point for the cache to fail you'dOn Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that
>> >> attempts to mitigate hint bit i/o penalty when many pages worth of
>> >> tuples are sequentially written out with the same transaction id.
>> >> There have been other attempts to deal with this problem that fit
>> >> niche cases (especially those that create the table in the same
>> >> transaction as the one inserting) that work but don't really solve
>> the
>> >> problem generally.
>>
>> As we are now dealing with only the last xid(please refer to the details
>> of the patch attached), the invalidation issues are not significant any
>> more.
>
> I think you are right, but please check below the scenario I have in mind
> due to which I got confused:
>
> Session -1
> 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
> inside SetHinbits and set it to xid of tuple which is let's say = 708
> 2. now for all consecutive tuples which have same xmin (708), it can
> directly refer
> cached id and cached status, even if hint bit is not set.
>
> Other Sessions
> 3. now from other sessions, large operations happened on all other tables
> except this table.
> 4. The situation can reach where xid can wrap around.
>
> Session -1
> 5. It again tries to query the same table, at this point it will compare
> the xid in tuple with cachedVisibilityXid.
>
> Now if all tuple's xid's for that particular table are frozen in all cases
> then it seems to be okay, otherwise it might be problem.
> I am not fully aware about this wrap around and frozed xid concept, thats
> why I had doubted
> that it might create problem, on further thought, it appears that I was
> wrong.
have to have a scenario where a table didn't scan any records for 1
billion+ transactions. See note [3] above for reasoning why this is
implausible. Also we're already relying on this effect in transam.c.
merlin
In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache.
So, in place ofthe condition is now
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))&& !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
&& cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
This checks if the commit status can be known from the cache or not before proceeding.
I will be posting the patch to commit fest.
Thoughts/Feedback?
Atri
--Regards,Atril'apprenant
patch:
*** tqual--unchanged.c 2012-09-20 03:17:58.000000000 +0530
--- tqual.c 2012-11-14 23:27:30.470499857 +0530
***************
*** 75,80 ****
--- 75,88 ----+ /*
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
+ * Single-item cache for results of clog visibility check. It's worth having
+ * such a cache to help reduce the amount of hint bit traffic when
+ * many sequentially touched tuples have the same XID.
+ */
+ static TransactionId cachedVisibilityXid;
+ /* Visibility status cache stores the commit status of the XID in cachedVisibilityXid */
+ static uint16 cachedVisibilityXidStatus;
/*
* SetHintBits()
***************
*** 117,122 ****
--- 125,133 ----*** 164,170 ****
if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))
return; /* not flushed yet, so don't set hint */
+
+ cachedVisibilityXid = xid;
+ cachedVisibilityXidStatus = infomask;
}
tuple->t_infomask |= infomask;
***************
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)--- 175,183 ----
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)*** 247,253 ****
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 260,268 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))if (tuple->t_infomask & HEAP_IS_LOCKED)
{
return true;
***************
*** 327,333 ****
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)--- 342,350 ----
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)*** 416,422 ****
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 433,441 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))if (tuple->t_infomask & HEAP_IS_LOCKED)
{
return true;
***************
*** 493,499 ****
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,--- 512,520 ----
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,*** 574,580 ****
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,return HeapTupleInvisible;
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 595,603 ----
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,return HeapTupleInvisible;
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
***************
*** 663,669 ****
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return HeapTupleMayBeUpdated;
! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return HeapTupleMayBeUpdated;
--- 686,694 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return HeapTupleMayBeUpdated;
! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))if (tuple->t_infomask & HEAP_IS_LOCKED)
{
return HeapTupleMayBeUpdated;
***************
*** 743,749 ****
{
snapshot->xmin = snapshot->xmax = InvalidTransactionId;--- 768,776 ----
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
{
snapshot->xmin = snapshot->xmax = InvalidTransactionId;*** 830,836 ****
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 857,865 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))if (tuple->t_infomask & HEAP_IS_LOCKED)
{
return true;--- 933,941 ----
***************
*** 904,910 ****
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;--- 1039,1047 ----
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 1008,1014 ****
return true;
}
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{***************
return true;
}
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
*** 1240,1246 ****
* invalid, then we assume it's still alive (since the presumption is that
* all relevant hint bits were just set moments ago).
*/return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true : false;
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
/*
--- 1273,1281 ----
* invalid, then we assume it's still alive (since the presumption is that
* all relevant hint bits were just set moments ago).
*/return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true : false;
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
/*
***************
*** 1253,1259 ****
return false;
/* If deleter isn't known to have committed, assume it's still running. */return false;
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
/* Deleter committed, so tuple is dead if the XID is old enough. */
--- 1288,1296 ----
return false;
/* If deleter isn't known to have committed, assume it's still running. */return false;
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
/* Deleter committed, so tuple is dead if the XID is old enough. */
Attached is the patch for review.
Atri
--
Attachment
On Thursday, November 15, 2012 2:02 AM Atri Sharma wrote:
On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that
>PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus
>holds the commit status of the XID in cachedVisibilityXid.
>In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be >retrieved from the cache.
1. From your explanation and code, it is quite clear that it will certainly give performance benefits in the scenario’s mentioned by you.
I can once validate the performance numbers again and do the code review for this patch during CF-3.
However I am just not very sure about the use case, such that whether it is a sufficient use case.
So I would like to ask opinion of other people as well.
2. After this patch, tuple hint bit is not set by Select operations after data populated by one transaction.
This appears to be good as it will save many ops (page dirty followed by flush , clog inquiry).
Though it is no apparent, however we should see whether it can cause any other impact due to this:
a. like may be now VACUUM needs set the hint bit which may cause more I/O during Vacuum.
Hackers, any opinion/suggestions about the use case?
With Regards,
Amit Kapila.
On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila <amit.kapila@huawei.com> wrote: >>In each visibility function (except HeapTupleSatisfiesVacuum() ), an >> addition check has been added to check if the commit status of Xmin or Xmax >> of a tuple can be >retrieved from the cache. > > > > 1. From your explanation and code, it is quite clear that it will > certainly give performance benefits in the scenario’s mentioned by you. > > I can once validate the performance numbers again and do the code > review for this patch during CF-3. > > However I am just not very sure about the use case, such that whether > it is a sufficient use case. > > So I would like to ask opinion of other people as well. sure. I'd like to note though that hint bit i/o is a somewhat common complaint. it tends to most affect OLAP style workloads. in pathological workloads, it can really burn you -- it's not fun when you are i/o starved via sequential scan. This can still happen when sweeping dead records (which this patch doesn't deal with, though maybe it should). > 2. After this patch, tuple hint bit is not set by Select operations after > data populated by one transaction. > > This appears to be good as it will save many ops (page dirty followed > by flush , clog inquiry). Technically it does not save clog fetch as transam.c has a very similar cache mechanism. However, it does save a page write i/o and a lock on the page header, as well as a couple of other minor things. In the best case, the page write is completely masked as the page gets dirty for other reasons. I think this is going to become more important in checksum enabled scenarios. > Though it is no apparent, however we should see whether it can cause > any other impact due to this: > > a. like may be now VACUUM needs set the hint bit which may cause more > I/O during Vacuum. IMNSHO. deferring non-critical i/o from foreground process to background process is generally good. VACUUM has nice features like i/o throttling and in place cancel so latent management of internal page optimization flags really belong there ideally. Also, the longer you defer such I/O the more opportunity there is for it to be masked off by some other page dirtying operation (again, this is more important in the face of having to log hint bit changes). There could be some good rebuttal analysis though. merlin
On Thursday, November 15, 2012 9:27 PM Merlin Moncure wrote: > On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila <amit.kapila@huawei.com> > wrote: > >>In each visibility function (except HeapTupleSatisfiesVacuum() ), an > >> addition check has been added to check if the commit status of Xmin > or Xmax > >> of a tuple can be >retrieved from the cache. > > > > > > > > 1. From your explanation and code, it is quite clear that it will > > certainly give performance benefits in the scenario's mentioned by > you. > > > > I can once validate the performance numbers again and do the > code > > review for this patch during CF-3. > > > > However I am just not very sure about the use case, such that > whether > > it is a sufficient use case. > > > > So I would like to ask opinion of other people as well. > > sure. I'd like to note though that hint bit i/o is a somewhat common > complaint. it tends to most affect OLAP style workloads. in > pathological workloads, it can really burn you -- it's not fun when > you are i/o starved via sequential scan. This can still happen when > sweeping dead records (which this patch doesn't deal with, though > maybe it should). > > > 2. After this patch, tuple hint bit is not set by Select operations > after > > data populated by one transaction. > > > > This appears to be good as it will save many ops (page dirty > followed > > by flush , clog inquiry). > > Technically it does not save clog fetch as transam.c has a very > similar cache mechanism. However, it does save a page write i/o and a > lock on the page header, as well as a couple of other minor things. > In the best case, the page write is completely masked as the page gets > dirty for other reasons. I think this is going to become more > important in checksum enabled scenarios. > > > Though it is no apparent, however we should see whether it can > cause > > any other impact due to this: > > > > a. like may be now VACUUM needs set the hint bit which may > cause more > > I/O during Vacuum. > > IMNSHO. deferring non-critical i/o from foreground process to > background process is generally good. Yes, in regard of deferring you are right. However in this case may be when foreground process has to mark page dirty due to hint-bit, it was already dirty so no extra I/O, but when it is done by VACUUM, page may not be dirty. Also due to below points, doing it in VACUUM may cost more: a. VACUUM has ring-buffer of fixed size and if such pages are many then write of page needs to be done by VACUUM to replace existing page in ring. b. Considering sometimes people want VACUUM to run when system is not busy, the chances of generating more overall I/O in system can be more. Why we can't avoid setting hint-bit in VACUUM? Is it due to reason that it has to be done in some way, so that hint-bits are properly set. Or may be I am missing something trivial? Though the case VACUUM, I am talking occurs very less in practical, but the point came to my mind, so I thought of sharing with you. > VACUUM has nice features like > i/o throttling and in place cancel so latent management of internal > page optimization flags really belong there ideally. Also, the longer > you defer such I/O the more opportunity there is for it to be masked > off by some other page dirtying operation (again, this is more > important in the face of having to log hint bit changes). > > There could be some good rebuttal analysis though. With Regards, Amit Kapila.
On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit.kapila@huawei.com> wrote: >> IMNSHO. deferring non-critical i/o from foreground process to >> background process is generally good. > > Yes, in regard of deferring you are right. > However in this case may be when foreground process has to mark page dirty > due to hint-bit, it was already dirty so no extra I/O, but when it is done > by VACUUM, page may not be dirty. Yeah. We can try to be smart and set the hint bits in that case. Not sure that will work out with checksum having to wal log hint bits though (which by reading the checksum threads seems likely to happen). > Also due to below points, doing it in VACUUM may cost more: > a. VACUUM has ring-buffer of fixed size and if such pages are many then > write of page needs to be done by VACUUM to replace existing page > in ring. Sure, although in scans we are using ring buffer as well so in practical sense the results are pretty close. > b. Considering sometimes people want VACUUM to run when system is not busy, > the chances of generating more overall I/O in system can be > more. It's hard to imagine overall i/o load increasing. However, longer vacuum times should be considered. A bigger issue is that we are missing an early opportunity to set the all visible bit. vacuum is doing: if (all_visible) { TransactionId xmin; if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED)) { all_visible = false; break; } assuming the cache is working and vacuum rolls around after a scan, you lost the opportunity to set all_visible flag where previously it would have been set, thereby dismantling the positive effect of an index only scan. AFAICT, this is the only case where vaccum is directly interfacing with hint bits. This could be construed as a violation of heapam API? Maybe if that's an issue we could proxy that check to a heaptuple/tqual.c maintained function (in the same manner as HeapTupleSetHintBits) so that the cache bit could be uniformly checked. All other *setting* of hint bits is running through SetHintBits(), so I think we are safe from vacuum point of view. That's another thing to test for though. merlin
On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote: > So given that -- the patch simple adds an extra check when/where hint > bit status is checked in the visibility routines (currently, only > HeapTupleSatisfiesMVCC is done but all the applicable visibility > routines should be done). Basically, the way it works is like this: > > *) is hint bit set? > *) if not? does the examined xid match the last examined one? > *) if so, and the cached hint bit matches the one want, proceeed as if > hint bit was set Can you clarify the difference between this and cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your patch is accepted? Regards,Jeff Davis
On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote: > On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit.kapila@huawei.com> > wrote: > > Sure, although in scans we are using ring buffer as well so in > practical sense the results are pretty close. > > > b. Considering sometimes people want VACUUM to run when system is not > busy, > > the chances of generating more overall I/O in system can be > > more. > > It's hard to imagine overall i/o load increasing. However, longer > vacuum times should be considered. A bigger issue is that we are > missing an early opportunity to set the all visible bit. vacuum is > doing: > > if (all_visible) > { > TransactionId xmin; > > if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED)) > { > all_visible = false; > break; > } > > assuming the cache is working and vacuum rolls around after a scan, > you lost the opportunity to set all_visible flag where previously it > would have been set, thereby dismantling the positive effect of an > index only scan. AFAICT, this is the only case where vaccum is > directly interfacing with hint bits. This could be construed as a > violation of heapam API? Maybe if that's an issue we could proxy that > check to a heaptuple/tqual.c maintained function (in the same manner > as HeapTupleSetHintBits) so that the cache bit could be uniformly > checked. I think we need to think of some tests to check if Vacuum or any other impact has not been created due to this change. I will devise tests during review of this patch. However if you have more ideas then share the same which will make tests of this patch more strong. For functional/performance test of this patch, one of my colleague Hari Babu will also work along with me on it. With Regards, Amit Kapila.
On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila <amit.kapila@huawei.com> wrote: > On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote: >> On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit.kapila@huawei.com> >> wrote: > >> >> Sure, although in scans we are using ring buffer as well so in >> practical sense the results are pretty close. >> >> > b. Considering sometimes people want VACUUM to run when system is not >> busy, >> > the chances of generating more overall I/O in system can be >> > more. >> >> It's hard to imagine overall i/o load increasing. However, longer >> vacuum times should be considered. A bigger issue is that we are >> missing an early opportunity to set the all visible bit. vacuum is >> doing: >> >> if (all_visible) >> { >> TransactionId xmin; >> >> if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED)) >> { >> all_visible = false; >> break; >> } >> >> assuming the cache is working and vacuum rolls around after a scan, >> you lost the opportunity to set all_visible flag where previously it >> would have been set, thereby dismantling the positive effect of an >> index only scan. AFAICT, this is the only case where vaccum is >> directly interfacing with hint bits. This could be construed as a >> violation of heapam API? Maybe if that's an issue we could proxy that >> check to a heaptuple/tqual.c maintained function (in the same manner >> as HeapTupleSetHintBits) so that the cache bit could be uniformly >> checked. > > I think we need to think of some tests to check if Vacuum or any other > impact has not been created due to this change. > I will devise tests during review of this patch. > However if you have more ideas then share the same which will make tests of > this patch more strong. > For functional/performance test of this patch, one of my colleague Hari Babu > will also work along with me on it. Thanks. So far, Atri ran some quick n dirty tests to see if there were any regressions. He benched a large scan followed by vacuum. So far, results are inconclusive so better testing methodologies will definitely be greatly appreciated. One of the challenges with working in this part of the code is that it's quite difficult to make changes without impacting at least some workloads. merlin
On Thu, Nov 15, 2012 at 6:42 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote: >> So given that -- the patch simple adds an extra check when/where hint >> bit status is checked in the visibility routines (currently, only >> HeapTupleSatisfiesMVCC is done but all the applicable visibility >> routines should be done). Basically, the way it works is like this: >> >> *) is hint bit set? >> *) if not? does the examined xid match the last examined one? >> *) if so, and the cached hint bit matches the one want, proceeed as if >> hint bit was set > > Can you clarify the difference between this and > cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your > patch is accepted? Very little, except: *) transam.c managed cache is unable to influence the specific code path through the visibility routines, at least not without significant refactoring -- everything that happens in tqual.c should be inlined. I played with doing it all in transam.c a while back and didn't much like how it turned out. That doesn't mean it can't work though. *) There are a couple of important looking code paths that communicate directly with transam.c. For example, in procarray.c ProcArrayApplyRecoveryInfo(). Removing transam.c cache could turn into fairly significant regression if that code is performance sensitive -- that would have to be studied before doing that. Maybe abstracting 'last xid cache' along with hint bit management out of both transam.c and tqual.c into something like 'hints.c' is appropriate, but that's a more invasive change. merlin
Thanks. So far, Atri ran some quick n dirty tests to see if thereOn Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
>> On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit.kapila@huawei.com>
>> wrote:
>
>>
>> Sure, although in scans we are using ring buffer as well so in
>> practical sense the results are pretty close.
>>
>> > b. Considering sometimes people want VACUUM to run when system is not
>> busy,
>> > the chances of generating more overall I/O in system can be
>> > more.
>>
>> It's hard to imagine overall i/o load increasing. However, longer
>> vacuum times should be considered. A bigger issue is that we are
>> missing an early opportunity to set the all visible bit. vacuum is
>> doing:
>>
>> if (all_visible)
>> {
>> TransactionId xmin;
>>
>> if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
>> {
>> all_visible = false;
>> break;
>> }
>>
>> assuming the cache is working and vacuum rolls around after a scan,
>> you lost the opportunity to set all_visible flag where previously it
>> would have been set, thereby dismantling the positive effect of an
>> index only scan. AFAICT, this is the only case where vaccum is
>> directly interfacing with hint bits. This could be construed as a
>> violation of heapam API? Maybe if that's an issue we could proxy that
>> check to a heaptuple/tqual.c maintained function (in the same manner
>> as HeapTupleSetHintBits) so that the cache bit could be uniformly
>> checked.
>
> I think we need to think of some tests to check if Vacuum or any other
> impact has not been created due to this change.
> I will devise tests during review of this patch.
> However if you have more ideas then share the same which will make tests of
> this patch more strong.
> For functional/performance test of this patch, one of my colleague Hari Babu
> will also work along with me on it.
were any regressions. He benched a large scan followed by vacuum. So
far, results are inconclusive so better testing methodologies will
definitely be greatly appreciated. One of the challenges with working
in this part of the code is that it's quite difficult to make changes
without impacting at least some workloads.
merlin
Thanks a ton Amit and your colleague Hari for volunteering to review the patch.
I ran some benchmarks and came up with the following results:
With our code
atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1 -n -f bench2.sql -c 8 -T 300
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 300 s
number of transactions actually processed: 412
tps = 1.366142 (including connections establishing)
tps = 1.366227 (excluding connections establishing)
Without our code
atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1 -n -f bench2.sql -c 8 -T 300
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 300 s
number of transactions actually processed: 378
tps = 1.244333 (including connections establishing)
tps = 1.244447 (excluding connections establishing)
The SQL file is attached.
Please let us know if you need any more details.
Atri
--
Attachment
On 11/16/12 9:03 AM, Merlin Moncure wrote: > Atri ran some quick n dirty tests to see if there > were any regressions. He benched a large scan followed by vacuum. So > far, results are inconclusive so better testing methodologies will > definitely be greatly appreciated. One of the challenges with working > in this part of the code is that it's quite difficult to make changes > without impacting at least some workloads. I'm adding something to pgbench-tools to test for some types of vacuum regressions that I've seen before. And the checksum benchmarking I've already signed up for overlaps with this one quite a bit. I'd suggest reviewers here focus on code quality, correctness, and targeted optimization testing. I'm working heavily on a larger benchmarking framework that both this and checksums will fit into right now. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Merlin Moncure escribió: > Maybe abstracting 'last xid cache' along with hint bit management out > of both transam.c and tqual.c into something like 'hints.c' is > appropriate, but that's a more invasive change. It would be good to have such a patch to measure/compare performance of both approaches. It does seem like the more invasive change might be more expensive for both the transam.c and the tqual.c uses, though; and it's not clear that there's anything to be gained by having them be together in a single module. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thursday, November 22, 2012 3:00 AM Greg Smith wrote: > On 11/16/12 9:03 AM, Merlin Moncure wrote: > > Atri ran some quick n dirty tests to see if there > > were any regressions. He benched a large scan followed by vacuum. So > > far, results are inconclusive so better testing methodologies will > > definitely be greatly appreciated. One of the challenges with working > > in this part of the code is that it's quite difficult to make changes > > without impacting at least some workloads. > > I'm adding something to pgbench-tools to test for some types of vacuum > regressions that I've seen before. And the checksum benchmarking I've > already signed up for overlaps with this one quite a bit. I'd suggest > reviewers here focus on code quality, correctness, and targeted > optimization testing. I'm working heavily on a larger benchmarking > framework that both this and checksums will fit into right now. We are planning below performance tests for hint-bit I/O mitigation patch: Test case -1: Select performance in sequential scan and vacuum operation with I/O statistics Bulk operations are happened in multiple transactions. 1. Stop the auto-vacuum. 2. Create table 3. Insert 10000 records in one transaction for 1000 times. 4A. Use pgbench to select all the records using sequentialscan for 5 min - 8 threads. 4B. Record the IO statistics. 5. After completion of test-case check VACUUM performance. Test case -2: Select performance in index scan and vacuum operation with I/O statistics Same as testcase - 1 change the 4A as follows 4A. Use pgbench with -F option to select randomrecords using index scan for 5 min - 8 threads. Test case -3: Select performance in sequential scan and vacuum operation with I/O statistics When bulk operations are happened in one transaction. 1. Stop the auto-vacuum. 2. Create table 3.Insert 10,000,000 times. 4A. Use pgbench to select all the records using sequential scan for 5 min - 8 threads. 4B. Record the IO statistics. 5. After completion of test-case check VACUUM performance. Test case -4: Select performance in index scan and vacuum operation with I/O statistics Same as testcase - 3 change the 4A as follows 4A. Use pgbench to select random records usingindex scan for 5 min - 8 threads. Test case -5: Check original pgbench performance & vacuum operation 1. For select only and tcp_b performance suiteswith scale factor of 75 & 150, threads 8 &16 Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty only for setting the hit bits. ) 1. Session - 1 : Open a some long transaction 2. Session - 2 : Insert some records & commit Do the select- all the tuples. Checkpoint; Vacuumthe table Checkpoint; 3. Record the IO statistics & time takenfor Vacuum & 2nd Checkpoint. Test case -7: (This is also to check Vacuum I/O) 1. Have replication setup. 2. Insert some records& commit 3. Vacuum the table 4. Upgrade the standby. 5. Select the allthe tuples on new master 6. Vacuum the tbl on new master. 6B. Record the IO statistics &time taken for Vacuum on new master. Suggestions/Feedback With Regards, Amit Kapila.
On Thu, Dec 6, 2012 at 3:59 AM, Amit Kapila <amit.kapila@huawei.com> wrote: > On Thursday, November 22, 2012 3:00 AM Greg Smith wrote: >> On 11/16/12 9:03 AM, Merlin Moncure wrote: >> > Atri ran some quick n dirty tests to see if there >> > were any regressions. He benched a large scan followed by vacuum. So >> > far, results are inconclusive so better testing methodologies will >> > definitely be greatly appreciated. One of the challenges with working >> > in this part of the code is that it's quite difficult to make changes >> > without impacting at least some workloads. >> >> I'm adding something to pgbench-tools to test for some types of vacuum >> regressions that I've seen before. And the checksum benchmarking I've >> already signed up for overlaps with this one quite a bit. I'd suggest >> reviewers here focus on code quality, correctness, and targeted >> optimization testing. I'm working heavily on a larger benchmarking >> framework that both this and checksums will fit into right now. > > We are planning below performance tests for hint-bit I/O mitigation patch: > > Test case -1: Select performance in sequential scan and vacuum operation > with I/O statistics > Bulk operations are happened in multiple transactions. > 1. Stop the auto-vacuum. > 2. Create table > 3. Insert 10000 records in one transaction for 1000 times. > 4A. Use pgbench to select all the records using sequential scan for 5 > min - 8 threads. > 4B. Record the IO statistics. > 5. After completion of test-case check VACUUM performance. > > Test case -2: > Select performance in index scan and vacuum operation with I/O > statistics > Same as testcase - 1 change the 4A as follows > 4A. Use pgbench with -F option to select random records using > index scan for 5 min - 8 threads. > > Test case -3: > Select performance in sequential scan and vacuum operation with I/O > statistics > When bulk operations are happened in one transaction. > 1. Stop the auto-vacuum. > 2. Create table > 3. Insert 10,000,000 times. > 4A. Use pgbench to select all the records using sequential scan for 5 > min - 8 threads. > 4B. Record the IO statistics. > 5. After completion of test-case check VACUUM performance. > > Test case -4: > Select performance in index scan and vacuum operation with I/O > statistics > Same as testcase - 3 change the 4A as follows > 4A. Use pgbench to select random records using index scan for 5 > min - 8 threads. > > Test case -5: > Check original pgbench performance & vacuum operation > 1. For select only and tcp_b performance suites with scale factor of > 75 & 150, threads 8 &16 > > Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty > only for setting the hit bits. ) > 1. Session - 1 : Open a some long transaction > > 2. Session - 2 : Insert some records & commit > Do the select - all the tuples. > Checkpoint; > Vacuum the table > Checkpoint; > 3. Record the IO statistics & time taken for Vacuum & 2nd > Checkpoint. > > Test case -7: (This is also to check Vacuum I/O) > 1. Have replication setup. > 2. Insert some records & commit > 3. Vacuum the table > 4. Upgrade the standby. > 5. Select the all the tuples on new master > 6. Vacuum the tbl on new master. > 6B. Record the IO statistics & time taken for Vacuum on new > master. Thanks for that -- that's fairly comprehensive I'd say. I'm quite interested in that benchmarking framework as well. Do you need help setting up the scripts? merlin
On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >Thanks for that -- that's fairly comprehensive I'd say. I'm quite >interested in that benchmarking framework as well. Do you need help >setting up the scripts? Presently I am testing with pgbench custom query option & taking IO & VM statistics in parallel. Following way I had written the test script for case -1. ./psql -f bench_test_1_init.sql postgres iostat -t 1 -d > hint.test1.iostat.reading_3.txt & vmstat -n 1 > hint.test1.vmstat.reading_3.txt & ./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres killall -s SIGINT iostat killall -s SIGINT vmstat Where the sql files are as follows: -- bench_test_1.sql select count(*) from bench where f1 is not null; --bench_test_1_init.sql drop table if exists bench; create table bench(f0 int primary key, f1 char(50)); insert into bench values (generate_series(1, 100000), 'a'); insert into bench values (generate_series(100001, 200000), 'a'); ... insert into bench values (generate_series(9800001, 9900000), 'a'); insert into bench values (generate_series(9900001, 10000000), 'a'); checkpoint; I will provide the test results later. Any suggestions/comments? Regards, Hari babu.
<div class="WordSection1"><p class="MsoPlainTextCxSpFirst"><span style="font-size:8.0pt;font-family:"Courier New"">On Thu,Dec 7, 2012 at 7:56 PM, Hari babu <haribabu(dot)kommi(at)Huawei(dot)com> wrote:</span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>>On Thu, Dec 6, 2012 at 8:52PM, Merlin Moncure <<a href="mailto:mmoncure@gmail.com">mmoncure@gmail.com</a>> wrote:</span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>>Thanks for that -- that's fairlycomprehensive I'd say. I'm quite </span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew""> </span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew"">>>interested in that benchmarking framework as well. Do you needhelp </span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"Courier New"">>>settingup the scripts?</span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew""> </span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew"">>Presently I am testing with pgbench custom query option & takingIO & VM statistics in parallel. </span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew"">>Following way I had written the test script for case -1.</span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New""> </span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>./psql -f bench_test_1_init.sqlpostgres iostat -t 1 -d > hint.test1.iostat.reading_3.txt & vmstat -n 1 > >hint.test1.vmstat.reading_3.txt& ./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres killall -s SIGINT iostat>killall -s SIGINT vmstat</span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew""> </span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew"">>Where the sql files are as follows:</span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>-- bench_test_1.sql</span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>select count(*) from bench wheref1 is not null;</span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"Courier New""> </span><pclass="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"Courier New"">>--bench_test_1_init.sql</span><pclass="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew"">>drop table if exists bench;</span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>create table bench(f0 int primarykey, f1 char(50)); insert into bench values (generate_series(1, 100000), 'a'); insert >into bench values (generate_series(100001,200000), 'a'); ...</span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"CourierNew"">>insert into bench values (generate_series(9800001, 9900000), 'a'); insertinto bench values (generate_series(9900001, >10000000), 'a'); checkpoint;</span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New""> </span><p class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>I will provide the test resultslater.</span><p class="MsoPlainTextCxSpLast"><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> </span><pclass="MsoNormalCxSpFirst"><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Pleasefind the review of the patch.</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:9.0pt;font-family:"Arial","sans-serif""><br/></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Basicstuff:</span></b><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">------------</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">-Patch applies with offsets. </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">-Compiles cleanly with no warnings</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">-Regression Test pass.</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">CodeReview:</span></b><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">-------------</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> 1. Better to set the hint bits for the tuples in a page,if the page is already dirty. </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br /><br /></span><b><spanstyle="font-size:8.0pt;font-family:"Arial","sans-serif"">Test cases:</span></b><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">----------</span></b><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> </span></b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Testcases are already described in the following link.</span><spanstyle="font-size:8.0pt;font-family:"Arial","sans-serif""><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><a href="http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kapila@huawei.com"><span style="color:blue">http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kapila@huawei.com</span></a><br /><br/></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Test Results:</span></b><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">------------</span></b><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Machinedetails:</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> CPU cores : 4</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> RAM : 24GB</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> OS : Suse Linux 10 SP2</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""></span><pclass="MsoNormalCxSpMiddle"><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Configuration:</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> shared_buffers = 500MB</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> autovacuum = off</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> checkpoint_segments = 256</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> checkpoint_timeout = 10min</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Followingresult are average of 3 runs each run is of 5min throughpgbench custom query.</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br /><br /></span><u><spanstyle="font-size:8.0pt;font-family:"Arial","sans-serif"">Test case - 1</span></u><u><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">:</span></u><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> Init:</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> None</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> Run:</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> create temp table atri1 as select v from generate_series(1,1000000)v;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> select * from atri1;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> VACUUM atri1;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> DROP TABLE atri1;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><u><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Testcase - 2</span></u><u><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">:</span></u><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> Init: </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> create table atri1 as select v from generate_series(1,1000000)v; </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> Run:</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> select * from atri1;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><u><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Testcase - 3</span></u><u><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">:(without pgbench)</span></u><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> connect two sessions s1, s2</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> s1 : start transaction;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> s2 : </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">createtable atri1 as select v from generate_series(1,1000000) v;</span><spanstyle="font-size:8.0pt;font-family:"Arial","sans-serif""><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> \timing</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> select * from atri1;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""> VACUUM atri1;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><b><u><span style="font-size:8.0pt;font-family:"CourierNew"">Results:</span></u></b><span style="font-size:8.0pt;font-family:"Times NewRoman","serif""><br /></span><span style="font-size:8.0pt;font-family:"Courier New""> 9.3devel(Tps) HintbitIO(Tps) </span><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">---------------------------------------------------</span><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">Test case - 1: 1.762946 1.922219 </span><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">Test case - 2: 4.038673 4.044356 </span><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /><br /></span><u><span style="font-size:8.0pt;font-family:"CourierNew"">Pgbench without vacuum scale factor 75, 8 threads & 8 client. (referreports attached)</span></u><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><spanstyle="font-size:8.0pt;font-family:"Courier New"">Default tables select : 64980.736149 64550.118693 </span><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">Unlogged tables select: 64874.974334 64550.118693 </span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /></span><u><span style="font-size:8.0pt;font-family:"CourierNew"">Multiple transaction bulk inserts: Select performance (refer script -1 &2 which attached)</span></u><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">sequential scan: 6.492680 6.382014 </span><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">Index scan: 1.386851 1.36234 </span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /></span><u><span style="font-size:8.0pt;font-family:"CourierNew"">Single transaction bulk inserts: Select performance (refer script - 3 &4 which attached)</span></u><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">sequential scan: 6.49319 6.3800147 </span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">Index scan: 1.384121 1.3615277 </span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /></span><u><span style="font-size:8.0pt;font-family:"CourierNew"">Long transaction open then Vacuum & select performance in milli seconds.(refer reports output)</span></u><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><spanstyle="font-size:8.0pt;font-family:"Courier New"">Testcase - 3:</span><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">Single Vacuum Perf : 128.302 ms 181.292 ms </span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">Single select perf : 214.107 ms 177.057 ms </span><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"CourierNew"">Total : 342.409 ms 358.349 ms</span><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Iwas not able to find the reason why in some of cases results arelow so please use the attached scripts to validate the same.</span><span style="font-size:8.0pt;font-family:"Times NewRoman","serif""><br /><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">I was not able to providethe IO statistics as IOSTAT & VMSTAT was giving the current snapshot</span><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">butnot the cumulative from start to end of test execution.</span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Documentation:</span></b><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">-------------</span></b><span style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Nouser visible changes, so no documentation is need to update.</span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /><br /></span><span style="font-size:8.0pt"></span><pclass="MsoNormalCxSpLast"><span style="font-size:8.0pt">Regards,</span><p class="MsoPlainText"><spanstyle="font-size:8.0pt">Hari babu.</span></div>
On Thu, Dec 7, 2012 at 7:56 PM, Hari babu <haribabu(dot)kommi(at)Huawei(dot)com> wrote:
>>On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>Thanks for that -- that's fairly comprehensive I'd say. I'm quite
>>interested in that benchmarking framework as well. Do you need help
>>setting up the scripts?
>Presently I am testing with pgbench custom query option & taking IO & VM statistics in parallel.
>Following way I had written the test script for case -1.
>./psql -f bench_test_1_init.sql postgres iostat -t 1 -d > hint.test1.iostat.reading_3.txt & vmstat -n 1 > >hint.test1.vmstat.reading_3.txt & ./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres killall -s SIGINT iostat >killall -s SIGINT vmstat
>Where the sql files are as follows:
>-- bench_test_1.sql
>select count(*) from bench where f1 is not null;
>--bench_test_1_init.sql
>drop table if exists bench;
>create table bench(f0 int primary key, f1 char(50)); insert into bench values (generate_series(1, 100000), 'a'); insert >into bench values (generate_series(100001, 200000), 'a'); ...
>insert into bench values (generate_series(9800001, 9900000), 'a'); insert into bench values (generate_series(9900001, >10000000), 'a'); checkpoint;
>I will provide the test results later.
Please find the review of the patch.
Basic stuff:
------------
- Patch applies with offsets.
- Compiles cleanly with no warnings
- Regression Test pass.
Code Review:
-------------
1. Better to set the hint bits for the tuples in a page, if the page is already dirty.
Test cases:
----------
Test cases are already described in the following link.
http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kapila@huawei.com
Test Results:
------------
Machine details:
CPU cores : 4
RAM : 24GB
OS : Suse Linux 10 SP2Configuration:
shared_buffers = 500MB
autovacuum = off
checkpoint_segments = 256
checkpoint_timeout = 10min
Following result are average of 3 runs each run is of 5min through pgbench custom query.
Test case - 1:
Init:
None
Run:
create temp table atri1 as select v from generate_series(1,1000000) v;
select * from atri1;
VACUUM atri1;
DROP TABLE atri1;
Test case - 2:
Init:
create table atri1 as select v from generate_series(1,1000000) v;
Run:
select * from atri1;
Test case - 3: (without pgbench)
connect two sessions s1, s2
s1 : start transaction;
s2 :
create table atri1 as select v from generate_series(1,1000000) v;
\timing
select * from atri1;
VACUUM atri1;
Results:
9.3devel(Tps) HintbitIO(Tps)
---------------------------------------------------
Test case - 1: 1.762946 1.922219
Test case - 2: 4.038673 4.044356
Pgbench without vacuum scale factor 75, 8 threads & 8 client. (refer reports attached)
Default tables select : 64980.736149 64550.118693
Unlogged tables select: 64874.974334 64550.118693
Multiple transaction bulk inserts: Select performance (refer script -1 & 2 which attached)
sequential scan: 6.492680 6.382014
Index scan: 1.386851 1.36234
Single transaction bulk inserts: Select performance (refer script - 3 & 4 which attached)
sequential scan: 6.49319 6.3800147
Index scan: 1.384121 1.3615277
Long transaction open then Vacuum & select performance in milli seconds. (refer reports output)
Testcase - 3:
Single Vacuum Perf : 128.302 ms 181.292 ms
Single select perf : 214.107 ms 177.057 ms
Total : 342.409 ms 358.349 ms
I was not able to find the reason why in some of cases results are low so please use the attached scripts to validate the same.
I was not able to provide the IO statistics as IOSTAT & VMSTAT was giving the current snapshot
but not the cumulative from start to end of test execution.
Documentation:
-------------
No user visible changes, so no documentation is need to update.Regards,
Hari babu.
Thanks for the review and tests.
The remarkable difference between 9.3devel and Hint Bit IO is present only in test -3,right? I have the feeling that the original case we discussed(vacuum setting the hint bits) is taking place here and hence the decrease in performance.
Atri
--
On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu <haribabu.kommi@huawei.com> wrote: > Please find the review of the patch. Thanks for detailed review! > Basic stuff: > ------------ > - Patch applies with offsets. > - Compiles cleanly with no warnings > - Regression Test pass. > > Code Review: > ------------- > 1. Better to set the hint bits for the tuples in a page, if the page > is already dirty. This is true today but likely less true if/when page checksums come out. Also it complicates the code a little bit. > Default tables select : 64980.736149 64550.118693 > Unlogged tables select: 64874.974334 64550.118693 So it looks like the extra tests visibility routines are causing 0.7% performance hit. > Multiple transaction bulk inserts: Select performance (refer script -1 & 2 > which attached) > sequential scan: 6.492680 6.382014 > Index scan: 1.386851 1.36234 > > Single transaction bulk inserts: Select performance (refer script - 3 & 4 > which attached) > sequential scan: 6.49319 6.3800147 > Index scan: 1.384121 1.3615277 The performance hit is higher here. Almost 2%. This is troubling. > Long transaction open then Vacuum & select performance in milli seconds. > (refer reports output) > Testcase - 3: > Single Vacuum Perf : 128.302 ms 181.292 ms > Single select perf : 214.107 ms 177.057 ms > Total : 342.409 ms 358.349 ms > > I was not able to find the reason why in some of cases results are low so > please use the attached scripts to validate the same. I need to validate the vacuum results. It's possible that this is solvable by tweaking xmin check inside vacuum. Assuming that's fixed, the question stands: do the results justify the change? I'd argue 'maybe' -- I'd like to see the bulk insert performance hit reduced if possible. Let's see what we can do in the short term (and, if no improvement can be found, I think this patch should be marked 'returned with feedback'). merlin
On Thursday, December 13, 2012 8:02 PM Merlin Moncure wrote: > On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu <haribabu.kommi@huawei.com> > wrote: > > Please find the review of the patch. > > > Thanks for detailed review! > > > Basic stuff: > > ------------ > > - Patch applies with offsets. > > - Compiles cleanly with no warnings > > - Regression Test pass. > > > > Code Review: > > ------------- > > 1. Better to set the hint bits for the tuples in a page, if > the page > > is already dirty. > > This is true today but likely less true if/when page checksums come > out. Also it complicates the code a little bit. > > > Default tables select : 64980.736149 64550.118693 > > Unlogged tables select: 64874.974334 64550.118693 > > So it looks like the extra tests visibility routines are causing 0.7% > performance hit. > > > Multiple transaction bulk inserts: Select performance (refer script -1 > & 2 > > which attached) > > sequential scan: 6.492680 6.382014 > > Index scan: 1.386851 1.36234 > > > > Single transaction bulk inserts: Select performance (refer script - 3 > & 4 > > which attached) > > sequential scan: 6.49319 6.3800147 > > Index scan: 1.384121 1.3615277 > > The performance hit is higher here. Almost 2%. This is troubling. > > > Long transaction open then Vacuum & select performance in milli > seconds. > > (refer reports output) > > Testcase - 3: > > Single Vacuum Perf : 128.302 ms 181.292 ms > > Single select perf : 214.107 ms 177.057 ms > > Total : 342.409 ms 358.349 ms > > > > I was not able to find the reason why in some of cases results are low > so > > please use the attached scripts to validate the same. > > I need to validate the vacuum results. It's possible that this is > solvable by tweaking xmin check inside vacuum. Assuming that's fixed, > the question stands: do the results justify the change? I'd argue > 'maybe' We can try with change (assuming change is small) and see if the performance gain is good, then discuss whether it really justifies. I think the main reason for Vacuum performance hit is that in the test pages are getting dirty only due to setting of hint bit by Vacuum. >-- I'd like to see the bulk insert performance hit reduced if > possible. I think if we can improve performance for bulk-insert case, then this patch has much more value. With Regards, Amit Kapila.
On 12/14/2012 09:57 PM, Amit Kapila wrote: >> >> I need to validate the vacuum results. It's possible that this is >> solvable by tweaking xmin check inside vacuum. Assuming that's fixed, >> the question stands: do the results justify the change? I'd argue >> 'maybe' > We can try with change (assuming change is small) and see if the performance > gain is good, then discuss whether it really justifies. > I think the main reason for Vacuum performance hit is that in the test pages > are getting dirty only due to setting of hint bit > by Vacuum. > >> -- I'd like to see the bulk insert performance hit reduced if >> possible. > I think if we can improve performance for bulk-insert case, then this patch > has much more value. Has there been any movement in this - more benchmarks and data showing that it really does improve performance, or that it really isn't helpful? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 18, 2013 at 5:34 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 12/14/2012 09:57 PM, Amit Kapila wrote: >>> >>> I need to validate the vacuum results. It's possible that this is >>> solvable by tweaking xmin check inside vacuum. Assuming that's fixed, >>> the question stands: do the results justify the change? I'd argue >>> 'maybe' >> We can try with change (assuming change is small) and see if the performance >> gain is good, then discuss whether it really justifies. >> I think the main reason for Vacuum performance hit is that in the test pages >> are getting dirty only due to setting of hint bit >> by Vacuum. >> >>> -- I'd like to see the bulk insert performance hit reduced if >>> possible. >> I think if we can improve performance for bulk-insert case, then this patch >> has much more value. > Has there been any movement in this - more benchmarks and data showing > that it really does improve performance, or that it really isn't helpful? Atri is working on that. I don't know if he's going to pull it out though in time so we may have to pull the patch from this fest. My take on the current patch is that the upside case is pretty clear but the bulk insert performance impact needs to be figured out and mitigated (that's what Atri is working on). merlin
On 18-Jan-2013, at 17:04, Craig Ringer <craig@2ndQuadrant.com> wrote: > On 12/14/2012 09:57 PM, Amit Kapila wrote: >>> >>> I need to validate the vacuum results. It's possible that this is >>> solvable by tweaking xmin check inside vacuum. Assuming that's fixed, >>> the question stands: do the results justify the change? I'd argue >>> 'maybe' >> We can try with change (assuming change is small) and see if the performance >> gain is good, then discuss whether it really justifies. >> I think the main reason for Vacuum performance hit is that in the test pages >> are getting dirty only due to setting of hint bit >> by Vacuum. >> >>> -- I'd like to see the bulk insert performance hit reduced if >>> possible. >> I think if we can improve performance for bulk-insert case, then this patch >> has much more value. > Has there been any movement in this - more benchmarks and data showing > that it really does improve performance, or that it really isn't helpful? > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services Hello all, Sorry for the delay in updating the hackers list with the current status. I recently did some profiling using perf on PostgreSQL 9.2 with and without our patch. I noticed that maximum time is being spent on heapgettup function. Further investigation and a bit of a hunch leads me tobelieve that we may be adversely affecting the visibility map optimisation that directly interact with the visibility functions,that our patch straight away affects. If this is the case, we may really need to get down to the design of our patch, and maybe see which visibility function/functionswe are affecting, and see if we can mitigate the affect. Please let me know your inputs on this. Regards, Atri >
On Fri, Jan 18, 2013 at 8:57 AM, Atri Sharma <atri.jiit@gmail.com> wrote: > > Hello all, > > Sorry for the delay in updating the hackers list with the current status. > > I recently did some profiling using perf on PostgreSQL 9.2 with and without our patch. > > I noticed that maximum time is being spent on heapgettup function. Further investigation and a bit of a hunch leads meto believe that we may be adversely affecting the visibility map optimisation that directly interact with the visibilityfunctions, that our patch straight away affects. > > If this is the case, we may really need to get down to the design of our patch, and maybe see which visibility function/functionswe are affecting, and see if we can mitigate the affect. > > Please let me know your inputs on this. Any scenario that involves non-trivial amount of investigation or development should result in us pulling the patch for rework and resubmission in later 'fest....it's closing time as they say :-). merlin
On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > Any scenario that involves non-trivial amount of investigation or > development should result in us pulling the patch for rework and > resubmission in later 'fest....it's closing time as they say :-). Amen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 01/18/2013 11:50 PM, Robert Haas wrote: > On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> Any scenario that involves non-trivial amount of investigation or >> development should result in us pulling the patch for rework and >> resubmission in later 'fest....it's closing time as they say :-). > Amen. > OK, bumped to the next CF. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services