Thread: WIP patch for hint bit i/o mitigation

WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

06 November 2012, 23:55:58

Following the sig is a first cut at a patch (written by Atri) that
attempts to mitigate hint bit i/o penalty when many pages worth of
tuples are sequentially written out with the same transaction id.
There have been other attempts to deal with this problem that fit
niche cases (especially those that create the table in the same
transaction as the one inserting) that work but don't really solve the
problem generally.

I previously attacked this problem ([1], [2]) and came up with a patch
that cached hint bits inside tqual.c.  The patch was pulled for a few
reasons:

1) a lot of complexity without proper justification
2) sketchy cache replacement algorithm
3) I manged to misspell 'committed' just about everywhere
4) invalidation?

Issues 1-3 could have been worked out but #4 was making me think the
problem was a nonstarter, or at least, 'too much too soon'.  The tuple
visibility routines are in a very tight code path and having to deal
with various things in the backend that could cause the xid to become
stale were making me nervous.   A smaller, simpler patch might be the
ticket.

What's happening here is that we're putting in 'last xid' cache in the
same manner and style as in transam.c. Unlike transam.c though, we're
caching the infomask bit only and only if we're safe per
XLogNeedsFlush().  While useful in its own right, the transam.c 'last
xid' cache is not sufficient because:

1) While it does guard against clog fetch, it does nothing to mitigate
hint bit penalties which are just as bad if not worse since you get a
page write+page header lock during a broad scan
2) In the very tight loops that crank HeapTupleSatisfiesMVCC, removing
the minimal non-clog based processing (TransactionIdIsInProgress, etc)
helps a little bit in cpu terms when scanning the table for the very
first time.

So given that -- the patch simple adds an extra check when/where hint
bit status is checked in the visibility routines (currently, only
HeapTupleSatisfiesMVCC is done but all the applicable visibility
routines should be done).  Basically, the way it works is like this:

*) is hint bit set?
*) if not? does the examined xid match the last examined one?
*) if so, and the cached hint bit matches the one want, proceeed as if
hint bit was set

Basically, this adds two tests to visibility check in the worst case
(zero if hint bit was set, one if xid does not match).  Pretty cheap.
If you fault through the cached xid and dirty the page, then you're
performance shouldn't be worse than the status quo.  Also, dealing
with 'last xid' is immune from invalidation issues [3] which is a nice
property.

The most logic place to *set* the cached xid/bit seemed to be in
SetHintBits().  Possible improvements might be to:
*) set the hint bit if the page is dirty
and/or
*) set the hint bit but do not dirty the page upon cache (although
this might have bad interplay with 'all hint bits must be WAL logged'
page checksum proposal)

Atri benched the patch and came up with the following:

atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench
atri -n -f bench.sql -c 8 -T 120
transaction type: Custom query
number of transactions actually processed: 1433
tps = 11.900653 (including connections establishing)
tps = 11.903127 (excluding connections establishing)

atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench
atri -n -f bench.sql -c 8 -T 120
transaction type: Custom query
number of transactions actually processed: 1769
tps = 14.669422 (including connections establishing)
tps = 14.673167 (excluding connections establishing)

bench.sql:
create temp table foo as select v from generate_series(1,100000) v;
select * from foo;
drop table foo;

merlin

notes:
[1] http://archives.postgresql.org/pgsql-hackers/2011-03/msg01765.php
[2] http://archives.postgresql.org/message-id/BANLkTinr1h1+rKX1UOukn-rAONBTEHQvew@mail.gmail.com)
[3] http://postgresql.1045698.n5.nabble.com/Is-cachedFetchXidStatus-provably-valid-td5712454.html


patch:

*** tqual1.c    2012-09-20 03:17:58.000000000 +0530
--- tqual.c    2012-11-06 00:50:39.769409077 +0530
***************
*** 72,81 **** SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny}; SnapshotData SnapshotToastData =
{HeapTupleSatisfiesToast};
 /* local functions */ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

- /*  * SetHintBits()  *
--- 72,83 ---- SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny}; SnapshotData SnapshotToastData =
{HeapTupleSatisfiesToast};

+ static TransactionId     cachedVisibilityXid;
+ static uint16         cachedVisibilityXidStatus;
+ /* local functions */ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
 /*  * SetHintBits()  *
***************
*** 117,122 ****
--- 119,127 ----
         if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))             return;                /* not flushed
yet,so don't set hint */
 
+
+         cachedVisibilityXid = xid;
+         cachedVisibilityXidStatus = infomask;     }
     tuple->t_infomask |= infomask;
***************
*** 904,910 **** HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))     {         if (tuple->t_infomask & HEAP_XMIN_INVALID)
  return false;
 
--- 909,917 ---- HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))     {         if (tuple->t_infomask &
HEAP_XMIN_INVALID)            return false;
 
***************
*** 1008,1014 ****         return true;     }

!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))     {         if
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))        {
 
--- 1015,1023 ----         return true;     }

!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))     {         if
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))        {

Re: WIP patch for hint bit i/o mitigation

From

Amit Kapila

Date:

07 November 2012, 10:17:12

> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Merlin Moncure
> Sent: Wednesday, November 07, 2012 5:26 AM
> To: PostgreSQL-development
> Cc: Atri Sharma
> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
> 
> Following the sig is a first cut at a patch (written by Atri) that
> attempts to mitigate hint bit i/o penalty when many pages worth of
> tuples are sequentially written out with the same transaction id.
> There have been other attempts to deal with this problem that fit
> niche cases (especially those that create the table in the same
> transaction as the one inserting) that work but don't really solve the
> problem generally.
> 
> I previously attacked this problem ([1], [2]) and came up with a patch
> that cached hint bits inside tqual.c.  The patch was pulled for a few
> reasons:
> 
> 1) a lot of complexity without proper justification
> 2) sketchy cache replacement algorithm
> 3) I manged to misspell 'committed' just about everywhere
> 4) invalidation?
> 
> Issues 1-3 could have been worked out but #4 was making me think the
> problem was a nonstarter, or at least, 'too much too soon'.  The tuple
> visibility routines are in a very tight code path and having to deal
> with various things in the backend that could cause the xid to become
> stale were making me nervous.   A smaller, simpler patch might be the
> ticket.

About invalidation, I think the cached xid can become invalid due to xid
wraparound.
So for that one way could be to invalidate it through Vacuum.

Though I am not sure what all other things can make cached id as invalid,
but I think once we
can think what other ways can make cached id invalid, then we can see if
there is a solution to address
them.


With Regards,
Amit Kapila.

Re: WIP patch for hint bit i/o mitigation

From

Atri Sharma

Date:

07 November 2012, 10:31:45

On 07-Nov-2012, at 15:46, Amit Kapila <amit.kapila@huawei.com> wrote:

>> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
>> owner@postgresql.org] On Behalf Of Merlin Moncure
>> Sent: Wednesday, November 07, 2012 5:26 AM
>> To: PostgreSQL-development
>> Cc: Atri Sharma
>> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
>>
>> Following the sig is a first cut at a patch (written by Atri) that
>> attempts to mitigate hint bit i/o penalty when many pages worth of
>> tuples are sequentially written out with the same transaction id.
>> There have been other attempts to deal with this problem that fit
>> niche cases (especially those that create the table in the same
>> transaction as the one inserting) that work but don't really solve the
>> problem generally.
>>
>> I previously attacked this problem ([1], [2]) and came up with a patch
>> that cached hint bits inside tqual.c.  The patch was pulled for a few
>> reasons:
>>
>> 1) a lot of complexity without proper justification
>> 2) sketchy cache replacement algorithm
>> 3) I manged to misspell 'committed' just about everywhere
>> 4) invalidation?
>>
>> Issues 1-3 could have been worked out but #4 was making me think the
>> problem was a nonstarter, or at least, 'too much too soon'.  The tuple
>> visibility routines are in a very tight code path and having to deal
>> with various things in the backend that could cause the xid to become
>> stale were making me nervous.   A smaller, simpler patch might be the
>> ticket.
>
> About invalidation, I think the cached xid can become invalid due to xid
> wraparound.
> So for that one way could be to invalidate it through Vacuum.
>
> Though I am not sure what all other things can make cached id as invalid,
> but I think once we
> can think what other ways can make cached id invalid, then we can see if
> there is a solution to address
> them.
>
>
> With Regards,
> Amit Kapila.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation
issuesare not significant any more. 

Regards,

Atri

Re: WIP patch for hint bit i/o mitigation

From

Amit Kapila

Date:

07 November 2012, 12:02:38

> -----Original Message-----
> From: Atri Sharma [mailto:atri.jiit@gmail.com]
> Sent: Wednesday, November 07, 2012 4:02 PM
> To: Amit Kapila
> Cc: Merlin Moncure; PostgreSQL-development
> Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation
> 
> On 07-Nov-2012, at 15:46, Amit Kapila <amit.kapila@huawei.com> wrote:
> 
> >> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> >> owner@postgresql.org] On Behalf Of Merlin Moncure
> >> Sent: Wednesday, November 07, 2012 5:26 AM
> >> To: PostgreSQL-development
> >> Cc: Atri Sharma
> >> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
> >>
> >> Following the sig is a first cut at a patch (written by Atri) that
> >> attempts to mitigate hint bit i/o penalty when many pages worth of
> >> tuples are sequentially written out with the same transaction id.
> >> There have been other attempts to deal with this problem that fit
> >> niche cases (especially those that create the table in the same
> >> transaction as the one inserting) that work but don't really solve
> the
> >> problem generally.
> 
> As we are now dealing with only the last xid(please refer to the details
> of the patch attached), the invalidation issues are not significant any
> more.

I think you are right, but please check below the scenario I have in mind
due to which I got confused:

Session -1
1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go  inside SetHinbits and set it to xid of tuple
whichis let's say = 708

2. now for all consecutive tuples which have same xmin (708), it can
directly refer  cached id and cached status, even if hint bit is not set.

Other Sessions
3. now from other sessions, large operations happened on all other tables
except this table.
4. The situation can reach where xid can wrap around.

Session -1
5. It again tries to query the same table, at this point it will compare   the xid in tuple with cachedVisibilityXid. 
Now if all tuple's xid's for that particular table are frozen in all cases
then it seems to be okay, otherwise it might be problem.
I am not fully aware about this wrap around and frozed xid concept, thats
why I had doubted
that it might create problem, on further thought, it appears that I was
wrong.

With Regards,
Amit Kapila.

Re: WIP patch for hint bit i/o mitigation

From

Atri Sharma

Date:

07 November 2012, 13:10:19

On Wed, Nov 7, 2012 at 5:31 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

> -----Original Message-----
> From: Atri Sharma [mailto:atri.jiit@gmail.com]
> Sent: Wednesday, November 07, 2012 4:02 PM
> To: Amit Kapila
> Cc: Merlin Moncure; PostgreSQL-development
> Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation
>
> On 07-Nov-2012, at 15:46, Amit Kapila <amit.kapila@huawei.com> wrote:
>
> >> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> >> owner@postgresql.org] On Behalf Of Merlin Moncure
> >> Sent: Wednesday, November 07, 2012 5:26 AM
> >> To: PostgreSQL-development
> >> Cc: Atri Sharma
> >> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
> >>
> >> Following the sig is a first cut at a patch (written by Atri) that
> >> attempts to mitigate hint bit i/o penalty when many pages worth of
> >> tuples are sequentially written out with the same transaction id.
> >> There have been other attempts to deal with this problem that fit
> >> niche cases (especially those that create the table in the same
> >> transaction as the one inserting) that work but don't really solve
> the
> >> problem generally.
>
> As we are now dealing with only the last xid(please refer to the details
> of the patch attached), the invalidation issues are not significant any
> more.

I think you are right, but please check below the scenario I have in mind
due to which I got confused:

Session -1
1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
inside SetHinbits and set it to xid of tuple which is let's say = 708
2. now for all consecutive tuples which have same xmin (708), it can
directly refer
cached id and cached status, even if hint bit is not set.

Other Sessions
3. now from other sessions, large operations happened on all other tables
except this table.
4. The situation can reach where xid can wrap around.

Session -1
5. It again tries to query the same table, at this point it will compare
the xid in tuple with cachedVisibilityXid.

Now if all tuple's xid's for that particular table are frozen in all cases
then it seems to be okay, otherwise it might be problem.
I am not fully aware about this wrap around and frozed xid concept, thats
why I had doubted
that it might create problem, on further thought, it appears that I was
wrong.

With Regards,
Amit Kapila.

AFAIK, xid are managed by reference xids, that have a range of +- 2 billion xids. Once this limit is reached, then reference xids are moved forward, and the xids that do not fall in the reference xid +- 2 billion are freezed.Hence, in the given scenario, I believe once the wrap around happens, since the xmin is same for all the tuples in session-1's table, there should no be no problem and all tuple's xid for that particular table will be frozen.

Atri

--

Regards,

Atri

l'apprenant

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

07 November 2012, 16:17:31

On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that
>> >> attempts to mitigate hint bit i/o penalty when many pages worth of
>> >> tuples are sequentially written out with the same transaction id.
>> >> There have been other attempts to deal with this problem that fit
>> >> niche cases (especially those that create the table in the same
>> >> transaction as the one inserting) that work but don't really solve
>> the
>> >> problem generally.
>>
>> As we are now dealing with only the last xid(please refer to the details
>> of the patch attached), the invalidation issues are not significant any
>> more.
>
> I think you are right, but please check below the scenario I have in mind
> due to which I got confused:
>
> Session -1
> 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
>    inside SetHinbits and set it to xid of tuple which is let's say = 708
> 2. now for all consecutive tuples which have same xmin (708), it can
> directly refer
>    cached id and cached status, even if hint bit is not set.
>
> Other Sessions
> 3. now from other sessions, large operations happened on all other tables
> except this table.
> 4. The situation can reach where xid can wrap around.
>
> Session -1
> 5. It again tries to query the same table, at this point it will compare
>    the xid in tuple with cachedVisibilityXid.
>
> Now if all tuple's xid's for that particular table are frozen in all cases
> then it seems to be okay, otherwise it might be problem.
> I am not fully aware about this wrap around and frozed xid concept, thats
> why I had doubted
> that it might create problem, on further thought, it appears that I was
> wrong.

Well there's that.  But more to the point for the cache to fail you'd
have to have a scenario where a table didn't scan any records for 1
billion+ transactions.   See note [3] above for reasoning why this is
implausible.  Also we're already relying on this effect in transam.c.

merlin

Re: WIP patch for hint bit i/o mitigation

From

Atri Sharma

Date:

14 November 2012, 19:12:36

On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that
>> >> attempts to mitigate hint bit i/o penalty when many pages worth of
>> >> tuples are sequentially written out with the same transaction id.
>> >> There have been other attempts to deal with this problem that fit
>> >> niche cases (especially those that create the table in the same
>> >> transaction as the one inserting) that work but don't really solve
>> the
>> >> problem generally.
>>
>> As we are now dealing with only the last xid(please refer to the details
>> of the patch attached), the invalidation issues are not significant any
>> more.
>
> I think you are right, but please check below the scenario I have in mind
> due to which I got confused:
>
> Session -1
> 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
> inside SetHinbits and set it to xid of tuple which is let's say = 708
> 2. now for all consecutive tuples which have same xmin (708), it can
> directly refer
> cached id and cached status, even if hint bit is not set.
>
> Other Sessions
> 3. now from other sessions, large operations happened on all other tables
> except this table.
> 4. The situation can reach where xid can wrap around.
>
> Session -1
> 5. It again tries to query the same table, at this point it will compare
> the xid in tuple with cachedVisibilityXid.
>
> Now if all tuple's xid's for that particular table are frozen in all cases
> then it seems to be okay, otherwise it might be problem.
> I am not fully aware about this wrap around and frozed xid concept, thats
> why I had doubted
> that it might create problem, on further thought, it appears that I was
> wrong.

Well there's that. But more to the point for the cache to fail you'd
have to have a scenario where a table didn't scan any records for 1
billion+ transactions. See note [3] above for reasoning why this is
implausible. Also we're already relying on this effect in transam.c.

merlin

PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus holds the commit status of the XID in cachedVisibilityXid.

In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache.

So, in place of

      if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))

the condition is now

if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
        && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
        && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))

This checks if the commit status can be known from the cache or not before proceeding.

I will be posting the patch to commit fest.

Thoughts/Feedback?

Atri

--

Regards,

Atri

l'apprenant

patch:

*** tqual--unchanged.c    2012-09-20 03:17:58.000000000 +0530
--- tqual.c    2012-11-14 23:27:30.470499857 +0530
***************
*** 75,80 ****
--- 75,88 ----
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

+ /*
+ * Single-item cache for results of clog visibility check. It's worth having
+ * such a cache to help reduce the amount of hint bit traffic when
+ * many sequentially touched tuples have the same XID.
+ */
+ static TransactionId     cachedVisibilityXid;
+ /* Visibility status cache stores the commit status of the XID in cachedVisibilityXid */
+ static uint16         cachedVisibilityXidStatus;

/*
   * SetHintBits()
***************
*** 117,122 ****
--- 125,133 ----

        if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))
            return;                /* not flushed yet, so don't set hint */
+
+         cachedVisibilityXid = xid;
+         cachedVisibilityXidStatus = infomask;
    }

    tuple->t_infomask |= infomask;
***************
*** 164,170 ****
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 175,183 ----
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 247,253 ****
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
--- 260,268 ----
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
***************
*** 327,333 ****
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 342,350 ----
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 416,422 ****
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
--- 433,441 ----
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
***************
*** 493,499 ****
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 512,520 ----
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 574,580 ****
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return HeapTupleInvisible;
--- 595,603 ----
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return HeapTupleInvisible;
***************
*** 663,669 ****
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return HeapTupleMayBeUpdated;

!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return HeapTupleMayBeUpdated;
--- 686,694 ----
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return HeapTupleMayBeUpdated;

!     if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return HeapTupleMayBeUpdated;
***************
*** 743,749 ****
{
    snapshot->xmin = snapshot->xmax = InvalidTransactionId;

!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 768,776 ----
{
    snapshot->xmin = snapshot->xmax = InvalidTransactionId;

!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 830,836 ****
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
--- 857,865 ----
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
***************
*** 904,910 ****
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
                       Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 933,941 ----
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
                       Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 1008,1014 ****
        return true;
    }

!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
    {
        if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
        {
--- 1039,1047 ----
        return true;
    }

!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
        {
***************
*** 1240,1246 ****
    * invalid, then we assume it's still alive (since the presumption is that
    * all relevant hint bits were just set moments ago).
    */
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
        return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true : false;

    /*
--- 1273,1281 ----
    * invalid, then we assume it's still alive (since the presumption is that
    * all relevant hint bits were just set moments ago).
    */
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
        return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true : false;

    /*
***************
*** 1253,1259 ****
        return false;

    /* If deleter isn't known to have committed, assume it's still running. */
!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
        return false;

    /* Deleter committed, so tuple is dead if the XID is old enough. */
--- 1288,1296 ----
        return false;

    /* If deleter isn't known to have committed, assume it's still running. */
!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
        return false;

    /* Deleter committed, so tuple is dead if the XID is old enough. */

Re: WIP patch for hint bit i/o mitigation

From

Atri Sharma

Date:

14 November 2012, 20:32:00

On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma <atri.jiit@gmail.com> wrote:

On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that
>> >> attempts to mitigate hint bit i/o penalty when many pages worth of
>> >> tuples are sequentially written out with the same transaction id.
>> >> There have been other attempts to deal with this problem that fit
>> >> niche cases (especially those that create the table in the same
>> >> transaction as the one inserting) that work but don't really solve
>> the
>> >> problem generally.
>>
>> As we are now dealing with only the last xid(please refer to the details
>> of the patch attached), the invalidation issues are not significant any
>> more.
>
> I think you are right, but please check below the scenario I have in mind
> due to which I got confused:
>
> Session -1
> 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
> inside SetHinbits and set it to xid of tuple which is let's say = 708
> 2. now for all consecutive tuples which have same xmin (708), it can
> directly refer
> cached id and cached status, even if hint bit is not set.
>
> Other Sessions
> 3. now from other sessions, large operations happened on all other tables
> except this table.
> 4. The situation can reach where xid can wrap around.
>
> Session -1
> 5. It again tries to query the same table, at this point it will compare
> the xid in tuple with cachedVisibilityXid.
>
> Now if all tuple's xid's for that particular table are frozen in all cases
> then it seems to be okay, otherwise it might be problem.
> I am not fully aware about this wrap around and frozed xid concept, thats
> why I had doubted
> that it might create problem, on further thought, it appears that I was
> wrong.

Well there's that. But more to the point for the cache to fail you'd
have to have a scenario where a table didn't scan any records for 1
billion+ transactions. See note [3] above for reasoning why this is
implausible. Also we're already relying on this effect in transam.c.

merlin

PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus holds the commit status of the XID in cachedVisibilityXid.

In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache.

So, in place of

      if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))

the condition is now

if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
        && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
        && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))

This checks if the commit status can be known from the cache or not before proceeding.

I will be posting the patch to commit fest.

Thoughts/Feedback?

Atri

--
Regards,

Atri
l'apprenant

patch:

*** tqual--unchanged.c    2012-09-20 03:17:58.000000000 +0530
--- tqual.c    2012-11-14 23:27:30.470499857 +0530
***************
*** 75,80 ****
--- 75,88 ----

/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

+ /*
+ * Single-item cache for results of clog visibility check. It's worth having
+ * such a cache to help reduce the amount of hint bit traffic when
+ * many sequentially touched tuples have the same XID.
+ */
+ static TransactionId     cachedVisibilityXid;
+ /* Visibility status cache stores the commit status of the XID in cachedVisibilityXid */
+ static uint16         cachedVisibilityXidStatus;

/*
   * SetHintBits()
***************
*** 117,122 ****
--- 125,133 ----

        if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))
            return;                /* not flushed yet, so don't set hint */
+
+         cachedVisibilityXid = xid;
+         cachedVisibilityXidStatus = infomask;
    }

    tuple->t_infomask |= infomask;
***************
*** 164,170 ****
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)

{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 175,183 ----
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)

{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 247,253 ****
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
--- 260,268 ----
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
***************
*** 327,333 ****
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)

{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 342,350 ----
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)

{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 416,422 ****
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
--- 433,441 ----
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
***************
*** 493,499 ****
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,

                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 512,520 ----
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,

                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 574,580 ****
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,

                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return HeapTupleInvisible;
--- 595,603 ----
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,

                        Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return HeapTupleInvisible;
***************
*** 663,669 ****
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return HeapTupleMayBeUpdated;

!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return HeapTupleMayBeUpdated;
--- 686,694 ----
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return HeapTupleMayBeUpdated;

!     if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return HeapTupleMayBeUpdated;
***************
*** 743,749 ****
{
    snapshot->xmin = snapshot->xmax = InvalidTransactionId;

!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 768,776 ----
{
    snapshot->xmin = snapshot->xmax = InvalidTransactionId;

!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 830,836 ****
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
--- 857,865 ----
    if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted */
        return true;

!     if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;

***************
*** 904,910 ****
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
                       Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
--- 933,941 ----

HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
                       Buffer buffer)
{
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
    {
        if (tuple->t_infomask & HEAP_XMIN_INVALID)
            return false;
***************
*** 1008,1014 ****
        return true;
    }

!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
    {
        if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
        {
--- 1039,1047 ----

        return true;
    }

!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
    {
        if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
        {
***************
*** 1240,1246 ****
    * invalid, then we assume it's still alive (since the presumption is that
    * all relevant hint bits were just set moments ago).
    */

!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
        return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true : false;

    /*
--- 1273,1281 ----
    * invalid, then we assume it's still alive (since the presumption is that
    * all relevant hint bits were just set moments ago).
    */

!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
        return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true : false;

    /*
***************
*** 1253,1259 ****
        return false;

    /* If deleter isn't known to have committed, assume it's still running. */

!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
        return false;

    /* Deleter committed, so tuple is dead if the XID is old enough. */
--- 1288,1296 ----
        return false;

    /* If deleter isn't known to have committed, assume it's still running. */

!     if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
!         && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
!         && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
        return false;

    /* Deleter committed, so tuple is dead if the XID is old enough. */

Attached is the patch for review.

Atri

--

Regards,

Atri

l'apprenant

Attachment

tqual_xid_cache_v3_1.patch

Re: WIP patch for hint bit i/o mitigation

From

Amit Kapila

Date:

15 November 2012, 10:39:59

On Thursday, November 15, 2012 2:02 AM Atri Sharma wrote:
On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma <atri.jiit@gmail.com> wrote:

On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that

>PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus

>holds the commit status of the XID in cachedVisibilityXid.

>In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be >retrieved from the cache.

1. From your explanation and code, it is quite clear that it will certainly give performance benefits in the scenario’s mentioned by you.

I can once validate the performance numbers again and do the code review for this patch during CF-3.

However I am just not very sure about the use case, such that whether it is a sufficient use case.

So I would like to ask opinion of other people as well.

2. After this patch, tuple hint bit is not set by Select operations after data populated by one transaction.

This appears to be good as it will save many ops (page dirty followed by flush , clog inquiry).

Though it is no apparent, however we should see whether it can cause any other impact due to this:

a. like may be now VACUUM needs set the hint bit which may cause more I/O during Vacuum.

Hackers, any opinion/suggestions about the use case?

With Regards,

Amit Kapila.

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

15 November 2012, 15:57:25

On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>>In each visibility function (except HeapTupleSatisfiesVacuum() ), an
>> addition check has been added to check if the commit status of Xmin or Xmax
>> of a tuple can be >retrieved from the cache.
>
>
>
> 1.      From your explanation and code, it is quite clear that it will
> certainly give performance benefits in the scenario’s mentioned by you.
>
>        I can once validate the performance numbers again and do the code
> review for this patch during CF-3.
>
>        However I am just not very sure about the use case, such that whether
> it is a sufficient use case.
>
>        So I would like to ask opinion of other people as well.

sure.  I'd like to note though that hint bit i/o is a somewhat common
complaint.  it tends to most affect OLAP style workloads.  in
pathological workloads, it can really burn you -- it's not fun when
you are i/o starved via sequential scan.  This can still happen when
sweeping dead records (which this patch doesn't deal with, though
maybe it should).

> 2.  After this patch, tuple hint bit is not set by Select operations after
> data populated by one transaction.
>
>       This appears to be good as it will save many ops (page dirty followed
> by flush , clog inquiry).

Technically it does not save clog fetch as transam.c has a very
similar cache mechanism.  However, it does save a page write i/o and a
lock on the page header, as well as a couple of other minor things.
In the best case, the page write is completely masked as the page gets
dirty for other reasons.  I think this is going to become more
important in checksum enabled scenarios.

>       Though it is no apparent, however we should see whether it can cause
> any other impact due to this:
>
> a.        like may be now VACUUM needs set the hint bit which may cause more
> I/O during Vacuum.

IMNSHO. deferring non-critical i/o from foreground process to
background process is generally good.  VACUUM has nice features like
i/o throttling and in place cancel so latent management of internal
page optimization flags really belong there ideally.  Also, the longer
you defer such I/O the more opportunity there is for it to be masked
off by some other page dirtying operation (again, this is more
important in the face of having to log hint bit changes).

There could be some good rebuttal analysis though.

merlin

Re: WIP patch for hint bit i/o mitigation

From

Amit Kapila

Date:

15 November 2012, 16:25:22

On Thursday, November 15, 2012 9:27 PM Merlin Moncure wrote:
> On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila <amit.kapila@huawei.com>
> wrote:
> >>In each visibility function (except HeapTupleSatisfiesVacuum() ), an
> >> addition check has been added to check if the commit status of Xmin
> or Xmax
> >> of a tuple can be >retrieved from the cache.
> >
> >
> >
> > 1.      From your explanation and code, it is quite clear that it will
> > certainly give performance benefits in the scenario's mentioned by
> you.
> >
> >        I can once validate the performance numbers again and do the
> code
> > review for this patch during CF-3.
> >
> >        However I am just not very sure about the use case, such that
> whether
> > it is a sufficient use case.
> >
> >        So I would like to ask opinion of other people as well.
> 
> sure.  I'd like to note though that hint bit i/o is a somewhat common
> complaint.  it tends to most affect OLAP style workloads.  in
> pathological workloads, it can really burn you -- it's not fun when
> you are i/o starved via sequential scan.  This can still happen when
> sweeping dead records (which this patch doesn't deal with, though
> maybe it should).
> 
> > 2.  After this patch, tuple hint bit is not set by Select operations
> after
> > data populated by one transaction.
> >
> >       This appears to be good as it will save many ops (page dirty
> followed
> > by flush , clog inquiry).
> 
> Technically it does not save clog fetch as transam.c has a very
> similar cache mechanism.  However, it does save a page write i/o and a
> lock on the page header, as well as a couple of other minor things.
> In the best case, the page write is completely masked as the page gets
> dirty for other reasons.  I think this is going to become more
> important in checksum enabled scenarios.
> 
> >       Though it is no apparent, however we should see whether it can
> cause
> > any other impact due to this:
> >
> > a.        like may be now VACUUM needs set the hint bit which may
> cause more
> > I/O during Vacuum.
> 
> IMNSHO. deferring non-critical i/o from foreground process to
> background process is generally good.  

Yes, in regard of deferring you are right. 
However in this case may be when foreground process has to mark page dirty
due to hint-bit, it was already dirty so no extra I/O, but when it is done
by VACUUM, page may not be dirty. 

Also due to below points, doing it in VACUUM may cost more:
a. VACUUM has ring-buffer of fixed size and if such pages are many then
write of page needs to be done by VACUUM to replace existing page   in ring.
b. Considering sometimes people want VACUUM to run when system is not busy,
the chances of generating more overall I/O in system can be    more.

Why we can't avoid setting hint-bit in VACUUM?
Is it due to reason that it has to be done in some way, so that hint-bits
are properly set.
Or may be I am missing something trivial?

Though the case VACUUM, I am talking occurs very less in practical, but the
point came to my mind, 
so I thought of sharing with you.

> VACUUM has nice features like
> i/o throttling and in place cancel so latent management of internal
> page optimization flags really belong there ideally.  Also, the longer
> you defer such I/O the more opportunity there is for it to be masked
> off by some other page dirtying operation (again, this is more
> important in the face of having to log hint bit changes).
> 
> There could be some good rebuttal analysis though.

With Regards,
Amit Kapila.

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

15 November 2012, 16:48:56

On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> IMNSHO. deferring non-critical i/o from foreground process to
>> background process is generally good.
>
> Yes, in regard of deferring you are right.
> However in this case may be when foreground process has to mark page dirty
> due to hint-bit, it was already dirty so no extra I/O, but when it is done
> by VACUUM, page may not be dirty.

Yeah.  We can try to be smart and set the hint bits in that case.  Not
sure that will work out with checksum having to wal log hint bits
though (which by reading the checksum threads seems likely to happen).

> Also due to below points, doing it in VACUUM may cost more:
> a. VACUUM has ring-buffer of fixed size and if such pages are many then
> write of page needs to be done by VACUUM to replace existing page
>    in ring.

Sure, although in scans we are using ring buffer as well so in
practical sense the results are pretty close.

> b. Considering sometimes people want VACUUM to run when system is not busy,
> the chances of generating more overall I/O in system can be
>    more.

It's hard to imagine overall i/o load increasing.  However, longer
vacuum times should be considered.   A bigger  issue is that we are
missing an early opportunity to set the all visible bit. vacuum is
doing:

if (all_visible) {         TransactionId xmin;
         if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))         {                 all_visible = false;
      break;         }

assuming the cache is working and vacuum rolls around after a scan,
you lost the opportunity to set all_visible flag where previously it
would have been set, thereby dismantling the positive effect of an
index only scan.  AFAICT, this is the only case where vaccum is
directly interfacing with hint bits.  This could be construed as a
violation of heapam API?  Maybe if that's an issue we could proxy that
check to a heaptuple/tqual.c maintained function (in the same manner
as HeapTupleSetHintBits) so that the cache bit could be uniformly
checked.

All other *setting* of hint bits is running through SetHintBits(), so
I think we are safe from vacuum point of view.  That's another thing
to test for though.

merlin

Re: WIP patch for hint bit i/o mitigation

From

Jeff Davis

Date:

16 November 2012, 00:42:24

On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote:
> So given that -- the patch simple adds an extra check when/where hint
> bit status is checked in the visibility routines (currently, only
> HeapTupleSatisfiesMVCC is done but all the applicable visibility
> routines should be done).  Basically, the way it works is like this:
> 
> *) is hint bit set?
> *) if not? does the examined xid match the last examined one?
> *) if so, and the cached hint bit matches the one want, proceeed as if
> hint bit was set

Can you clarify the difference between this and
cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your
patch is accepted?

Regards,Jeff Davis

Re: WIP patch for hint bit i/o mitigation

From

Amit Kapila

Date:

16 November 2012, 10:32:44

On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
> On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit.kapila@huawei.com>
> wrote:

> 
> Sure, although in scans we are using ring buffer as well so in
> practical sense the results are pretty close.
> 
> > b. Considering sometimes people want VACUUM to run when system is not
> busy,
> > the chances of generating more overall I/O in system can be
> >    more.
> 
> It's hard to imagine overall i/o load increasing.  However, longer
> vacuum times should be considered.   A bigger  issue is that we are
> missing an early opportunity to set the all visible bit. vacuum is
> doing:
> 
> if (all_visible)
>   {
>           TransactionId xmin;
> 
>           if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
>           {
>                   all_visible = false;
>                   break;
>           }
> 
> assuming the cache is working and vacuum rolls around after a scan,
> you lost the opportunity to set all_visible flag where previously it
> would have been set, thereby dismantling the positive effect of an
> index only scan.  AFAICT, this is the only case where vaccum is
> directly interfacing with hint bits.  This could be construed as a
> violation of heapam API?  Maybe if that's an issue we could proxy that
> check to a heaptuple/tqual.c maintained function (in the same manner
> as HeapTupleSetHintBits) so that the cache bit could be uniformly
> checked.

I think we need to think of some tests to check if Vacuum or any other
impact has not been created due to this change.
I will devise tests during review of this patch. 
However if you have more ideas then share the same which will make tests of
this patch more strong.
For functional/performance test of this patch, one of my colleague Hari Babu
will also work along with me on it.

With Regards,
Amit Kapila.

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

16 November 2012, 14:03:07

On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
>> On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit.kapila@huawei.com>
>> wrote:
>
>>
>> Sure, although in scans we are using ring buffer as well so in
>> practical sense the results are pretty close.
>>
>> > b. Considering sometimes people want VACUUM to run when system is not
>> busy,
>> > the chances of generating more overall I/O in system can be
>> >    more.
>>
>> It's hard to imagine overall i/o load increasing.  However, longer
>> vacuum times should be considered.   A bigger  issue is that we are
>> missing an early opportunity to set the all visible bit. vacuum is
>> doing:
>>
>> if (all_visible)
>>   {
>>           TransactionId xmin;
>>
>>           if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
>>           {
>>                   all_visible = false;
>>                   break;
>>           }
>>
>> assuming the cache is working and vacuum rolls around after a scan,
>> you lost the opportunity to set all_visible flag where previously it
>> would have been set, thereby dismantling the positive effect of an
>> index only scan.  AFAICT, this is the only case where vaccum is
>> directly interfacing with hint bits.  This could be construed as a
>> violation of heapam API?  Maybe if that's an issue we could proxy that
>> check to a heaptuple/tqual.c maintained function (in the same manner
>> as HeapTupleSetHintBits) so that the cache bit could be uniformly
>> checked.
>
> I think we need to think of some tests to check if Vacuum or any other
> impact has not been created due to this change.
> I will devise tests during review of this patch.
> However if you have more ideas then share the same which will make tests of
> this patch more strong.
> For functional/performance test of this patch, one of my colleague Hari Babu
> will also work along with me on it.

Thanks.  So far, Atri ran some quick n dirty tests to see if there
were any regressions.  He benched a large scan followed by vacuum.  So
far, results are inconclusive so better testing methodologies will
definitely be greatly appreciated.  One of the challenges with working
in this part of the code is that it's quite difficult to make changes
without impacting at least some workloads.

merlin

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

16 November 2012, 14:26:09

On Thu, Nov 15, 2012 at 6:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote:
>> So given that -- the patch simple adds an extra check when/where hint
>> bit status is checked in the visibility routines (currently, only
>> HeapTupleSatisfiesMVCC is done but all the applicable visibility
>> routines should be done).  Basically, the way it works is like this:
>>
>> *) is hint bit set?
>> *) if not? does the examined xid match the last examined one?
>> *) if so, and the cached hint bit matches the one want, proceeed as if
>> hint bit was set
>
> Can you clarify the difference between this and
> cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your
> patch is accepted?

Very little, except:
*) transam.c managed cache is unable to influence the specific code
path through the visibility routines, at least not without significant
refactoring -- everything that happens in tqual.c should be inlined.
I played with doing it all in transam.c a while back and didn't much
like how it turned out.  That doesn't mean it can't work though.

*) There are a couple of important looking code paths that communicate
directly with transam.c.   For example, in procarray.c
ProcArrayApplyRecoveryInfo().  Removing transam.c cache could turn
into fairly significant regression if that code is performance
sensitive -- that would have to be studied before doing that.

Maybe abstracting 'last xid cache'  along with hint bit management out
of both transam.c and tqual.c into something like 'hints.c' is
appropriate, but that's a more invasive change.

merlin

Re: WIP patch for hint bit i/o mitigation

From

Atri Sharma

Date:

16 November 2012, 19:42:30

On Fri, Nov 16, 2012 at 7:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
>> On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit.kapila@huawei.com>
>> wrote:
>
>>
>> Sure, although in scans we are using ring buffer as well so in
>> practical sense the results are pretty close.
>>
>> > b. Considering sometimes people want VACUUM to run when system is not
>> busy,
>> > the chances of generating more overall I/O in system can be
>> > more.
>>
>> It's hard to imagine overall i/o load increasing. However, longer
>> vacuum times should be considered. A bigger issue is that we are
>> missing an early opportunity to set the all visible bit. vacuum is
>> doing:
>>
>> if (all_visible)
>> {
>> TransactionId xmin;
>>
>> if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
>> {
>> all_visible = false;
>> break;
>> }
>>
>> assuming the cache is working and vacuum rolls around after a scan,
>> you lost the opportunity to set all_visible flag where previously it
>> would have been set, thereby dismantling the positive effect of an
>> index only scan. AFAICT, this is the only case where vaccum is
>> directly interfacing with hint bits. This could be construed as a
>> violation of heapam API? Maybe if that's an issue we could proxy that
>> check to a heaptuple/tqual.c maintained function (in the same manner
>> as HeapTupleSetHintBits) so that the cache bit could be uniformly
>> checked.
>
> I think we need to think of some tests to check if Vacuum or any other
> impact has not been created due to this change.
> I will devise tests during review of this patch.
> However if you have more ideas then share the same which will make tests of
> this patch more strong.
> For functional/performance test of this patch, one of my colleague Hari Babu
> will also work along with me on it.

Thanks. So far, Atri ran some quick n dirty tests to see if there
were any regressions. He benched a large scan followed by vacuum. So
far, results are inconclusive so better testing methodologies will
definitely be greatly appreciated. One of the challenges with working
in this part of the code is that it's quite difficult to make changes
without impacting at least some workloads.

merlin

Thanks a ton Amit and your colleague Hari for volunteering to review the patch.

I ran some benchmarks and came up with the following results:

With our code

atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1 -n -f bench2.sql -c 8 -T 300
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 300 s
number of transactions actually processed: 412
tps = 1.366142 (including connections establishing)
tps = 1.366227 (excluding connections establishing)

Without our code

atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1 -n -f bench2.sql -c 8 -T 300
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 300 s
number of transactions actually processed: 378
tps = 1.244333 (including connections establishing)
tps = 1.244447 (excluding connections establishing)

The SQL file is attached.

Please let us know if you need any more details.

Atri
--

Regards,

Atri

l'apprenant

Attachment

bench2.sql

Re: WIP patch for hint bit i/o mitigation

From

Greg Smith

Date:

21 November 2012, 21:30:19

On 11/16/12 9:03 AM, Merlin Moncure wrote:
> Atri ran some quick n dirty tests to see if there
> were any regressions.  He benched a large scan followed by vacuum.  So
> far, results are inconclusive so better testing methodologies will
> definitely be greatly appreciated.  One of the challenges with working
> in this part of the code is that it's quite difficult to make changes
> without impacting at least some workloads.

I'm adding something to pgbench-tools to test for some types of vacuum 
regressions that I've seen before.  And the checksum benchmarking I've 
already signed up for overlaps with this one quite a bit.  I'd suggest 
reviewers here focus on code quality, correctness, and targeted 
optimization testing.  I'm working heavily on a larger benchmarking 
framework that both this and checksums will fit into right now.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Re: WIP patch for hint bit i/o mitigation

From

Alvaro Herrera

Date:

28 November 2012, 21:07:57

Merlin Moncure escribió:

> Maybe abstracting 'last xid cache'  along with hint bit management out
> of both transam.c and tqual.c into something like 'hints.c' is
> appropriate, but that's a more invasive change.

It would be good to have such a patch to measure/compare performance of
both approaches.  It does seem like the more invasive change might be
more expensive for both the transam.c and the tqual.c uses, though; and
it's not clear that there's anything to be gained by having them be
together in a single module.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: WIP patch for hint bit i/o mitigation

From

Amit Kapila

Date:

06 December 2012, 10:00:54

On Thursday, November 22, 2012 3:00 AM Greg Smith wrote:
> On 11/16/12 9:03 AM, Merlin Moncure wrote:
> > Atri ran some quick n dirty tests to see if there
> > were any regressions.  He benched a large scan followed by vacuum.  So
> > far, results are inconclusive so better testing methodologies will
> > definitely be greatly appreciated.  One of the challenges with working
> > in this part of the code is that it's quite difficult to make changes
> > without impacting at least some workloads.
> 
> I'm adding something to pgbench-tools to test for some types of vacuum
> regressions that I've seen before.  And the checksum benchmarking I've
> already signed up for overlaps with this one quite a bit.  I'd suggest
> reviewers here focus on code quality, correctness, and targeted
> optimization testing.  I'm working heavily on a larger benchmarking
> framework that both this and checksums will fit into right now.

We are planning below performance tests for hint-bit I/O mitigation patch:

Test case -1: Select performance in sequential scan and vacuum operation
with I/O statistics   Bulk operations are happened in multiple transactions.    1. Stop the auto-vacuum.    2. Create
table   3. Insert 10000 records in one transaction for 1000 times.    4A. Use pgbench to select all the records using
sequentialscan for 5

min  - 8 threads.    4B. Record the IO statistics.    5. After completion of test-case check VACUUM performance. 

Test case -2:        Select performance in index scan and vacuum operation with I/O
statistics        Same as testcase - 1 change the 4A as follows            4A. Use pgbench with -F option to select
randomrecords using

index scan for 5 min  - 8 threads.         

Test case -3:    Select performance in sequential scan and vacuum operation with I/O
statistics    When bulk operations are happened in one transaction.    1. Stop the auto-vacuum.    2. Create table
3.Insert 10,000,000 times.    4A. Use pgbench to select all the records using sequential scan for 5

min  - 8 threads.    4B. Record the IO statistics.    5. After completion of test-case check VACUUM performance. 

Test case -4:        Select performance in index scan and vacuum operation with I/O
statistics        Same as testcase - 3 change the 4A as follows            4A. Use pgbench to select random records
usingindex scan for 5

min  - 8 threads.         

Test case -5:    Check original pgbench performance & vacuum operation    1. For  select only and tcp_b  performance
suiteswith scale factor of

75 & 150, threads 8 &16

Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty
only for setting the hit bits. )               1. Session - 1 : Open a some long transaction
               2. Session - 2 :         Insert some records & commit                                        Do the
select- all the tuples.                                        Checkpoint;
Vacuumthe table                                        Checkpoint;                3. Record the IO statistics & time
takenfor Vacuum & 2nd

Checkpoint.

Test case -7: (This is also to check Vacuum I/O)               1. Have replication setup.                2. Insert some
records& commit                3. Vacuum the table                4. Upgrade the standby.                5. Select the
allthe tuples on new master                6. Vacuum the tbl on new master.                6B. Record the IO statistics
&time taken for  Vacuum on new

master.

Suggestions/Feedback

With Regards,
Amit Kapila.

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

06 December 2012, 15:22:14

On Thu, Dec 6, 2012 at 3:59 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Thursday, November 22, 2012 3:00 AM Greg Smith wrote:
>> On 11/16/12 9:03 AM, Merlin Moncure wrote:
>> > Atri ran some quick n dirty tests to see if there
>> > were any regressions.  He benched a large scan followed by vacuum.  So
>> > far, results are inconclusive so better testing methodologies will
>> > definitely be greatly appreciated.  One of the challenges with working
>> > in this part of the code is that it's quite difficult to make changes
>> > without impacting at least some workloads.
>>
>> I'm adding something to pgbench-tools to test for some types of vacuum
>> regressions that I've seen before.  And the checksum benchmarking I've
>> already signed up for overlaps with this one quite a bit.  I'd suggest
>> reviewers here focus on code quality, correctness, and targeted
>> optimization testing.  I'm working heavily on a larger benchmarking
>> framework that both this and checksums will fit into right now.
>
> We are planning below performance tests for hint-bit I/O mitigation patch:
>
> Test case -1: Select performance in sequential scan and vacuum operation
> with I/O statistics
>     Bulk operations are happened in multiple transactions.
>     1. Stop the auto-vacuum.
>     2. Create table
>     3. Insert 10000 records in one transaction for 1000 times.
>     4A. Use pgbench to select all the records using sequential scan for 5
> min  - 8 threads.
>     4B. Record the IO statistics.
>     5. After completion of test-case check VACUUM performance.
>
> Test case -2:
>         Select performance in index scan and vacuum operation with I/O
> statistics
>         Same as testcase - 1 change the 4A as follows
>             4A. Use pgbench with -F option to select random records using
> index scan for 5 min  - 8 threads.
>
> Test case -3:
>     Select performance in sequential scan and vacuum operation with I/O
> statistics
>     When bulk operations are happened in one transaction.
>     1. Stop the auto-vacuum.
>     2. Create table
>     3. Insert 10,000,000 times.
>     4A. Use pgbench to select all the records using sequential scan for 5
> min  - 8 threads.
>     4B. Record the IO statistics.
>     5. After completion of test-case check VACUUM performance.
>
> Test case -4:
>         Select performance in index scan and vacuum operation with I/O
> statistics
>         Same as testcase - 3 change the 4A as follows
>             4A. Use pgbench to select random records using index scan for 5
> min  - 8 threads.
>
> Test case -5:
>     Check original pgbench performance & vacuum operation
>     1. For  select only and tcp_b  performance suites with scale factor of
> 75 & 150, threads 8 &16
>
> Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty
> only for setting the hit bits. )
>                 1. Session - 1 : Open a some long transaction
>
>                 2. Session - 2 :         Insert some records & commit
>                                         Do the select - all the tuples.
>                                         Checkpoint;
>                                         Vacuum the table
>                                         Checkpoint;
>                 3. Record the IO statistics & time taken for Vacuum & 2nd
> Checkpoint.
>
> Test case -7: (This is also to check Vacuum I/O)
>                 1. Have replication setup.
>                 2. Insert some records & commit
>                 3. Vacuum the table
>                 4. Upgrade the standby.
>                 5. Select the all the tuples on new master
>                 6. Vacuum the tbl on new master.
>                 6B. Record the IO statistics & time taken for  Vacuum on new
> master.

Thanks for that -- that's fairly comprehensive I'd say.  I'm quite
interested in that benchmarking framework as well.  Do you need help
setting up the scripts?

merlin

Re: WIP patch for hint bit i/o mitigation

From

Hari Babu

Date:

07 December 2012, 14:26:41

On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>Thanks for that -- that's fairly comprehensive I'd say.  I'm quite
>interested in that benchmarking framework as well.  Do you need help
>setting up the scripts?

Presently I am testing with pgbench custom query option & taking IO & VM
statistics in parallel. 

Following way I had written the test script for case -1.
./psql -f bench_test_1_init.sql postgres 
iostat -t 1 -d > hint.test1.iostat.reading_3.txt & 
vmstat -n 1 > hint.test1.vmstat.reading_3.txt & 
./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres 
killall -s SIGINT iostat 
killall -s SIGINT vmstat

Where the sql files are as follows:

-- bench_test_1.sql
select count(*) from  bench where f1 is not null;

--bench_test_1_init.sql

drop table if exists bench;
create table bench(f0 int primary key, f1 char(50));
insert into bench values (generate_series(1, 100000), 'a');
insert into bench values (generate_series(100001, 200000), 'a');
...
insert into bench values (generate_series(9800001, 9900000), 'a');
insert into bench values (generate_series(9900001, 10000000), 'a');
checkpoint;

I will provide the test results later.

Any suggestions/comments?

Regards,
Hari babu.

Re: WIP patch for hint bit i/o mitigation

From

Hari Babu

Date:

13 December 2012, 13:07:39

<div class="WordSection1"><p class="MsoPlainTextCxSpFirst"><span style="font-size:8.0pt;font-family:"Courier New"">On
Thu,Dec 7, 2012 at 7:56 PM, Hari babu <haribabu(dot)kommi(at)Huawei(dot)com> wrote:</span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>>On Thu, Dec 6, 2012 at
8:52PM, Merlin Moncure <<a href="mailto:mmoncure@gmail.com">mmoncure@gmail.com</a>> wrote:</span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>>Thanks for that -- that's
fairlycomprehensive I'd say.  I'm quite </span><p class="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew""> </span><p class="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew"">>>interested in that benchmarking framework as well.  Do you
needhelp </span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"Courier
New"">>>settingup the scripts?</span><p class="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew""> </span><p class="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew"">>Presently I am testing with pgbench custom query option &
takingIO & VM statistics in parallel. </span><p class="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew"">>Following way I had written the test script for case -1.</span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New""> </span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>./psql -f
bench_test_1_init.sqlpostgres iostat -t 1 -d > hint.test1.iostat.reading_3.txt & vmstat -n 1 >
>hint.test1.vmstat.reading_3.txt& ./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres killall -s SIGINT
iostat>killall -s SIGINT vmstat</span><p class="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew""> </span><p class="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew"">>Where the sql files are as follows:</span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>-- bench_test_1.sql</span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>select count(*) from  bench
wheref1 is not null;</span><p class="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"Courier
New""> </span><pclass="MsoPlainTextCxSpMiddle"><span style="font-size:8.0pt;font-family:"Courier
New"">>--bench_test_1_init.sql</span><pclass="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew"">>drop table if exists bench;</span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>create table bench(f0 int
primarykey, f1 char(50)); insert into bench values (generate_series(1, 100000), 'a'); insert >into bench values
(generate_series(100001,200000), 'a'); ...</span><p class="MsoPlainTextCxSpMiddle"><span
style="font-size:8.0pt;font-family:"CourierNew"">>insert into bench values (generate_series(9800001, 9900000), 'a');
insertinto bench values (generate_series(9900001, >10000000), 'a'); checkpoint;</span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New""> </span><p
class="MsoPlainTextCxSpMiddle"><spanstyle="font-size:8.0pt;font-family:"Courier New"">>I will provide the test
resultslater.</span><p class="MsoPlainTextCxSpLast"><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""> </span><pclass="MsoNormalCxSpFirst"><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Pleasefind the review of the patch.</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:9.0pt;font-family:"Arial","sans-serif""><br/></span><b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Basicstuff:</span></b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">------------</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">-Patch applies with offsets. </span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">-Compiles cleanly with no warnings</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">-Regression Test pass.</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">CodeReview:</span></b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">-------------</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       1. Better to set the hint bits for the tuples in a
page,if the page is already dirty. </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br /><br
/></span><b><spanstyle="font-size:8.0pt;font-family:"Arial","sans-serif"">Test cases:</span></b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">----------</span></b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       </span></b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Testcases are already described in the following
link.</span><spanstyle="font-size:8.0pt;font-family:"Arial","sans-serif""><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       </span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><a
href="http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kapila@huawei.com"><span
style="color:blue">http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kapila@huawei.com</span></a><br
/><br/></span><b><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">Test Results:</span></b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">------------</span></b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Machinedetails:</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       CPU cores : 4</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       RAM       : 24GB</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       OS        : Suse Linux 10 SP2</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""></span><pclass="MsoNormalCxSpMiddle"><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Configuration:</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">   shared_buffers = 500MB</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">   autovacuum = off</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">   checkpoint_segments = 256</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">   checkpoint_timeout = 10min</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Followingresult are average of 3 runs each run is of 5min
throughpgbench custom query.</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br /><br
/></span><u><spanstyle="font-size:8.0pt;font-family:"Arial","sans-serif"">Test case - 1</span></u><u><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">:</span></u><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       Init:</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               None</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       Run:</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               create temp table atri1 as select v from
generate_series(1,1000000)v;</span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               select * from atri1;</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               VACUUM atri1;</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               DROP TABLE atri1;</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><u><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Testcase - 2</span></u><u><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">:</span></u><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">      Init: </span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               create table atri1 as select v from
generate_series(1,1000000)v;  </span><span style="font-size:8.0pt;font-family:"Arial","sans-serif""><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       Run:</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               select * from atri1;</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><u><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Testcase - 3</span></u><u><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">:(without pgbench)</span></u><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">       connect two sessions s1, s2</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               s1 : start transaction;</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">               s2 : </span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">                       </span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">createtable atri1 as select v from generate_series(1,1000000)
v;</span><spanstyle="font-size:8.0pt;font-family:"Arial","sans-serif""><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">                       \timing</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">                       select * from atri1;</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">                       VACUUM atri1;</span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif""><br/><br /></span><b><u><span
style="font-size:8.0pt;font-family:"CourierNew"">Results:</span></u></b><span style="font-size:8.0pt;font-family:"Times
NewRoman","serif""><br /></span><span style="font-size:8.0pt;font-family:"Courier New"">                  9.3devel(Tps)
  HintbitIO(Tps)         </span><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">---------------------------------------------------</span><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">Test case - 1:     1.762946        1.922219         </span><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">Test case - 2:     4.038673        4.044356         </span><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /><br /></span><u><span
style="font-size:8.0pt;font-family:"CourierNew"">Pgbench without vacuum scale factor 75, 8 threads & 8 client.
(referreports attached)</span></u><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br
/></span><spanstyle="font-size:8.0pt;font-family:"Courier New"">Default tables select :      64980.736149  
 64550.118693    </span><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">Unlogged tables select:      64874.974334    64550.118693  
 </span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /></span><u><span
style="font-size:8.0pt;font-family:"CourierNew"">Multiple transaction bulk inserts: Select performance (refer script -1
&2 which attached)</span></u><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">sequential scan:      6.492680        6.382014     </span><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">Index scan:           1.386851        1.36234              
</span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /></span><u><span
style="font-size:8.0pt;font-family:"CourierNew"">Single transaction bulk inserts: Select performance  (refer script - 3
&4 which attached)</span></u><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">sequential scan:      6.49319         6.3800147              
 </span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">Index scan:           1.384121        1.3615277                      
 </span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /></span><u><span
style="font-size:8.0pt;font-family:"CourierNew"">Long transaction open then Vacuum & select performance in milli
seconds.(refer reports output)</span></u><span style="font-size:8.0pt;font-family:"Times New Roman","serif""><br
/></span><spanstyle="font-size:8.0pt;font-family:"Courier New"">Testcase - 3:</span><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">Single Vacuum Perf   : 128.302 ms      181.292 ms          
</span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">Single select perf   : 214.107 ms      177.057 ms  </span><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"CourierNew"">Total                : 342.409 ms      358.349 ms</span><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Iwas not able to find the reason why in some of cases results
arelow so please use the attached scripts to validate the same.</span><span style="font-size:8.0pt;font-family:"Times
NewRoman","serif""><br /><br /></span><span style="font-size:8.0pt;font-family:"Arial","sans-serif"">I was not able to
providethe IO statistics as IOSTAT & VMSTAT was giving the current snapshot</span><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">butnot the cumulative from start to end of test
execution.</span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /></span><b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Documentation:</span></b><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><b><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">-------------</span></b><span
style="font-size:8.0pt;font-family:"TimesNew Roman","serif""><br /></span><span
style="font-size:8.0pt;font-family:"Arial","sans-serif"">Nouser visible changes, so no documentation is need to
update.</span><spanstyle="font-size:8.0pt;font-family:"Times New Roman","serif""><br /><br /><br /></span><span
style="font-size:8.0pt"></span><pclass="MsoNormalCxSpLast"><span style="font-size:8.0pt">Regards,</span><p
class="MsoPlainText"><spanstyle="font-size:8.0pt">Hari babu.</span></div>

Re: WIP patch for hint bit i/o mitigation

From

Atri Sharma

Date:

13 December 2012, 13:21:49

On Thu, Dec 13, 2012 at 6:36 PM, Hari Babu <haribabu.kommi@huawei.com> wrote:

On Thu, Dec 7, 2012 at 7:56 PM, Hari babu <haribabu(dot)kommi(at)Huawei(dot)com> wrote:
>>On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>Thanks for that -- that's fairly comprehensive I'd say. I'm quite

>>interested in that benchmarking framework as well. Do you need help
>>setting up the scripts?

>Presently I am testing with pgbench custom query option & taking IO & VM statistics in parallel.
>Following way I had written the test script for case -1.

>./psql -f bench_test_1_init.sql postgres iostat -t 1 -d > hint.test1.iostat.reading_3.txt & vmstat -n 1 > >hint.test1.vmstat.reading_3.txt & ./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres killall -s SIGINT iostat >killall -s SIGINT vmstat

>Where the sql files are as follows:
>-- bench_test_1.sql
>select count(*) from bench where f1 is not null;

>--bench_test_1_init.sql
>drop table if exists bench;
>create table bench(f0 int primary key, f1 char(50)); insert into bench values (generate_series(1, 100000), 'a'); insert >into bench values (generate_series(100001, 200000), 'a'); ...
>insert into bench values (generate_series(9800001, 9900000), 'a'); insert into bench values (generate_series(9900001, >10000000), 'a'); checkpoint;

>I will provide the test results later.

Please find the review of the patch.

Basic stuff:
------------
- Patch applies with offsets.
- Compiles cleanly with no warnings
- Regression Test pass.

Code Review:
-------------
1. Better to set the hint bits for the tuples in a page, if the page is already dirty.

Test cases:
----------
Test cases are already described in the following link.
http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kapila@huawei.com

Test Results:
------------
Machine details:
CPU cores : 4
RAM : 24GB
OS : Suse Linux 10 SP2

Configuration:
shared_buffers = 500MB
autovacuum = off
checkpoint_segments = 256
checkpoint_timeout = 10min

Following result are average of 3 runs each run is of 5min through pgbench custom query.

Test case - 1:
Init:
None
Run:
create temp table atri1 as select v from generate_series(1,1000000) v;
select * from atri1;
VACUUM atri1;
DROP TABLE atri1;

Test case - 2:
Init:
create table atri1 as select v from generate_series(1,1000000) v;
Run:
select * from atri1;

Test case - 3: (without pgbench)
connect two sessions s1, s2
s1 : start transaction;
s2 :
create table atri1 as select v from generate_series(1,1000000) v;
\timing
select * from atri1;
VACUUM atri1;

Results:
9.3devel(Tps) HintbitIO(Tps)
---------------------------------------------------
Test case - 1: 1.762946 1.922219
Test case - 2: 4.038673 4.044356

Pgbench without vacuum scale factor 75, 8 threads & 8 client. (refer reports attached)
Default tables select : 64980.736149 64550.118693
Unlogged tables select: 64874.974334 64550.118693

Multiple transaction bulk inserts: Select performance (refer script -1 & 2 which attached)
sequential scan: 6.492680 6.382014
Index scan: 1.386851 1.36234

Single transaction bulk inserts: Select performance (refer script - 3 & 4 which attached)
sequential scan: 6.49319 6.3800147
Index scan: 1.384121 1.3615277

Long transaction open then Vacuum & select performance in milli seconds. (refer reports output)
Testcase - 3:
Single Vacuum Perf : 128.302 ms 181.292 ms
Single select perf : 214.107 ms 177.057 ms
Total : 342.409 ms 358.349 ms

I was not able to find the reason why in some of cases results are low so please use the attached scripts to validate the same.

I was not able to provide the IO statistics as IOSTAT & VMSTAT was giving the current snapshot
but not the cumulative from start to end of test execution.

Documentation:
-------------
No user visible changes, so no documentation is need to update.

Regards,
Hari babu.

Thanks for the review and tests.

The remarkable difference between 9.3devel and Hint Bit IO is present only in test -3,right? I have the feeling that the original case we discussed(vacuum setting the hint bits) is taking place here and hence the decrease in performance.

Atri

--

Regards,

Atri

l'apprenant

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

13 December 2012, 14:32:27

On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu <haribabu.kommi@huawei.com> wrote:
> Please find the review of the patch.

Thanks for detailed review!

> Basic stuff:
> ------------
> - Patch applies with offsets.
> - Compiles cleanly with no warnings
> - Regression Test pass.
>
> Code Review:
> -------------
>         1. Better to set the hint bits for the tuples in a page, if the page
> is already dirty.

This is true today but likely less true if/when page checksums come
out.  Also it complicates the code a little bit.

> Default tables select :      64980.736149    64550.118693
> Unlogged tables select:      64874.974334    64550.118693

So it looks like the extra tests visibility routines are causing 0.7%
performance hit.

> Multiple transaction bulk inserts: Select performance (refer script -1 & 2
> which attached)
> sequential scan:      6.492680        6.382014
> Index scan:           1.386851        1.36234
>
> Single transaction bulk inserts: Select performance  (refer script - 3 & 4
> which attached)
> sequential scan:      6.49319         6.3800147
> Index scan:           1.384121        1.3615277

The performance hit is higher  here.  Almost 2%.   This is troubling.

> Long transaction open then Vacuum & select performance in milli seconds.
> (refer reports output)
> Testcase - 3:
> Single Vacuum Perf   : 128.302 ms      181.292 ms
> Single select perf   : 214.107 ms      177.057 ms
> Total                : 342.409 ms      358.349 ms
>
> I was not able to find the reason why in some of cases results are low so
> please use the attached scripts to validate the same.

I need to validate the vacuum results. It's possible that this is
solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
the question stands: do the results justify the change?  I'd argue
'maybe' -- I'd like to see the bulk insert performance hit reduced if
possible.  Let's see what we can do in the short term (and, if no
improvement can be found, I think this patch should be marked
'returned with feedback').

merlin

Re: WIP patch for hint bit i/o mitigation

From

Amit Kapila

Date:

14 December 2012, 13:58:21

On Thursday, December 13, 2012 8:02 PM Merlin Moncure wrote:
> On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu <haribabu.kommi@huawei.com>
> wrote:
> > Please find the review of the patch.
> 
> 
> Thanks for detailed review!
> 
> > Basic stuff:
> > ------------
> > - Patch applies with offsets.
> > - Compiles cleanly with no warnings
> > - Regression Test pass.
> >
> > Code Review:
> > -------------
> >         1. Better to set the hint bits for the tuples in a page, if
> the page
> > is already dirty.
> 
> This is true today but likely less true if/when page checksums come
> out.  Also it complicates the code a little bit.
> 
> > Default tables select :      64980.736149    64550.118693
> > Unlogged tables select:      64874.974334    64550.118693
> 
> So it looks like the extra tests visibility routines are causing 0.7%
> performance hit.
> 
> > Multiple transaction bulk inserts: Select performance (refer script -1
> & 2
> > which attached)
> > sequential scan:      6.492680        6.382014
> > Index scan:           1.386851        1.36234
> >
> > Single transaction bulk inserts: Select performance  (refer script - 3
> & 4
> > which attached)
> > sequential scan:      6.49319         6.3800147
> > Index scan:           1.384121        1.3615277
> 
> The performance hit is higher  here.  Almost 2%.   This is troubling.
> 
> > Long transaction open then Vacuum & select performance in milli
> seconds.
> > (refer reports output)
> > Testcase - 3:
> > Single Vacuum Perf   : 128.302 ms      181.292 ms
> > Single select perf   : 214.107 ms      177.057 ms
> > Total                : 342.409 ms      358.349 ms
> >
> > I was not able to find the reason why in some of cases results are low
> so
> > please use the attached scripts to validate the same.
> 
> I need to validate the vacuum results. It's possible that this is
> solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
> the question stands: do the results justify the change?  I'd argue
> 'maybe' 

We can try with change (assuming change is small) and see if the performance
gain is good, then discuss whether it really justifies.
I think the main reason for Vacuum performance hit is that in the test pages
are getting dirty only due to setting of hint bit
by Vacuum. 

>-- I'd like to see the bulk insert performance hit reduced if
> possible.

I think if we can improve performance for bulk-insert case, then this patch
has much more value. 

With Regards,
Amit Kapila.

Re: WIP patch for hint bit i/o mitigation

From

Craig Ringer

Date:

18 January 2013, 11:35:19

On 12/14/2012 09:57 PM, Amit Kapila wrote:
>>
>> I need to validate the vacuum results. It's possible that this is
>> solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
>> the question stands: do the results justify the change?  I'd argue
>> 'maybe' 
> We can try with change (assuming change is small) and see if the performance
> gain is good, then discuss whether it really justifies.
> I think the main reason for Vacuum performance hit is that in the test pages
> are getting dirty only due to setting of hint bit
> by Vacuum. 
>
>> -- I'd like to see the bulk insert performance hit reduced if
>> possible.
> I think if we can improve performance for bulk-insert case, then this patch
> has much more value. 
Has there been any movement in this - more benchmarks and data showing
that it really does improve performance, or that it really isn't helpful?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

18 January 2013, 14:16:11

On Fri, Jan 18, 2013 at 5:34 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 12/14/2012 09:57 PM, Amit Kapila wrote:
>>>
>>> I need to validate the vacuum results. It's possible that this is
>>> solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
>>> the question stands: do the results justify the change?  I'd argue
>>> 'maybe'
>> We can try with change (assuming change is small) and see if the performance
>> gain is good, then discuss whether it really justifies.
>> I think the main reason for Vacuum performance hit is that in the test pages
>> are getting dirty only due to setting of hint bit
>> by Vacuum.
>>
>>> -- I'd like to see the bulk insert performance hit reduced if
>>> possible.
>> I think if we can improve performance for bulk-insert case, then this patch
>> has much more value.
> Has there been any movement in this - more benchmarks and data showing
> that it really does improve performance, or that it really isn't helpful?

Atri is working on that.  I don't know if he's going to pull it out
though in time so we may have to pull the patch from this fest.  My
take on the current patch is that the upside case is pretty clear but
the bulk insert performance impact needs to be figured out and
mitigated (that's what Atri is working on).

merlin

Re: WIP patch for hint bit i/o mitigation

From

Atri Sharma

Date:

18 January 2013, 14:57:20

On 18-Jan-2013, at 17:04, Craig Ringer <craig@2ndQuadrant.com> wrote:

> On 12/14/2012 09:57 PM, Amit Kapila wrote:
>>>
>>> I need to validate the vacuum results. It's possible that this is
>>> solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
>>> the question stands: do the results justify the change?  I'd argue
>>> 'maybe'
>> We can try with change (assuming change is small) and see if the performance
>> gain is good, then discuss whether it really justifies.
>> I think the main reason for Vacuum performance hit is that in the test pages
>> are getting dirty only due to setting of hint bit
>> by Vacuum.
>>
>>> -- I'd like to see the bulk insert performance hit reduced if
>>> possible.
>> I think if we can improve performance for bulk-insert case, then this patch
>> has much more value.
> Has there been any movement in this - more benchmarks and data showing
> that it really does improve performance, or that it really isn't helpful?
>
> --
> Craig Ringer                   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

Hello all,

Sorry for the delay in updating the hackers list with the current status.

I recently did some profiling using perf on PostgreSQL 9.2 with and without our patch.

I noticed that maximum time is being spent on heapgettup function. Further investigation and a bit of a hunch leads me
tobelieve that we may be adversely affecting the visibility map optimisation that directly interact with the visibility
functions,that our patch straight away affects. 

If this is the case, we may really need to get down to the design of our patch, and maybe see which visibility
function/functionswe are affecting, and see if we can mitigate the affect. 

Please let me know your inputs on this.

Regards,

Atri

>

Re: WIP patch for hint bit i/o mitigation

From

Merlin Moncure

Date:

18 January 2013, 15:36:12

On Fri, Jan 18, 2013 at 8:57 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
>
> Hello all,
>
> Sorry for the delay in updating the hackers list with the current status.
>
> I recently did some profiling using perf on PostgreSQL 9.2 with and without our patch.
>
> I noticed that maximum time is being spent on heapgettup function. Further investigation and a bit of a hunch leads
meto believe that we may be adversely affecting the visibility map optimisation that directly interact with the
visibilityfunctions, that our patch straight away affects. 
>
> If this is the case, we may really need to get down to the design of our patch, and maybe see which visibility
function/functionswe are affecting, and see if we can mitigate the affect. 
>
> Please let me know your inputs on this.

Any scenario that involves non-trivial amount of investigation or
development should result in us pulling the patch for rework and
resubmission in later 'fest....it's closing time as they say :-).

merlin

Re: WIP patch for hint bit i/o mitigation

From

Robert Haas

Date:

18 January 2013, 15:50:32

On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Any scenario that involves non-trivial amount of investigation or
> development should result in us pulling the patch for rework and
> resubmission in later 'fest....it's closing time as they say :-).

Amen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP patch for hint bit i/o mitigation

From

Craig Ringer

Date:

21 January 2013, 01:13:23

On 01/18/2013 11:50 PM, Robert Haas wrote:
> On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Any scenario that involves non-trivial amount of investigation or
>> development should result in us pulling the patch for rework and
>> resubmission in later 'fest....it's closing time as they say :-).
> Amen.
>
OK, bumped to the next CF.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services