Thread: Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Amit Kapila
Date:
On Thu, Dec 29, 2016 at 4:50 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Pursuant to my comments at
> https://www.postgresql.org/message-id/20161223192245.hx4rbrxbrwtgwj6i@alvherre.pgsql
> and because of a stupid bug I found in my indirect indexes patch, I
> rewrote (read: removed) HeapSatisfiesHOTAndKey.  The replacement
> function HeapDetermineModifiedColumns returns a bitmapset with a bit set
> for each modified column, for those columns that are listed as
> "interesting" -- currently that set is the ID columns, the "key"
> columns, and the indexed columns.  The new code is much simpler, at the
> expense of a few bytes of additional memory used during heap_update().
>
> All tests pass.
>
> Both WARM and indirect indexes should be able to use this infrastructure
> in a way that better serves them than the current HeapSatisfiesHOTAndKey
> (both patches modify that function in a rather ugly way).  I intend to
> get this pushed shortly unless objections are raised.
>
> The new coding prevents stopping the check early as soon as the three
> result booleans could be determined.
>

I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table).  The reason for such a suspicion
is that heap_getattr() is not so cheap that doing it multiple times is
free.  Do you think it is worth to do few tests before committing the
patch?

Noticed below comment in interesting-attrs-2.patch
+ * are considered the "key" of rows in the table, and columns that are
+ * part of indirect indexes.

Is it right to mention about indirect indexes in above comment
considering indirect indexes are still not part of core code?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Pavan Deolasee
Date:


On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table).  


I don't see that as a problem because the routine only checks for columns that are passed as "interesting_cols".

Noticed below comment in interesting-attrs-2.patch
+ * are considered the "key" of rows in the table, and columns that are
+ * part of indirect indexes.

Is it right to mention about indirect indexes in above comment
considering indirect indexes are still not part of core code? 

I agree. We can add details about indirect indexes or WARM later, as and when those patches get committed.
 
Pavan, please rebase your WARM patch on top of this and let me know how
you like it.  I'll post a new version of indirect indexes later this
week.


I've rebased WARM on top of this patch and the proposed changes look fine from WARM's perspective too. I'll send rebased patches separately. 

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Amit Kapila
Date:
On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>
>> I think there is some chance that such a change could induce
>> regression for the cases when there are many index columns or I think
>> even when index is on multiple columns (consider index is on first and
>> eight column in a ten column table).
>>
>
> I don't see that as a problem because the routine only checks for columns
> that are passed as "interesting_cols".
>

Right, but now it will evaluate for all interesting_cols whereas
previously it would just stop at first if that column is changed.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Pavan Deolasee
Date:


On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>
>> I think there is some chance that such a change could induce
>> regression for the cases when there are many index columns or I think
>> even when index is on multiple columns (consider index is on first and
>> eight column in a ten column table).
>>
>
> I don't see that as a problem because the routine only checks for columns
> that are passed as "interesting_cols".
>

Right, but now it will evaluate for all interesting_cols whereas
previously it would just stop at first if that column is changed.


Ah ok. I read your comment "consider index is on first and
eight column in a ten column table" as s/eight/eighth. But may be you're referring to
the case where there is an index on eight or nine columns of a ten column table.

You're right. That's an additional cost as Alvaro himself explained in the original
post. But both indirect indexes and WARM needs to know information about all
modified columns. So assuming either of these patches are going to make it,
we've to bear that cost. Having said that, given that attributes are usually cached,
the cost may not be significant  since for non-HOT updates, the
attributes will be later fetched anyways while preparing index tuples. So we're
probably doing that work in advance.

Obviously, I'm not against doing additional performance tests to ensure that the
cost is not significant, especially if neither indirect indexes nor WARM gets
committed in 10.0

Thanks,
Pavan
--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Amit Kapila
Date:
On Mon, Jan 2, 2017 at 10:59 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
> On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>>
>> On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
>> wrote:
>> >
>> >
>> > On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com>
>> > wrote:
>> >>
>> >>
>> >> I think there is some chance that such a change could induce
>> >> regression for the cases when there are many index columns or I think
>> >> even when index is on multiple columns (consider index is on first and
>> >> eight column in a ten column table).
>> >>
>> >
>> > I don't see that as a problem because the routine only checks for
>> > columns
>> > that are passed as "interesting_cols".
>> >
>>
>> Right, but now it will evaluate for all interesting_cols whereas
>> previously it would just stop at first if that column is changed.
>>
>
> Ah ok. I read your comment "consider index is on first and
> eight column in a ten column table" as s/eight/eighth. But may be you're
> referring to
> the case where there is an index on eight or nine columns of a ten column
> table.
>

I am talking about both kinds of cases.  The scenario where we can see
some performance impact is when there is variable-width column before
the index column (in above context before the eighth column) as there
will be cached offset optimization won't work for such a column.

> You're right. That's an additional cost as Alvaro himself explained in the
> original
> post. But both indirect indexes and WARM needs to know information about all
> modified columns. So assuming either of these patches are going to make it,
> we've to bear that cost.
>

Okay, but I think if we know how much is the additional cost in
average and worst case, then we can take a better call.  Also, if we
agree, then doing an update-intensive test on a unlogged table or with
asynchronous commit mode can show us the overhead if there is any.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Robert Haas
Date:
On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Okay, but I think if we know how much is the additional cost in
> average and worst case, then we can take a better call.

Yeah.  We shouldn't just rip out optimizations that are inconvenient
without doing some test of what the impact is on the cases where those
optimizations are likely to matter.  I don't think it needs to be
anything incredibly laborious and if there's no discernable impact,
great.  But it doesn't make sense to change things for the benefit of
WARM and indirect indexes and then discover later that we regressed
other cases we care about and have no plan to fix it.  Better to find
out any potential problems of that kind now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Pavan Deolasee
Date:


On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Okay, but I think if we know how much is the additional cost in
> > average and worst case, then we can take a better call.
>
> Yeah.  We shouldn't just rip out optimizations that are inconvenient
> without doing some test of what the impact is on the cases where those
> optimizations are likely to matter.  I don't think it needs to be
> anything incredibly laborious and if there's no discernable impact,
> great.  


So I performed some tests to measure if this causes any noticeable regression. I used the following simple schema:

DROP TABLE IF EXISTS testtab;
CREATE UNLOGGED TABLE testtab (
    col1 integer,
    col2 text,
    col3 float,
    col4 text,
    col5 text,
    col6 char(30),
    col7 text,
    col8 date,
    col9 text,
    col10 text
);
INSERT INTO testtab
    SELECT generate_series(1,100000),
        md5(random()::text),
        random(),
        md5(random()::text),
        md5(random()::text),
        md5(random()::text)::char(30),
        md5(random()::text),
        now(),
        md5(random()::text),
        md5(random()::text);
CREATE INDEX testindx ON testtab (col1, col2, col3, col4, col5, col6, col7, col8, col9);

I used a rather wide UNLOGGED table with an index on first 9 columns, as suggested by Amit. Also, the table has reasonable number of rows, but not more than what shared buffers (set to 512MB for these tests) can hold. This should make the test mostly CPU bound.

A transaction then updates the second column in the table. So the refactored patch will do heap_getattr() on more columns that the master while checking if HOT update is possible and before giving up. I believe we are probably testing a somewhat worst case with this setup, though may be I could have tuned some other configuration parameters.

\set value random(1, 100000)
UPDATE testtab SET col2 = md5(random()::text) WHERE col1 = :value;

I tested with -c1 and -c8 -j4 and the results are:

1-client
              Master                     Refactored
Run1 8774.089935 8979.068604
Run2 8509.2661 8943.613575
Run3 8879.484019 8950.994425


8-clients
              Master                     Refactored
Run1 22520.422448 22672.798871
Run2 21967.812303 22022.969747
Run3 22305.073223 21909.945623


So at best there is some improvement with the patch, though I don't see any reason why it should positively affect the performance. The results with more number of clients look almost identical, probably because the bottleneck shifts somewhere else. For all these tests, table was dropped and recreated in every iteration, so I don't think there was any error in testing. It might be a good idea for someone else to repeat the tests to confirm the improvement that I noticed.

Apart from this, I also ran "make check" multiple times and couldn't find any significant difference in the average time.

I will leave it to Alvaro's judgement to decide whether it's worth to commit the patch now or later when he or other committer looks at committing WARM/indirect indexes because without either of those patches this change probably does not bring up much value, if we ignore the slight improvement we see here.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Alvaro Herrera
Date:
Pavan Deolasee wrote:

> A transaction then updates the second column in the table. So the
> refactored patch will do heap_getattr() on more columns that the master
> while checking if HOT update is possible and before giving up.

Thanks.

> 1-client
>               Master                     Refactored
> Run1 8774.089935 8979.068604
> Run2 8509.2661 8943.613575
> Run3 8879.484019 8950.994425
> 
> 
> 8-clients
>               Master                     Refactored
> Run1 22520.422448 22672.798871
> Run2 21967.812303 22022.969747
> Run3 22305.073223 21909.945623

Wow, this is very surprising.  I was expecting a slight performance
decrease, not this.  I will try to reproduce these numbers.

One thing worth mentioning is that the current implementation is not
very good -- I just kept the attribute check loop as it was in the
original, which uses heap_getattr.  If we used incremental extraction
such as what we do in slot_getattr, it could be made a bit faster too.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Amit Kapila
Date:
On Wed, Jan 4, 2017 at 11:45 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>
> On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > Okay, but I think if we know how much is the additional cost in
>> > average and worst case, then we can take a better call.
>>
>> Yeah.  We shouldn't just rip out optimizations that are inconvenient
>> without doing some test of what the impact is on the cases where those
>> optimizations are likely to matter.  I don't think it needs to be
>> anything incredibly laborious and if there's no discernable impact,
>> great.
>
>
> So I performed some tests to measure if this causes any noticeable
> regression.
>

Your test and results look good, what kind of m/c you have used to
test this.  Let me see if I or one of my colleague can do this and
similar test on some high-end m/c.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Pavan Deolasee
Date:


On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Your test and results look good, what kind of m/c you have used to
test this.

I ran it on my Macbook Pro, so nothing fancy. The code was compiled with simple ./confgure and with no special flags. The only non-default setting was shared_buffers = 512MB to ensure the table/index fits in memory.
 
  Let me see if I or one of my colleague can do this and
similar test on some high-end m/c.


Sure. That'll be helpful given a slight unexpected result. May be it's just a noise or we may see different result on a high end m/c.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Mithun Cy
Date:
On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Your test and results look good, what kind of m/c you have used to
> test this.  Let me see if I or one of my colleague can do this and
> similar test on some high-end m/c.

As discussed with Amit, I have tried to run the same tests with below
modification,
1. Increased the total rows to 10milion.
2. Set fsync off;
3. Changed tests as below. Updated all rows at a time.

VACUUM FULL;
BEGIN;
UPDATE testtab SET col2 = md5(random()::text);
ROLLBACK;

I have run these tests on IBM power2 which have sufficient ram. I have
set shared_buffer=32GB.

My results show after this patch there is a slight increase in
response time (psql \timing was used) for the above update statement.
Which is around 5 to 10% delay.


Runs             Response time in ms for update base.         Response
Time in ms for update new patch.                          %INC

1                                     158863.501                         167443.767                     5.4010304104

2                                     151061.793                         168908.536                    11.8142004312

3                                     153750.577                         164071.994                    6.7130915548

4                                     153639.165                         168364.225                    9.5841838245

5                                     149607.139                         166498.44                     11.2904378179


Under the same condition running original tests, that is, updating
rows which satisfy a condition col1 = :value1. I did not see any
regression.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Mithun Cy
Date:
On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
Sorry Auto plain text setting has disturbed the table indentation.
Attaching the spreadsheet for same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

From
Amit Kapila
Date:
On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Your test and results look good, what kind of m/c you have used to
>> test this.  Let me see if I or one of my colleague can do this and
>> similar test on some high-end m/c.
>
> As discussed with Amit, I have tried to run the same tests with below
> modification,
> 1. Increased the total rows to 10milion.
> 2. Set fsync off;
> 3. Changed tests as below. Updated all rows at a time.
>
> VACUUM FULL;
> BEGIN;
> UPDATE testtab SET col2 = md5(random()::text);
> ROLLBACK;
>
> I have run these tests on IBM power2 which have sufficient ram. I have
> set shared_buffer=32GB.
>
> My results show after this patch there is a slight increase in
> response time (psql \timing was used) for the above update statement.
> Which is around 5 to 10% delay.
>

I would like to add here, that the intention of the test was to stress
the changes of the patch to see the overhead patch can incur.  Now,
surely this is a synthetic test prepared to test this patch, but I
think it indicates that the changes have some overhead which might or
might not be ignorable depending on how important is to get this
patch.  I think if Warm tuples or indirect indexes need this patch and
they can't do without this, then it is worth considering this patch
along with those patches.  OTOH, if we can reduce the overhead, then
it might be okay proceed with this patch on the basis of simplicity it
can bring.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com