Thread: Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey
On Thu, Dec 29, 2016 at 4:50 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Pursuant to my comments at > https://www.postgresql.org/message-id/20161223192245.hx4rbrxbrwtgwj6i@alvherre.pgsql > and because of a stupid bug I found in my indirect indexes patch, I > rewrote (read: removed) HeapSatisfiesHOTAndKey. The replacement > function HeapDetermineModifiedColumns returns a bitmapset with a bit set > for each modified column, for those columns that are listed as > "interesting" -- currently that set is the ID columns, the "key" > columns, and the indexed columns. The new code is much simpler, at the > expense of a few bytes of additional memory used during heap_update(). > > All tests pass. > > Both WARM and indirect indexes should be able to use this infrastructure > in a way that better serves them than the current HeapSatisfiesHOTAndKey > (both patches modify that function in a rather ugly way). I intend to > get this pushed shortly unless objections are raised. > > The new coding prevents stopping the check early as soon as the three > result booleans could be determined. > I think there is some chance that such a change could induce regression for the cases when there are many index columns or I think even when index is on multiple columns (consider index is on first and eight column in a ten column table). The reason for such a suspicion is that heap_getattr() is not so cheap that doing it multiple times is free. Do you think it is worth to do few tests before committing the patch? Noticed below comment in interesting-attrs-2.patch + * are considered the "key" of rows in the table, and columns that are + * part of indirect indexes. Is it right to mention about indirect indexes in above comment considering indirect indexes are still not part of core code? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I agree. We can add details about indirect indexes or WARM later, as and when those patches get committed.
I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table).
I don't see that as a problem because the routine only checks for columns that are passed as "interesting_cols".
Noticed below comment in interesting-attrs-2.patch
+ * are considered the "key" of rows in the table, and columns that are
+ * part of indirect indexes.
Is it right to mention about indirect indexes in above comment
considering indirect indexes are still not part of core code?
Pavan, please rebase your WARM patch on top of this and let me know how
you like it. I'll post a new version of indirect indexes later this
week.
I've rebased WARM on top of this patch and the proposed changes look fine from WARM's perspective too. I'll send rebased patches separately.
Thanks,
Pavan
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > > On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> >> I think there is some chance that such a change could induce >> regression for the cases when there are many index columns or I think >> even when index is on multiple columns (consider index is on first and >> eight column in a ten column table). >> > > I don't see that as a problem because the routine only checks for columns > that are passed as "interesting_cols". > Right, but now it will evaluate for all interesting_cols whereas previously it would just stop at first if that column is changed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>
>> I think there is some chance that such a change could induce
>> regression for the cases when there are many index columns or I think
>> even when index is on multiple columns (consider index is on first and
>> eight column in a ten column table).
>>
>
> I don't see that as a problem because the routine only checks for columns
> that are passed as "interesting_cols".
>
Right, but now it will evaluate for all interesting_cols whereas
previously it would just stop at first if that column is changed.
Ah ok. I read your comment "consider index is on first and
eight column in a ten column table" as s/eight/eighth. But may be you're referring tothe case where there is an index on eight or nine columns of a ten column table.
You're right. That's an additional cost as Alvaro himself explained in the original
post. But both indirect indexes and WARM needs to know information about all
modified columns. So assuming either of these patches are going to make it,
we've to bear that cost. Having said that, given that attributes are usually cached,
the cost may not be significant since for non-HOT updates, the
attributes will be later fetched anyways while preparing index tuples. So we're
probably doing that work in advance.
Obviously, I'm not against doing additional performance tests to ensure that the
Obviously, I'm not against doing additional performance tests to ensure that the
cost is not significant, especially if neither indirect indexes nor WARM gets
committed in 10.0
Thanks,
Pavan
-- Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 2, 2017 at 10:59 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila <amit.kapila16@gmail.com> > wrote: >> >> On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com> >> wrote: >> > >> > >> > On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> >> > wrote: >> >> >> >> >> >> I think there is some chance that such a change could induce >> >> regression for the cases when there are many index columns or I think >> >> even when index is on multiple columns (consider index is on first and >> >> eight column in a ten column table). >> >> >> > >> > I don't see that as a problem because the routine only checks for >> > columns >> > that are passed as "interesting_cols". >> > >> >> Right, but now it will evaluate for all interesting_cols whereas >> previously it would just stop at first if that column is changed. >> > > Ah ok. I read your comment "consider index is on first and > eight column in a ten column table" as s/eight/eighth. But may be you're > referring to > the case where there is an index on eight or nine columns of a ten column > table. > I am talking about both kinds of cases. The scenario where we can see some performance impact is when there is variable-width column before the index column (in above context before the eighth column) as there will be cached offset optimization won't work for such a column. > You're right. That's an additional cost as Alvaro himself explained in the > original > post. But both indirect indexes and WARM needs to know information about all > modified columns. So assuming either of these patches are going to make it, > we've to bear that cost. > Okay, but I think if we know how much is the additional cost in average and worst case, then we can take a better call. Also, if we agree, then doing an update-intensive test on a unlogged table or with asynchronous commit mode can show us the overhead if there is any. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Okay, but I think if we know how much is the additional cost in > average and worst case, then we can take a better call. Yeah. We shouldn't just rip out optimizations that are inconvenient without doing some test of what the impact is on the cases where those optimizations are likely to matter. I don't think it needs to be anything incredibly laborious and if there's no discernable impact, great. But it doesn't make sense to change things for the benefit of WARM and indirect indexes and then discover later that we regressed other cases we care about and have no plan to fix it. Better to find out any potential problems of that kind now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Okay, but I think if we know how much is the additional cost in
> > average and worst case, then we can take a better call.
>
> Yeah. We shouldn't just rip out optimizations that are inconvenient
> without doing some test of what the impact is on the cases where those
> optimizations are likely to matter. I don't think it needs to be
> anything incredibly laborious and if there's no discernable impact,
> great.
So I performed some tests to measure if this causes any noticeable regression. I used the following simple schema:
DROP TABLE IF EXISTS testtab;
CREATE UNLOGGED TABLE testtab (
col1 integer,
col2 text,
col3 float,
col4 text,
col5 text,
col6 char(30),
col7 text,
col8 date,
col9 text,
col10 text
);
INSERT INTO testtab
SELECT generate_series(1,100000),
md5(random()::text),
random(),
md5(random()::text),
md5(random()::text),
md5(random()::text)::char(30),
md5(random()::text),
now(),
md5(random()::text),
md5(random()::text);
CREATE INDEX testindx ON testtab (col1, col2, col3, col4, col5, col6, col7, col8, col9);
I used a rather wide UNLOGGED table with an index on first 9 columns, as suggested by Amit. Also, the table has reasonable number of rows, but not more than what shared buffers (set to 512MB for these tests) can hold. This should make the test mostly CPU bound.
A transaction then updates the second column in the table. So the refactored patch will do heap_getattr() on more columns that the master while checking if HOT update is possible and before giving up. I believe we are probably testing a somewhat worst case with this setup, though may be I could have tuned some other configuration parameters.
\set value random(1, 100000)
UPDATE testtab SET col2 = md5(random()::text) WHERE col1 = :value;
I tested with -c1 and -c8 -j4 and the results are:
1-client
Master Refactored
Run1 8774.089935 8979.068604
Run2 8509.2661 8943.613575
Run3 8879.484019 8950.994425
8-clients
Master Refactored
Run1 22520.422448 22672.798871
Run2 21967.812303 22022.969747
Run3 22305.073223 21909.945623
So at best there is some improvement with the patch, though I don't see any reason why it should positively affect the performance. The results with more number of clients look almost identical, probably because the bottleneck shifts somewhere else. For all these tests, table was dropped and recreated in every iteration, so I don't think there was any error in testing. It might be a good idea for someone else to repeat the tests to confirm the improvement that I noticed.
Apart from this, I also ran "make check" multiple times and couldn't find any significant difference in the average time.
I will leave it to Alvaro's judgement to decide whether it's worth to commit the patch now or later when he or other committer looks at committing WARM/indirect indexes because without either of those patches this change probably does not bring up much value, if we ignore the slight improvement we see here.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Pavan Deolasee wrote: > A transaction then updates the second column in the table. So the > refactored patch will do heap_getattr() on more columns that the master > while checking if HOT update is possible and before giving up. Thanks. > 1-client > Master Refactored > Run1 8774.089935 8979.068604 > Run2 8509.2661 8943.613575 > Run3 8879.484019 8950.994425 > > > 8-clients > Master Refactored > Run1 22520.422448 22672.798871 > Run2 21967.812303 22022.969747 > Run3 22305.073223 21909.945623 Wow, this is very surprising. I was expecting a slight performance decrease, not this. I will try to reproduce these numbers. One thing worth mentioning is that the current implementation is not very good -- I just kept the attribute check loop as it was in the original, which uses heap_getattr. If we used incremental extraction such as what we do in slot_getattr, it could be made a bit faster too. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 4, 2017 at 11:45 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > > On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com> >> wrote: >> > Okay, but I think if we know how much is the additional cost in >> > average and worst case, then we can take a better call. >> >> Yeah. We shouldn't just rip out optimizations that are inconvenient >> without doing some test of what the impact is on the cases where those >> optimizations are likely to matter. I don't think it needs to be >> anything incredibly laborious and if there's no discernable impact, >> great. > > > So I performed some tests to measure if this causes any noticeable > regression. > Your test and results look good, what kind of m/c you have used to test this. Let me see if I or one of my colleague can do this and similar test on some high-end m/c. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Your test and results look good, what kind of m/c you have used to
test this.
I ran it on my Macbook Pro, so nothing fancy. The code was compiled with simple ./confgure and with no special flags. The only non-default setting was shared_buffers = 512MB to ensure the table/index fits in memory.
Let me see if I or one of my colleague can do this and
similar test on some high-end m/c.
Sure. That'll be helpful given a slight unexpected result. May be it's just a noise or we may see different result on a high end m/c.
Thanks,
Pavan
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > Your test and results look good, what kind of m/c you have used to > test this. Let me see if I or one of my colleague can do this and > similar test on some high-end m/c. As discussed with Amit, I have tried to run the same tests with below modification, 1. Increased the total rows to 10milion. 2. Set fsync off; 3. Changed tests as below. Updated all rows at a time. VACUUM FULL; BEGIN; UPDATE testtab SET col2 = md5(random()::text); ROLLBACK; I have run these tests on IBM power2 which have sufficient ram. I have set shared_buffer=32GB. My results show after this patch there is a slight increase in response time (psql \timing was used) for the above update statement. Which is around 5 to 10% delay. Runs Response time in ms for update base. Response Time in ms for update new patch. %INC 1 158863.501 167443.767 5.4010304104 2 151061.793 168908.536 11.8142004312 3 153750.577 164071.994 6.7130915548 4 153639.165 168364.225 9.5841838245 5 149607.139 166498.44 11.2904378179 Under the same condition running original tests, that is, updating rows which satisfy a condition col1 = :value1. I did not see any regression. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: Sorry Auto plain text setting has disturbed the table indentation. Attaching the spreadsheet for same. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> Your test and results look good, what kind of m/c you have used to >> test this. Let me see if I or one of my colleague can do this and >> similar test on some high-end m/c. > > As discussed with Amit, I have tried to run the same tests with below > modification, > 1. Increased the total rows to 10milion. > 2. Set fsync off; > 3. Changed tests as below. Updated all rows at a time. > > VACUUM FULL; > BEGIN; > UPDATE testtab SET col2 = md5(random()::text); > ROLLBACK; > > I have run these tests on IBM power2 which have sufficient ram. I have > set shared_buffer=32GB. > > My results show after this patch there is a slight increase in > response time (psql \timing was used) for the above update statement. > Which is around 5 to 10% delay. > I would like to add here, that the intention of the test was to stress the changes of the patch to see the overhead patch can incur. Now, surely this is a synthetic test prepared to test this patch, but I think it indicates that the changes have some overhead which might or might not be ignorable depending on how important is to get this patch. I think if Warm tuples or indirect indexes need this patch and they can't do without this, then it is worth considering this patch along with those patches. OTOH, if we can reduce the overhead, then it might be okay proceed with this patch on the basis of simplicity it can bring. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com