Thread: Performance degradation in commit 6150a1b0
From past few weeks, we were facing some performance degradation in the read-only performance bench marks in high-end machines. My colleague Mithun, has tried by reverting commit ac1d794 which seems to degrade the performance in HEAD on high-end m/c's as reported previously[1], but still we were getting degradation, then we have done some profiling to see what has caused it and we found that it's mainly caused by spin lock when called via pin/unpin buffer and then we tried by reverting commit 6150a1b0 which has recently changed the structures in that area and it turns out that reverting that patch, we don't see any degradation in performance. The important point to note is that the performance degradation doesn't occur every time, but if the tests are repeated twice or thrice, it is easily visible.
m/c details
IBM POWER-824 cores,192 hardware threads
RAM - 492GB
Non-default postgresql.conf settings-
shared_buffers=16GB
max_connections=200
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout=900
maintenance_work_mem=1GB
checkpoint_completion_target=0.9
scale_factor - 300
Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is 469002 at 64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it went down to 200807. This performance numbers are median of 3 15-min pgbench read-only tests. The similar data is seen even when we revert the patch on latest commit. We have yet to perform detail analysis as to why the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation, but any ideas are welcome.
[1] -
On 24 February 2016 at 23:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
--
From past few weeks, we were facing some performance degradation in the read-only performance bench marks in high-end machines. My colleague Mithun, has tried by reverting commit ac1d794 which seems to degrade the performance in HEAD on high-end m/c's as reported previously[1], but still we were getting degradation, then we have done some profiling to see what has caused it and we found that it's mainly caused by spin lock when called via pin/unpin buffer and then we tried by reverting commit 6150a1b0 which has recently changed the structures in that area and it turns out that reverting that patch, we don't see any degradation in performance. The important point to note is that the performance degradation doesn't occur every time, but if the tests are repeated twice or thrice, it is easily visible.
Not seen that on the original patch I posted. 6150a1b0 contains multiple changes to the lwlock structures, one written by me, others by Andres.
Perhaps we should revert that patch and re-apply the various changes in multiple commits so we can see the differences.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 25, 2016 at 11:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 24 February 2016 at 23:26, Amit Kapila <amit.kapila16@gmail.com> wrote:From past few weeks, we were facing some performance degradation in the read-only performance bench marks in high-end machines. My colleague Mithun, has tried by reverting commit ac1d794 which seems to degrade the performance in HEAD on high-end m/c's as reported previously[1], but still we were getting degradation, then we have done some profiling to see what has caused it and we found that it's mainly caused by spin lock when called via pin/unpin buffer and then we tried by reverting commit 6150a1b0 which has recently changed the structures in that area and it turns out that reverting that patch, we don't see any degradation in performance. The important point to note is that the performance degradation doesn't occur every time, but if the tests are repeated twice or thrice, it is easily visible.Not seen that on the original patch I posted. 6150a1b0 contains multiple changes to the lwlock structures, one written by me, others by Andres.Perhaps we should revert that patch and re-apply the various changes in multiple commits so we can see the differences.
Yes, thats one choice, other is locally we can narrow down the root cause of problem and then try to address the same. Last time similar issue came up on list, agreement [1] was to note down it in PostgreSQL 9.6 open items and then work on it. I think for this problem, we haven't got to the root cause of problem, so we can try to investigate it. If nobody else steps up to reproduce and look into problem, in few days, I will look into it.
On 25 February 2016 at 18:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
--
On Thu, Feb 25, 2016 at 11:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:On 24 February 2016 at 23:26, Amit Kapila <amit.kapila16@gmail.com> wrote:From past few weeks, we were facing some performance degradation in the read-only performance bench marks in high-end machines. My colleague Mithun, has tried by reverting commit ac1d794 which seems to degrade the performance in HEAD on high-end m/c's as reported previously[1], but still we were getting degradation, then we have done some profiling to see what has caused it and we found that it's mainly caused by spin lock when called via pin/unpin buffer and then we tried by reverting commit 6150a1b0 which has recently changed the structures in that area and it turns out that reverting that patch, we don't see any degradation in performance. The important point to note is that the performance degradation doesn't occur every time, but if the tests are repeated twice or thrice, it is easily visible.Not seen that on the original patch I posted. 6150a1b0 contains multiple changes to the lwlock structures, one written by me, others by Andres.Perhaps we should revert that patch and re-apply the various changes in multiple commits so we can see the differences.Yes, thats one choice, other is locally we can narrow down the root cause of problem and then try to address the same. Last time similar issue came up on list, agreement [1] was to note down it in PostgreSQL 9.6 open items and then work on it. I think for this problem, we haven't got to the root cause of problem, so we can try to investigate it. If nobody else steps up to reproduce and look into problem, in few days, I will look into it.
Don't understand this. If a problem is caused by one of two things, first you check one, then the other.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Feb 26, 2016 at 8:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Don't understand this. If a problem is caused by one of two things, first > you check one, then the other. I don't quite understand how you think that patch can be decomposed into multiple, independent changes. It was one commit because every change in there is interdependent with every other one, at least as far as I can see. I don't really understand how you'd split it up, or what useful information you'd hope to gain from testing a split patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2016-02-25 12:56:39 +0530, Amit Kapila wrote: > From past few weeks, we were facing some performance degradation in the > read-only performance bench marks in high-end machines. My colleague > Mithun, has tried by reverting commit ac1d794 which seems to degrade the > performance in HEAD on high-end m/c's as reported previously[1], but still > we were getting degradation, then we have done some profiling to see what > has caused it and we found that it's mainly caused by spin lock when > called via pin/unpin buffer and then we tried by reverting commit 6150a1b0 > which has recently changed the structures in that area and it turns out > that reverting that patch, we don't see any degradation in performance. > The important point to note is that the performance degradation doesn't > occur every time, but if the tests are repeated twice or thrice, it > is easily visible. > m/c details > IBM POWER-8 > 24 cores,192 hardware threads > RAM - 492GB > > Non-default postgresql.conf settings- > shared_buffers=16GB > max_connections=200 > min_wal_size=15GB > max_wal_size=20GB > checkpoint_timeout=900 > maintenance_work_mem=1GB > checkpoint_completion_target=0.9 > > scale_factor - 300 > > Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is 469002 at > 64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it > went down to 200807. This performance numbers are median of 3 15-min > pgbench read-only tests. The similar data is seen even when we revert the > patch on latest commit. We have yet to perform detail analysis as to why > the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation, > but any ideas are welcome. Ugh. Especially the varying performance is odd. Does it vary between restarts, or is it just happenstance? If it's the former, we might be dealing with some alignment issues. If not, I wonder if the issue is massive buffer header contention. As a LL/SC architecture acquiring the content lock might interrupt buffer spinlock acquisition and vice versa. Does applying the patch from http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com change the picture? Regards, Andres
On Sat, Feb 27, 2016 at 12:41 AM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:
> > From past few weeks, we were facing some performance degradation in the
> > read-only performance bench marks in high-end machines. My colleague
> > Mithun, has tried by reverting commit ac1d794 which seems to degrade the
> > performance in HEAD on high-end m/c's as reported previously[1], but still
> > we were getting degradation, then we have done some profiling to see what
> > has caused it and we found that it's mainly caused by spin lock when
> > called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
> > which has recently changed the structures in that area and it turns out
> > that reverting that patch, we don't see any degradation in performance.
> > The important point to note is that the performance degradation doesn't
> > occur every time, but if the tests are repeated twice or thrice, it
> > is easily visible.
>
> > m/c details
> > IBM POWER-8
> > 24 cores,192 hardware threads
> > RAM - 492GB
> >
> > Non-default postgresql.conf settings-
> > shared_buffers=16GB
> > max_connections=200
> > min_wal_size=15GB
> > max_wal_size=20GB
> > checkpoint_timeout=900
> > maintenance_work_mem=1GB
> > checkpoint_completion_target=0.9
> >
> > scale_factor - 300
> >
> > Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is 469002 at
> > 64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
> > went down to 200807. This performance numbers are median of 3 15-min
> > pgbench read-only tests. The similar data is seen even when we revert the
> > patch on latest commit. We have yet to perform detail analysis as to why
> > the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation,
> > but any ideas are welcome.
>
> Ugh. Especially the varying performance is odd. Does it vary between
> restarts, or is it just happenstance? If it's the former, we might be
> dealing with some alignment issues.
>
It varies between restarts.
>
> If not, I wonder if the issue is massive buffer header contention. As a
> LL/SC architecture acquiring the content lock might interrupt buffer
> spinlock acquisition and vice versa.
>
> Does applying the patch from http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com
> change the picture?
>
>
> Hi,
>
> On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:
> > From past few weeks, we were facing some performance degradation in the
> > read-only performance bench marks in high-end machines. My colleague
> > Mithun, has tried by reverting commit ac1d794 which seems to degrade the
> > performance in HEAD on high-end m/c's as reported previously[1], but still
> > we were getting degradation, then we have done some profiling to see what
> > has caused it and we found that it's mainly caused by spin lock when
> > called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
> > which has recently changed the structures in that area and it turns out
> > that reverting that patch, we don't see any degradation in performance.
> > The important point to note is that the performance degradation doesn't
> > occur every time, but if the tests are repeated twice or thrice, it
> > is easily visible.
>
> > m/c details
> > IBM POWER-8
> > 24 cores,192 hardware threads
> > RAM - 492GB
> >
> > Non-default postgresql.conf settings-
> > shared_buffers=16GB
> > max_connections=200
> > min_wal_size=15GB
> > max_wal_size=20GB
> > checkpoint_timeout=900
> > maintenance_work_mem=1GB
> > checkpoint_completion_target=0.9
> >
> > scale_factor - 300
> >
> > Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is 469002 at
> > 64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
> > went down to 200807. This performance numbers are median of 3 15-min
> > pgbench read-only tests. The similar data is seen even when we revert the
> > patch on latest commit. We have yet to perform detail analysis as to why
> > the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation,
> > but any ideas are welcome.
>
> Ugh. Especially the varying performance is odd. Does it vary between
> restarts, or is it just happenstance? If it's the former, we might be
> dealing with some alignment issues.
>
It varies between restarts.
>
> If not, I wonder if the issue is massive buffer header contention. As a
> LL/SC architecture acquiring the content lock might interrupt buffer
> spinlock acquisition and vice versa.
>
> Does applying the patch from http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com
> change the picture?
>
Not tried, but if this is alignment issue as you are suspecting above, then does it make sense to try this out?
On February 26, 2016 7:55:18 PM PST, Amit Kapila <amit.kapila16@gmail.com> wrote: >On Sat, Feb 27, 2016 at 12:41 AM, Andres Freund <andres@anarazel.de> >wrote: >> >> Hi, >> >> On 2016-02-25 12:56:39 +0530, Amit Kapila wrote: >> > From past few weeks, we were facing some performance degradation in >the >> > read-only performance bench marks in high-end machines. My >colleague >> > Mithun, has tried by reverting commit ac1d794 which seems to >degrade the >> > performance in HEAD on high-end m/c's as reported previously[1], >but >still >> > we were getting degradation, then we have done some profiling to >see >what >> > has caused it and we found that it's mainly caused by spin lock >when >> > called via pin/unpin buffer and then we tried by reverting commit >6150a1b0 >> > which has recently changed the structures in that area and it turns >out >> > that reverting that patch, we don't see any degradation in >performance. >> > The important point to note is that the performance degradation >doesn't >> > occur every time, but if the tests are repeated twice or thrice, it >> > is easily visible. >> >> > m/c details >> > IBM POWER-8 >> > 24 cores,192 hardware threads >> > RAM - 492GB >> > >> > Non-default postgresql.conf settings- >> > shared_buffers=16GB >> > max_connections=200 >> > min_wal_size=15GB >> > max_wal_size=20GB >> > checkpoint_timeout=900 >> > maintenance_work_mem=1GB >> > checkpoint_completion_target=0.9 >> > >> > scale_factor - 300 >> > >> > Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is >469002 at >> > 64-client count and then at >6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it >> > went down to 200807. This performance numbers are median of 3 >15-min >> > pgbench read-only tests. The similar data is seen even when we >revert >the >> > patch on latest commit. We have yet to perform detail analysis as >to >why >> > the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to >degradation, >> > but any ideas are welcome. >> >> Ugh. Especially the varying performance is odd. Does it vary between >> restarts, or is it just happenstance? If it's the former, we might >be >> dealing with some alignment issues. >> > >It varies between restarts. > >> >> If not, I wonder if the issue is massive buffer header contention. As >a >> LL/SC architecture acquiring the content lock might interrupt buffer >> spinlock acquisition and vice versa. >> >> Does applying the patch from >http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com >> change the picture? >> > >Not tried, but if this is alignment issue as you are suspecting above, >then >does it make sense to try this out? It's the other theory I had. And it's additionally useful testing regardless of this regression... --- Please excuse brevity and formatting - I am writing this on my mobile phone.
Hi All,
I have been working on this issue for last few days trying to investigate what could be the probable reasons for Performance degradation at commit 6150a1b0. After going through Andres patch for moving buffer I/O and content lock out of Main Tranche, the following two things come into my
mind.
1. Content Lock is no more used as a pointer in BufferDesc structure instead it is included as LWLock structure. This basically increases the overall structure size from 64bytes to 80 bytes. Just to investigate on this, I have reverted the changes related to content lock from commit 6150a1b0 and taken at least 10 readings and with this change i can see that the overall performance is similar to what it was observed earlier i.e. before commit 6150a1b0.
2. Secondly, i can see that the BufferDesc structure padding is 64 bytes however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the BufferDesc structure padding size to 128 bytes along with the changes mentioned in above point #1, I see that the overall performance is again similar to what is observed before commit 6150a1b0.
Please have a look into the attached test report that contains the performance test results for all the scenarios discussed above and let me know your thoughts.
With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com
I have been working on this issue for last few days trying to investigate what could be the probable reasons for Performance degradation at commit 6150a1b0. After going through Andres patch for moving buffer I/O and content lock out of Main Tranche, the following two things come into my
mind.
1. Content Lock is no more used as a pointer in BufferDesc structure instead it is included as LWLock structure. This basically increases the overall structure size from 64bytes to 80 bytes. Just to investigate on this, I have reverted the changes related to content lock from commit 6150a1b0 and taken at least 10 readings and with this change i can see that the overall performance is similar to what it was observed earlier i.e. before commit 6150a1b0.
2. Secondly, i can see that the BufferDesc structure padding is 64 bytes however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the BufferDesc structure padding size to 128 bytes along with the changes mentioned in above point #1, I see that the overall performance is again similar to what is observed before commit 6150a1b0.
Please have a look into the attached test report that contains the performance test results for all the scenarios discussed above and let me know your thoughts.
With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com
On Sat, Feb 27, 2016 at 9:26 AM, Andres Freund <andres@anarazel.de> wrote:
It's the other theory I had. And it's additionally useful testing regardless of this regression...On February 26, 2016 7:55:18 PM PST, Amit Kapila <amit.kapila16@gmail.com> wrote:
>On Sat, Feb 27, 2016 at 12:41 AM, Andres Freund <andres@anarazel.de>
>wrote:
>>
>> Hi,
>>
>> On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:
>> > From past few weeks, we were facing some performance degradation in
>the
>> > read-only performance bench marks in high-end machines. My
>colleague
>> > Mithun, has tried by reverting commit ac1d794 which seems to
>degrade the
>> > performance in HEAD on high-end m/c's as reported previously[1],
>but
>still
>> > we were getting degradation, then we have done some profiling to
>see
>what
>> > has caused it and we found that it's mainly caused by spin lock
>when
>> > called via pin/unpin buffer and then we tried by reverting commit
>6150a1b0
>> > which has recently changed the structures in that area and it turns
>out
>> > that reverting that patch, we don't see any degradation in
>performance.
>> > The important point to note is that the performance degradation
>doesn't
>> > occur every time, but if the tests are repeated twice or thrice, it
>> > is easily visible.
>>
>> > m/c details
>> > IBM POWER-8
>> > 24 cores,192 hardware threads
>> > RAM - 492GB
>> >
>> > Non-default postgresql.conf settings-
>> > shared_buffers=16GB
>> > max_connections=200
>> > min_wal_size=15GB
>> > max_wal_size=20GB
>> > checkpoint_timeout=900
>> > maintenance_work_mem=1GB
>> > checkpoint_completion_target=0.9
>> >
>> > scale_factor - 300
>> >
>> > Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is
>469002 at
>> > 64-client count and then at
>6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
>> > went down to 200807. This performance numbers are median of 3
>15-min
>> > pgbench read-only tests. The similar data is seen even when we
>revert
>the
>> > patch on latest commit. We have yet to perform detail analysis as
>to
>why
>> > the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to
>degradation,
>> > but any ideas are welcome.
>>
>> Ugh. Especially the varying performance is odd. Does it vary between
>> restarts, or is it just happenstance? If it's the former, we might
>be
>> dealing with some alignment issues.
>>
>
>It varies between restarts.
>
>>
>> If not, I wonder if the issue is massive buffer header contention. As
>a
>> LL/SC architecture acquiring the content lock might interrupt buffer
>> spinlock acquisition and vice versa.
>>
>> Does applying the patch from
>http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com
>> change the picture?
>>
>
>Not tried, but if this is alignment issue as you are suspecting above,
>then
>does it make sense to try this out?
---
Please excuse brevity and formatting - I am writing this on my mobile phone.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Mar 23, 2016 at 1:59 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi All,
>
> I have been working on this issue for last few days trying to investigate what could be the probable reasons for Performance degradation at commit 6150a1b0. After going through Andres patch for moving buffer I/O and content lock out of Main Tranche, the following two things come into my
> mind.
>
> 1. Content Lock is no more used as a pointer in BufferDesc structure instead it is included as LWLock structure. This basically increases the overall structure size from 64bytes to 80 bytes. Just to investigate on this, I have reverted the changes related to content lock from commit 6150a1b0 and taken at least 10 readings and with this change i can see that the overall performance is similar to what it was observed earlier i.e. before commit 6150a1b0.
>
> 2. Secondly, i can see that the BufferDesc structure padding is 64 bytes however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the BufferDesc structure padding size to 128 bytes along with the changes mentioned in above point #1, I see that the overall performance is again similar to what is observed before commit 6150a1b0.
>
> Please have a look into the attached test report that contains the performance test results for all the scenarios discussed above and let me know your thoughts.
>
>
> Hi All,
>
> I have been working on this issue for last few days trying to investigate what could be the probable reasons for Performance degradation at commit 6150a1b0. After going through Andres patch for moving buffer I/O and content lock out of Main Tranche, the following two things come into my
> mind.
>
> 1. Content Lock is no more used as a pointer in BufferDesc structure instead it is included as LWLock structure. This basically increases the overall structure size from 64bytes to 80 bytes. Just to investigate on this, I have reverted the changes related to content lock from commit 6150a1b0 and taken at least 10 readings and with this change i can see that the overall performance is similar to what it was observed earlier i.e. before commit 6150a1b0.
>
> 2. Secondly, i can see that the BufferDesc structure padding is 64 bytes however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the BufferDesc structure padding size to 128 bytes along with the changes mentioned in above point #1, I see that the overall performance is again similar to what is observed before commit 6150a1b0.
>
> Please have a look into the attached test report that contains the performance test results for all the scenarios discussed above and let me know your thoughts.
>
So this indicates that changing back content lock as LWLock* in BufferDesc brings back the performance which indicates that increase in BufferDesc size to more than 64bytes on this platform has caused regression. I think it is worth trying the patch [1] as suggested by Andres as that will reduce the size of BufferDesc which can bring back the performance. Can you once try the same?
On 2016-03-25 09:29:34 +0530, Amit Kapila wrote: > > 2. Secondly, i can see that the BufferDesc structure padding is 64 bytes > however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the > BufferDesc structure padding size to 128 bytes along with the changes > mentioned in above point #1, I see that the overall performance is again > similar to what is observed before commit 6150a1b0. That makes sense, as it restores alignment. > So this indicates that changing back content lock as LWLock* in BufferDesc > brings back the performance which indicates that increase in BufferDesc > size to more than 64bytes on this platform has caused regression. I think > it is worth trying the patch [1] as suggested by Andres as that will reduce > the size of BufferDesc which can bring back the performance. Can you once > try the same? > > [1] - > http://www.postgresql.org/message-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com Yes please. I'll try to review that once more ASAP. Regards, Andres
Hi,
I am getting some reject files while trying to apply "pinunpin-cas-5.patch" attached with the thread,http://www.postgresql.org/message-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com
EnterpriseDB: http://www.enterprisedb.com
On Fri, Mar 25, 2016 at 9:29 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 23, 2016 at 1:59 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi All,
>
> I have been working on this issue for last few days trying to investigate what could be the probable reasons for Performance degradation at commit 6150a1b0. After going through Andres patch for moving buffer I/O and content lock out of Main Tranche, the following two things come into my
> mind.
>
> 1. Content Lock is no more used as a pointer in BufferDesc structure instead it is included as LWLock structure. This basically increases the overall structure size from 64bytes to 80 bytes. Just to investigate on this, I have reverted the changes related to content lock from commit 6150a1b0 and taken at least 10 readings and with this change i can see that the overall performance is similar to what it was observed earlier i.e. before commit 6150a1b0.
>
> 2. Secondly, i can see that the BufferDesc structure padding is 64 bytes however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the BufferDesc structure padding size to 128 bytes along with the changes mentioned in above point #1, I see that the overall performance is again similar to what is observed before commit 6150a1b0.
>
> Please have a look into the attached test report that contains the performance test results for all the scenarios discussed above and let me know your thoughts.
>So this indicates that changing back content lock as LWLock* in BufferDesc brings back the performance which indicates that increase in BufferDesc size to more than 64bytes on this platform has caused regression. I think it is worth trying the patch [1] as suggested by Andres as that will reduce the size of BufferDesc which can bring back the performance. Can you once try the same?
Hi,
As mentioned in my earlier mail i was not able to apply pinunpin-cas-5.patch on commit 6150a1b0, therefore i thought of applying it on theapplying pinunpin-cas-5.patch and my observations are as follows,
1. I can still see that the current performance lags by 2-3% from the expected performance when pinunpin-cas-5.patch is applied on the commit 76281aa9.
Note: Here, the expected performance is the performance observed before commit 6150a1b0 when ac1d794 is reverted.
With Regards,
On Sat, Mar 26, 2016 at 9:31 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Ashutosh SharmaWith Regards,Note: I am applying this patch on top of commit "6150a1b08a9fe7ead2b25240be46dddeae9d98e1".Hi,I am getting some reject files while trying to apply "pinunpin-cas-5.patch" attached with the thread,
http://www.postgresql.org/message-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com
EnterpriseDB: http://www.enterprisedb.comOn Fri, Mar 25, 2016 at 9:29 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:On Wed, Mar 23, 2016 at 1:59 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi All,
>
> I have been working on this issue for last few days trying to investigate what could be the probable reasons for Performance degradation at commit 6150a1b0. After going through Andres patch for moving buffer I/O and content lock out of Main Tranche, the following two things come into my
> mind.
>
> 1. Content Lock is no more used as a pointer in BufferDesc structure instead it is included as LWLock structure. This basically increases the overall structure size from 64bytes to 80 bytes. Just to investigate on this, I have reverted the changes related to content lock from commit 6150a1b0 and taken at least 10 readings and with this change i can see that the overall performance is similar to what it was observed earlier i.e. before commit 6150a1b0.
>
> 2. Secondly, i can see that the BufferDesc structure padding is 64 bytes however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the BufferDesc structure padding size to 128 bytes along with the changes mentioned in above point #1, I see that the overall performance is again similar to what is observed before commit 6150a1b0.
>
> Please have a look into the attached test report that contains the performance test results for all the scenarios discussed above and let me know your thoughts.
>So this indicates that changing back content lock as LWLock* in BufferDesc brings back the performance which indicates that increase in BufferDesc size to more than 64bytes on this platform has caused regression. I think it is worth trying the patch [1] as suggested by Andres as that will reduce the size of BufferDesc which can bring back the performance. Can you once try the same?
Attachment
Hi, On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote: > As mentioned in my earlier mail i was not able to apply > *pinunpin-cas-5.patch* on commit *6150a1b0, That's not surprising; that's pretty old. > *therefore i thought of applying it on the latest commit and i was > able to do it successfully. I have now taken the performance readings > at latest commit i.e. *76281aa9* with and without applying > *pinunpin-cas-5.patch* and my observations are as follows, > > 1. I can still see that the current performance lags by 2-3% from the > expected performance when *pinunpin-cas-5.patch *is applied on the commit > > *76281aa9.* > 2. When *pinunpin-cas-5.patch *is ignored and performance is measured at > commit *76281aa9 *the overall performance lags by 50-60% from the expected > performance. > > *Note:* Here, the expected performance is the performance observed before > commit *6150a1b0 *when* ac1d794 *is reverted. Thanks for doing these benchmarks. What's the performance if you revert 6150a1b0 on top of a recent master? There've been a lot of other patches influencing performance since 6150a1b0, so minor performance differences aren't necessarily meaningful; especially when that older version then had other patches reverted. Thanks, Andres
Hi,
I am unable to revert 6150a1b0 on top of recent commit in the master branch. It seems like there has been some commit made recently that has got dependency on 6150a1b0.EnterpriseDB: http://www.enterprisedb.com
On Sun, Mar 27, 2016 at 5:45 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:
> As mentioned in my earlier mail i was not able to apply
> *pinunpin-cas-5.patch* on commit *6150a1b0,
That's not surprising; that's pretty old.
> *therefore i thought of applying it on the latest commit and i was
> able to do it successfully. I have now taken the performance readings
> at latest commit i.e. *76281aa9* with and without applying
> *pinunpin-cas-5.patch* and my observations are as follows,
>
> 1. I can still see that the current performance lags by 2-3% from the
> expected performance when *pinunpin-cas-5.patch *is applied on the commit
>
> *76281aa9.*
> 2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
> commit *76281aa9 *the overall performance lags by 50-60% from the expected
> performance.
>
> *Note:* Here, the expected performance is the performance observed before
> commit *6150a1b0 *when* ac1d794 *is reverted.
Thanks for doing these benchmarks. What's the performance if you revert
6150a1b0 on top of a recent master? There've been a lot of other patches
influencing performance since 6150a1b0, so minor performance differences
aren't necessarily meaningful; especially when that older version then
had other patches reverted.
Thanks,
Andres
On Sun, Mar 27, 2016 at 02:15:50PM +0200, Andres Freund wrote: > On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote: > > As mentioned in my earlier mail i was not able to apply > > *pinunpin-cas-5.patch* on commit *6150a1b0, > > That's not surprising; that's pretty old. > > > *therefore i thought of applying it on the latest commit and i was > > able to do it successfully. I have now taken the performance readings > > at latest commit i.e. *76281aa9* with and without applying > > *pinunpin-cas-5.patch* and my observations are as follows, > > > > > 1. I can still see that the current performance lags by 2-3% from the > > expected performance when *pinunpin-cas-5.patch *is applied on the commit > > > > *76281aa9.* > > 2. When *pinunpin-cas-5.patch *is ignored and performance is measured at > > commit *76281aa9 *the overall performance lags by 50-60% from the expected > > performance. > > > > *Note:* Here, the expected performance is the performance observed before > > commit *6150a1b0 *when* ac1d794 *is reverted. > > Thanks for doing these benchmarks. What's the performance if you revert > 6150a1b0 on top of a recent master? There've been a lot of other patches > influencing performance since 6150a1b0, so minor performance differences > aren't necessarily meaningful; especially when that older version then > had other patches reverted. [This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item. Andres, since you committed the patch believed to have created it, you own this open item. If that responsibility lies elsewhere, please let us know whose responsibility it is to fix this. Since new open items may be discovered at any time and I want to plan to have them all fixed well in advance of the ship date, I will appreciate your efforts toward speedy resolution. Please present, within 72 hours, a plan to fix the defect within seven days of this message. Thanks.
On Thu, Mar 31, 2016 at 01:10:56AM -0400, Noah Misch wrote: > On Sun, Mar 27, 2016 at 02:15:50PM +0200, Andres Freund wrote: > > On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote: > > > As mentioned in my earlier mail i was not able to apply > > > *pinunpin-cas-5.patch* on commit *6150a1b0, > > > > That's not surprising; that's pretty old. > > > > > *therefore i thought of applying it on the latest commit and i was > > > able to do it successfully. I have now taken the performance readings > > > at latest commit i.e. *76281aa9* with and without applying > > > *pinunpin-cas-5.patch* and my observations are as follows, > > > > > > > > 1. I can still see that the current performance lags by 2-3% from the > > > expected performance when *pinunpin-cas-5.patch *is applied on the commit > > > > > > *76281aa9.* > > > 2. When *pinunpin-cas-5.patch *is ignored and performance is measured at > > > commit *76281aa9 *the overall performance lags by 50-60% from the expected > > > performance. > > > > > > *Note:* Here, the expected performance is the performance observed before > > > commit *6150a1b0 *when* ac1d794 *is reverted. > > > > Thanks for doing these benchmarks. What's the performance if you revert > > 6150a1b0 on top of a recent master? There've been a lot of other patches > > influencing performance since 6150a1b0, so minor performance differences > > aren't necessarily meaningful; especially when that older version then > > had other patches reverted. > > [This is a generic notification.] > > The above-described topic is currently a PostgreSQL 9.6 open item. Andres, > since you committed the patch believed to have created it, you own this open > item. If that responsibility lies elsewhere, please let us know whose > responsibility it is to fix this. Since new open items may be discovered at > any time and I want to plan to have them all fixed well in advance of the ship > date, I will appreciate your efforts toward speedy resolution. Please > present, within 72 hours, a plan to fix the defect within seven days of this > message. Thanks. My attribution above was incorrect. Robert Haas is the committer and owner of this one. I apologize.
On March 31, 2016 7:16:33 AM GMT+02:00, Noah Misch <noah@leadboat.com> wrote: >On Thu, Mar 31, 2016 at 01:10:56AM -0400, Noah Misch wrote: >> On Sun, Mar 27, 2016 at 02:15:50PM +0200, Andres Freund wrote: >> > On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote: >> > > As mentioned in my earlier mail i was not able to apply >> > > *pinunpin-cas-5.patch* on commit *6150a1b0, >> > >> > That's not surprising; that's pretty old. >> > >> > > *therefore i thought of applying it on the latest commit and i >was >> > > able to do it successfully. I have now taken the performance >readings >> > > at latest commit i.e. *76281aa9* with and without applying >> > > *pinunpin-cas-5.patch* and my observations are as follows, >> > > >> > >> > > 1. I can still see that the current performance lags by 2-3% from >the >> > > expected performance when *pinunpin-cas-5.patch *is applied on >the commit >> > > >> > > *76281aa9.* >> > > 2. When *pinunpin-cas-5.patch *is ignored and performance is >measured at >> > > commit *76281aa9 *the overall performance lags by 50-60% from the >expected >> > > performance. >> > > >> > > *Note:* Here, the expected performance is the performance >observed before >> > > commit *6150a1b0 *when* ac1d794 *is reverted. >> > >> > Thanks for doing these benchmarks. What's the performance if you >revert >> > 6150a1b0 on top of a recent master? There've been a lot of other >patches >> > influencing performance since 6150a1b0, so minor performance >differences >> > aren't necessarily meaningful; especially when that older version >then >> > had other patches reverted. >> >> [This is a generic notification.] >> >> The above-described topic is currently a PostgreSQL 9.6 open item. >Andres, >> since you committed the patch believed to have created it, you own >this open >> item. If that responsibility lies elsewhere, please let us know >whose >> responsibility it is to fix this. Since new open items may be >discovered at >> any time and I want to plan to have them all fixed well in advance of >the ship >> date, I will appreciate your efforts toward speedy resolution. >Please >> present, within 72 hours, a plan to fix the defect within seven days >of this >> message. Thanks. > >My attribution above was incorrect. Robert Haas is the committer and >owner of >this one. I apologize. Fine in this case I guess. I've posted a proposal nearby either way, it appears to be a !x86 problem. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund <andres@anarazel.de> wrote: >>My attribution above was incorrect. Robert Haas is the committer and >>owner of >>this one. I apologize. > > Fine in this case I guess. I've posted a proposal nearby either way, it appears to be a !x86 problem. To which proposal are you referring? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-03-31 06:43:19 -0400, Robert Haas wrote: > On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund <andres@anarazel.de> wrote: > >>My attribution above was incorrect. Robert Haas is the committer and > >>owner of > >>this one. I apologize. > > > > Fine in this case I guess. I've posted a proposal nearby either way, it appears to be a !x86 problem. > > To which proposal are you referring? 1) in http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg4qb@awork2.anarazel.de
On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-03-31 06:43:19 -0400, Robert Haas wrote: >> On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund <andres@anarazel.de> wrote: >> >>My attribution above was incorrect. Robert Haas is the committer and >> >>owner of >> >>this one. I apologize. >> > >> > Fine in this case I guess. I've posted a proposal nearby either way, it appears to be a !x86 problem. >> >> To which proposal are you referring? > > 1) in http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg4qb@awork2.anarazel.de OK. So, Noah, my proposed strategy is to wait and see if Andres can make that work, and if not, then revisit the issue of what to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund <andres@anarazel.de> wrote: >> On 2016-03-31 06:43:19 -0400, Robert Haas wrote: >>> To which proposal are you referring? >> 1) in http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg4qb@awork2.anarazel.de > OK. So, Noah, my proposed strategy is to wait and see if Andres can > make that work, and if not, then revisit the issue of what to do. I thought that proposal had already crashed and burned, on the grounds that byte-size spinlocks require instructions that many PPC machines don't have. regards, tom lane
On Thu, Mar 31, 2016 at 10:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund <andres@anarazel.de> wrote: >>> On 2016-03-31 06:43:19 -0400, Robert Haas wrote: >>>> To which proposal are you referring? > >>> 1) in http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg4qb@awork2.anarazel.de > >> OK. So, Noah, my proposed strategy is to wait and see if Andres can >> make that work, and if not, then revisit the issue of what to do. > > I thought that proposal had already crashed and burned, on the grounds > that byte-size spinlocks require instructions that many PPC machines > don't have. So the current status of this issue is: 1. Andres committed a patch (008608b9d51061b1f598c197477b3dc7be9c4a64) to reduce the size of an LWLock by an amount equal to the size of a mutex (modulo alignment). 2. Andres also committed a patch (48354581a49c30f5757c203415aa8412d85b0f70) to remove the spinlock from a BufferDesc, which also reduces its size, I think, because it replaces members of types BufFlags (2 bytes), uint8, slock_t, and unsigned with a single member of type pg_atomic_uint32. The reason why these changes are relevant is because Andres thought the observed regression might be related to the BufferDesc growing to more than 64 bytes on POWER, which in turn could cause buffer descriptors to get split across cache lines. However, in the meantime, I did some performance tests on the same machine that Amit used for testing in the email that started this thread: http://www.postgresql.org/message-id/CA+TgmoZJdA6K7-17K4A48rVB0UPR98HVuaNcfNNLrGsdb1uChg@mail.gmail.com The upshot of that is that (1) the performance degradation I saw was significant but smaller than what Amit reported in the OP, and (2) it looked like the patches Andres gave me to test at the time got performance back to about the same level we were at before 6150a1b0. So there's room for optimism that this is fixed, but perhaps some retesting is in order, since what was committed was, I think, not identical to what I tested. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 12, 2016 at 05:36:07PM -0400, Robert Haas wrote: > So the current status of this issue is: > > 1. Andres committed a patch (008608b9d51061b1f598c197477b3dc7be9c4a64) > to reduce the size of an LWLock by an amount equal to the size of a > mutex (modulo alignment). > > 2. Andres also committed a patch > (48354581a49c30f5757c203415aa8412d85b0f70) to remove the spinlock from > a BufferDesc, which also reduces its size, I think, because it > replaces members of types BufFlags (2 bytes), uint8, slock_t, and > unsigned with a single member of type pg_atomic_uint32. > > The reason why these changes are relevant is because Andres thought > the observed regression might be related to the BufferDesc growing to > more than 64 bytes on POWER, which in turn could cause buffer > descriptors to get split across cache lines. However, in the > meantime, I did some performance tests on the same machine that Amit > used for testing in the email that started this thread: > > http://www.postgresql.org/message-id/CA+TgmoZJdA6K7-17K4A48rVB0UPR98HVuaNcfNNLrGsdb1uChg@mail.gmail.com > > The upshot of that is that (1) the performance degradation I saw was > significant but smaller than what Amit reported in the OP, and (2) it > looked like the patches Andres gave me to test at the time got > performance back to about the same level we were at before 6150a1b0. > So there's room for optimism that this is fixed, but perhaps some > retesting is in order, since what was committed was, I think, not > identical to what I tested. That sounds like this open item is ready for CLOSE_WAIT status; is it? If someone does retest this, it would be informative to see how the system performs with 6150a1b0 reverted. Your testing showed performance of 6150a1b0 alone and of 6150a1b0 plus predecessors of 008608b and 4835458. I don't recall seeing figures for 008608b + 4835458 - 6150a1b0, though.
On Tue, Apr 12, 2016 at 10:30 PM, Noah Misch <noah@leadboat.com> wrote: > That sounds like this open item is ready for CLOSE_WAIT status; is it? I just retested this on power2. Here are the results. I retested 3fed4174 and 6150a1b0 plus master as of deb71fa9. 5-minute pgbench -S runs, scale factor 300, with predictable prewarming to minimize variation, as well as numactl --interleave. Each result is a median of three. 1 client: 3fed4174 = 13701.014931, 6150a1b0 = 13669.626916, master = 19685.571089 8 clients: 3fed4174 = 126676.357079, 6150a1b0 = 125239.911105, master = 122940.079404 32 clients: 3fed4174 = 323989.685428, 6150a1b0 = 338638.095126, master = 333656.861590 64 clients: 3fed4174 = 495434.372578, 6150a1b0 = 457794.475129, master = 493034.922791 128 clients: 3fed4174 = 376412.090366, 6150a1b0 = 363157.294391, master = 625498.280370 On this test 8, 32, and 64 clients are coming out about the same as 3fed4174, but 1 client and 128 clients are dramatically improved with current master. The 1-client result is a lot more surprising than the 128-client result; I don't know what's going on there. But anyway I don't see a regression here. So, yes, I would say this should go to CLOSE_WAIT at this point, unless Amit or somebody else turns up further evidence of a continuing issue here. Random points of possible interest: 1. During a 128-client run, top shows about 45% user time, 10% system time, 45% idle. 2. About 3 minutes into a 128-client run, perf looks like this (substantially abridged): 3.55% postgres postgres [.] GetSnapshotData 2.15% postgres postgres [.] LWLockAttemptLock |--32.82%-- LockBuffer | |--48.59%-- _bt_relandgetbuf | |--44.07%-- _bt_getbuf |--29.81%-- ReadBuffer_common |--23.88%-- GetSnapshotData |--5.30%-- LockAcquireExtended 2.12% postgres postgres [.] LWLockRelease 2.02% postgres postgres [.] _bt_compare 1.88% postgres postgres [.] hash_search_with_hash_value |--47.21%-- BufTableLookup |--10.93%-- LockAcquireExtended |--5.43%-- GetPortalByName |--5.21%-- ReadBuffer_common |--4.68%-- RelationIdGetRelation 1.87% postgres postgres [.] AllocSetAlloc 1.42% postgres postgres [.] PinBuffer.isra.3 0.96% postgres libc-2.17.so [.] __memcpy_power7 0.89% postgres postgres [.] UnpinBuffer.constprop.7 0.80% postgres postgres [.] PostgresMain 0.80% postgres postgres [.] pg_encoding_mbcliplen 0.71% postgres postgres [.] hash_any 0.62% postgres postgres [.] AllocSetFree 0.59% postgres postgres [.] palloc 0.57% postgres libc-2.17.so [.] _int_free A context-switch profile, somewhat amazingly, shows no context switches for anything other than waiting on client read, implying that performance is entirely constrained by memory bandwidth and CPU speed, not lock contention. > If someone does retest this, it would be informative to see how the system > performs with 6150a1b0 reverted. Your testing showed performance of 6150a1b0 > alone and of 6150a1b0 plus predecessors of 008608b and 4835458. I don't > recall seeing figures for 008608b + 4835458 - 6150a1b0, though. That revert isn't trivial: even what exactly that would mean at this point is somewhat subjective. I'm also not sure there is much point. 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 was written in such a way that only platforms with single-byte spinlocks were going to have a BufferDesc that fits into 64 bytes, which in retrospect was a bit short-sighted. Because the changes that were made to get it back down to 64 bytes might also have other performance-relevant consequences, it's a bit hard to be sure that that was the precise thing that caused the regression. And of course there was a fury of other commits going in at the same time, some even on related topics, which further adds to the difficulty of pinpointing this precisely. All that is a bit unfortunate in some sense, but I think we're just going to have to keep moving forward and hope for the best. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 12, 2016 at 11:40:43PM -0400, Robert Haas wrote: > On Tue, Apr 12, 2016 at 10:30 PM, Noah Misch <noah@leadboat.com> wrote: > > That sounds like this open item is ready for CLOSE_WAIT status; is it? > > I just retested this on power2. > So, yes, I would say this should go to CLOSE_WAIT at this point, > unless Amit or somebody else turns up further evidence of a continuing > issue here. Thanks for testing again. > > If someone does retest this, it would be informative to see how the system > > performs with 6150a1b0 reverted. Your testing showed performance of 6150a1b0 > > alone and of 6150a1b0 plus predecessors of 008608b and 4835458. I don't > > recall seeing figures for 008608b + 4835458 - 6150a1b0, though. > > That revert isn't trivial: even what exactly that would mean at this > point is somewhat subjective. I'm also not sure there is much point. > 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 was written in such a way > that only platforms with single-byte spinlocks were going to have a > BufferDesc that fits into 64 bytes, which in retrospect was a bit > short-sighted. Because the changes that were made to get it back down > to 64 bytes might also have other performance-relevant consequences, > it's a bit hard to be sure that that was the precise thing that caused > the regression. And of course there was a fury of other commits going > in at the same time, some even on related topics, which further adds > to the difficulty of pinpointing this precisely. All that is a bit > unfortunate in some sense, but I think we're just going to have to > keep moving forward and hope for the best. I can live with that.
On Wed, Apr 13, 2016 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Apr 12, 2016 at 10:30 PM, Noah Misch <noah@leadboat.com> wrote:
> > That sounds like this open item is ready for CLOSE_WAIT status; is it?
>
> I just retested this on power2. Here are the results. I retested
> 3fed4174 and 6150a1b0 plus master as of deb71fa9. 5-minute pgbench -S
> runs, scale factor 300, with predictable prewarming to minimize
> variation, as well as numactl --interleave. Each result is a median
> of three.
>
> 1 client: 3fed4174 = 13701.014931, 6150a1b0 = 13669.626916, master =
> 19685.571089
> 8 clients: 3fed4174 = 126676.357079, 6150a1b0 = 125239.911105, master
> = 122940.079404
> 32 clients: 3fed4174 = 323989.685428, 6150a1b0 = 338638.095126, master
> = 333656.861590
> 64 clients: 3fed4174 = 495434.372578, 6150a1b0 = 457794.475129, master
> = 493034.922791
> 128 clients: 3fed4174 = 376412.090366, 6150a1b0 = 363157.294391,
> master = 625498.280370
>
> On this test 8, 32, and 64 clients are coming out about the same as
> 3fed4174, but 1 client and 128 clients are dramatically improved with
> current master. The 1-client result is a lot more surprising than the
> 128-client result; I don't know what's going on there. But anyway I
> don't see a regression here.
>
> So, yes, I would say this should go to CLOSE_WAIT at this point,
> unless Amit or somebody else turns up further evidence of a continuing
> issue here.
>
>
> On Tue, Apr 12, 2016 at 10:30 PM, Noah Misch <noah@leadboat.com> wrote:
> > That sounds like this open item is ready for CLOSE_WAIT status; is it?
>
> I just retested this on power2. Here are the results. I retested
> 3fed4174 and 6150a1b0 plus master as of deb71fa9. 5-minute pgbench -S
> runs, scale factor 300, with predictable prewarming to minimize
> variation, as well as numactl --interleave. Each result is a median
> of three.
>
> 1 client: 3fed4174 = 13701.014931, 6150a1b0 = 13669.626916, master =
> 19685.571089
> 8 clients: 3fed4174 = 126676.357079, 6150a1b0 = 125239.911105, master
> = 122940.079404
> 32 clients: 3fed4174 = 323989.685428, 6150a1b0 = 338638.095126, master
> = 333656.861590
> 64 clients: 3fed4174 = 495434.372578, 6150a1b0 = 457794.475129, master
> = 493034.922791
> 128 clients: 3fed4174 = 376412.090366, 6150a1b0 = 363157.294391,
> master = 625498.280370
>
> On this test 8, 32, and 64 clients are coming out about the same as
> 3fed4174, but 1 client and 128 clients are dramatically improved with
> current master. The 1-client result is a lot more surprising than the
> 128-client result; I don't know what's going on there. But anyway I
> don't see a regression here.
>
> So, yes, I would say this should go to CLOSE_WAIT at this point,
> unless Amit or somebody else turns up further evidence of a continuing
> issue here.
>
Yes, I also think that this particular issue can be closed. However I felt that the observation related to performance variation is still present as I never need to perform prewarm or anything else to get consistent results during my work in 9.5 or early 9.6. Also, Andres, Alexander and myself are working on similar observation (run-to-run performance variation) in a nearby thread [1].
On Wed, Apr 13, 2016 at 11:22 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Yes, I also think that this particular issue can be closed. However I felt > that the observation related to performance variation is still present as I > never need to perform prewarm or anything else to get consistent results > during my work in 9.5 or early 9.6. Also, Andres, Alexander and myself are > working on similar observation (run-to-run performance variation) in a > nearby thread [1]. Yeah. My own measurements do not seem to support the idea that the variance recently increased, but I haven't tested incredibly widely. It may be that whatever is causing the variance is something that used to be hidden by locking bottlenecks and now no longer is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company