Thread: FSM patch - performance test

FSM patch - performance test

From
Zdenek Kotala
Date:
Hi Heikki,

I finally performed iGen test. I used two v490 servers with 4 dual core SPARC 
CPUs and 32GB RAM. I have only one disk and I did not performed any disk I/O 
optimization. I tested 105 parallel connection and think time was 200ms.
See the result:

Original:
---------
Actual run/snap-shot time: 3004 sec

MQThL (Maximum Qualified Throughput LIGHT): 1458.76 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 3122.44 tpm
MQThH (Maximum Qualified Throughput HEAVY): 2626.70 tpm


TRANSACTION MIX

Total number of transactions = 438133
TYPE            TX. COUNT       MIX
----            ---------       ---
Light:          72938           16.65%
Medium:         156122          35.63%
DSS:            48516           11.07%
Heavy:          131335          29.98%
Connection:     29222           6.67%


RESPONSE TIMES          AVG.            MAX.            90TH

Light                   0.541           3.692           0.800
Medium                  0.542           3.702           0.800
DSS                     0.539           3.510           0.040
Heavy                   0.539           3.742           4.000
Connections             0.545           3.663           0.800
Number of users = 105
Sum of Avg. RT * TPS for all Tx. Types = 64.851454


New FSM implementation:
-----------------------
Actual run/snap-shot time: 3004 sec

MQThL (Maximum Qualified Throughput LIGHT): 1351.20 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 2888.74 tpm
MQThH (Maximum Qualified Throughput HEAVY): 2428.90 tpm


TRANSACTION MIX

Total number of transactions = 405502
TYPE            TX. COUNT       MIX
----            ---------       ---
Light:          67560           16.66%
Medium:         144437          35.62%
DSS:            45028           11.10%
Heavy:          121445          29.95%
Connection:     27032           6.67%


RESPONSE TIMES          AVG.            MAX.            90TH

Light                   0.596           3.735           0.800
Medium                  0.601           3.748           0.800
DSS                     0.601           3.695           0.040
Heavy                   0.597           3.725           4.000
Connections             0.599           3.445           0.800
Number of users = 105
Sum of Avg. RT * TPS for all Tx. Types = 66.419466


----------------------------

My conclusion is that new implementation is about 8% slower in OLTP workload.
        Zdenek


-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: FSM patch - performance test

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> My conclusion is that new implementation is about 8% slower in OLTP 
> workload.

Thanks. That's very disappointing :-(

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: FSM patch - performance test

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> My conclusion is that new implementation is about 8% slower in OLTP 
> workload.

Can you do some analysis of why that is?

Looks like I need to blow the dust off my DBT-2 test rig and try to 
reproduce that as well.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: FSM patch - performance test

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> My conclusion is that new implementation is about 8% slower in OLTP 
>> workload.
> 
> Can you do some analysis of why that is?

I'll try something but I do not guarantee result.
Zdenek



-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: FSM patch - performance test

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Zdenek Kotala wrote:
>> My conclusion is that new implementation is about 8% slower in OLTP 
>> workload.

> Thanks. That's very disappointing :-(

One thing that jumped out at me is that you call FreeSpaceMapExtendRel
every time a rel is extended by even one block.  I admit I've not
studied the data structure in any detail yet, but surely most such calls
end up being a no-op?  Seems like some attention to making a fast path
for that case would be helpful.
        regards, tom lane


Re: FSM patch - performance test

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Zdenek Kotala wrote:
>>> My conclusion is that new implementation is about 8% slower in OLTP 
>>> workload.
> 
>> Thanks. That's very disappointing :-(
> 
> One thing that jumped out at me is that you call FreeSpaceMapExtendRel
> every time a rel is extended by even one block.  I admit I've not
> studied the data structure in any detail yet, but surely most such calls
> end up being a no-op?  Seems like some attention to making a fast path
> for that case would be helpful.

Yes, most of those calls end up being no-op. Which is exactly why I 
would be surprised if those made any difference. It does call 
smgrnblocks(), though, which isn't completely free...

Zdenek, can you say off the top of your head whether the test was I/O 
bound or CPU bound? What was the CPU utilization % during the test?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: FSM patch - performance test

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> One thing that jumped out at me is that you call FreeSpaceMapExtendRel
>> every time a rel is extended by even one block.

> Yes, most of those calls end up being no-op. Which is exactly why I 
> would be surprised if those made any difference. It does call 
> smgrnblocks(), though, which isn't completely free...

No, it's a kernel call (at least one) which makes it pretty expensive.

I wonder whether it's necessary to do FreeSpaceMapExtendRel at this
point at all?  Why not lazily extend the map when you are told to store
a nonzero space category for a page that's off the end of the map?
Whether or not this saved many cycles overall, it'd push most of the map
extension work to VACUUM instead of having it happen in foreground.

A further refinement would be to extend the map only for a space
category "significantly" greater than zero --- maybe a quarter page or
so.  For an insert-only table that would probably result in the map
never growing at all, which might be nice.  However it would go back to
the concept of FSM being lossy; I forget whether you were hoping to get
away from that.
        regards, tom lane


Re: FSM patch - performance test

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):
> Tom Lane wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> Zdenek Kotala wrote:
>>>> My conclusion is that new implementation is about 8% slower in OLTP 
>>>> workload.
>>
>>> Thanks. That's very disappointing :-(
>>
>> One thing that jumped out at me is that you call FreeSpaceMapExtendRel
>> every time a rel is extended by even one block.  I admit I've not
>> studied the data structure in any detail yet, but surely most such calls
>> end up being a no-op?  Seems like some attention to making a fast path
>> for that case would be helpful.
> 
> Yes, most of those calls end up being no-op. Which is exactly why I 
> would be surprised if those made any difference. It does call 
> smgrnblocks(), though, which isn't completely free...

It is not a problem. It is really strange. I'm using DTrace to count number of 
calls and number of calls is really small (I monitor only one backend). I have 
removed WAL logging and it does not help too.

> Zdenek, can you say off the top of your head whether the test was I/O 
> bound or CPU bound? What was the CPU utilization % during the test?

CPU is not problem it is mostly in idle time.

-bash-3.00# iostat 5   tty        sd1           ssd0          ssd1          nfs1           cpu tin tout kps tps serv
kpstps serv  kps tps serv  kps tps serv   us sy wt id   0    1   0   0    1    9   1   92    0   0    0    0   0    0
0  0  0 100   0   47   0   0    0  894 111    7    0   0    0    0   0    0    2  1  0 97   0   16   0   0    0  949
118   6    0   0    0    0   0    0    2  2  0 97   0   16   0   0    0  965 120    6    0   0    0    0   0    0    2
1 0 97   0   16   0   0    0  981 122    7    0   0    0    0   0    0    2  2  0 96   0   16   0   0    0  944 118
6   0   0    0    0   0    0    2  1  0 97   0   16   0   0    0  1202 149    7    0   0    0    0   0    0    3  2  0
95  0   16   0   0    0  1261 157    9    0   0    0    0   0    0    3  2  0 95   0   16   0   0    0  1357 168   14
0   0    0    0   0    0    3  2  0 95   0   16   0   0    0  1631 201   33    0   0    0    0   0    0    2  2  0 96
0  16   0   0    0  1973 246   48    0   0    0    0   0    0    2  2  0 96   0   16   0   0    0  2008 251   50    0
0   0    0   0    0    2  2  0 97   0   16   0   0    0  1956 241   45    0   0    0    0   0    0    2  2  0 97   0
16  0   0    0  2003 250   49    0   0    0    0   0    0    2  2  0 97
 

-bash-3.00# vmstat 1 kthr      memory            page            disk          faults      cpu r b w   swap  free  re
mfpi po fr de sr s1 sd sd --   in   sy   cs us sy id 0 0 0 28091000 31640552 3 4 0  0  0  0  0  0  1  0  0  359   72
206 0  0 100 0 0 0 27363144 27614576 3 28 0 16 16 0  0  0 60  0  0 1216 1134 1072  1  1 99 0 0 0 27363144 27614568 8 0
016 16  0  0  0 52  0  0 1099 1029  964  0  1 98 0 0 0 27363144 27614560 9 0 0  8  8  0  0  0 53  0  0 1143  896 1009
1 1 98 0 0 0 27363144 27614544 1 241 0 16 16 0 0  0 46  0  0 1042 1105  895  0  1 98 0 0 0 27363144 27614544 0 0 0 16
16 0  0  0 50  0  0 1078  860  924  0  0 99 0 0 0 27363144 27614552 10 0 0 16 16 0  0  0 56  0  0 1177  914 1033  1  1
980 0 0 27363144 27614536 0 0 0  8  8  0  0  0 25  0  0  726  554  603  0  0 99 0 0 0 27363144 27614528 1 0 0 16 16  0
0 0 65  0  0 1206 1159 1081  1  1 98 0 0 0 27363144 27614512 13 0 0 16 16 0  0  0 63  0  0 1256 1088 1094  1  1 99 0 0
027363144 27614512 0 0 0  8  8  0  0  0 37  0  0  920  797  779  0  1 99 0 0 0 27363144 27614504 6 0 0 16 16  0  0  0
58 0  0 1218 1074 1078  1  0 99 0 0 0 27363144 27614488 85 91 0 16 16 0 0  0 45  0  0  973 1344  833  1  1 99 0 0 0
2736314427614488 2 0 0 16 16  0  0  0 57  0  0 1164 1023 1036  1  1 99 0 0 0 27363144 27614472 4 0 0  8  8  0  0  0 47
0 0 1133  937  957  0  1 99
 


-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: FSM patch - performance test

From
Zdenek Kotala
Date:
Zdenek Kotala napsal(a):
> Heikki Linnakangas napsal(a):
>> Zdenek Kotala wrote:
>>> My conclusion is that new implementation is about 8% slower in OLTP 
>>> workload.
>>
>> Can you do some analysis of why that is?

I tested it several times and last test was surprise for me. I run original 
server (with old FSM) on the database which has been created by new server (with 
new FSM) and performance is similar (maybe new implementation is little bit better):

MQThL (Maximum Qualified Throughput LIGHT): 1348.90 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 2874.76 tpm
MQThH (Maximum Qualified Throughput HEAVY): 2422.20 tpm

The question is why? There could be two reasons for that. One is realated to 
OS/FS or HW. Filesystem could be defragmented or HDD is slower in some part...

Second idea is that new FSM creates heavy defragmented data and index scan needs 
to jump from one page to another too often.
    Thoughts?
        Zdenek

PS: I'm leaving now and I will be online on Monday.


-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: FSM patch - performance test

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> Zdenek Kotala napsal(a):
>> Heikki Linnakangas napsal(a):
>>> Zdenek Kotala wrote:
>>>> My conclusion is that new implementation is about 8% slower in OLTP 
>>>> workload.
>>>
>>> Can you do some analysis of why that is?
> 
> I tested it several times and last test was surprise for me. I run 
> original server (with old FSM) on the database which has been created by 
> new server (with new FSM) and performance is similar (maybe new 
> implementation is little bit better):
> 
> MQThL (Maximum Qualified Throughput LIGHT): 1348.90 tpm
> MQThM (Maximum Qualified Throughput MEDIUM): 2874.76 tpm
> MQThH (Maximum Qualified Throughput HEAVY): 2422.20 tpm
> 
> The question is why? There could be two reasons for that. One is 
> realated to OS/FS or HW. Filesystem could be defragmented or HDD is 
> slower in some part...

Ugh. Could it be autovacuum kicking in at different times? Do you get 
any other metrics than the TPM out of it.

> Second idea is that new FSM creates heavy defragmented data and index 
> scan needs to jump from one page to another too often.

Hmm. That's remotely plausible, I suppose. The old FSM only kept track 
of pages with more than avg. request size of free space, but the new FSM 
tracks even the smallest free spots. Is there tables in that workload 
that are inserted to, with very varying row widths?

FWIW, I just got the results of my first 2h DBT-2 results, and I'm 
seeing no difference at all in the overall performance or behavior 
during the test. Autovacuum doesn't kick in in those short tests, 
though, so I schedule a pair of 4h tests, and might run even longer 
tests over the weekend.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: FSM patch - performance test

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Zdenek Kotala wrote:
>> Second idea is that new FSM creates heavy defragmented data and index 
>> scan needs to jump from one page to another too often.

> Hmm. That's remotely plausible, I suppose. The old FSM only kept track 
> of pages with more than avg. request size of free space, but the new FSM 
> tracks even the smallest free spots. Is there tables in that workload 
> that are inserted to, with very varying row widths?

I'm not sure I buy that either.  But after thinking a bit about how
search_avail() works, it occurs to me that it's not doing what the old
code did and that might contribute to contention.  The old FSM did a
cyclic search through the pages it knew about, so as long as there were
plenty of pages with "enough" free space, different backends would
always get pointed to different pages.  But consider what the algorithm
is now.  (For simplicity, consider only the behavior on a leaf FSM page.)

* Starting from the "next" slot, bubble up to parent nodes until finding a parent showing enough space.

* Descend to the *leftmost* leaf child of that parent that has enough space.

* Point "next" to the slot after that, and return that page.

What this means is that if we start with "next" pointing at a page
without enough space (quite likely considering that we now index all
pages not only those with free space), then it is highly possible that
the search will end on a page *before* where next was.  The most trivial
case is that we have an even-numbered page with a lot of free space and
its odd-numbered successor has none --- in this case, far from spreading
out the backends, all comers will be handed back that same page!  (Until
someone reports that it's full.)  In general it seems that this behavior
will tend to concentrate the returned pages in a small area rather than
allowing them to range over the whole FSM page as was intended.

So the bottom line is that the "next" addition doesn't actually work and
needs to be rethought.  It might be possible to salvage it by paying
attention to "next" during the descent phase and preferentially trying
to descend to the right of "next"; but I'm not quite sure how to make
that work efficiently, and even less sure how to wrap around cleanly
when the starting value of "next" is near the last slot on the page.
        regards, tom lane


Re: FSM patch - performance test

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> Zdenek Kotala napsal(a):
>>> Heikki Linnakangas napsal(a):
>>>> Zdenek Kotala wrote:
>>>>> My conclusion is that new implementation is about 8% slower in OLTP 
>>>>> workload.
>>>>
>>>> Can you do some analysis of why that is?
>>
>> I tested it several times and last test was surprise for me. I run 
>> original server (with old FSM) on the database which has been created 
>> by new server (with new FSM) and performance is similar (maybe new 
>> implementation is little bit better):
>>
>> MQThL (Maximum Qualified Throughput LIGHT): 1348.90 tpm
>> MQThM (Maximum Qualified Throughput MEDIUM): 2874.76 tpm
>> MQThH (Maximum Qualified Throughput HEAVY): 2422.20 tpm
>>
>> The question is why? There could be two reasons for that. One is 
>> realated to OS/FS or HW. Filesystem could be defragmented or HDD is 
>> slower in some part...
> 
> Ugh. Could it be autovacuum kicking in at different times? Do you get 
> any other metrics than the TPM out of it.

I don't think that it is autovacuum problem. I run test more times and result 
was same. But today I created fresh database and I got similar throughput for 
original and new FSM implementation. It seems to me that I hit a HW/OS 
singularity. I'll verify it tomorrow.

I recognize only little bit slowdown during index creation, (4:11mins vs. 
3:47mins), but I tested it only once.
Zdenek


-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: FSM patch - performance test

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> What this means is that if we start with "next" pointing at a page
> without enough space (quite likely considering that we now index all
> pages not only those with free space), then it is highly possible that
> the search will end on a page *before* where next was.  The most trivial
> case is that we have an even-numbered page with a lot of free space and
> its odd-numbered successor has none --- in this case, far from spreading
> out the backends, all comers will be handed back that same page!  (Until
> someone reports that it's full.)  In general it seems that this behavior
> will tend to concentrate the returned pages in a small area rather than
> allowing them to range over the whole FSM page as was intended.

Good point.

> So the bottom line is that the "next" addition doesn't actually work and
> needs to be rethought.  It might be possible to salvage it by paying
> attention to "next" during the descent phase and preferentially trying
> to descend to the right of "next"; but I'm not quite sure how to make
> that work efficiently, and even less sure how to wrap around cleanly
> when the starting value of "next" is near the last slot on the page.

Yeah, I think it can be salvaged like that. see the patch I just posted 
on a separate thread.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com