Thread: HOT - preliminary results

HOT - preliminary results

From
"Pavan Deolasee"
Date:
Hi All,

Here are some preliminary numbers with the HOT 4.0 patch that I sent
out earlier today. These are only indicative results and should not be
used to judge the performance  of HOT in general. I have intentionally
used the setup favorable to HOT. The goal here is to point out the best
usage of HOT so that we get some early feedback about its usefulness.
We still need to run several benchmark tests to see where it would
be useful and where it would add unnecessary overhead without any
real gains. This would also require lot of tuning and would be heavily
dependent on the community feedback/suggestions.

I used a modified pgbench to test HOT with just accounts and history
tables. The only operations are UPDATE on the accounts and INSERT
into the history table. So basically I removed other UPDATEs and SELECT
statements from the pgbench tpc_b tests.

The machine has 2 GB RAM, but the shared_buffers are set to 128MB
to make the tests  IO bound. fsync is turned on and autovacuum
is enabled with a naptime of 60 seconds and scale factor of 0.2.

We had runs with 90 scaling factor, 90 clients and 50000 txns / client.
The "accounts" table is created with a fillfactor of 90 so that there is
free space available for initial HOT UPDATEs.

Here are the results with current CVS HEAD.


transaction type: TPC-B (sort of)
scaling factor: 90
number of clients: 90
number of transactions per client: 50000
number of transactions actually processed: 4500000/4500000
tps = 1007.451264 (including connections establishing)
tps = 1007.512019 (excluding connections establishing)
---------------------------------------------------------------------------------


In the same setup and with the same fillfactor, HOT gave us the
following results:

transaction type: TPC-B (sort of)
scaling factor: 90
number of clients: 90
number of transactions per client: 50000
number of transactions actually processed: 4500000/4500000
tps = 2006.098739 (including connections establishing)
tps = 2006.361857 (excluding connections establishing)
---------------------------------------------------------------------------------

Thats a good jump of 100% in terms of tps. A more detail analysis
shows that HOT helps keep the size of the "accounts" table and
the index almost constant.

With CVS HEAD, the accounts relation and the index grows
considerably at the end of the test.
accounts       157895  (initial size)       49284 (increase)
accounts_pkey  19709   (initial size)       19705 (increase)

Whereas HOT keeps the table sizes stable.

accounts         157895 (initial size)    43 (increase)
accounts_pkey    19709 (initial size)      0 (increase)


This easily explains the significant jump in the tps. Of course, things 
might
not always work in favor of HOT. Few things that can easily dampen the
performance that come to my mind are:

- Index key column UPDATEs

HOT works on the premise that index column does not change often.
If that is not the case, HOT is not used and might put unnecessary
overhead in the execution path.

- Often change in tuple size between UPDATEs

This may limit our ability to reuse the heap-only and dead root tuples.
It may also lead to tuple level fragmentation when we reuse a larger
dead tuple to store a smaller new tuple. Thankfully it would be much
easier to correct row-level fragmentation without a VACUUM-strength
lock.

- Long running transactions

This can lead to very long HOT-update chains. Still my guess is
it won't be much worse than the current behavior.


- HOT-updates on very small tables

We prune the HOT-update chain in the SELECT path. This requires
releasing the SHARE lock and acquiring EXCLUSIVE lock on the page.
I am wondering for very small tables, can that be point of contention ?
Also, if the small tables can always fit completely in the buffer pool and
can be vacuumed very frequently, HOT may not very effective. I ran
the above mentioned tests with normal pgbench and HOT boosts
tps from 976 to 1024, though it shows good value in keeping the
small table size stable.

With CVS HEAD:

accounts       157895 (initial size)   41157 (increase)
accounts_pkey  19709  (initial size)   19705 (increase)

tellers        5  (initial size)       2017  (increase)
tellers_pkey   4  (initial size)       537   (increase)

branches       1  (initial size)       256   (increase)
branches_pkey  2  (initial size)       605   (increase)

With HOT:

accounts       157895  (initial size)  39 (increase)
accounts_pkey  19709 (initial size)    0 (increase)

tellers        5  (initial size)      87 (increase)
tellers_pkey   4  (initial size)      3 (increase)

branches       1  (initial size)      66 (increase)
branches_pkey  2  (initial size)      0 (increase)
I would like to emphasis again that we still need many more
tests, in different environments and setups, to measure performance
impact of HOT, good or bad. But these tests should be a good starting
point to believe that HOT is working on the expected lines.

Thanks,
Pavan


-- 

EnterpriseDB     http://www.enterprisedb.com



Re: HOT - preliminary results

From
"Pavan Deolasee"
Date:

On 3/1/07, Pavan Deolasee <pavan.deolasee@enterprisedb.com> wrote:

accounts       157895  (initial size)       49284 (increase)
accounts_pkey  19709   (initial size)       19705 (increase)


Just to clarify, the relation size and increase is in number of blocks.

Thanks,
Pavan

--

EnterpriseDB     http://www.enterprisedb.com

Re: HOT - preliminary results

From
"Zeugswetter Andreas ADI SD"
Date:
> > accounts       157895  (initial size)       49284 (increase)
> > accounts_pkey  19709   (initial size)       19705 (increase)
> >
> >
> Just to clarify, the relation size and increase is in number
> of blocks.

The numbers are quite impressive :-) Have you removed the selects on
accounts too ?
Seems that should also show improvements.

Do you prune chains during update also ? This seems important for a
scenario where only few selects but many updates happen.

Andreas



Re: HOT - preliminary results

From
"Merlin Moncure"
Date:
On 3/1/07, Pavan Deolasee <pavan.deolasee@enterprisedb.com> wrote:
>
> Hi All,
>
> Here are some preliminary numbers with the HOT 4.0 patch that I sent
> out earlier today. These are only indicative results and should not be
> used to judge the performance  of HOT in general. I have intentionally
> used the setup favorable to HOT. The goal here is to point out the best
> usage of HOT so that we get some early feedback about its usefulness.
> We still need to run several benchmark tests to see where it would
> be useful and where it would add unnecessary overhead without any
> real gains. This would also require lot of tuning and would be heavily
> dependent on the community feedback/suggestions.

I tested HOT patch (currently 3.2) over a wide variety pf pgbench runs
and found there to significant improvements (5-40%) in most cases.
Generally, I ran fsync=off and stock fillfactor.  I'm also a huge
believer in HOT once all the kinks get worked out.  I had some wierd
glitches in earlier versions of the patch which I could not quite
figure out and may have been some problems on my end...the new version
seems pretty solid except for one possible problem...at one point when
I dropped then later added the index on 'abalance', I got spammed
'WARNING:  found a HOT-updated tuple' from psql prompt.

I was also curious about the 'worst case' of HOT so I added an index
on abalance and did some runs.   The runs are not quite long enough to
remove all volatility from the results but they should be close:

**** index on abalance ****
[root@mernix ~]# pgbench -c1 -t100000 -p 5000 [HOT]
tps = 448.104436 (excluding connections establishing)

[root@mernix ~]# pgbench -c1 -t100000
tps = 448.120719 (excluding connections establishing)

[root@mernix ~]# pgbench -c10 -t10000 -p 5000 [HOT]
tps = 363.444429 (excluding connections establishing) run #1
tps = 415.854569 (excluding connections establishing) run #2

[root@mernix ~]# pgbench -c10 -t10000
tps = 416.659906 (excluding connections establishing)

**** no index on abalance ****
[root@mernix ~]# pgbench -c10 -t10000 -p 5000 [HOT]
tps = 659.870628 (excluding connections establishing)

[root@mernix ~]# pgbench -c10 -t10000
tps = 321.889840 (excluding connections establishing)

platform is intel pentium D 3ghz, 2xsata 7200rpm software raid0,
redhat fc4.  This is my dev box which I usually run fsync=off to get
comparable performance with production systems and caching raid
controller.

merlin


Re: HOT - preliminary results

From
"Pavan Deolasee"
Date:
Merlin Moncure wrote:> On 3/1/07, Pavan Deolasee <pavan.deolasee@enterprisedb.com> wrote:>>> seems pretty solid except
forone possible problem...at one point when> I dropped then later added the index on 'abalance', I got spammed>
'WARNING: found a HOT-updated tuple' from psql prompt.
 

Thats intentional. We don't yet support CREATE INDEX on a HOT-updated
table. This is one of the major unfinished TODO items before we can
consider patch feature complete. I have left the warning to catch
this case till then.

Thanks,
Pavan



Re: HOT - preliminary results

From
"Simon Riggs"
Date:
On Thu, 2007-03-01 at 22:35 +0530, Pavan Deolasee wrote:
> Merlin Moncure wrote:
>  > On 3/1/07, Pavan Deolasee <pavan.deolasee@enterprisedb.com> wrote:
>  >>
>  > seems pretty solid except for one possible problem...at one point when
>  > I dropped then later added the index on 'abalance', I got spammed
>  > 'WARNING:  found a HOT-updated tuple' from psql prompt.
> 
> Thats intentional. We don't yet support CREATE INDEX on a HOT-updated
> table. This is one of the major unfinished TODO items before we can
> consider patch feature complete. I have left the warning to catch
> this case till then.

CREATE INDEX and VACUUM FULL will require changes. Proposals for VACUUM
FULL have been posted, CREATE INDEX should be there tomorrow.

CLUSTER does not need changes for HOT, as things stand currently, mainly
because its MVCC behaviour is broken.

I've not looked in detail yet at the various ALTER TABLE modes, but will
do so in the next day or so.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: HOT - preliminary results

From
"Pavan Deolasee"
Date:
Zeugswetter Andreas ADI SD wrote:
>>> accounts       157895  (initial size)       49284 (increase)
>>> accounts_pkey  19709   (initial size)       19705 (increase)
>>>
>>>
>> Just to clarify, the relation size and increase is in number
>> of blocks.
>
> The numbers are quite impressive :-) Have you removed the selects on
> accounts too ?

Yes. In the first set of results, SELECT on accounts is removed.

> Seems that should also show improvements.
>
> Do you prune chains during update also ?

Yes, we do prune the chains during UPDATE, but only when we
run out of free space and LP_DELETEd items on the page.
And we prune all chains on the page in that case.


Thanks,
Pavan




Re: HOT - preliminary results

From
Tatsuo Ishii
Date:
Just for curiosity, I would like to ask you why you need to modify
pgbench. pgbench can accept custom SQL scripts...

P.S. HOT seems to be one of the greatest enhancements since PostgreSQL
was born!
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Hi All,
> 
> Here are some preliminary numbers with the HOT 4.0 patch that I sent
> out earlier today. These are only indicative results and should not be
> used to judge the performance  of HOT in general. I have intentionally
> used the setup favorable to HOT. The goal here is to point out the best
> usage of HOT so that we get some early feedback about its usefulness.
> We still need to run several benchmark tests to see where it would
> be useful and where it would add unnecessary overhead without any
> real gains. This would also require lot of tuning and would be heavily
> dependent on the community feedback/suggestions.
> 
> I used a modified pgbench to test HOT with just accounts and history
> tables. The only operations are UPDATE on the accounts and INSERT
> into the history table. So basically I removed other UPDATEs and SELECT
> statements from the pgbench tpc_b tests.
> 
> The machine has 2 GB RAM, but the shared_buffers are set to 128MB
> to make the tests  IO bound. fsync is turned on and autovacuum
> is enabled with a naptime of 60 seconds and scale factor of 0.2.
> 
> We had runs with 90 scaling factor, 90 clients and 50000 txns / client.
> The "accounts" table is created with a fillfactor of 90 so that there is
> free space available for initial HOT UPDATEs.
> 
> Here are the results with current CVS HEAD.
> 
> 
> transaction type: TPC-B (sort of)
> scaling factor: 90
> number of clients: 90
> number of transactions per client: 50000
> number of transactions actually processed: 4500000/4500000
> tps = 1007.451264 (including connections establishing)
> tps = 1007.512019 (excluding connections establishing)
> ---------------------------------------------------------------------------------
> 
> 
> In the same setup and with the same fillfactor, HOT gave us the
> following results:
> 
> transaction type: TPC-B (sort of)
> scaling factor: 90
> number of clients: 90
> number of transactions per client: 50000
> number of transactions actually processed: 4500000/4500000
> tps = 2006.098739 (including connections establishing)
> tps = 2006.361857 (excluding connections establishing)
> ---------------------------------------------------------------------------------
> 
> Thats a good jump of 100% in terms of tps. A more detail analysis
> shows that HOT helps keep the size of the "accounts" table and
> the index almost constant.
> 
> With CVS HEAD, the accounts relation and the index grows
> considerably at the end of the test.
>  
> accounts       157895  (initial size)       49284 (increase)
> accounts_pkey  19709   (initial size)       19705 (increase)
>  
> 
> Whereas HOT keeps the table sizes stable.
> 
> accounts         157895 (initial size)    43 (increase)
> accounts_pkey    19709 (initial size)      0 (increase)
> 
> 
> This easily explains the significant jump in the tps. Of course, things 
> might
> not always work in favor of HOT. Few things that can easily dampen the
> performance that come to my mind are:
> 
> - Index key column UPDATEs
> 
> HOT works on the premise that index column does not change often.
> If that is not the case, HOT is not used and might put unnecessary
> overhead in the execution path.
> 
> - Often change in tuple size between UPDATEs
> 
> This may limit our ability to reuse the heap-only and dead root tuples.
> It may also lead to tuple level fragmentation when we reuse a larger
> dead tuple to store a smaller new tuple. Thankfully it would be much
> easier to correct row-level fragmentation without a VACUUM-strength
> lock.
> 
> - Long running transactions
> 
> This can lead to very long HOT-update chains. Still my guess is
> it won't be much worse than the current behavior.
> 
> 
> - HOT-updates on very small tables
> 
> We prune the HOT-update chain in the SELECT path. This requires
> releasing the SHARE lock and acquiring EXCLUSIVE lock on the page.
> I am wondering for very small tables, can that be point of contention ?
> Also, if the small tables can always fit completely in the buffer pool and
> can be vacuumed very frequently, HOT may not very effective. I ran
> the above mentioned tests with normal pgbench and HOT boosts
> tps from 976 to 1024, though it shows good value in keeping the
> small table size stable.
> 
> With CVS HEAD:
> 
> accounts       157895 (initial size)   41157 (increase)
> accounts_pkey  19709  (initial size)   19705 (increase)
> 
> tellers        5  (initial size)       2017  (increase)
> tellers_pkey   4  (initial size)       537   (increase)
> 
> branches       1  (initial size)       256   (increase)
> branches_pkey  2  (initial size)       605   (increase)
>  
> 
> With HOT:
> 
> accounts       157895  (initial size)  39 (increase)
> accounts_pkey  19709 (initial size)    0 (increase)
> 
> tellers        5  (initial size)      87 (increase)
> tellers_pkey   4  (initial size)      3 (increase)
> 
> branches       1  (initial size)      66 (increase)
> branches_pkey  2  (initial size)      0 (increase)
>  
> I would like to emphasis again that we still need many more
> tests, in different environments and setups, to measure performance
> impact of HOT, good or bad. But these tests should be a good starting
> point to believe that HOT is working on the expected lines.
> 
> Thanks,
> Pavan
> 
> 
> -- 
> 
> EnterpriseDB     http://www.enterprisedb.com
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 


Re: HOT - preliminary results

From
"Pavan Deolasee"
Date:

On 3/2/07, Tatsuo Ishii <ishii@postgresql.org> wrote:
Just for curiosity, I would like to ask you why you need to modify
pgbench. pgbench can accept custom SQL scripts...


Oh yes, there was no real need to modify pgbench.

Thanks,
Pavan

--

EnterpriseDB     http://www.enterprisedb.com

Re: HOT - preliminary results

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Thu, 2007-03-01 at 22:35 +0530, Pavan Deolasee wrote:
> > Merlin Moncure wrote:
> >  > On 3/1/07, Pavan Deolasee <pavan.deolasee@enterprisedb.com> wrote:
> >  >>
> >  > seems pretty solid except for one possible problem...at one point when
> >  > I dropped then later added the index on 'abalance', I got spammed
> >  > 'WARNING:  found a HOT-updated tuple' from psql prompt.
> > 
> > Thats intentional. We don't yet support CREATE INDEX on a HOT-updated
> > table. This is one of the major unfinished TODO items before we can
> > consider patch feature complete. I have left the warning to catch
> > this case till then.
> 
> CREATE INDEX and VACUUM FULL will require changes. Proposals for VACUUM
> FULL have been posted, CREATE INDEX should be there tomorrow.
> 
> CLUSTER does not need changes for HOT, as things stand currently, mainly
> because its MVCC behaviour is broken.             ------------------------

That's oddly discouraging.  :-)

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: HOT - preliminary results

From
Bruce Momjian
Date:
Pavan Deolasee wrote:
> Zeugswetter Andreas ADI SD wrote:
> >>> accounts       157895  (initial size)       49284 (increase)
> >>> accounts_pkey  19709   (initial size)       19705 (increase)
> >>>
> >>>
> >> Just to clarify, the relation size and increase is in number
> >> of blocks.
> >
> > The numbers are quite impressive :-) Have you removed the selects on
> > accounts too ?
> 
> Yes. In the first set of results, SELECT on accounts is removed.
> 
> > Seems that should also show improvements.
> >
> > Do you prune chains during update also ?
> 
> Yes, we do prune the chains during UPDATE, but only when we
> run out of free space and LP_DELETEd items on the page.
> And we prune all chains on the page in that case.

Yep, that seems the most efficient approach.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: HOT - preliminary results

From
Bruce Momjian
Date:
Tatsuo Ishii wrote:
> Just for curiosity, I would like to ask you why you need to modify
> pgbench. pgbench can accept custom SQL scripts...
> 
> P.S. HOT seems to be one of the greatest enhancements since PostgreSQL
> was born!

Yep, I share your enthusiasm.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: HOT - preliminary results

From
Alvaro Herrera
Date:
Bruce Momjian escribió:
> Simon Riggs wrote:
> > On Thu, 2007-03-01 at 22:35 +0530, Pavan Deolasee wrote:
> > > Merlin Moncure wrote:
> > >  > On 3/1/07, Pavan Deolasee <pavan.deolasee@enterprisedb.com> wrote:
> > >  >>
> > >  > seems pretty solid except for one possible problem...at one point when
> > >  > I dropped then later added the index on 'abalance', I got spammed
> > >  > 'WARNING:  found a HOT-updated tuple' from psql prompt.
> > > 
> > > Thats intentional. We don't yet support CREATE INDEX on a HOT-updated
> > > table. This is one of the major unfinished TODO items before we can
> > > consider patch feature complete. I have left the warning to catch
> > > this case till then.
> > 
> > CREATE INDEX and VACUUM FULL will require changes. Proposals for VACUUM
> > FULL have been posted, CREATE INDEX should be there tomorrow.
> > 
> > CLUSTER does not need changes for HOT, as things stand currently, mainly
> > because its MVCC behaviour is broken.
>               ------------------------
> 
> That's oddly discouraging.  :-)

Apparently no one has been bothered enough to fix CLUSTER.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: HOT - preliminary results

From
"Simon Riggs"
Date:
On Fri, 2007-03-02 at 18:37 -0300, Alvaro Herrera wrote:
> Bruce Momjian escribió:
> > Simon Riggs wrote:
> > > CLUSTER does not need changes for HOT, as things stand currently, mainly
> > > because its MVCC behaviour is broken.
> >               ------------------------
> >
> > That's oddly discouraging.  :-)
>
> Apparently no one has been bothered enough to fix CLUSTER.

We learned from Csaba just the other day that this is a backdoor used on
production systems. I've not seen anyone admit it before, even though
I've seen it discussed.

People know that CLUSTER works better than VACUUM FULL and they use
that. The reason it hasn't been fixed is because its useful, I observe.

The oddly discouraging bit is that VACUUM FULL is only second best at
the thing its designed to achieve (compaction). Thats why I'm not
enthralled by the prospect of adding code to make VACUUM FULL work with
HOT - ISTM a good opportunity to make it work better.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: HOT - preliminary results

From
Alvaro Herrera
Date:
Simon Riggs escribió:
> On Fri, 2007-03-02 at 18:37 -0300, Alvaro Herrera wrote:
> > Bruce Momjian escribió:
> > > Simon Riggs wrote:
> > > > CLUSTER does not need changes for HOT, as things stand currently, mainly
> > > > because its MVCC behaviour is broken.
> > >               ------------------------
> > > 
> > > That's oddly discouraging.  :-)
> > 
> > Apparently no one has been bothered enough to fix CLUSTER.
> 
> We learned from Csaba just the other day that this is a backdoor used on
> production systems. I've not seen anyone admit it before, even though
> I've seen it discussed.

Yeah, I had just read that and was about to mention it, but I think
(some of?) these cases would be solved by HOT.  So that usage of CLUSTER
would go away altogether, allowing us to fix the MVCC issue ... which
would require HOT to work nicely :-(

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support