Thread: Re: [COMMITTERS] pgsql: Reduce the size of memory allocations by lazy vacuum when

Re: [COMMITTERS] pgsql: Reduce the size of memory allocations by lazy vacuum when

From
"Heikki Linnakangas"
Date:
Alvaro Herrera wrote:
> Log Message:
> -----------
> Reduce the size of memory allocations by lazy vacuum when processing a small
> table, by allocating just enough for a hardcoded number of dead tuples per
> page.  The current estimate is 200 dead tuples per page.

200 sounds like a badly chosen value. With a 8k block size, that's a bit
less than MaxHeapTuplesPerPage, which means that in the worst case you
don't allocate enough space to hold all dead tuples, and you end up
doing 2 index cleanups, no matter how large you set
maintenance_work_mem. Remember that having MaxHeapTuplesPerPage dead
tuples on a page just got much more likely with HOT, and with larger
block sizes 200 tuples isn't very much anyway.

At the other end of the spectrum, with a smaller block size 200 is more
than MaxHeapTuplesPerPage, so we're still allocating more than necessary.

Note that as the patch stands, the capping is not limited to small
tables. Doing extra index passes on a relatively big table with lots of
indexes might be cause a lot of real extra I/O.

How about just using MaxHeapTuplesPerPage? With the default 8K block
size, it's not that much more than 200, but makes the above gripes
completely go away. That seems like the safest option at this point.

> Per reports from Jeff Amiel, Erik Jones and Marko Kreen, and subsequent
> discussion.

Ok, I just read that discussion in the archives. A lot of good ideas
were suggested, like reducing the space required for the tid list, or
dividing maintenance_work_mem between workers. None of that is going to
happen for 8.3, so it seems likely that we're going to revisit this in
8.4. Let's keep it simple and safe for now.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: [COMMITTERS] pgsql: Reduce the size of memory allocations by lazy vacuum when

From
Simon Riggs
Date:
On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
> > Log Message:
> > -----------
> > Reduce the size of memory allocations by lazy vacuum when processing a small
> > table, by allocating just enough for a hardcoded number of dead tuples per
> > page.  The current estimate is 200 dead tuples per page.
> 
> 200 sounds like a badly chosen value. With a 8k block size, that's a bit
> less than MaxHeapTuplesPerPage, which means that in the worst case you
> don't allocate enough space to hold all dead tuples, and you end up
> doing 2 index cleanups, no matter how large you set
> maintenance_work_mem.

Agreed. Tables with 2-4 columns easily fall into that category.
Assoication tables are often like this and can have 2 indexes on them.

> Note that as the patch stands, the capping is not limited to small
> tables. Doing extra index passes on a relatively big table with lots of
> indexes might be cause a lot of real extra I/O.
> 
> How about just using MaxHeapTuplesPerPage? With the default 8K block
> size, it's not that much more than 200, but makes the above gripes
> completely go away. That seems like the safest option at this point.

It would be much better to use a value for each table. Any constant
value will be sub-optimal in many cases. 

Let's use our knowledge of the table to calculate a sensible value. We
often have average row length available from last VACUUM, don't we? Use
that, plus 10%. 

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when

From
"Heikki Linnakangas"
Date:
Simon Riggs wrote:
> On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
>> How about just using MaxHeapTuplesPerPage? With the default 8K block
>> size, it's not that much more than 200, but makes the above gripes
>> completely go away. That seems like the safest option at this point.
> 
> It would be much better to use a value for each table. Any constant
> value will be sub-optimal in many cases. 

Allocating extra memory doesn't usually do much harm, as long as you
don't actually use it. The reason we're now limiting it is to avoid Out
Of Memory errors if you're running with overcommit turned off, and
autovacuum triggers a vacuum on multiple tables at the same time.

Let's keep it simple. Per-table setting would be much more complex and
would be something that the DBA would need to calculate and set. If you
really do run into this problem, you can just dial down
maintenance_work_mem.

> Let's use our knowledge of the table to calculate a sensible value. We
> often have average row length available from last VACUUM, don't we? Use
> that, plus 10%.

If there's dead line pointers in the table, left behind by HOT pruning,
the average row length would be completely bogus. Using the dead_tuples
stats directly would be more reasonable, but I would like us to keep
this even more conservative than that.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when

From
Simon Riggs
Date:
On Mon, 2007-09-24 at 10:39 +0100, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
> >> How about just using MaxHeapTuplesPerPage? With the default 8K block
> >> size, it's not that much more than 200, but makes the above gripes
> >> completely go away. That seems like the safest option at this point.
> > 
> > It would be much better to use a value for each table. Any constant
> > value will be sub-optimal in many cases. 
> 
> Allocating extra memory doesn't usually do much harm, as long as you
> don't actually use it. The reason we're now limiting it is to avoid Out
> Of Memory errors if you're running with overcommit turned off, and
> autovacuum triggers a vacuum on multiple tables at the same time.
> 
> Let's keep it simple. Per-table setting would be much more complex and
> would be something that the DBA would need to calculate and set. If you
> really do run into this problem, you can just dial down
> maintenance_work_mem.

Much more complex? RelationGetAvgFSM(relation)??

> > Let's use our knowledge of the table to calculate a sensible value. We
> > often have average row length available from last VACUUM, don't we? Use
> > that, plus 10%.
> 
> If there's dead line pointers in the table, left behind by HOT pruning,
> the average row length would be completely bogus. Using the dead_tuples
> stats directly would be more reasonable, but I would like us to keep
> this even more conservative than that.

That's a better argument.

Since we have committed HOT, I would suggest we move the default of
autovacuum_max_workers down to 2. That will limit the default amount of
memory used by VACUUMs much better than trying to get the values
precisely and we expect them to run much less frequently anyway now.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: [COMMITTERS] pgsql: Reduce the size ofmemoryallocations by lazy vacuum when

From
"Heikki Linnakangas"
Date:
Simon Riggs wrote:
> On Mon, 2007-09-24 at 10:39 +0100, Heikki Linnakangas wrote:
>> Let's keep it simple. Per-table setting would be much more complex and
>> would be something that the DBA would need to calculate and set. If you
>> really do run into this problem, you can just dial down
>> maintenance_work_mem.
> 
> Much more complex? RelationGetAvgFSM(relation)??

The code isn't complex, but the resulting behavior is.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when

From
Gregory Stark
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> Simon Riggs wrote:
>> On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
>>> How about just using MaxHeapTuplesPerPage? With the default 8K block
>>> size, it's not that much more than 200, but makes the above gripes
>>> completely go away. That seems like the safest option at this point.
>> 
>> It would be much better to use a value for each table. Any constant
>> value will be sub-optimal in many cases. 
>
> Allocating extra memory doesn't usually do much harm, as long as you
> don't actually use it. The reason we're now limiting it is to avoid Out
> Of Memory errors if you're running with overcommit turned off, and
> autovacuum triggers a vacuum on multiple tables at the same time.

For reference, MaxHeapTuplesPerPage on an 8k block is 291. If there are any
columns in your tuples (meaning they're not either HOT updates which have been
pruned or rows with 8 or fewer columns all of which are null) then the most
you can have is 255 rows.

For the small difference between 200 and 291 it seems safer to just use
MaxHeapTuplesPerPage.


BS    MHTPG     Max w/data
--------------------------
4096     145     127
8192     291     255
16384    584    511 
32768    1169    1023

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when

From
Alvaro Herrera
Date:
Gregory Stark wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> 
> > Simon Riggs wrote:
> >> On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
> >>> How about just using MaxHeapTuplesPerPage? With the default 8K block
> >>> size, it's not that much more than 200, but makes the above gripes
> >>> completely go away. That seems like the safest option at this point.
> >> 
> >> It would be much better to use a value for each table. Any constant
> >> value will be sub-optimal in many cases. 
> >
> > Allocating extra memory doesn't usually do much harm, as long as you
> > don't actually use it. The reason we're now limiting it is to avoid Out
> > Of Memory errors if you're running with overcommit turned off, and
> > autovacuum triggers a vacuum on multiple tables at the same time.
> 
> For reference, MaxHeapTuplesPerPage on an 8k block is 291. If there are any
> columns in your tuples (meaning they're not either HOT updates which have been
> pruned or rows with 8 or fewer columns all of which are null) then the most
> you can have is 255 rows.

How about we change it to MaxHeapTuplesPerPage for now.  That closes all
complaints in this thread.

Later we can think about better ways of doing the whole thing, like
using non-lossy tidbitmaps.

-- 
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"Para tener más hay que desear menos"


Re: Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> How about we change it to MaxHeapTuplesPerPage for now.  That closes all
> complaints in this thread.

Done, thanks for the input!

-- 
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)