Thread: Remove size limitations of vacuums dead_tuples array

Remove size limitations of vacuums dead_tuples array

From

Ants Aasma

Date:

09 October 2019, 12:58:11

When dealing with a case where a 2TB table had 3 billion dead tuples I discovered that vacuum currently can't make use of more than 1GB of maintenance_work_mem - 179M tuples. This caused excessive amounts of index scanning even though there was plenty of memory available.

I didn't see any good reason for having this limit, so here is a patch that makes use of MemoryContextAllocHuge, and converts the array indexing to use size_t to lift a second limit at 12GB.

One potential problem with allowing larger arrays is that bsearch might no longer be the best way of determining if a ctid was marked dead. It might pay off to convert the dead tuples array to a hash table to avoid O(n log n) runtime when scanning indexes. I haven't done any profiling yet to see how big of a problem this is.

Second issue I noticed is that the dead_tuples array is always allocated max allowed size, unless the table can't possibly have that many tuples. It may make sense to allocate it based on estimated number of dead tuples and resize if needed.

Regards,

Ants Aasma

Web: https://www.cybertec-postgresql.com

Attachment

0001-Allow-vacuum-to-use-more-than-1GB-of-memory.patch

Re: Remove size limitations of vacuums dead_tuples array

From

Tomas Vondra

Date:

10 October 2019, 14:05:08

On Wed, Oct 09, 2019 at 03:58:11PM +0300, Ants Aasma wrote:
>When dealing with a case where a 2TB table had 3 billion dead tuples I
>discovered that vacuum currently can't make use of more than 1GB of
>maintenance_work_mem - 179M tuples. This caused excessive amounts of index
>scanning even though there was plenty of memory available.
>
>I didn't see any good reason for having this limit, so here is a patch that
>makes use of MemoryContextAllocHuge, and converts the array indexing to use
>size_t to lift a second limit at 12GB.
>
>One potential problem with allowing larger arrays is that bsearch might no
>longer be the best way of determining if a ctid was marked dead. It might
>pay off to convert the dead tuples array to a hash table to avoid O(n log
>n) runtime when scanning indexes. I haven't done any profiling yet to see
>how big of a problem this is.
>
>Second issue I noticed is that the dead_tuples array is always allocated
>max allowed size, unless the table can't possibly have that many tuples. It
>may make sense to allocate it based on estimated number of dead tuples and
>resize if needed.
>

There already was a attempt to make this improvement, see [1]. There was
a fairly long discussion about how to best do that (using other data
structure, not just a simple array). It kinda died about a year ago, but
I suppose there's a lot of relevant info in that thread.

[1] https://www.postgresql.org/message-id/CAGTBQpbDCaR6vv9%3DscXzuT8fSbckf%3Da3NgZdWFWZbdVugVht6Q%40mail.gmail.com


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Remove size limitations of vacuums dead_tuples array

From

Ants Aasma

Date:

11 October 2019, 13:49:11

On Thu, 10 Oct 2019 at 17:05, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

There already was a attempt to make this improvement, see [1]. There was
a fairly long discussion about how to best do that (using other data
structure, not just a simple array). It kinda died about a year ago, but
I suppose there's a lot of relevant info in that thread.

[1] https://www.postgresql.org/message-id/CAGTBQpbDCaR6vv9%3DscXzuT8fSbckf%3Da3NgZdWFWZbdVugVht6Q%40mail.gmail.com

Thanks for the pointer, wow that's a long thread. For some reason it did not consider lifting the INT_MAX tuples/12GB limitation. I'll see if I can pick up where that thread left off and push it along.

Regards,

Ants Aasma

Web: https://www.cybertec-postgresql.com

Re: Remove size limitations of vacuums dead_tuples array

From

Michael Paquier

Date:

30 November 2019, 02:20:01

On Fri, Oct 11, 2019 at 04:49:11PM +0300, Ants Aasma wrote:
> Thanks for the pointer, wow that's a long thread. For some reason it did
> not consider lifting the INT_MAX tuples/12GB limitation. I'll see if I can
> pick up where that thread left off and push it along.

Hmm.  Okay..  Then I have marked this entry as returned with feedback
in this CF.
--
Michael

Attachment

signature.asc