Thread: FSM versus GIN pending list bloat

FSM versus GIN pending list bloat

From
Jeff Janes
Date:
For a GIN index with fastupdate turned on, both the user backends and autoanalyze routine will clear out the pending list, pushing the entries into the normal index structure and deleting the pages used by the pending list.  But those deleted pages will not get added to the freespace map until a vacuum is done.  This leads to horrible bloat on insert only tables, as it is never vacuumed and so the pending list space is never reused.  And the pending list is very inefficient in space usage to start with, even compared to the old style posting lists and especially compared to the new compressed ones.  (If they were aggressively recycled, this inefficient use wouldn't be much of a problem.)

Even on a table receiving mostly updates after its initial population (and so being vacuumed regularly) with default autovac setting, there is a lot of bloat.

The attached proof of concept patch greatly improves the bloat for both the insert and the update cases.  You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The first of those two things could probably be adopted for real, but the second probably is not acceptable.  What is the right way to do this?  Could a variant of RecordFreeIndexPage bubble the free space up the map immediately rather than waiting for a vacuum?  It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered.

Or would a completely different approach be better, like managing the vacated pending list pages directly in the index without going to the fsm?

Cheers,

Jeff
Attachment

Re: FSM versus GIN pending list bloat

From
Heikki Linnakangas
Date:
On 08/04/2015 08:03 AM, Jeff Janes wrote:
> For a GIN index with fastupdate turned on, both the user backends and
> autoanalyze routine will clear out the pending list, pushing the entries
> into the normal index structure and deleting the pages used by the pending
> list.  But those deleted pages will not get added to the freespace map
> until a vacuum is done.  This leads to horrible bloat on insert only
> tables, as it is never vacuumed and so the pending list space is never
> reused.  And the pending list is very inefficient in space usage to start
> with, even compared to the old style posting lists and especially compared
> to the new compressed ones.  (If they were aggressively recycled, this
> inefficient use wouldn't be much of a problem.)

Good point.

> The attached proof of concept patch greatly improves the bloat for both the
> insert and the update cases.  You need to turn on both features: adding the
> pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The
> first of those two things could probably be adopted for real, but the
> second probably is not acceptable.  What is the right way to do this?
> Could a variant of RecordFreeIndexPage bubble the free space up the map
> immediately rather than waiting for a vacuum?  It would only have to move
> up until it found a page with freespace already recorded in it, which the
> vast majority of the time would mean observing up one level and then not
> writing to it, assuming the pending list pages remain well clustered.

Yep, that sounds good.

- Heikki




Re: FSM versus GIN pending list bloat

From
Simon Riggs
Date:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
 
The attached proof of concept patch greatly improves the bloat for both the insert and the update cases.  You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The first of those two things could probably be adopted for real, but the second probably is not acceptable.  What is the right way to do this?  Could a variant of RecordFreeIndexPage bubble the free space up the map immediately rather than waiting for a vacuum?  It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered.

You make a good case for action here since insert only tables with GIN indexes on text are a common use case for GIN. 

Why would vacuuming the FSM be unacceptable? With a large gin_pending_list_limit it makes sense.

If it is unacceptable, perhaps we can avoid calling it every time, or simply have FreeSpaceMapVacuum() terminate more quickly on some kind of 80/20 heuristic for this case.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FSM versus GIN pending list bloat

From
Simon Riggs
Date:
On 4 August 2015 at 09:39, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
 
The attached proof of concept patch greatly improves the bloat for both the insert and the update cases.  You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The first of those two things could probably be adopted for real, but the second probably is not acceptable.  What is the right way to do this?  Could a variant of RecordFreeIndexPage bubble the free space up the map immediately rather than waiting for a vacuum?  It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered.

You make a good case for action here since insert only tables with GIN indexes on text are a common use case for GIN. 

Why would vacuuming the FSM be unacceptable? With a large gin_pending_list_limit it makes sense.

If it is unacceptable, perhaps we can avoid calling it every time, or simply have FreeSpaceMapVacuum() terminate more quickly on some kind of 80/20 heuristic for this case.

Couple of questions here...

* the docs say "it's desirable to have pending-list cleanup occur in the background", but there is no way to invoke that, except via VACUUM. I think we need a separate function to be able to call this as a background action. If we had that, we wouldn't need much else, would we?

* why do we have two parameters: gin_pending_list_limit and fastupdate? What happens if we set gin_pending_list_limit but don't set fastupdate?

* how do we know how to set that parameter? Is there a way of knowing gin_pending_list_limit has been reached?

This and the OP seem like 9.5 open items to me.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FSM versus GIN pending list bloat

From
Heikki Linnakangas
Date:
On 08/04/2015 04:35 PM, Simon Riggs wrote:
> This and the OP seem like 9.5 open items to me.

Why? This is nothing new in 9.5.

- Heikki




Re: FSM versus GIN pending list bloat

From
Simon Riggs
Date:
On 4 August 2015 at 14:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 08/04/2015 04:35 PM, Simon Riggs wrote:
This and the OP seem like 9.5 open items to me.

Why? This is nothing new in 9.5.

gin_pending_list_limit is new in 9.5

We're in Alpha, so if something has been added and isn't very usable, we should change it while we can.
 
--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FSM versus GIN pending list bloat

From
Andres Freund
Date:
On 2015-08-04 14:59:11 +0100, Simon Riggs wrote:
> On 4 August 2015 at 14:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> 
> > On 08/04/2015 04:35 PM, Simon Riggs wrote:
> >
> >> This and the OP seem like 9.5 open items to me.
> >>
> >
> > Why? This is nothing new in 9.5.
> 
> 
> gin_pending_list_limit is new in 9.5
> 
> We're in Alpha, so if something has been added and isn't very usable, we
> should change it while we can.

The only thing that variable does is change what the pending size limit
is determined by. Previously it was work_mem, now it's
gin_pending_list_limit. Imo that has pretty much nothing to do with not
registering pages as free.

Andres



Re: FSM versus GIN pending list bloat

From
Simon Riggs
Date:
On 4 August 2015 at 15:18, Andres Freund <andres@anarazel.de> wrote:
On 2015-08-04 14:59:11 +0100, Simon Riggs wrote:
> On 4 August 2015 at 14:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> > On 08/04/2015 04:35 PM, Simon Riggs wrote:
> >
> >> This and the OP seem like 9.5 open items to me.
> >>
> >
> > Why? This is nothing new in 9.5.
>
>
> gin_pending_list_limit is new in 9.5
>
> We're in Alpha, so if something has been added and isn't very usable, we
> should change it while we can.

The only thing that variable does is change what the pending size limit
is determined by. Previously it was work_mem, now it's
gin_pending_list_limit. Imo that has pretty much nothing to do with not
registering pages as free.

We've made a change to the way GIN fast update works and that needs to be an effective change, not one that immediately triggers a secondary annoyance for the user. I've asked some questions; they may turn into actions, or not, but they are open items.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FSM versus GIN pending list bloat

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 4 August 2015 at 15:18, Andres Freund <andres@anarazel.de> wrote:
>> The only thing that variable does is change what the pending size limit
>> is determined by. Previously it was work_mem, now it's
>> gin_pending_list_limit. Imo that has pretty much nothing to do with not
>> registering pages as free.

> We've made a change to the way GIN fast update works and that needs to be
> an effective change, not one that immediately triggers a secondary
> annoyance for the user.

Said annoyance has been there since day one, no?

> I've asked some questions; they may turn into
> actions, or not, but they are open items.

If it's not a new problem introduced by 9.5, I don't think it's
appropriate to insist that 9.5 must solve it.
        regards, tom lane



Re: FSM versus GIN pending list bloat

From
Alvaro Herrera
Date:
Jeff Janes wrote:

> The attached proof of concept patch greatly improves the bloat for both the
> insert and the update cases.  You need to turn on both features: adding the
> pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The
> first of those two things could probably be adopted for real, but the
> second probably is not acceptable.  What is the right way to do this?

I think that's the right way, only that I wouldn't call FSMVacuum()
inside the loop but only once you're out of it.

FWIW I had to add a FSMVacuum call somewhere in the BRIN code also,
which I didn't like at all.  If there's a way to optimize this "bubbling
up" of some individual page all the way to the root level, that would
probably be a lot better than what it's currently doing.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FSM versus GIN pending list bloat

From
Jeff Janes
Date:
On Tue, Aug 4, 2015 at 1:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
 
The attached proof of concept patch greatly improves the bloat for both the insert and the update cases.  You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The first of those two things could probably be adopted for real, but the second probably is not acceptable.  What is the right way to do this?  Could a variant of RecordFreeIndexPage bubble the free space up the map immediately rather than waiting for a vacuum?  It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered.

You make a good case for action here since insert only tables with GIN indexes on text are a common use case for GIN. 

Why would vacuuming the FSM be unacceptable? With a large gin_pending_list_limit it makes sense.

But with a smallish gin_pending_list_limit (like the default 4MB) this could be called a lot (multiple times a second during some spurts), and would read the entire fsm each time.  
 

If it is unacceptable, perhaps we can avoid calling it every time, or simply have FreeSpaceMapVacuum() terminate more quickly on some kind of 80/20 heuristic for this case.

Or maybe it could be passed a range of blocks which need vacuuming, so it concentrated on that range.

But from the README file, it sounds like it is already supposed to be bubbling up.  I'll have to see just whats going on there when I get a chance.

Cheers,

Jeff

Re: FSM versus GIN pending list bloat

From
Jeff Janes
Date:
On Tue, Aug 4, 2015 at 6:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 09:39, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
 
The attached proof of concept patch greatly improves the bloat for both the insert and the update cases.  You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The first of those two things could probably be adopted for real, but the second probably is not acceptable.  What is the right way to do this?  Could a variant of RecordFreeIndexPage bubble the free space up the map immediately rather than waiting for a vacuum?  It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered.

You make a good case for action here since insert only tables with GIN indexes on text are a common use case for GIN. 

Why would vacuuming the FSM be unacceptable? With a large gin_pending_list_limit it makes sense.

If it is unacceptable, perhaps we can avoid calling it every time, or simply have FreeSpaceMapVacuum() terminate more quickly on some kind of 80/20 heuristic for this case.

Couple of questions here...

* the docs say "it's desirable to have pending-list cleanup occur in the background", but there is no way to invoke that, except via VACUUM. I think we need a separate function to be able to call this as a background action. If we had that, we wouldn't need much else, would we?

I thought maybe the new bgworker framework would be a way to have a backend signal a bgworker to do the cleanup when it notices the pending list is getting large.  But that wouldn't directly fix this issue, because the bgworker still wouldn't recycle that space (without further changes), only vacuum workers do that currently.

But I don't think this could be implemented as an extension, because the signalling code has to be in core, so (not having studied the matter at all) I don't know if it is good fit for bgworker.  


* why do we have two parameters: gin_pending_list_limit and fastupdate? What happens if we set gin_pending_list_limit but don't set fastupdate?

Fastupdate is on by default.  If it were turned off, then gin_pending_list_limit would be mostly irrelevant for those tables. Fastupdate could have been implemented as a magic value (0 or -1) for gin_pending_list_limit but that would break backwards compatibility (and arguably would not be a better way of doing things, anyway).
 
* how do we know how to set that parameter? Is there a way of knowing gin_pending_list_limit has been reached?

I don't think there is an easier answer to that.  The trade offs are complex and depend on things like how well cached the parts of the index needing insertions are, how many lexemes/array elements are in an average document, and how many documents inserted near the same time as each other share lexemes in common.  And of course what you need to optimize for, latency or throughput, and if latency search latency or insert latency.

This and the OP seem like 9.5 open items to me.

I don't think so.  Freeing gin_pending_list_limit from being forcibly tied to work_mem is a good thing.  Even if I don't know exactly how to set gin_pending_list_limit, I know I don't want to be 4GB just because work_mem was set there for some temporary reason.  I'm happy to leave it at its default and let its fine tuning be a topic for people who really care about every microsecond of performance.

Cheers,

Jeff

Re: FSM versus GIN pending list bloat

From
Simon Riggs
Date:
On 4 August 2015 at 21:04, Jeff Janes <jeff.janes@gmail.com> wrote:
 
Couple of questions here...

* the docs say "it's desirable to have pending-list cleanup occur in the background", but there is no way to invoke that, except via VACUUM. I think we need a separate function to be able to call this as a background action. If we had that, we wouldn't need much else, would we?

I thought maybe the new bgworker framework would be a way to have a backend signal a bgworker to do the cleanup when it notices the pending list is getting large.  But that wouldn't directly fix this issue, because the bgworker still wouldn't recycle that space (without further changes), only vacuum workers do that currently.

But I don't think this could be implemented as an extension, because the signalling code has to be in core, so (not having studied the matter at all) I don't know if it is good fit for bgworker.  

We need to expose 2 functions:

1. a function to perform the recycling directly (BRIN has an equivalent function)

2. a function to see how big the pending list is for a particular index, i.e. do we need to run function 1? 

We can then build a bgworker that polls the pending list and issues a recycle if and when needed - which is how autovac started.
 
* why do we have two parameters: gin_pending_list_limit and fastupdate? What happens if we set gin_pending_list_limit but don't set fastupdate?

Fastupdate is on by default.  If it were turned off, then gin_pending_list_limit would be mostly irrelevant for those tables. Fastupdate could have been implemented as a magic value (0 or -1) for gin_pending_list_limit but that would break backwards compatibility (and arguably would not be a better way of doing things, anyway).
 
* how do we know how to set that parameter? Is there a way of knowing gin_pending_list_limit has been reached?

I don't think there is an easier answer to that.  The trade offs are complex and depend on things like how well cached the parts of the index needing insertions are, how many lexemes/array elements are in an average document, and how many documents inserted near the same time as each other share lexemes in common.  And of course what you need to optimize for, latency or throughput, and if latency search latency or insert latency.

So we also need a way to count the number of times the pending list is flushed. Perhaps record that on the metapage, so we can see how often it has happened - and another function to view the stats on that

This and the OP seem like 9.5 open items to me.

I don't think so.  Freeing gin_pending_list_limit from being forcibly tied to work_mem is a good thing.  Even if I don't know exactly how to set gin_pending_list_limit, I know I don't want to be 4GB just because work_mem was set there for some temporary reason.  I'm happy to leave it at its default and let its fine tuning be a topic for people who really care about every microsecond of performance.

OK, I accept this.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FSM versus GIN pending list bloat

From
Jeff Janes
Date:
On Tue, Aug 4, 2015 at 12:38 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Aug 4, 2015 at 1:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
 
The attached proof of concept patch greatly improves the bloat for both the insert and the update cases.  You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The first of those two things could probably be adopted for real, but the second probably is not acceptable.  What is the right way to do this?  Could a variant of RecordFreeIndexPage bubble the free space up `the map immediately rather than waiting for a vacuum?  It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered.

You make a good case for action here since insert only tables with GIN indexes on text are a common use case for GIN. 

Why would vacuuming the FSM be unacceptable? With a large gin_pending_list_limit it makes sense.

But with a smallish gin_pending_list_limit (like the default 4MB) this could be called a lot (multiple times a second during some spurts), and would read the entire fsm each time.  
 

If it is unacceptable, perhaps we can avoid calling it every time, or simply have FreeSpaceMapVacuum() terminate more quickly on some kind of 80/20 heuristic for this case.

Or maybe it could be passed a range of blocks which need vacuuming, so it concentrated on that range.

But from the README file, it sounds like it is already supposed to be bubbling up.  I'll have to see just whats going on there when I get a chance.

Before making changes to the FSM code to make immediate summarization possible, I decided to quantify the effect of vacuuming the entire fsm.  Going up to 5 GB of index size, the time taken to vacuum the entire FSM one time for each GIN_NDELETE_AT_ONCE was undetectable.

Based on that, I made this patch which vacuums it one time per completed ginInsertCleanup, which should be far less than once per GIN_NDELETE_AT_ONCE.

I would be interested in hearing what people with very large GIN indexes think of it.  It does seem like at some point the time needed must become large, but from what I can tell that point is way beyond what someone is likely to have for an index on an unpartitioned table.


I have a simple test case that inserts an array of 101 md5 digests into each row.  With 10_000 of these rows inserted into an already indexed table, I get 40MB for the table and 80MB for the index unpatched.  With the patch, I get 7.3 MB for the index.

Cheers,

Jeff


Attachment

Re: FSM versus GIN pending list bloat

From
Fujii Masao
Date:
On Wed, Aug 5, 2015 at 5:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 4 August 2015 at 21:04, Jeff Janes <jeff.janes@gmail.com> wrote:
>
>>>
>>> Couple of questions here...
>>>
>>> * the docs say "it's desirable to have pending-list cleanup occur in the
>>> background", but there is no way to invoke that, except via VACUUM. I think
>>> we need a separate function to be able to call this as a background action.
>>> If we had that, we wouldn't need much else, would we?
>>
>>
>> I thought maybe the new bgworker framework would be a way to have a
>> backend signal a bgworker to do the cleanup when it notices the pending list
>> is getting large.  But that wouldn't directly fix this issue, because the
>> bgworker still wouldn't recycle that space (without further changes), only
>> vacuum workers do that currently.
>>
>> But I don't think this could be implemented as an extension, because the
>> signalling code has to be in core, so (not having studied the matter at all)
>> I don't know if it is good fit for bgworker.
>
>
> We need to expose 2 functions:
>
> 1. a function to perform the recycling directly (BRIN has an equivalent
> function)
>
> 2. a function to see how big the pending list is for a particular index,
> i.e. do we need to run function 1?

Probably you can use pgstatginindex() that pgstattuple contrib module provides
for this case.

Regards,

-- 
Fujii Masao



Re: FSM versus GIN pending list bloat

From
Robert Haas
Date:
On Mon, Aug 10, 2015 at 1:16 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I have a simple test case that inserts an array of 101 md5 digests into each
> row.  With 10_000 of these rows inserted into an already indexed table, I
> get 40MB for the table and 80MB for the index unpatched.  With the patch, I
> get 7.3 MB for the index.

Uh, wow.  Seems like we should do something about this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: FSM versus GIN pending list bloat

From
Masahiko Sawada
Date:
On Thu, Sep 3, 2015 at 3:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Aug 10, 2015 at 1:16 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I have a simple test case that inserts an array of 101 md5 digests into each
>> row.  With 10_000 of these rows inserted into an already indexed table, I
>> get 40MB for the table and 80MB for the index unpatched.  With the patch, I
>> get 7.3 MB for the index.
>
> Uh, wow.  Seems like we should do something about this.
>

I looked into this patch.
The patch is applied cleanly, and complied without error.
I think it works as we expected, but the patch has some unnecessary white space.
Also the today document says that pending list is cleaned up by only
vacuum, we need to document about this?

Regards,

--
Masahiko Sawada