FSM versus GIN pending list bloat - Mailing list pgsql-hackers

From Jeff Janes
Subject FSM versus GIN pending list bloat
Date
Msg-id CAMkU=1xfE1MnGMkv655hB8jCs3PBTb4S5H+FnQv8kcmYzyeBDQ@mail.gmail.com
Whole thread Raw
Responses Re: FSM versus GIN pending list bloat  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: FSM versus GIN pending list bloat  (Simon Riggs <simon@2ndQuadrant.com>)
Re: FSM versus GIN pending list bloat  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
For a GIN index with fastupdate turned on, both the user backends and autoanalyze routine will clear out the pending list, pushing the entries into the normal index structure and deleting the pages used by the pending list.  But those deleted pages will not get added to the freespace map until a vacuum is done.  This leads to horrible bloat on insert only tables, as it is never vacuumed and so the pending list space is never reused.  And the pending list is very inefficient in space usage to start with, even compared to the old style posting lists and especially compared to the new compressed ones.  (If they were aggressively recycled, this inefficient use wouldn't be much of a problem.)

Even on a table receiving mostly updates after its initial population (and so being vacuumed regularly) with default autovac setting, there is a lot of bloat.

The attached proof of concept patch greatly improves the bloat for both the insert and the update cases.  You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The first of those two things could probably be adopted for real, but the second probably is not acceptable.  What is the right way to do this?  Could a variant of RecordFreeIndexPage bubble the free space up the map immediately rather than waiting for a vacuum?  It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered.

Or would a completely different approach be better, like managing the vacated pending list pages directly in the index without going to the fsm?

Cheers,

Jeff
Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: tablecmds.c and lock hierarchy
Next
From: Michael Paquier
Date:
Subject: Re: pg_rewind tap test unstable