Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update
Date
Msg-id as2rhg$178u$1@news.hub.org
Whole thread Raw
In response to [PERFORM] Realtime VACUUM, was: performance of insert/delete/update  ("Curtis Faith" <curtis@galtair.com>)
List pgsql-hackers
How about OPTIMIZE?

eg. optimize customers

instead of analyze, could be paired with agressive

so, OPTIMIZE AGREESSIVE

very much a glass half empty, half full type thing.  vacuum is not a
problem, its a solution.

Merlin


""Curtis Faith"" <curtis@galtair.com> wrote in message
news:DMEEJMCDOJAKPPFACMPMIEIDCFAA.curtis@galtair.com...
> tom lane wrote:
> > Sure, it's just shuffling the housekeeping work from one place to
> > another.  The thing that I like about Postgres' approach is that we
> > put the housekeeping in a background task (VACUUM) rather than in the
> > critical path of foreground transaction commit.
>
> Thinking with my marketing hat on, MVCC would be a much bigger win if
VACUUM
> was not required (or was done automagically). The need for periodic VACUUM
> just gives ammunition to the PostgreSQL opponents who can claim we are
> deferring work but that it amounts to the same thing.
>
> A fully automatic background VACUUM will significantly reduce but will not
> eliminate this perceived weakness.
>
> However, it always seemed to me there should be some way to reuse the
space
> more dynamically and quickly than a background VACUUM thereby reducing the
> percentage of tuples that are expired in heavy update cases. If only a
very
> tiny number of tuples on the disk are expired this will reduce the
aggregate
> performance/space penalty of MVCC into insignificance for the majority of
> uses.
>
> Couldn't we reuse tuple and index space as soon as there are no
transactions
> that depend on the old tuple or index values. I have imagined that this
was
> always part of the long-term master plan.
>
> Couldn't we keep a list of dead tuples in shared memory and look in the
list
> first when deciding where to place new values for inserts or updates so we
> don't have to rely on VACUUM (even a background one)? If there are expired
> tuple slots in the list these would be used before allocating a new slot
from
> the tuple heap.
>
> The only issue is determining the lowest transaction ID for in-process
> transactions which seems relatively easy to do (if it's not already done
> somewhere).
>
> In the normal shutdown and startup case, a tuple VACUUM could be performed
> automatically. This would normally be very fast since there would not be
many
> tuples in the list.
>
> Index slots would be handled differently since these cannot be substituted
> one for another. However, these could be recovered as part of every index
> page update. Pages would be scanned before being written and any expired
> slots that had transaction ID's lower than the lowest active slot would be
> removed. This could be done for non-leaf pages as well and would result in
> only reorganizing a page that is already going to be written thereby not
> adding much to the overall work.
>
> I don't think that internal pages that contain pointers to values in nodes
> further down the tree that are no longer in the leaf nodes because of this
> partial expired entry elimination will cause a problem since searches and
> scans will still work fine.
>
> Does VACUUM do something that could not be handled in this realtime
manner?
>
> - Curtis
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org




pgsql-hackers by date:

Previous
From: Jim Beckstrom
Date:
Subject: Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update
Next
From: "Merlin Moncure"
Date:
Subject: Re: Need Quote for 7.3