Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update - Mailing list pgsql-hackers

From Nicolai Tufar
Subject Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update
Date
Msg-id 043701c2961d$9bbd7d20$8016a8c0@apb.com.tr
Whole thread Raw
In response to Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update  (Tommi Maekitalo <t.maekitalo@epgmbh.de>)
List pgsql-hackers
I always wandered if VACUUM is the right name for the porcess. Now, when
PostgreSQL
is actively challenging in Enterprise space, it might be a good idea to give
it a more
enterprise-like name. Try to think how it is looking for an outside person
to see
us, database professionals hold lenghty discussions about the ways we
vacuum a database. Why should you need to vacuum a database? Is it
dirty? In my personal opinion, something like "space reclaiming daemon",
"free-list organizer", "tuple recyle job" or "segment coalesce process"
would
sound more business-like .

Regards,
Nick


----- Original Message -----
From: "Bruce Momjian" <pgman@candle.pha.pa.us>
To: "Curtis Faith" <curtis@galtair.com>
Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; "Ron Johnson" <ron.l.johnson@cox.net>;
"PgSQL Performance ML" <pgsql-performance@postgresql.org>;
<pgsql-hackers@postgresql.org>
Sent: Tuesday, November 26, 2002 9:09 PM
Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of
insert/delete/update


>
> Good ideas.  I think the master solution is to hook the statistics
> daemon information into an automatic vacuum that could _know_ which
> tables need attention.
>
> --------------------------------------------------------------------------
-
>
> Curtis Faith wrote:
> > tom lane wrote:
> > > Sure, it's just shuffling the housekeeping work from one place to
> > > another.  The thing that I like about Postgres' approach is that we
> > > put the housekeeping in a background task (VACUUM) rather than in the
> > > critical path of foreground transaction commit.
> >
> > Thinking with my marketing hat on, MVCC would be a much bigger win if
VACUUM
> > was not required (or was done automagically). The need for periodic
VACUUM
> > just gives ammunition to the PostgreSQL opponents who can claim we are
> > deferring work but that it amounts to the same thing.
> >
> > A fully automatic background VACUUM will significantly reduce but will
not
> > eliminate this perceived weakness.
> >
> > However, it always seemed to me there should be some way to reuse the
space
> > more dynamically and quickly than a background VACUUM thereby reducing
the
> > percentage of tuples that are expired in heavy update cases. If only a
very
> > tiny number of tuples on the disk are expired this will reduce the
aggregate
> > performance/space penalty of MVCC into insignificance for the majority
of
> > uses.
> >
> > Couldn't we reuse tuple and index space as soon as there are no
transactions
> > that depend on the old tuple or index values. I have imagined that this
was
> > always part of the long-term master plan.
> >
> > Couldn't we keep a list of dead tuples in shared memory and look in the
list
> > first when deciding where to place new values for inserts or updates so
we
> > don't have to rely on VACUUM (even a background one)? If there are
expired
> > tuple slots in the list these would be used before allocating a new slot
from
> > the tuple heap.
> >
> > The only issue is determining the lowest transaction ID for in-process
> > transactions which seems relatively easy to do (if it's not already done
> > somewhere).
> >
> > In the normal shutdown and startup case, a tuple VACUUM could be
performed
> > automatically. This would normally be very fast since there would not be
many
> > tuples in the list.
> >
> > Index slots would be handled differently since these cannot be
substituted
> > one for another. However, these could be recovered as part of every
index
> > page update. Pages would be scanned before being written and any expired
> > slots that had transaction ID's lower than the lowest active slot would
be
> > removed. This could be done for non-leaf pages as well and would result
in
> > only reorganizing a page that is already going to be written thereby not
> > adding much to the overall work.
> >
> > I don't think that internal pages that contain pointers to values in
nodes
> > further down the tree that are no longer in the leaf nodes because of
this
> > partial expired entry elimination will cause a problem since searches
and
> > scans will still work fine.
> >
> > Does VACUUM do something that could not be handled in this realtime
manner?
> >
> > - Curtis
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania
19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


pgsql-hackers by date:

Previous
From: "Andrew John"
Date:
Subject: Re: How can i import database from MSSQL to Postgres?(NULL BODY)
Next
From: "Dave Page"
Date:
Subject: Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update