Re: [GENERAL] Buglist - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: [GENERAL] Buglist
Date
Msg-id 3F46423A.3050205@Yahoo.com
Whole thread Raw
In response to Re: [GENERAL] Buglist  ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
Responses Re: [GENERAL] Buglist  (Manfred Koizar <mkoi-pg@aon.at>)
Re: [GENERAL] Buglist  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Okay, my proposal would be to have a VACUUM mode where it tells the
buffer manager to only return a page if it is already in memory, and
some "not cached" if it would have to read it from disk, and simply skip
the page in that case. Probably needs some modifications in vacuums FSM
handling, but basically that's it. It'll still cause IO for the
resulting index bulk cleaning, so I don't know how efficient it'll be
after all.

The number of vacuumed tuples returned will tell the autovacuum how
useful this vacuum scan was. The less useful it is, the less frequent
it'll be scheduled. There is no point in vacuuming a 50M row table every
hour when the average number of tuples reclaimed is in the hundreds. I
don't intend to avoid a full table scan completely. I only intend to
lower the frequency of them. It will require some fuzzy logic in
autovacuum to figure out if a CACHEONLY vacuum for a table needs to be
more or less frequent to find more tuples though.

So far for what I have in mind. Now what are you proposing down there?
Where do you intend to hold that "per page stat" and what exactly is
maintaining it? And please don't give us any vague "some other resident
process". This only indicates you don't really know what it requires for
a process to be able to read or write data in PostgreSQL.


Jan

Shridhar Daithankar wrote:

> On 22 Aug 2003 at 11:03, Jan Wieck wrote:
>
>> Tom Lane wrote:
>>
>> > Jan Wieck <JanWieck@Yahoo.com> writes:
>> >> Shridhar Daithankar wrote:
>> >>> Umm.. What does FSM does then? I was under impression that FSM stores page
>> >>> pointers and vacuum work on FSM information only. In that case, it wouldn't
>> >>> have to waste time to find out which pages to clean.
>> >
>> >> It's the other way around! VACUUM scan's the tables to find and reclaim
>> >> free space and remembers that free space in the FSM.
>> >
>> > Right.  One big question mark in my mind about these "partial vacuum"
>> > proposals is whether they'd still allow adequate FSM information to be
>> > maintained.  If VACUUM isn't looking at most of the pages, there's no
>> > very good way to acquire info about where there's free space.
>>
>> That's why I think it needs one more pg_stat column to count the number
>> of vacuumed tuples. If one does
>>
>>      tuples_updated + tuples_deleted - tuples_vacuumed
>>
>> he'll get approximately the number of tuples a regular vacuum might be
>> able to reclaim. If that number is really small, no need for autovacuum
>> to cause any big trouble by scanning the relation.
>>
>> Another way to give autovacuum some hints would be to return some number
>> as commandtuples from vacuum. like the number of tuples actually
>> vacuumed. That together with the new number of reltuples in pg_class
>> will tell autovacuum how frequent a relation really needs scanning.
>
> This kind of information does not really help autovacuum. If we are talking
> about modifying backend stat collection algo., so that vacuum does minimum
> work, is has translate to cheaper vacuum analyze so that autovacuum can fire it
> at will any time. In the best case, another resident process like stat
> collector can keep cleaning the deads.
>
> This information must be in terms of pages and actually be maintained as per
> page stat. Looking at number of tuples values does not give any idea to vacuum
> how it is going to flush cache lines, either in postgresql or on OS. I doubt it
> will help vacuum command in itself to be any lighter or more efficient.
>
> If it is easy to do, I would favour maitaining two page maps as I mentioned in
> another mail. One for pages in cache but not locked by any transaction and
> another for pages which has some free space. If it is rare for a page to be
> full, we can skip the later one. I think that could be good enough.
>
>
>
>
> Bye
>  Shridhar
>
> --
> Office Automation:    The use of computers to improve efficiency in the office    by
> removing anyone you would want to talk with over coffee.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


pgsql-hackers by date:

Previous
From: Manfred Koizar
Date:
Subject: Re: [GENERAL] Buglist
Next
From: Jan Wieck
Date:
Subject: Re: Single-file DBs WAS: Need concrete "Why Postgres