Re: Visibility map thoughts - Mailing list pgsql-hackers

From Gokulakannan Somasundaram
Subject Re: Visibility map thoughts
Date
Msg-id 9362e74e0711050813o7eb2a353q7a7ea6cc9b86b608@mail.gmail.com
Whole thread Raw
In response to Re: Visibility map thoughts  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-hackers


On 11/5/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Gokulakannan Somasundaram wrote:
> a) The inserts won't increase the size of the table. If it increases, it has
> to lock one full page of Visibility map and this is not suitable for tables,
> which are short-lived like partitioned tables

The overhead of locking a page is very small.

Actually, extending a heap only needs to touch the visibility map when
we need a new visibility map page, if we initialize all bits to zero.
Like we do already anyway.

As you have pointed out 1 page in the visibility map points to 65535 pages in the heap. So even if we are locking the visibility map for a small time, it will affect all those scans, which will need to access these pages.

> b) Even if the inserts don't increase the size of the table, it might make
> DSM useless, if lot of inserts keep converting the all-visible ones to
> uncertain ones. For that matter, even the Deletes and Updates are also going
> to make lot of pages into uncertain ones.

Sure. If you have a lot of (random) inserts/updates/deletes, it becomes
much less useful.

A small mitigating factor is that an insert/update/delete will fetch the
heap page to memory anyway. Therefore having to access it just after the
update is cheap. This helps inserts in particular, because after the
inserting transaction is < OldestXmin, we can set the bit again.

But we can set this bit only after a Vacuum process. The tuples might not be there, till the Vacuum process pitches in and marks this.

> c) Visibility map gets useless, when there is a long running batch query /
> periodic background queries which run for longer times

Yeah, long running transactions are a bitch in many ways.

But  these are determined by business conditions and  we need to provide efficient solutions to deal with it. 

> d) More updates- more blocks of uncertainity - space usage by DSM and the
> reference made to DSM is just an overhead
> e) Lot of times, people may not need index-only scans. Again this gets to be
> a overhead

The beauty of this approach is that the overhead is very small.

> f) If there are scheduled reboots, the DSM crashes and periodic slow-downs
> in the queries during the time, the DSM gets re-constructed.

That's rubbish.

I think DSM is not  WAL-Logged.  So when it gets reconstructed every time for a big table, isn't it a overhead? 

> I am not opposing this, as it is a redundant feature for Thick indexes.
> After all every one of us, want Postgres to be the fastest one in the world.

And also the easiest to maintain, most space-efficient, most reliable
and so forth...

Provided it supports Vacuuming & Freezing..

> But because DSM has a inherent assumption that lot of tables will become
> static and all the tuples would be visible to everyone. If there are such
> tables, then definitely Thick index becomes a overhead in terms of space.
> But DSM should not become overhead at any cost, as it is a memory resident
> one at all times and also always gets into the lifecycle of a query. Only
> way to achieve it is to make it a dual purpose one. It should help Vacuum,
> freezing and visibility checks.

I don't understand this paragraph.

Because  updates, inserts and deletes reduce the utility of Visibility map, it seems to be designed for more static tables, which don't experience much of these operations.
 


--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Slow regression tests on windows
Next
From: Gregory Stark
Date:
Subject: Re: pgsql: Add a note about another issue that needs to be considered before