Home > mailing lists

Re: Including Snapshot Info with Indexes - Mailing list pgsql-hackers

From	Gokulakannan Somasundaram
Subject	Re: Including Snapshot Info with Indexes
Date	October 20, 2007 03:54:18
Msg-id	9362e74e0710192054s666b6907l5227e96247f6ac7b@mail.gmail.com Whole thread Raw
In response to	Re: Including Snapshot Info with Indexes ("Gokulakannan Somasundaram" <gokul007@gmail.com>)
Responses	Re: Including Snapshot Info with Indexes
List	pgsql-hackers

Tree view

Hi,
I think i have a initial Implementation. It has some bugs and i am working on fixing it. But to show the advantages, I want to show the number of Logical I/Os on the screen. In order to show that, i tried enabling the log_statement option in PostgreSQL.conf. But it shows only the physical reads. What i wanted was a Logical reads count( No. of ReadBuffer calls, which is stored in ReadBufferCount variable). So i have added this stats to the bufmgr.c(function is BufferUsage, i suppose) to show Logical Reads and Physical Reads. Is this a acceptable change?
I thought logical read count would be helpful, even for SQL tuning. Since if someone wants to tune the SQL on a test system, things might get cached and he wouldn't know how much I/O his SQL is potentially capable of. May be we can add a statistic to show how many of those ReadBuffers are pinned Buffers.

Expecting your comments.

Thanks,
Gokul.

On 10/14/07, Gokulakannan Somasundaram <gokul007@gmail.com > wrote:

On 10/14/07, Trevor Talbot <quension@gmail.com> wrote:
On 10/14/07, Gokulakannan Somasundaram <gokul007@gmail.com> wrote:

> http://www.databasecolumn.com/2007/09/one-size-fits-all.html

> > > The Vertica database(Monet is a open source version with the same
> > > principle) makes use of the very same principle. Use more disk space,
> > > since they are less costly and optimize the data warehousing.

> What i  meant there was, it has duplicated storage of certain columns of the
> table. A table with more than one projection always needs more space, than a
> table with just one projection. By doing this they are reducing the number
> of disk operations. If they are duplicating columns of data to avoid reading
> un-necessary information, we are duplicating the snapshot information to
> avoid going to the table.

Was this about Vertica or MonetDB?  I saw that article a while ago,
and I didn't see anything that suggested Vertica duplicated data, just
that it organized it differently on disk.  What are you seeing as
being duplicated?

Hi Trevor,
             This is a good paper to read about the basics of Column-oriented databases.
http://db.lcs.mit.edu/projects/cstore/vldb.pdf
If you goto the Section 2 - Data Model. He has shown the data model, with a sample EMP table.

The example shows that EMP table contains four columns - Name, Age, Dept, Salary
From this table, projections are being formed - (In the paper, they have shown the creation of four projections for Example 1)
EMP1 (name, age)
EMP2 (dept, age, DEPT.floor)
EMP3 (name, salary)
DEPT1(dname, floor)

As you can see, the same column information gets duplicated in different projections.
The advantage is that if a query is around name and age, it need not skim around other details. But the storage requirements go high, since there is redundancy. As you may know, if you increase data redundancy, it will help selects at the cost of inserts, updates and deletes.

This is what i was trying to say.

Thanks,
Gokul.

pgsql-hackers by date:

From: "Henry B. Hotz"
Date: 20 October 2007, 00:13:00
Subject: 8.3 GSS Issues

From: "Magnus Hagander"
Date: 20 October 2007, 07:23:31
Subject: Re: Strange error dropping foreign key

Re: Including Snapshot Info with Indexes - Mailing list pgsql-hackers

Previous

Next