Thread: Question about OID and TCID

Question about OID and TCID

From
Zhipan Wang
Date:
Hi,

I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read
severalpages of data in the table starting from this tuple. I guess CTID could be translated to physical address on the
diskto retrieve this tuple, right? If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL? Can I
useOID to do this equally efficiently? 

Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old
CTID,and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean
itactually inserts a new tuple and the out-dated tuple still occupies the space? How can I write the updated tuple back
toits original position to utilize disk space more efficiently? 

Thanks!

Re: Question about OID and TCID

From
Dmitriy Igrishin
Date:
Hey Zhipan,

2010/11/27 Zhipan Wang <wzhipan@soe.ucsc.edu>
Hi,

I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read several pages of data in the table starting from this tuple. I guess CTID could be translated to physical address on the disk to retrieve this tuple, right? If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL? Can I use OID to do this equally efficiently?
Consider to use cursors to read sequentially by FETCH.
http://www.postgresql.org/docs/9.0/static/sql-fetch.html

Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old CTID, and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean it actually inserts a new tuple and the out-dated tuple still occupies the space? How can I write the updated tuple back to its original position to utilize disk space more efficiently?
I believe that VACUUM works well on it.
http://www.postgresql.org/docs/9.0/static/sql-vacuum.html

Thanks!

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
// Dmitriy.


Re: Question about OID and TCID

From
Guillaume Lelarge
Date:
Le 27/11/2010 09:52, Zhipan Wang a écrit :
> Hi,
>
> I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read
severalpages of data in the table starting from this tuple. 

You shouldn't rely on the order on disk. It will change as soon as you
update one.

> I guess CTID could be translated to physical address on the disk to retrieve this tuple, right?

Yes, first number is the page number. The second one is the record
number in that page.

> If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL?

For example, SELECT * FROM yourtable WHERE ctid='(0,5)';

> Can I use OID to do this equally efficiently?

No, they don't reflect the order on disk.

> Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old
CTID,and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean
itactually inserts a new tuple and the out-dated tuple still occupies the space? 

Yes. Other sessions could still need to see the old tuple values.

> How can I write the updated tuple back to its original position to utilize disk space more efficiently?

You don't need to. PostgreSQL will deal with that. As soon as you do a
VACUUM, PostgreSQL will be able to use the dead space for the next
UPDATE or INSERT query (if there is enough space of course).


--
Guillaume
 http://www.postgresql.fr
 http://dalibo.com

Re: Question about OID and TCID

From
Guillaume Lelarge
Date:
Please send your answer (also) to the list.

Le 28/11/2010 10:23, zhipan Wang a écrit :
> [...]
> Thanks a lot for your detailed reply. I am trying to implement the idea proposed
> in the paper Database Cracking. The basic idea is like this:
>
> When a query with a range predicate like "a<col" comes, a copy of the column col
> is created as a stand-alone table, and I need  to partially sort (so physically
> reorganize) the table, and insert a  node into the index to point to a tuple in
> the table, such that all  values in the new table before that tuple are less
> than a and all values in col after that tuple are greater than or equal to a. As
> more and more queries come in, the table becomes better ordered, and a qeury can
> benefit from sequential scan enabled by this order.
>

Finding a solution to your issue is quite hard, because we don't know
which issue you have. Anyways, some thoughts...

PostgreSQL can already use an index to quickly sort a table. But CLUSTER
may be a more appropriate solution for you. You have to remember that
you'll need to launch CLUSTER from time to time so that the table get
reorganized.

> So the in-place update is a vital part of the program, because a range of values
> of column col should be stored together on the disk. Is there any way to do such
> a in-place update without generating much extra overhead?
>

If you need in-place update, then it's possible that PostgreSQL is not
the right tool for you.

But, as I already said, it's difficult to know for sure as we don't know
what issue you're trying to fix.


--
Guillaume
 http://www.postgresql.fr
 http://dalibo.com

Re: Question about OID and TCID

From
Matthew Walden
Date:
> So the in-place update is a vital part of the program, because a range of values
> of column col should be stored together on the disk. Is there any way to do such
> a in-place update without generating much extra overhead?
>

Although in-place update is not possible, can he not use partitioning to at least try to store the data together based on the value of column col?

Re: Question about OID and TCID

From
Zhipan Wang
Date:
Since in-place seems not possible, I am trying to use arrays to store different ranges of that column, and it's fast to
fetcha big array, so I believe an array is stored continuously on disk. 

I can convert an array to a column easily by unnest() function, but I didn't find any function that converts a column
toan array. Is there a efficient method to do that? 

Thanks

----- Original Message -----
From: "Guillaume Lelarge" <guillaume@lelarge.info>
To: "Zhipan Wang" <wzhipan@soe.ucsc.edu>
Cc: pgsql-general@postgresql.org
Sent: Sunday, November 28, 2010 12:59:46 AM GMT -08:00 US/Canada Pacific
Subject: Re: [GENERAL] Question about OID and TCID

Le 27/11/2010 09:52, Zhipan Wang a écrit :
> Hi,
>
> I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read
severalpages of data in the table starting from this tuple. 

You shouldn't rely on the order on disk. It will change as soon as you
update one.

> I guess CTID could be translated to physical address on the disk to retrieve this tuple, right?

Yes, first number is the page number. The second one is the record
number in that page.

> If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL?

For example, SELECT * FROM yourtable WHERE ctid='(0,5)';

> Can I use OID to do this equally efficiently?

No, they don't reflect the order on disk.

> Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old
CTID,and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean
itactually inserts a new tuple and the out-dated tuple still occupies the space? 

Yes. Other sessions could still need to see the old tuple values.

> How can I write the updated tuple back to its original position to utilize disk space more efficiently?

You don't need to. PostgreSQL will deal with that. As soon as you do a
VACUUM, PostgreSQL will be able to use the dead space for the next
UPDATE or INSERT query (if there is enough space of course).


--
Guillaume
 http://www.postgresql.fr
 http://dalibo.com

Re: Question about OID and TCID

From
Raymond O'Donnell
Date:
On 29/11/2010 16:56, Zhipan Wang wrote:
> Since in-place seems not possible, I am trying to use arrays to store
> different ranges of that column, and it's fast to fetch a big array,
> so I believe an array is stored continuously on disk.
>
> I can convert an array to a column easily by unnest() function, but I
> didn't find any function that converts a column to an array. Is there
> a efficient method to do that?

I think array_agg() is what you want.

   http://www.postgresql.org/docs/9.0/static/functions-aggregate.html

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie