Thread: Question about OID and TCID
Hi, I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read severalpages of data in the table starting from this tuple. I guess CTID could be translated to physical address on the diskto retrieve this tuple, right? If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL? Can I useOID to do this equally efficiently? Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old CTID,and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean itactually inserts a new tuple and the out-dated tuple still occupies the space? How can I write the updated tuple back toits original position to utilize disk space more efficiently? Thanks!
Hey Zhipan,
--
// Dmitriy.
2010/11/27 Zhipan Wang <wzhipan@soe.ucsc.edu>
Hi,
I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read several pages of data in the table starting from this tuple. I guess CTID could be translated to physical address on the disk to retrieve this tuple, right? If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL? Can I use OID to do this equally efficiently?
Consider to use cursors to read sequentially by FETCH.
http://www.postgresql.org/docs/9.0/static/sql-fetch.html
http://www.postgresql.org/docs/9.0/static/sql-fetch.html
Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old CTID, and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean it actually inserts a new tuple and the out-dated tuple still occupies the space? How can I write the updated tuple back to its original position to utilize disk space more efficiently?
Thanks!
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
// Dmitriy.
Le 27/11/2010 09:52, Zhipan Wang a écrit : > Hi, > > I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read severalpages of data in the table starting from this tuple. You shouldn't rely on the order on disk. It will change as soon as you update one. > I guess CTID could be translated to physical address on the disk to retrieve this tuple, right? Yes, first number is the page number. The second one is the record number in that page. > If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL? For example, SELECT * FROM yourtable WHERE ctid='(0,5)'; > Can I use OID to do this equally efficiently? No, they don't reflect the order on disk. > Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old CTID,and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean itactually inserts a new tuple and the out-dated tuple still occupies the space? Yes. Other sessions could still need to see the old tuple values. > How can I write the updated tuple back to its original position to utilize disk space more efficiently? You don't need to. PostgreSQL will deal with that. As soon as you do a VACUUM, PostgreSQL will be able to use the dead space for the next UPDATE or INSERT query (if there is enough space of course). -- Guillaume http://www.postgresql.fr http://dalibo.com
Please send your answer (also) to the list. Le 28/11/2010 10:23, zhipan Wang a écrit : > [...] > Thanks a lot for your detailed reply. I am trying to implement the idea proposed > in the paper Database Cracking. The basic idea is like this: > > When a query with a range predicate like "a<col" comes, a copy of the column col > is created as a stand-alone table, and I need to partially sort (so physically > reorganize) the table, and insert a node into the index to point to a tuple in > the table, such that all values in the new table before that tuple are less > than a and all values in col after that tuple are greater than or equal to a. As > more and more queries come in, the table becomes better ordered, and a qeury can > benefit from sequential scan enabled by this order. > Finding a solution to your issue is quite hard, because we don't know which issue you have. Anyways, some thoughts... PostgreSQL can already use an index to quickly sort a table. But CLUSTER may be a more appropriate solution for you. You have to remember that you'll need to launch CLUSTER from time to time so that the table get reorganized. > So the in-place update is a vital part of the program, because a range of values > of column col should be stored together on the disk. Is there any way to do such > a in-place update without generating much extra overhead? > If you need in-place update, then it's possible that PostgreSQL is not the right tool for you. But, as I already said, it's difficult to know for sure as we don't know what issue you're trying to fix. -- Guillaume http://www.postgresql.fr http://dalibo.com
> So the in-place update is a vital part of the program, because a range of values
> of column col should be stored together on the disk. Is there any way to do such
> a in-place update without generating much extra overhead?
>
Although in-place update is not possible, can he not use partitioning to at least try to store the data together based on the value of column col?
> of column col should be stored together on the disk. Is there any way to do such
> a in-place update without generating much extra overhead?
>
Although in-place update is not possible, can he not use partitioning to at least try to store the data together based on the value of column col?
Since in-place seems not possible, I am trying to use arrays to store different ranges of that column, and it's fast to fetcha big array, so I believe an array is stored continuously on disk. I can convert an array to a column easily by unnest() function, but I didn't find any function that converts a column toan array. Is there a efficient method to do that? Thanks ----- Original Message ----- From: "Guillaume Lelarge" <guillaume@lelarge.info> To: "Zhipan Wang" <wzhipan@soe.ucsc.edu> Cc: pgsql-general@postgresql.org Sent: Sunday, November 28, 2010 12:59:46 AM GMT -08:00 US/Canada Pacific Subject: Re: [GENERAL] Question about OID and TCID Le 27/11/2010 09:52, Zhipan Wang a écrit : > Hi, > > I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read severalpages of data in the table starting from this tuple. You shouldn't rely on the order on disk. It will change as soon as you update one. > I guess CTID could be translated to physical address on the disk to retrieve this tuple, right? Yes, first number is the page number. The second one is the record number in that page. > If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL? For example, SELECT * FROM yourtable WHERE ctid='(0,5)'; > Can I use OID to do this equally efficiently? No, they don't reflect the order on disk. > Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old CTID,and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean itactually inserts a new tuple and the out-dated tuple still occupies the space? Yes. Other sessions could still need to see the old tuple values. > How can I write the updated tuple back to its original position to utilize disk space more efficiently? You don't need to. PostgreSQL will deal with that. As soon as you do a VACUUM, PostgreSQL will be able to use the dead space for the next UPDATE or INSERT query (if there is enough space of course). -- Guillaume http://www.postgresql.fr http://dalibo.com
On 29/11/2010 16:56, Zhipan Wang wrote: > Since in-place seems not possible, I am trying to use arrays to store > different ranges of that column, and it's fast to fetch a big array, > so I believe an array is stored continuously on disk. > > I can convert an array to a column easily by unnest() function, but I > didn't find any function that converts a column to an array. Is there > a efficient method to do that? I think array_agg() is what you want. http://www.postgresql.org/docs/9.0/static/functions-aggregate.html Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie