Thread: Postgres code for a query intermediate dataset
Hi All,
I want to work on the code of intermediate dataset of select and update query.
For example.
Rohit's salary has been updated 4 times, so it has 4 different version of salary.
I want to select salary of person named Rohit. Now suppose , in intermediate result, I found 4 different versions of the data. I want to know the code portion which i need to look for working on all 4 versions in dataset. :)
Thanks in advance!!
Regards,
Rohit Goyal
Regards,
Rohit Goyal
On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <rhtgyl.87@gmail.com> wrote:
-- Hi All,I want to work on the code of intermediate dataset of select and update query.For example.Rohit's salary has been updated 4 times, so it has 4 different version of salary.I want to select salary of person named Rohit. Now suppose , in intermediate result, I found 4 different versions of the data. I want to know the code portion which i need to look for working on all 4 versions in dataset. :)Thanks in advance!!
Not sure what you are looking for, but each update is an insert of a new tuple with the new values and marking the old tuple as deleted.
There is no need for tracking the versions of any changes in data set. Each update operation leaves only one visible tuple. If the transaction commits, inserted tuple becomes visible and old row is marked deleted. If the transaction rollbacks, only the old tuple shall remain visible.
Regards,
Atri
l'apprenant
Atri Sharma wrote > On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal < > rhtgyl.87@ > > wrote: > >> >> Hi All, >> >> I want to work on the code of intermediate dataset of select and update >> query. >> >> For example. >> >> Rohit's salary has been updated 4 times, so it has 4 different version of >> salary. >> >> I want to select salary of person named Rohit. Now suppose , in >> intermediate result, I found 4 different versions of the data. I want to >> know the code portion which i need to look for working on all 4 versions >> in >> dataset. :) >> >> Thanks in advance!! >> >> >> > Not sure what you are looking for, but each update is an insert of a new > tuple with the new values and marking the old tuple as deleted. > > There is no need for tracking the versions of any changes in data set. > Each > update operation leaves only one visible tuple. If the transaction > commits, > inserted tuple becomes visible and old row is marked deleted. If the > transaction rollbacks, only the old tuple shall remain visible. > -- > Regards, > > Atri > *l'apprenant* Or rather even if you want to be able to reference the older versions of that record there is nothing in PostgreSQL to facilitate that. You have to manually create and manage the data so that you know during what time period a given record is valid. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgres-code-for-a-query-intermediate-dataset-tp5818931p5818935.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Sat, Sep 13, 2014 at 11:52 PM, David G Johnston <david.g.johnston@gmail.com> wrote:
-- Atri Sharma wrote
> On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <
Or rather even if you want to be able to reference the older versions of
that record there is nothing in PostgreSQL to facilitate that. You have to
manually create and manage the data so that you know during what time period
a given record is valid.
David J.
Sometimes I do miss 'time travel' we used to have :)
Regards,
Regards,
Atri
Regards,
Atri
l'apprenant
On 14/09/14 06:35, Atri Sharma wrote:
That is only because the Guild of Time Travellers was formed, and we are very selective in whom we allow to join. It was a massive undertaking to purge the knowledge of effective time travel from the general populace (H. G. Wells had to be expelled with a partial brain wipe)! :-)On Sat, Sep 13, 2014 at 11:52 PM, David G Johnston <david.g.johnston@gmail.com> wrote:--Atri Sharma wrote
> On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <
Or rather even if you want to be able to reference the older versions of
that record there is nothing in PostgreSQL to facilitate that. You have to
manually create and manage the data so that you know during what time period
a given record is valid.
David J.Sometimes I do miss 'time travel' we used to have :)
Regards,AtriRegards,Atril'apprenant
On a more serious note:
I did design and implement a system to allow what the original poster was after, it involved 2 tables for each logical table, and used both an EFFECTIVE_DATE & an AS_AT_DATE. This allowed insurance quotes to be valid for a given of time, even if the insurance rates were set change after the quote was given (but before the quote expired). This was about 15 years ago. It was amusing that my wife joined that team 10 years after I left, and found 2 of my original colleagues still there!
Cheers,
Gavin
On 14/09/14 05:36, Rohit Goyal wrote: > Hi All, > > I want to work on the code of intermediate dataset of select and update > query. > > For example. > > Rohit's salary has been updated 4 times, so it has 4 different version > of salary. > > I want to select salary of person named Rohit. Now suppose , in > intermediate result, I found 4 different versions of the data. I want to > know the code portion which i need to look for working on all 4 versions > in dataset. :) > > Hi Rohit, Currently in Postgres, these intermediate versions all exist - however a given session can only see one of them. Also VACUUM is allowed to destroy versions that no other transactions can see. So if I'm understanding you correctly, you would like to have some way for a session to see *all* these versions (and I guess preventing VACUUM from destroying them). It is certainly possible (or used to be via snapshot manipulation, I haven't looked at that code in a while sorry) to enable a session to see all the old versions, and is quite a cool idea (Postgres used to have this ability in older versions - called Time Travel). For pure practicality, this can be achieved without any code modifications using TRIGGERs and an extra table (as Gavin alludes to). Do tell us a bit more about what you are wanting to do! Cheers Mark
On Sunday, September 14, 2014, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 14/09/14 05:36, Rohit Goyal wrote:Hi All,
I want to work on the code of intermediate dataset of select and update
query.
For example.
Rohit's salary has been updated 4 times, so it has 4 different version
of salary.
I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions
in dataset. :)
Hi Rohit,
Currently in Postgres, these intermediate versions all exist - however a given session can only see one of them. Also VACUUM is allowed to destroy versions that no other transactions can see.
So if I'm understanding you correctly, you would like to have some way for a session to see *all* these versions (and I guess preventing VACUUM from destroying them).
Any modifications of that sort are bound to introduce lots of pain, not to mention performance degradation and the added responsibility of ensuring that dead tuples don't bloat up the system (prevent vacuum from running at regular intervals and you can have a xid wraparound).
Regards,
Atri
--
Regards,
Atri
l'apprenant
On 14/09/14 19:25, Atri Sharma wrote: > > > On Sunday, September 14, 2014, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>> > wrote: > > On 14/09/14 05:36, Rohit Goyal wrote: > > Hi All, > > I want to work on the code of intermediate dataset of select and > update > query. > > For example. > > Rohit's salary has been updated 4 times, so it has 4 different > version > of salary. > > I want to select salary of person named Rohit. Now suppose , in > intermediate result, I found 4 different versions of the data. I > want to > know the code portion which i need to look for working on all 4 > versions > in dataset. :) > > > > Hi Rohit, > > Currently in Postgres, these intermediate versions all exist - > however a given session can only see one of them. Also VACUUM is > allowed to destroy versions that no other transactions can see. > > So if I'm understanding you correctly, you would like to have some > way for a session to see *all* these versions (and I guess > preventing VACUUM from destroying them). > > > > Any modifications of that sort are bound to introduce lots of pain, not > to mention performance degradation and the added responsibility of > ensuring that dead tuples don't bloat up the system (prevent vacuum from > running at regular intervals and you can have a xid wraparound). > > I just mentioned that in case you are planning to go in that direction. > If you only want the data, use the triggers as Gavin mentioned. > Obviously in the general case sure - but (as yet) we don't have much idea about Rohit's use case and workload. If retrieving past versions is the *primary* workload bias and high update concurrency is not required then this could well work better than a trigger based solution. And it does not seem too onerous to have the ability to switch this on as required, viz: ALTER TABLE table1 VERSIONING; (or similar syntax) which makes VACUUM leave this table alone. It might make more sense to make such a concept apply to a TABLESPACE instead mind you (i.e things in here are for archive/versioning purposes)... Clearly we'd need to see the code for any of this and evaluate if it is good or terrible, but I'm not seeing the idea as bad as stated. Cheers Mark
Hi Mark,
--
Regards,
On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 14/09/14 05:36, Rohit Goyal wrote:Hi All,
I want to work on the code of intermediate dataset of select and update
query.
For example.
Rohit's salary has been updated 4 times, so it has 4 different version
of salary.
I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions
in dataset. :)
Hi Rohit,
Currently in Postgres, these intermediate versions all exist - however a given session can only see one of them. Also VACUUM is allowed to destroy versions that no other transactions can see.
Exactly, one visible version is there per session. But, I want to test my algorithm in which i myself want to work on all the intermediate version and find the correct one for the session.
So if I'm understanding you correctly, you would like to have some way for a session to see *all* these versions (and I guess preventing VACUUM from destroying them).
yes and I also want to know the portion of code where i can find all the historical or intermediate versions
It is certainly possible (or used to be via snapshot manipulation, I haven't looked at that code in a while sorry) to enable a session to see all the old versions, and is quite a cool idea (Postgres used to have this ability in older versions - called Time Travel).
For pure practicality, this can be achieved without any code modifications using TRIGGERs and an extra table (as Gavin alludes to).
Can you explain me more about how to starting working using trigger?
Do tell us a bit more about what you are wanting to do!
Cheers
Mark
Regards,
Rohit Goyal
On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 14/09/14 19:25, Atri Sharma wrote:
On Sunday, September 14, 2014, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>
wrote:
On 14/09/14 05:36, Rohit Goyal wrote:
Hi All,
I want to work on the code of intermediate dataset of select and
update
query.
For example.
Rohit's salary has been updated 4 times, so it has 4 different
version
of salary.
I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I
want to
know the code portion which i need to look for working on all 4
versions
in dataset. :)
Hi Rohit,
Currently in Postgres, these intermediate versions all exist -
however a given session can only see one of them. Also VACUUM is
allowed to destroy versions that no other transactions can see.
So if I'm understanding you correctly, you would like to have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).
Any modifications of that sort are bound to introduce lots of pain, not
to mention performance degradation and the added responsibility of
ensuring that dead tuples don't bloat up the system (prevent vacuum from
running at regular intervals and you can have a xid wraparound).
I just mentioned that in case you are planning to go in that direction.
If you only want the data, use the triggers as Gavin mentioned.
Obviously in the general case sure - but (as yet) we don't have much idea about Rohit's use case and workload. If retrieving past versions is the *primary* workload bias and high update concurrency is not required then this could well work better than a trigger based solution.
And it does not seem too onerous to have the ability to switch this on as required, viz:
ALTER TABLE table1 VERSIONING;
(or similar syntax) which makes VACUUM leave this table alone.
How do you plan to do all that VACUUM does for this table then?
It seems to me that you are saying to VACUUM that it need not be concerned with table 'A' and you are assuming ownership of all the tasks performed by VACUUM for this table. Seems pretty broken to me, not to mention the performance degradations.
Regards,
Atri
Regards,
Atri
Regards,
Atri
Atri
On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 14/09/14 19:25, Atri Sharma wrote:
On Sunday, September 14, 2014, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>
wrote:
On 14/09/14 05:36, Rohit Goyal wrote:
Hi All,
I want to work on the code of intermediate dataset of select and
update
query.
For example.
Rohit's salary has been updated 4 times, so it has 4 different
version
of salary.
I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I
want to
know the code portion which i need to look for working on all 4
versions
in dataset. :)
Hi Rohit,
Currently in Postgres, these intermediate versions all exist -
however a given session can only see one of them. Also VACUUM is
allowed to destroy versions that no other transactions can see.
So if I'm understanding you correctly, you would like to have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).
Any modifications of that sort are bound to introduce lots of pain, not
to mention performance degradation and the added responsibility of
ensuring that dead tuples don't bloat up the system (prevent vacuum from
running at regular intervals and you can have a xid wraparound).
I just mentioned that in case you are planning to go in that direction.
If you only want the data, use the triggers as Gavin mentioned.
Obviously in the general case sure - but (as yet) we don't have much idea about Rohit's use case and workload. If retrieving past versions is the *primary* workload bias and high update concurrency is not required then this could well work better than a trigger based solution.
And it does not seem too onerous to have the ability to switch this on as required, viz:
ALTER TABLE table1 VERSIONING;
(or similar syntax) which makes VACUUM leave this table alone. It might make more sense to make such a concept apply to a TABLESPACE instead mind you (i.e things in here are for archive/versioning purposes)...
What I think can be done is have a tuplestore which has the delta of updated rows i.e. only have the changes made in an update statement stored in a tuplestore (it could be a part of RelationData). It should be simple enough to have tuplestore store the oid of the inserted tuple and the difference between new tuple and the old tuple. No changes need to be done for old tuple since it can be marked as deleted and VACUUM can remove it as normal logic.
Not a clean way, but should work for what you proposed.
On 14/09/14 20:24, Atri Sharma wrote: > > > On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>> > wrote: > > On 14/09/14 19:25, Atri Sharma wrote: > > > > On Sunday, September 14, 2014, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz > <mailto:mark.kirkwood@catalyst.net.nz> > <mailto:mark.kirkwood@__catalyst.net.nz > <mailto:mark.kirkwood@catalyst.net.nz>>> > > wrote: > > On 14/09/14 05:36, Rohit Goyal wrote: > > Hi All, > > I want to work on the code of intermediate dataset of > select and > update > query. > > For example. > > Rohit's salary has been updated 4 times, so it has 4 > different > version > of salary. > > I want to select salary of person named Rohit. Now > suppose , in > intermediate result, I found 4 different versions of > the data. I > want to > know the code portion which i need to look for working > on all 4 > versions > in dataset. :) > > > > Hi Rohit, > > Currently in Postgres, these intermediate versions all exist - > however a given session can only see one of them. Also > VACUUM is > allowed to destroy versions that no other transactions can see. > > So if I'm understanding you correctly, you would like to > have some > way for a session to see *all* these versions (and I guess > preventing VACUUM from destroying them). > > > > Any modifications of that sort are bound to introduce lots of > pain, not > to mention performance degradation and the added responsibility of > ensuring that dead tuples don't bloat up the system (prevent > vacuum from > running at regular intervals and you can have a xid wraparound). > > I just mentioned that in case you are planning to go in that > direction. > If you only want the data, use the triggers as Gavin mentioned. > > > Obviously in the general case sure - but (as yet) we don't have much > idea about Rohit's use case and workload. If retrieving past > versions is the *primary* workload bias and high update concurrency > is not required then this could well work better than a trigger > based solution. > > And it does not seem too onerous to have the ability to switch this > on as required, viz: > > ALTER TABLE table1 VERSIONING; > > (or similar syntax) which makes VACUUM leave this table alone. > > > How do you plan to do all that VACUUM does for this table then? > > It seems to me that you are saying to VACUUM that it need not be > concerned with table 'A' and you are assuming ownership of all the tasks > performed by VACUUM for this table. Seems pretty broken to me, not to > mention the performance degradations. > I think the whole point of such a modification is that nothing is done to such tables, as you want to see all the previous versions. Clearly this is less performant for standard workloads...but we are talking about non standard workloads surely... Regards Mark
On 14/09/14 20:11, Rohit Goyal wrote: > Hi Mark, > > On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>> > wrote: > Currently in Postgres, these intermediate versions all exist - > however a given session can only see one of them. Also VACUUM is > allowed to destroy versions that no other transactions can see. > > Exactly, one visible version is there per session. But, I want to test > my algorithm in which i myself want to work on all the intermediate > version and find the correct one for the session. > > So if I'm understanding you correctly, you would like to have some > way for a session to see *all* these versions (and I guess > preventing VACUUM from destroying them). > > yes and I also want to know the portion of code where i can find all the > historical or intermediate versions Well that's going to be a bit of a learning curve for you :-), the concept to get familiar with is snapshots (see src/backand/access/heap/heapam.c to get started). > > > It is certainly possible (or used to be via snapshot manipulation, I > haven't looked at that code in a while sorry) to enable a session to > see all the old versions, and is quite a cool idea (Postgres used to > have this ability in older versions - called Time Travel). > > For pure practicality, this can be achieved without any code > modifications using TRIGGERs and an extra table (as Gavin alludes to). > > Can you explain me more about how to starting working using trigger? I'm not sure we have specif examples in the docs for what you want to do, but generally see http://www.postgresql.org/docs/9.4/static/plpgsql-trigger.html Cheers Mark
Hi Mark & Atri, :)
Thanks for reply. But, I think i confused you. I am talking about access using indexes. So, I assume that B+ tree store key-value pair where rohit is the key and all the versions are its value.
Another way to think is I have a secondary index on emp. name and there are 4 rohit exist in DB. So, now B+ tree gives me 4 different tuple pointer for each Rohit. I want to know the code portion for this where i can see all 4 tuple pointer before each one have I/O access to fetch its tuple.
Are the suggestions still valid?
--
Regards,
On Sun, Sep 14, 2014 at 10:53 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 14/09/14 20:11, Rohit Goyal wrote:Hi Mark,
On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>>
wrote:Currently in Postgres, these intermediate versions all exist -
however a given session can only see one of them. Also VACUUM is
allowed to destroy versions that no other transactions can see.
Exactly, one visible version is there per session. But, I want to test
my algorithm in which i myself want to work on all the intermediate
version and find the correct one for the session.
So if I'm understanding you correctly, you would like to have some
way for a session to see *all* these versions (and I guess
preventing VACUUM from destroying them).
yes and I also want to know the portion of code where i can find all the
historical or intermediate versions
Well that's going to be a bit of a learning curve for you :-), the concept to get familiar with is snapshots (see src/backand/access/heap/heapam.c to get started).
Thanks I will read it. :) Can you please tel me some specifics from this c file, if you already know :)
It is certainly possible (or used to be via snapshot manipulation, I
haven't looked at that code in a while sorry) to enable a session to
see all the old versions, and is quite a cool idea (Postgres used to
have this ability in older versions - called Time Travel).
For pure practicality, this can be achieved without any code
modifications using TRIGGERs and an extra table (as Gavin alludes to).
Can you explain me more about how to starting working using trigger?
I'm not sure we have specif examples in the docs for what you want to do, but generally see http://www.postgresql.org/docs/9.4/static/plpgsql-trigger.html
Cheers
Mark
Regards,
Rohit Goyal
On 14/09/14 21:18, Rohit Goyal wrote: > Hi Mark & Atri, :) > > Thanks for reply. But, I think i confused you. I am talking about access > using indexes. So, I assume that B+ tree store key-value pair where > rohit is the key and all the versions are its value. > > Another way to think is I have a secondary index on emp. name and there > are 4 rohit exist in DB. So, now B+ tree gives me 4 different tuple > pointer for each Rohit. I want to know the code portion for this where i > can see all 4 tuple pointer before each one have I/O access to fetch its > tuple. > > Are the suggestions still valid? > Visibility rules mentioned earlier apply equally to tables and indexes (strictly speaking what happens is index tuples are checked against the relevant tables to see if your session can see them), so discussion of whether tuples are retrieved via index or table scans is not really relevant (i.e query planning/optimization is separate from tuple visibility). Cheers Mark
On 14/09/14 20:43, Mark Kirkwood wrote: > On 14/09/14 20:24, Atri Sharma wrote: >> >> How do you plan to do all that VACUUM does for this table then? >> >> It seems to me that you are saying to VACUUM that it need not be >> concerned with table 'A' and you are assuming ownership of all the tasks >> performed by VACUUM for this table. Seems pretty broken to me, not to >> mention the performance degradations. >> > > I think the whole point of such a modification is that nothing is done > to such tables, as you want to see all the previous versions. > > Clearly this is less performant for standard workloads...but we are > talking about non standard workloads surely... To be fair with respect to what Atri is saying, I should have said something like: Clearly this is *horribly* less performant for standard workloads...etc :-) Also there is the good point he raised about transaction xid wrap, so some messing about with that part of VACUUM would be required too (it's the little complications that all add up)! The TRIGGER based approach is clearly a lot simpler! However for an interest project to understand Postgres internals the other approach is worthwhile. Cheers Mark
On Sun, Sep 14, 2014 at 5:18 AM, Rohit Goyal <rhtgyl.87@gmail.com> wrote: > Thanks for reply. But, I think i confused you. I am talking about access > using indexes. So, I assume that B+ tree store key-value pair where rohit is > the key and all the versions are its value. > > Another way to think is I have a secondary index on emp. name and there are > 4 rohit exist in DB. So, now B+ tree gives me 4 different tuple pointer for > each Rohit. I want to know the code portion for this where i can see all 4 > tuple pointer before each one have I/O access to fetch its tuple. You may want to look at index_getnext(), index_getnext_tid(), and/or heap_hot_search_buffer(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company