Thread: CTID issues and a soc student in need of help
Hi, I am a Google soc student and in need of some help with PostgreSQL internals: My C function can run (and already running) for a very very long time on some inputs and reiterate on relations using SPI. Basically, I open portals and cursors to relations. Also note that I always open the relations in READ ONLY mode using SPI. A big no no is that some data in the relations would change since that would just ruin everything. I have a great need to identify a tuple uniquely so my prototype uses the CTID field for that purpose. Btw, this identification is required by the algorithm and primary keys are just wasteful and will slow the process considerably (not to mention some relations don't have primary keys). The question is, can the CTID field change throughout the run of my function due to some other processes working on the relation? Or because of command boundaries it is pretty much secured inside an implicit transaction? The problem wasn't so great if I didn't want to exploit indices in the relations (but I do and does), since after you issue a SELECT that uses indices, all you can rely on is the CTID to uniquely identify a tuple. The other solution is to temporarily duplicate the relations but this is less appealing (plus it's already working great with CTIDs). -- Tzahi Fadida Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info WARNING TO SPAMMERS: see at http://members.lycos.co.uk/my2nis/spamwarning.html
On Thu, Jun 01, 2006 at 03:33:50PM +0300, Tzahi Fadida wrote: > The question is, can the CTID field change throughout > the run of my function due to some other processes working > on the relation? Or because of command boundaries it is > pretty much secured inside an implicit transaction? > The problem wasn't so great if I didn't want to exploit > indices in the relations (but I do and does), since > after you issue a SELECT that uses indices, all you can rely on > is the CTID to uniquely identify a tuple. The CTID is the location on disk of the tuple, so no, it doesn't change while you are running. However, if you're running in isolation mode "read committed", then someone else could update the tuple while you're looking at it. In this case the tuple will appear to vanish and the updated version will appear elsewhere with a new CTID. Whether this is a problem or not depends on what you're using it for. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
I am using CTID for the concept of a tuple set. For example, the set of t1 from relation1, t1 from relation2, t10 from relation3 will be represented in my function as a list of (TableID:CTID) pairs. For example {(1:1),(2:1),(3:10)) I then save these in bytea arrays in a tuplestore. This is essential for the full disjunction algorithm because you do not want to recompute joins for every such set using the actual attribute. >From the documentation i see that Dirty Read, Nonrepeatable Read and Phantom Read are all unacceptable. Moreover, when i said long time i meant full disjunctions can run for hours on an extremely difficult input (not that most people will want to do that, but for the general case) so it is not realistic to lock the tables for that period. So i have 2 follow up questions: 1) If i execute the function in a serializable isolation level and the function is read only to the tables, is it possible for the function to fail or other updating transactions to either fail or wait for hours/starvation? 2) Inside the function i open and reopen cursors and portals to relations, how can i set once, for all those opening within the function, to have a "serializable" isolation level. I suspect that because of command boundaries, just running SET TRANSACTION SERIALIZABLE using SPI at the start should be enough. On Thu, 2006-06-01 at 15:30 +0200, Martijn van Oosterhout wrote: > > The CTID is the location on disk of the tuple, so no, it doesn't change > while you are running. > > However, if you're running in isolation mode "read committed", then > someone else could update the tuple while you're looking at it. In this > case the tuple will appear to vanish and the updated version will > appear elsewhere with a new CTID. Whether this is a problem or not > depends on what you're using it for. > > Hope this helps,
Tzahi Fadida <tzahi.ml@gmail.com> writes: > I am using CTID for the concept of a tuple set. > For example, the set of t1 from relation1, t1 from relation2, t10 from > relation3 will be represented in my function as a list > of (TableID:CTID) pairs. > For example {(1:1),(2:1),(3:10)) > I then save these in bytea arrays in a tuplestore. > This is essential for the full disjunction algorithm because > you do not want to recompute joins for every such set using the actual > attribute. I think this is OK within the context of a single SQL command, since tuple visibility should not change for that command. If you were trying to use the CTID info across multiple statements then it'd get worrisome. regards, tom lane
I am not sure about the definition of a context of a single SQL command. Example of a run: A <- SELECT getfdr('Relation1,Relation2,Relation3'); to get the result schema (takes a few milliseconds). SELECT * FROM FullDisjunctions('Relation1,Relation2,Relation3') AS RECORD A; Can take a long time. Inside C-language FullDisjunctions() function i repeatedly call, using SPI: SELECT * FROM Relation1; SELECT * FROM Relation2; SELECT * FROM Relation1 WHERE...; SELECT * FROM Relation3; .... So i call using one SQL to FullDisjunction and subsequent SQL calls in it. I wish that the context of the overall SELECT FullDisjunctions will be perfectly unchanged for all subsequent SQL calls inside it and especially the CTID. p.s.: In a different version of the function i create a temporary relation and insert tuples in it, but it is exclusively used and destroyed by the specific instance of that function. I hope it does not affect anything in the general sense of a read only transaction. 10x! Regards,tzahi. On Thu, 2006-06-01 at 11:28 -0400, Tom Lane wrote: > > I think this is OK within the context of a single SQL command, since > tuple visibility should not change for that command. If you were trying > to use the CTID info across multiple statements then it'd get worrisome. > > regards, tom lane
Tzahi Fadida <tzahi.ml@gmail.com> writes: > I am not sure about the definition of a context of a single SQL command. Well, AFAICS selecting a disjunction ought to qualify as a single SQL command using a single snapshot. It's not that different from a JOIN or UNION operation, no? > Inside C-language FullDisjunctions() function i repeatedly call, using > SPI: > SELECT * FROM Relation1; > SELECT * FROM Relation2; > SELECT * FROM Relation1 WHERE...; > SELECT * FROM Relation3; > .... You would need to force all these operations to be done with the same snapshot; should be possible with SPI_execute_snapshot. But really the above sounds like a toy prototype implementation to me. Why aren't you building this as executor plan-tree machinery? > p.s.: In a different version of the function i create a temporary > relation and insert tuples in it, but it is exclusively used and > destroyed by the specific instance of that function. Why? You could use a tuplestore for transient data. regards, tom lane
On Thu, 2006-06-01 at 12:45 -0400, Tom Lane wrote: > Tzahi Fadida <tzahi.ml@gmail.com> writes: > > I am not sure about the definition of a context of a single SQL command. > > Well, AFAICS selecting a disjunction ought to qualify as a single SQL > command using a single snapshot. It's not that different from a JOIN > or UNION operation, no? Yes, it is (at least the current version i am implementing) a one shot computation. It is computed top-down and not bottom-up as regular joins. For example, A natural join B natural join C can be broken down to a left deep plan tree. Full disjunctions cannot be broken into such a thing (in this version) and FD('A,B,C') directly returns a set of results. > > > Inside C-language FullDisjunctions() function i repeatedly call, using > > SPI: > > SELECT * FROM Relation1; > > SELECT * FROM Relation2; > > SELECT * FROM Relation1 WHERE...; > > SELECT * FROM Relation3; > > .... > > You would need to force all these operations to be done with the same > snapshot; should be possible with SPI_execute_snapshot. But really the > above sounds like a toy prototype implementation to me. Why aren't you > building this as executor plan-tree machinery? I actually use cursors because i reiterate on the "SELECT * FROM Relation1" queries using the FETCH_ALL technique. Hopefully cursors uses something similar to SPI_execute_snapshot? (maybe on READ_ONLY that i use. i see it uses something called ActiveSnapshot) (but for WHERE queries that are intended to exploit indices in the relations i must execute repeatedly). The reason, is two fold. - At this time i don't see any big advantage (aside from the schema) in putting it in the parser and subsequently the executor. - I want to work inside the frame of time for the soc. I think that i should first have a stable contrib module that looks acceptable before i continue to something more problematic to maintain. We have a new paper that was accepted to VLDB yesterday that breaks down the problem into smaller ones + iterators + have polynomial delay that is suited for streaming, hence the possibility for implementing in the planner but it's too complex for soc. Lets have a stable something first. > > > p.s.: In a different version of the function i create a temporary > > relation and insert tuples in it, but it is exclusively used and > > destroyed by the specific instance of that function. > > Why? You could use a tuplestore for transient data. I do use tuplestore, but the other version needs an index and you can't put an index on a tuplestore. Unless, you can give me a hint on how to create a btree/hash index without a relation but that can be stored on disk like tuplestore. I.e. all data is stored in the index. The key is the whole tuple (the array of CTIDs) anyway. > > regards, tom lane