Thread: CTID issues and a soc student in need of help

CTID issues and a soc student in need of help

From
Tzahi Fadida
Date:
Hi,
I am a Google soc student and in need of some help 
with PostgreSQL internals:

My C function can run (and already running)
for a very very long time on some
inputs and reiterate on relations using SPI.
Basically, I open portals and cursors to relations. 
Also note that I always open the
relations in READ ONLY mode using SPI.
A big no no is that some data in the relations
would change since that would just ruin everything.

I have a great need to identify a tuple uniquely so
my prototype uses the CTID field for that purpose.
Btw, this identification is required by the algorithm
and primary keys are just wasteful and will slow the process
considerably (not to mention some relations don't have primary keys).

The question is, can the CTID field change throughout
the run of my function due to some other processes working
on the relation? Or because of command boundaries it is
pretty much secured inside an implicit transaction?
The problem wasn't so great if I didn't want to exploit
indices in the relations (but I do and does), since
after you issue a SELECT that uses indices, all you can rely on
is the CTID to uniquely identify a tuple.

The other solution is to temporarily duplicate the relations but
this is less appealing (plus it's already working great with CTIDs).

-- 
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html



Re: CTID issues and a soc student in need of help

From
Martijn van Oosterhout
Date:
On Thu, Jun 01, 2006 at 03:33:50PM +0300, Tzahi Fadida wrote:
> The question is, can the CTID field change throughout
> the run of my function due to some other processes working
> on the relation? Or because of command boundaries it is
> pretty much secured inside an implicit transaction?
> The problem wasn't so great if I didn't want to exploit
> indices in the relations (but I do and does), since
> after you issue a SELECT that uses indices, all you can rely on
> is the CTID to uniquely identify a tuple.

The CTID is the location on disk of the tuple, so no, it doesn't change
while you are running.

However, if you're running in isolation mode "read committed", then
someone else could update the tuple while you're looking at it. In this
case the tuple will appear to vanish and the updated version will
appear elsewhere with a new CTID. Whether this is a problem or not
depends on what you're using it for.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: CTID issues and a soc student in need of help

From
Tzahi Fadida
Date:
I am using CTID for the concept of a tuple set.
For example, the set of t1 from relation1, t1 from relation2, t10 from
relation3 will be represented in my function as a list
of (TableID:CTID) pairs.
For example {(1:1),(2:1),(3:10))
I then save these in bytea arrays in a tuplestore.
This is essential for the full disjunction algorithm because
you do not want to recompute joins for every such set using the actual
attribute.

>From the documentation i see that
Dirty Read,  Nonrepeatable Read and Phantom Read
are all unacceptable.
Moreover, when i said long time i meant full disjunctions
can run for hours on an extremely difficult input 
(not that most people will want to do that, but for
the general case) so it is not
realistic to lock the tables for that period.

So i have 2 follow up questions:
1) If i execute the function in a serializable isolation level
and the function is read only to the tables, is it possible
for the function to fail or other updating transactions to
either fail or wait for hours/starvation?

2) Inside the function i open and reopen cursors and portals to  relations, how can i set once, for all those opening
within the function, to have a "serializable" isolation level.  I suspect that because of command boundaries, just
running SET TRANSACTION SERIALIZABLE using SPI at the start should be enough.
 

On Thu, 2006-06-01 at 15:30 +0200, Martijn van Oosterhout wrote:
> 
> The CTID is the location on disk of the tuple, so no, it doesn't change
> while you are running.
> 
> However, if you're running in isolation mode "read committed", then
> someone else could update the tuple while you're looking at it. In this
> case the tuple will appear to vanish and the updated version will
> appear elsewhere with a new CTID. Whether this is a problem or not
> depends on what you're using it for.
> 
> Hope this helps,



Re: CTID issues and a soc student in need of help

From
Tom Lane
Date:
Tzahi Fadida <tzahi.ml@gmail.com> writes:
> I am using CTID for the concept of a tuple set.
> For example, the set of t1 from relation1, t1 from relation2, t10 from
> relation3 will be represented in my function as a list
> of (TableID:CTID) pairs.
> For example {(1:1),(2:1),(3:10))
> I then save these in bytea arrays in a tuplestore.
> This is essential for the full disjunction algorithm because
> you do not want to recompute joins for every such set using the actual
> attribute.

I think this is OK within the context of a single SQL command, since
tuple visibility should not change for that command.  If you were trying
to use the CTID info across multiple statements then it'd get worrisome.
        regards, tom lane


Re: CTID issues and a soc student in need of help

From
Tzahi Fadida
Date:
I am not sure about the definition of a context of a single SQL command.

Example of a run:

A <- SELECT getfdr('Relation1,Relation2,Relation3');
to get the result schema (takes a few milliseconds).
SELECT * FROM FullDisjunctions('Relation1,Relation2,Relation3') AS
RECORD A;
Can take a long time.

Inside C-language FullDisjunctions() function i repeatedly call, using
SPI:
SELECT * FROM Relation1;
SELECT * FROM Relation2;
SELECT * FROM Relation1 WHERE...;
SELECT * FROM Relation3;
....

So i call using one SQL to FullDisjunction and subsequent SQL calls in
it. I wish that the context of the overall SELECT FullDisjunctions
will be perfectly unchanged for all subsequent SQL calls inside it
and especially the CTID.

p.s.: In a different version of the function i create a temporary
relation and insert tuples in it, but it is exclusively used and
destroyed by the specific instance of that function. I hope it does not
affect anything in the general sense of a read only transaction.

10x!
Regards,tzahi.

On Thu, 2006-06-01 at 11:28 -0400, Tom Lane wrote:
> 
> I think this is OK within the context of a single SQL command, since
> tuple visibility should not change for that command.  If you were trying
> to use the CTID info across multiple statements then it'd get worrisome.
> 
>             regards, tom lane



Re: CTID issues and a soc student in need of help

From
Tom Lane
Date:
Tzahi Fadida <tzahi.ml@gmail.com> writes:
> I am not sure about the definition of a context of a single SQL command.

Well, AFAICS selecting a disjunction ought to qualify as a single SQL
command using a single snapshot.  It's not that different from a JOIN
or UNION operation, no?

> Inside C-language FullDisjunctions() function i repeatedly call, using
> SPI:
> SELECT * FROM Relation1;
> SELECT * FROM Relation2;
> SELECT * FROM Relation1 WHERE...;
> SELECT * FROM Relation3;
> ....

You would need to force all these operations to be done with the same
snapshot; should be possible with SPI_execute_snapshot.  But really the
above sounds like a toy prototype implementation to me.  Why aren't you
building this as executor plan-tree machinery?

> p.s.: In a different version of the function i create a temporary
> relation and insert tuples in it, but it is exclusively used and
> destroyed by the specific instance of that function.

Why?  You could use a tuplestore for transient data.
        regards, tom lane


Re: CTID issues and a soc student in need of help

From
Tzahi Fadida
Date:
On Thu, 2006-06-01 at 12:45 -0400, Tom Lane wrote:
> Tzahi Fadida <tzahi.ml@gmail.com> writes:
> > I am not sure about the definition of a context of a single SQL command.
> 
> Well, AFAICS selecting a disjunction ought to qualify as a single SQL
> command using a single snapshot.  It's not that different from a JOIN
> or UNION operation, no?

Yes, it is (at least the current version i am implementing) a one shot
computation. It is computed top-down and not bottom-up as regular
joins. For example, A natural join B natural join C can be broken down
to a left deep plan tree. Full disjunctions cannot be broken into such a
thing (in this version) and FD('A,B,C') directly returns a set of
results.

> 
> > Inside C-language FullDisjunctions() function i repeatedly call, using
> > SPI:
> > SELECT * FROM Relation1;
> > SELECT * FROM Relation2;
> > SELECT * FROM Relation1 WHERE...;
> > SELECT * FROM Relation3;
> > ....
> 
> You would need to force all these operations to be done with the same
> snapshot; should be possible with SPI_execute_snapshot.  But really the
> above sounds like a toy prototype implementation to me.  Why aren't you
> building this as executor plan-tree machinery?

I actually use cursors because i reiterate on the
"SELECT * FROM Relation1" queries using the FETCH_ALL technique.
Hopefully cursors uses something similar to SPI_execute_snapshot?
(maybe on READ_ONLY that i use. i see it uses something called
ActiveSnapshot)
(but for WHERE queries that are intended to exploit indices in
the relations i must execute repeatedly).

The reason, is two fold.
- At this time i don't see any big advantage (aside from the schema) 
in putting it in the parser and subsequently the executor.
- I want to work inside the frame of time for the soc.

I think that i should first have a stable contrib module that looks
acceptable before i continue to something more problematic to maintain. 

We have a new paper that was accepted to VLDB yesterday that breaks down
the problem into smaller ones + iterators + have polynomial delay that
is suited for streaming, hence the possibility for implementing in
the planner but it's too complex for soc. Lets have a stable something
first.

> 
> > p.s.: In a different version of the function i create a temporary
> > relation and insert tuples in it, but it is exclusively used and
> > destroyed by the specific instance of that function.
> 
> Why?  You could use a tuplestore for transient data.

I do use tuplestore, but the other version needs an index and you can't
put an index on a tuplestore. Unless, you can give me a hint on how to
create a btree/hash index without a relation but that can be stored on
disk like tuplestore. I.e. all data is stored in the index. The key is
the whole tuple (the array of CTIDs) anyway.

> 
>             regards, tom lane