Re: CTID issues and a soc student in need of help - Mailing list pgsql-hackers

From Tzahi Fadida
Subject Re: CTID issues and a soc student in need of help
Date
Msg-id 1149183564.4871.71.camel@llord
Whole thread Raw
In response to Re: CTID issues and a soc student in need of help  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, 2006-06-01 at 12:45 -0400, Tom Lane wrote:
> Tzahi Fadida <tzahi.ml@gmail.com> writes:
> > I am not sure about the definition of a context of a single SQL command.
> 
> Well, AFAICS selecting a disjunction ought to qualify as a single SQL
> command using a single snapshot.  It's not that different from a JOIN
> or UNION operation, no?

Yes, it is (at least the current version i am implementing) a one shot
computation. It is computed top-down and not bottom-up as regular
joins. For example, A natural join B natural join C can be broken down
to a left deep plan tree. Full disjunctions cannot be broken into such a
thing (in this version) and FD('A,B,C') directly returns a set of
results.

> 
> > Inside C-language FullDisjunctions() function i repeatedly call, using
> > SPI:
> > SELECT * FROM Relation1;
> > SELECT * FROM Relation2;
> > SELECT * FROM Relation1 WHERE...;
> > SELECT * FROM Relation3;
> > ....
> 
> You would need to force all these operations to be done with the same
> snapshot; should be possible with SPI_execute_snapshot.  But really the
> above sounds like a toy prototype implementation to me.  Why aren't you
> building this as executor plan-tree machinery?

I actually use cursors because i reiterate on the
"SELECT * FROM Relation1" queries using the FETCH_ALL technique.
Hopefully cursors uses something similar to SPI_execute_snapshot?
(maybe on READ_ONLY that i use. i see it uses something called
ActiveSnapshot)
(but for WHERE queries that are intended to exploit indices in
the relations i must execute repeatedly).

The reason, is two fold.
- At this time i don't see any big advantage (aside from the schema) 
in putting it in the parser and subsequently the executor.
- I want to work inside the frame of time for the soc.

I think that i should first have a stable contrib module that looks
acceptable before i continue to something more problematic to maintain. 

We have a new paper that was accepted to VLDB yesterday that breaks down
the problem into smaller ones + iterators + have polynomial delay that
is suited for streaming, hence the possibility for implementing in
the planner but it's too complex for soc. Lets have a stable something
first.

> 
> > p.s.: In a different version of the function i create a temporary
> > relation and insert tuples in it, but it is exclusively used and
> > destroyed by the specific instance of that function.
> 
> Why?  You could use a tuplestore for transient data.

I do use tuplestore, but the other version needs an index and you can't
put an index on a tuplestore. Unless, you can give me a hint on how to
create a btree/hash index without a relation but that can be stored on
disk like tuplestore. I.e. all data is stored in the index. The key is
the whole tuple (the array of CTIDs) anyway.

> 
>             regards, tom lane



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: More thoughts about planner's cost estimates
Next
From: Greg Stark
Date:
Subject: Re: More thoughts about planner's cost estimates