Thread: Constraint Exclusion + Joins?

Constraint Exclusion + Joins?

From
"Brandon Black"
Date:
I was wondering (for planning purposes) if anyone knew the status of
constraint exclusions moving up to query runtime and working for
joins.  Is this something that's coming down the pipe in the
foreseeable future, or just on a back-burner to-do list, or probably
never happening, or... ?

I have a painful work-around for my particular case that's good enough
for now, but it would be helpful to know whether there's a good
probability I can convert my code to do things the easy way somewhere
in the foreseeable future if/when this feature goes in, or whether I
should consider design changes now before my problems grow.

Thanks,
Brandon


Re: Constraint Exclusion + Joins?

From
Tom Lane
Date:
"Brandon Black" <blblack@gmail.com> writes:
> I was wondering (for planning purposes) if anyone knew the status of
> constraint exclusions moving up to query runtime and working for
> joins.

The latter, done; the former, not on the radar screen IMHO.
        regards, tom lane


Re: Constraint Exclusion + Joins?

From
"Brandon Black"
Date:
On 4/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Brandon Black" <blblack@gmail.com> writes:
> > I was wondering (for planning purposes) if anyone knew the status of
> > constraint exclusions moving up to query runtime and working for
> > joins.
>
> The latter, done; the former, not on the radar screen IMHO.

I didn't really care about the former, I was just under the impression
it had to happen first based on some old list posts on the subject.

I dug around in CVS to have a look for this, and I did eventually find
it (well, I found the corresponding docs patch that removed the note
about not working for joins).  I see it's in MAIN but not in
8_1_STABLE.  Does that mean it's headed for 8.2.x when that comes
about? (Sorry, I'm not terribly familiar with how you guys handle all
of this).

Thanks,
-- Brandon


Re: Constraint Exclusion + Joins?

From
Heikki Linnakangas
Date:
On Fri, 28 Apr 2006, Brandon Black wrote:

> I dug around in CVS to have a look for this, and I did eventually find
> it (well, I found the corresponding docs patch that removed the note
> about not working for joins).  I see it's in MAIN but not in
> 8_1_STABLE.  Does that mean it's headed for 8.2.x when that comes
> about? (Sorry, I'm not terribly familiar with how you guys handle all
> of this).

Yes.

- Heikki


Re: Constraint Exclusion + Joins?

From
"Brandon Black"
Date:
On 4/30/06, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On Fri, 28 Apr 2006, Brandon Black wrote:
>
> > I dug around in CVS to have a look for this, and I did eventually find
> > it (well, I found the corresponding docs patch that removed the note
> > about not working for joins).  I see it's in MAIN but not in
> > 8_1_STABLE.  Does that mean it's headed for 8.2.x when that comes
> > about? (Sorry, I'm not terribly familiar with how you guys handle all
> > of this).
>
> Yes.
>

Perhaps I'm confused about the meaning of the removal of the
JOINs-related caveat from the constraint exclusion docs in MAIN.  What
I was intending to ask about was constraint exclusion kicking in where
the constrained column is being joined to a column of another table,
with no constants involved.

For a contrived example:

--------------

CREATE TABLE basic (  basic_id INTEGER NOT NULL PRIMARY KEY,  basic_data TEXT
);

CREATE TABLE basic_sub1 ( PRIMARY KEY (basic_id), CHECK ( basic_id >= 0 AND basic_id < 100 )
) INHERITS (basic);

CREATE TABLE basic_sub2 ( PRIMARY KEY (basic_id), CHECK ( basic_id >= 100 AND basic_id < 200 )
) INHERITS (basic);

[...]

CREATE TABLE jstuff (   jstuff_id INTEGER NOT NULL PRIMARY KEY,   jstuff_data TEXT
);

EXPLAIN ANALYZE SELECT basic.* FROM basic JOIN jstuff ON
(basic.basic_id = jstuff.jstuff_id) WHERE jstuff_data = 'foo';

------------------

I tried things like the above with small test data sets against cvs
just now on my home machine, and constraint exclusion doesn't seem to
apply here (even if all of the joined jstuff rows have ids which only
match the constraint for basic_sub3, all basic_subX's seem to get
scanned, as is the case I'm seeing in my real code against 8.1.3).  Is
this sort of dynamic constraint exclusion on the radar?

Without it, some inheritance-based partitioning constructs which could
otherwise be written as a single query have to be done as loops with
seperate nested statements (to extract constants and then manually
plug them into the next statement down the chain), which  seems to
make it more difficult (or nearly impossible) to support varying
where/grouping/ordering/count of the resultant inner query rows from
client code efficiently without writing a seperate plpgsql function
for every possible variation.

-- Brandon


Re: Constraint Exclusion + Joins?

From
Tom Lane
Date:
"Brandon Black" <blblack@gmail.com> writes:
> Is this sort of dynamic constraint exclusion on the radar?

What sort of fantasy have you got in mind?

It doesn't seem likely to me that testing constraints explicitly against
each row from the other table would be a win compared to letting the
indexes do it.
        regards, tom lane


Re: Constraint Exclusion + Joins?

From
Simon Riggs
Date:
On Tue, 2006-05-02 at 00:27 -0500, Brandon Black wrote:

> I tried things like the above with small test data sets against cvs
> just now on my home machine, and constraint exclusion doesn't seem to
> apply here (even if all of the joined jstuff rows have ids which only
> match the constraint for basic_sub3, all basic_subX's seem to get
> scanned, as is the case I'm seeing in my real code against 8.1.3).  Is
> this sort of dynamic constraint exclusion on the radar?

(You should use life size data for your testing.)

If you think the plan you get can be improved, post the EXPLAIN (maybe
EXPLAIN ANALYZE) and say how you think it can be improved. It's a lot
easier to consider concrete requirements.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: Constraint Exclusion + Joins?

From
"Brandon Black"
Date:
On 5/2/06, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2006-05-02 at 00:27 -0500, Brandon Black wrote:
>
> > I tried things like the above with small test data sets against cvs
> > just now on my home machine, and constraint exclusion doesn't seem to
> > apply here (even if all of the joined jstuff rows have ids which only
> > match the constraint for basic_sub3, all basic_subX's seem to get
> > scanned, as is the case I'm seeing in my real code against 8.1.3).  Is
> > this sort of dynamic constraint exclusion on the radar?
>
> (You should use life size data for your testing.)
>
> If you think the plan you get can be improved, post the EXPLAIN (maybe
> EXPLAIN ANALYZE) and say how you think it can be improved. It's a lot
> easier to consider concrete requirements.
>

Perhaps I should rewind a bit here then since clearly, as Tom pointed
out, dynamic constraint exclusion is fantasy-land :)

My real life-size situation is a bit more complicated, hence my trying
to just break things down into a simple example for the earlier email,
as I didn't want to bog you down reading all the details.  It's really
easier to explain in english than trying to post my pages upon pages
of DDL and explain analyze outputs (but I can try to summarize some
good example cases of those if neccesary later from work):

In my real code, I've got a number of seperate tables logging
time-series numerical statistics of various kinds.  The PKs on these
tables are normally composed of two fields, a resource identifier
(there's about 4,500 resources), and a timestamp (the statistics come
in once per minute).  I'm using inheritance-based partitioning to
split each stat table up into 1-hour chunks, and I keep about 72 hours
of "current" data around (dropping old subtables and adding new ones
as appropriate), which means there's about ~270k rows per subtable,
~19.4 million for the whole set at any given time.  A few of the
(top-level, seperate) tables in this scheme have a third PK component
which is a sub-resource identifier, which there might be ~2-6 of per
resource, multiplying the row counts further in those cases.  The
timestamp field I'm partitioning on is called t_stamp, and it's just
an integer (unix epoch time, easier for my application).

For a lot of my queries on these tables, I already have an explicit
constant time-span to work with, so the constraint exclusion works
well.  For certain queries however, what I really want is to join into
these statistical tables for a number of resource_ids, and for only
the rows that are at the max(t_stamp) for each resource - the last
thing inserted for each resource (they aren't neccesarily in perfect
time synch with each other).

My (quite possibly flawed) current solution for this is to maintain a
seperate singular "resource_status" table which tracks the max
timestamp for each resource as the data rows are inserted (it gets
updated with the new timestamp being inserted if the new timestamp is
greater than the max_t_stamp I had recorded earlier - sometimes data
arruves not in timestamp order), so that I could quickly determine the
current per-resource max t_stamp.

I'm happy with the design in the overall, as my problematically huge
insert (and transaction) rate is humming along nicely, and most of my
queries (some of which are very complex) are running in excellent
time.  The only big performance issue I face now is those cases
described above, where I need to get one datum per resource from a
stat table for a number of resources using each individual resource's
max t_stamp.

When I do this by joining from the table where I recorded the max
t_stamps into the top-level stat table, the plan does index scans of
every child (obviously) to match the stamps, even though in the most
likely case all of the stamps fall in a narrow window within just 1-2
child tables.

By manually rewriting the query as a loop in plpgsql which first
extracts the max t_stamp per resource from my "status" table, then
inside the loop runs an individual select statement per-resource with
the t_stamp as a constant, I see huge performance increases over the
join (in the neighboorhood of an order of magnitude, but it varies up
or down in different cases), to the point where things are acceptably
fast.

As I mentioned though, this (the plpgsql looping construct) is
difficult to work with for dynamically constructed user queries (in
terms of further where-clause constraints, joins to other tables
afterwards, limit/offset counts, ordering, etc), whereas plugging
those things in is easy if the client code is executing a single
literal dynamic sql statement for the whole thing.

There aren't any ugly sequential scans going on here or anything in
the join case, it appears it's just the difference between checking
~72 indexes versus checking 1.  I could break up the tables in larger
chunks: if I did 12 hours instead of 1 hour, I'd be looking at more
like 6 subtables/indexes, but each growing up to the vicinity of 1.5-9
million (in some rare cases, perhaps 20 million) rows per table.  It
is quite difficult and time-consuming to experiment with this, but
clearly it's an avenue I need to look into.

Another option I considered was that on each statistic insert into the
big partitioned history table, I should also update a single row
per-resource in a seperate table (or the top-level table), which
tracks only the most recent data.  This decreased my
insertion-transaction performance substantially of course (I tried a
few different logical ways of doing it), so I've been trying to work
with the max_stamp storing method instead lately.  It could turn out
that the "update a table of latest-data when doing the inserts" is the
least painful way to pay for these queries of course.

And finally, a relatively-easy option I'm going to be trying out later
today is to have the client code do a dynamic "create function" to
make a custom plpgsql loop for itself, use it once, and toss it.  For
the current schema design this may turn out to be the best combination
of fast and flexible, even though it's a bit ugly.

Anyways, thanks for your time and efforts on postgres in general, I
love it and don't mean this to sound as "complaining" as it probably
does.  I'm just finding it hard to come up with a way out of this
latest corner I've painted myself into :)

-- Brandon


Re: Constraint Exclusion + Joins?

From
Hannu Krosing
Date:
Ühel kenal päeval, T, 2006-05-02 kell 00:27, kirjutas Brandon Black:
> On 4/30/06, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > On Fri, 28 Apr 2006, Brandon Black wrote:
> >
> > > I dug around in CVS to have a look for this, and I did eventually find
> > > it (well, I found the corresponding docs patch that removed the note
> > > about not working for joins).  I see it's in MAIN but not in
> > > 8_1_STABLE.  Does that mean it's headed for 8.2.x when that comes
> > > about? (Sorry, I'm not terribly familiar with how you guys handle all
> > > of this).
> >
> > Yes.
> >
> 
> Perhaps I'm confused about the meaning of the removal of the
> JOINs-related caveat from the constraint exclusion docs in MAIN.  What
> I was intending to ask about was constraint exclusion kicking in where
> the constrained column is being joined to a column of another table,
> with no constants involved.
> 
> For a contrived example:
> 
> --------------
> 
> CREATE TABLE basic (
>    basic_id INTEGER NOT NULL PRIMARY KEY,
>    basic_data TEXT
> );
> 
> CREATE TABLE basic_sub1 (
>   PRIMARY KEY (basic_id),
>   CHECK ( basic_id >= 0 AND basic_id < 100 )
> ) INHERITS (basic);
> 
> CREATE TABLE basic_sub2 (
>   PRIMARY KEY (basic_id),
>   CHECK ( basic_id >= 100 AND basic_id < 200 )
> ) INHERITS (basic);
> 
> [...]
> 
> CREATE TABLE jstuff (
>     jstuff_id INTEGER NOT NULL PRIMARY KEY,
>     jstuff_data TEXT
> );

try putting a constraint on jstuff.jstuff_id so the CE mechanism has
something to work on.

> 
> EXPLAIN ANALYZE SELECT basic.* FROM basic JOIN jstuff ON
> (basic.basic_id = jstuff.jstuff_id) WHERE jstuff_data = 'foo';

------------
Hannu



Re: Constraint Exclusion + Joins?

From
kris.shannon@gmail.com
Date:


On 5/2/06, Brandon Black <blblack@gmail.com> wrote:
On 4/30/06, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On Fri, 28 Apr 2006, Brandon Black wrote:
>
> > I dug around in CVS to have a look for this, and I did eventually find
> > it (well, I found the corresponding docs patch that removed the note
> > about not working for joins).  I see it's in MAIN but not in
> > 8_1_STABLE.  Does that mean it's headed for 8.2.x when that comes
> > about? (Sorry, I'm not terribly familiar with how you guys handle all
> > of this).
>
> Yes.
>

Perhaps I'm confused about the meaning of the removal of the
JOINs-related caveat from the constraint exclusion docs in MAIN.  What
I was intending to ask about was constraint exclusion kicking in where
the constrained column is being joined to a column of another table,
with no constants involved.

For a contrived example:

--------------

CREATE TABLE basic (
   basic_id INTEGER NOT NULL PRIMARY KEY,
   basic_data TEXT
);

CREATE TABLE basic_sub1 (
  PRIMARY KEY (basic_id),
  CHECK ( basic_id >= 0 AND basic_id < 100 )
) INHERITS (basic);

CREATE TABLE basic_sub2 (
  PRIMARY KEY (basic_id),
  CHECK ( basic_id >= 100 AND basic_id < 200 )
) INHERITS (basic);

[...]

CREATE TABLE jstuff (
    jstuff_id INTEGER NOT NULL PRIMARY KEY,
    jstuff_data TEXT
);

EXPLAIN ANALYZE SELECT basic.* FROM basic JOIN jstuff ON
(basic.basic_id = jstuff.jstuff_id) WHERE jstuff_data = 'foo';

------------------

If you only need 1 column from basic, then a subquery will do it for you:

SELECT jstuff_id AS basic_id, (SELECT basic_data FROM basic WHERE
basic.basic_id = jstuff.jstuff_id) FROM jstuff WHERE jstuff_data = 'foo';

If you need more than one column you can use ROW() constructors but that
gets pretty dirty.