Re: Constraint Exclusion + Joins? - Mailing list pgsql-hackers

From Brandon Black
Subject Re: Constraint Exclusion + Joins?
Date
Msg-id 84621a60605020553y219a85d3pa3d1734f20b70ba@mail.gmail.com
Whole thread Raw
In response to Constraint Exclusion + Joins?  ("Brandon Black" <blblack@gmail.com>)
List pgsql-hackers
On 5/2/06, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2006-05-02 at 00:27 -0500, Brandon Black wrote:
>
> > I tried things like the above with small test data sets against cvs
> > just now on my home machine, and constraint exclusion doesn't seem to
> > apply here (even if all of the joined jstuff rows have ids which only
> > match the constraint for basic_sub3, all basic_subX's seem to get
> > scanned, as is the case I'm seeing in my real code against 8.1.3).  Is
> > this sort of dynamic constraint exclusion on the radar?
>
> (You should use life size data for your testing.)
>
> If you think the plan you get can be improved, post the EXPLAIN (maybe
> EXPLAIN ANALYZE) and say how you think it can be improved. It's a lot
> easier to consider concrete requirements.
>

Perhaps I should rewind a bit here then since clearly, as Tom pointed
out, dynamic constraint exclusion is fantasy-land :)

My real life-size situation is a bit more complicated, hence my trying
to just break things down into a simple example for the earlier email,
as I didn't want to bog you down reading all the details.  It's really
easier to explain in english than trying to post my pages upon pages
of DDL and explain analyze outputs (but I can try to summarize some
good example cases of those if neccesary later from work):

In my real code, I've got a number of seperate tables logging
time-series numerical statistics of various kinds.  The PKs on these
tables are normally composed of two fields, a resource identifier
(there's about 4,500 resources), and a timestamp (the statistics come
in once per minute).  I'm using inheritance-based partitioning to
split each stat table up into 1-hour chunks, and I keep about 72 hours
of "current" data around (dropping old subtables and adding new ones
as appropriate), which means there's about ~270k rows per subtable,
~19.4 million for the whole set at any given time.  A few of the
(top-level, seperate) tables in this scheme have a third PK component
which is a sub-resource identifier, which there might be ~2-6 of per
resource, multiplying the row counts further in those cases.  The
timestamp field I'm partitioning on is called t_stamp, and it's just
an integer (unix epoch time, easier for my application).

For a lot of my queries on these tables, I already have an explicit
constant time-span to work with, so the constraint exclusion works
well.  For certain queries however, what I really want is to join into
these statistical tables for a number of resource_ids, and for only
the rows that are at the max(t_stamp) for each resource - the last
thing inserted for each resource (they aren't neccesarily in perfect
time synch with each other).

My (quite possibly flawed) current solution for this is to maintain a
seperate singular "resource_status" table which tracks the max
timestamp for each resource as the data rows are inserted (it gets
updated with the new timestamp being inserted if the new timestamp is
greater than the max_t_stamp I had recorded earlier - sometimes data
arruves not in timestamp order), so that I could quickly determine the
current per-resource max t_stamp.

I'm happy with the design in the overall, as my problematically huge
insert (and transaction) rate is humming along nicely, and most of my
queries (some of which are very complex) are running in excellent
time.  The only big performance issue I face now is those cases
described above, where I need to get one datum per resource from a
stat table for a number of resources using each individual resource's
max t_stamp.

When I do this by joining from the table where I recorded the max
t_stamps into the top-level stat table, the plan does index scans of
every child (obviously) to match the stamps, even though in the most
likely case all of the stamps fall in a narrow window within just 1-2
child tables.

By manually rewriting the query as a loop in plpgsql which first
extracts the max t_stamp per resource from my "status" table, then
inside the loop runs an individual select statement per-resource with
the t_stamp as a constant, I see huge performance increases over the
join (in the neighboorhood of an order of magnitude, but it varies up
or down in different cases), to the point where things are acceptably
fast.

As I mentioned though, this (the plpgsql looping construct) is
difficult to work with for dynamically constructed user queries (in
terms of further where-clause constraints, joins to other tables
afterwards, limit/offset counts, ordering, etc), whereas plugging
those things in is easy if the client code is executing a single
literal dynamic sql statement for the whole thing.

There aren't any ugly sequential scans going on here or anything in
the join case, it appears it's just the difference between checking
~72 indexes versus checking 1.  I could break up the tables in larger
chunks: if I did 12 hours instead of 1 hour, I'd be looking at more
like 6 subtables/indexes, but each growing up to the vicinity of 1.5-9
million (in some rare cases, perhaps 20 million) rows per table.  It
is quite difficult and time-consuming to experiment with this, but
clearly it's an avenue I need to look into.

Another option I considered was that on each statistic insert into the
big partitioned history table, I should also update a single row
per-resource in a seperate table (or the top-level table), which
tracks only the most recent data.  This decreased my
insertion-transaction performance substantially of course (I tried a
few different logical ways of doing it), so I've been trying to work
with the max_stamp storing method instead lately.  It could turn out
that the "update a table of latest-data when doing the inserts" is the
least painful way to pay for these queries of course.

And finally, a relatively-easy option I'm going to be trying out later
today is to have the client code do a dynamic "create function" to
make a custom plpgsql loop for itself, use it once, and toss it.  For
the current schema design this may turn out to be the best combination
of fast and flexible, even though it's a bit ugly.

Anyways, thanks for your time and efforts on postgres in general, I
love it and don't mean this to sound as "complaining" as it probably
does.  I'm just finding it hard to come up with a way out of this
latest corner I've painted myself into :)

-- Brandon


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Logging pg_autovacuum
Next
From: Hannu Krosing
Date:
Subject: Re: Constraint Exclusion + Joins?