Re: In progress INSERT wrecks plans on table - Mailing list pgsql-performance

From Gavin Flower
Subject Re: In progress INSERT wrecks plans on table
Date
Msg-id 5182D53B.4050708@archidevsys.co.nz
Whole thread Raw
In response to Re: In progress INSERT wrecks plans on table  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-performance
On 03/05/13 00:27, Simon Riggs wrote:
On 2 May 2013 01:49, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 02/05/13 02:06, Tom Lane wrote:
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
I am concerned that the deafening lack of any replies to my original
message is a result of folk glancing at your original quick reply and
thinking... incomplete problem spec...ignore... when that is not that
case - yes I should have muttered "9.2" in the original email, but we
have covered that now.
No, I think it's more that we're trying to get to beta, and so anything
that looks like new development is getting shuffled to folks' "to
look at later" queues.  The proposed patch is IMO a complete nonstarter
anyway; but I'm not sure what a less bogus solution would look like.

Yeah, I did think that beta might be consuming everyone's attention (of
course immediately *after* sending the email)!

And yes, the patch was merely to illustrate the problem rather than any
serious attempt at a solution.
I think we need a problem statement before we attempt a solution,
which is what Tom is alluding to.

ISTM that you've got a case where the plan is very sensitive to a
table load. Which is a pretty common situation and one that can be
solved in various ways. I don't see much that Postgres can do because
it can't know ahead of time you're about to load rows. We could
imagine an optimizer that set thresholds on plans that caused the
whole plan to be recalculated half way thru a run, but that would be a
lot of work to design and implement and even harder to test. Having
static plans at least allows us to discuss what it does after the fact
with some ease.

The plan is set using stats that are set when there are very few
non-NULL rows, and those increase massively on load. The way to cope
is to run the ANALYZE immediately after the load and then don't allow
auto-ANALYZE to reset them later.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Would be practicable to have a facility for telling Postgres in advance what you intend to do, so it can create plans accordingly?

I won't try and invent syntax, but it would be good to tell the system that you intend to:
insert a million rows sequentially
or
insert ten million rows randomly
or
update two million rows in the primary key range 'AA00000' to 'PP88877'
etc.

Though, sometime it may be useful to give a little more detail,especially if you have a good estimate of the distribution of primary keys, e.g.:
AA000000
        20%
AA000456
         0%
KE700999
        30%
NN400005
        35%
PA000001

        15%
PP808877


I figure that if the planner had more information about what one intends to do, then it could combine that with the statistics it knows, to come up with a more realistic plan.


Cheers,
Gavin



pgsql-performance by date:

Previous
From: Anne Rosset
Date:
Subject: Deterioration in performance when query executed in multi threads
Next
From: Tom Lane
Date:
Subject: Re: RT3.4 query needed a lot more tuning with 9.2 than it did with 8.1