Re: Hash Anti Join performance degradation - Mailing list pgsql-performance

From Craig Ringer
Subject Re: Hash Anti Join performance degradation
Date
Msg-id 4DDC6109.7060708@postnewspapers.com.au
Whole thread Raw
In response to Re: Hash Anti Join performance degradation  (panam <panam@gmx.net>)
Responses Re: Hash Anti Join performance degradation
List pgsql-performance
On 24/05/11 22:34, panam wrote:

>> The usual cause is that the statistics for estimated row counts cross a
>> threshold that makes the query planner think that a different kind of
>> plan will be faster.
>
> Hm, as far as i understand the plans, they are equivalent, aren't they?

Yes, they are, and the estimates are too. This isn't the usual case
where the planner trips over a threshold and switches to a totally
different plan type that it thinks is faster, but isn't.

The estimates are actually IDENTICAL for the hash anti-join node of
interest, and so are the actual loop count and row count. Temp file
activity is much the same across both plans too.

You can reproduce this behaviour consistently? It's _seriously_ weird,
and the sort of thing that when I encounter myself I tend to ask "what
else is going on that I'm missing?".

What happens if you DELETE more rows? Or fewer? What's the threshold?

What happens if you DELETE rows from the start not the end, or a random
selection?

Does the problem persist if you DELETE the rows then CLUSTER the table
before running the query?

Does the problem persist if you DELETE the rows then REINDEX?

>> If the query planner is using bad information about the performance of
>> the storage, then it will be making bad decisions about which approach
>> is faster. [snip]
>
> Will this make any difference even when the plans are equivalent as assumed
> above?

Nope. It doesn't seem to be a problem with plan selection.

> This speeds up the query by a factor of ~27. (207033.081 (bad) vs. 7683.978
> (good)).

That's a serious WTF.

> @Cédric
>> did you have log of vacuum and checkpoint activity ?
>> (no vacuum full/cluster or such thing running ?)
> There is no clustering involved here, its a pretty basic setup.

He means 'CLUSTER', the SQL command that tells PostgreSQL to re-organize
a table.

The answer from the rest of your post would appear to be 'no, it's being
run in an otherwise-idle stand-alone test environment'. Right?

> How can I obtain the information you require here? I could send you the
> output of the analyse vacuum command from pgAdmin, but is there a way to
> make it output the information in English (rather than German)?

It's easy enough to read familiar output like that in German, if needs be.

--
Craig Ringer

pgsql-performance by date:

Previous
From: Stefan Keller
Date:
Subject: Re: FW: KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Next
From: Santhakumaran
Date:
Subject: Re: Performance degradation of inserts when database size grows