Thanks for the quick reply Tom.
Tom Lane wrote:
>"Phil Endecott" <spam_from_postgresql_general ( at ) chezphil ( dot ) org> writes:
>> I was not patient enough to wait for the remaining explain-analyse results,
>> but I feel that there is a linear slowdown of about 60x between the raw
>> query and the explain-analyse version.
>
> Slow gettimeofday() ... fairly common on desktop-grade PC hardware :-(.
It's actually a virtual machine, and I seem to recall reading something about the
virtualised gettimeofday() being slow. OK, that explains it. Thanks.
>> But the peculiar behaviour of explain-analyse is really a distraction from
>> the fact that the query is slow, especially when the limit value is large.
>
> You need a "hashed subplan" for NOT IN to work reasonably fast. The
> fact you're not getting one suggests you either have to raise work_mem,
> or you're using some weird datatype that doesn't support hashing.
It's an int, and yes, increasing work_mem makes it use a hashed subplan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4209.76..4213.61 rows=10 width=4) (actual time=5432.840..5461.518 rows=10 loops=1)
-> Seq Scan on messages (cost=4209.76..11608.23 rows=19218 width=4) (actual time=5432.776..5460.859 rows=10
loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on part_tsearch (cost=0.00..4115.01 rows=37901 width=4) (actual time=0.390..2984.783
rows=37907loops=1)
Total runtime: 5468.817 ms
So presumably work_mem must be greater than some function of the size of the
table in the subquery. Is there some way to work that out? This (virtual)
machine doesn't have an enormous amount of RAM so I like to keep settings
like this "as high as necessary but no higher".
If I understand it correctly, it is still doing a sequential scan on
part_tsearch that does not terminate early due to the limit clause. So
I'm still seeing run times that are rather worse than I think should be
possible. Can it not step through the indexes in the way that it does
for a Merge Join until it has got enough results to satisfy the limit,
and then terminate?
Thanks,
Phil.