On Tue, Apr 23, 2019 at 03:43:48PM -0500, Justin Pryzby wrote:
>On Tue, Apr 23, 2019 at 04:37:50PM -0400, Gunther wrote:
>> On 4/21/2019 23:09, Tomas Vondra wrote:
>> >What I think might work better is the attached v2 of the patch, with a
>> Thanks for this, and I am trying this now.
>...
>> Aaaaaand, it's a winner!
>>
>> Unique (cost=5551524.36..5554207.33 rows=34619 width=1197) (actual time=6150303.060..6895451.210 rows=435274
loops=1)
>> -> Sort (cost=5551524.36..5551610.91 rows=34619 width=1197) (actual time=6150303.058..6801372.192 rows=113478386
loops=1)
>> Sort Method: external merge Disk: 40726720kB
>>
>> For the first time this query has succeeded now. Memory was bounded. The
>> time of nearly hours is crazy, but things sometimes take that long
>
>It wrote 40GB tempfiles - perhaps you can increase work_mem now to improve the
>query time.
>
That's unlikely to reduce the amount of data written to temporary files,
it just means there will be fewer larger files - in total it's still
going to be ~40GB. And it's not guaranteed it'll improve performance,
because work_mem=4MB might fit into CPU caches and larger values almost
certainly won't. I don't think there's much to gain, really.
>We didn't address it yet, but your issue was partially caused by a misestimate.
>It's almost certainly because these conditions are correlated, or maybe
>redundant.
>
Right. Chances are that with a bettwe estimate the optimizer would pick
merge join instead. I wonder if that would be significantly faster.
>> Merge Cond: (((documentinformationsubject.documentinternalid)::text =
>> (documentinformationsubject_1.documentinternalid)::text) AND
>> ((documentinformationsubject.documentid)::text =
>> (documentinformationsubject_1.documentid)::text) AND
>> ((documentinformationsubject.actinternalid)::text =
>> (documentinformationsubject_1.actinternalid)::text))
>
>If they're completely redundant and you can get the same result after
>dropping one or two of those conditions, then you should.
>
>Alternately, if they're correlated but not redundant, you can use PG10
>"dependency" statistics (CREATE STATISTICS) on the correlated columns
>(and ANALYZE).
>
That's not going to help, because we don't use functional dependencies
in join estimation yet.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services