Home > mailing lists

Re: [GENERAL] Query Using Massive Temp Space - Mailing list pgsql-general

From	Thomas Munro
Subject	Re: [GENERAL] Query Using Massive Temp Space
Date	November 22, 2017 00:18:40
Msg-id	CAEepm=3e9RKr3DCv0=nGWiNTPs7_SqfqLt=pc83M03kN0KuBkA@mail.gmail.com Whole thread
In response to	Re: [GENERAL] Query Using Massive Temp Space (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [GENERAL] Query Using Massive Temp Space
List	pgsql-general

Tree view

On Wed, Nov 22, 2017 at 7:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now, there's definitely something busted here; it should not have gone as
> far as 2 million batches before giving up on splitting.

I had been meaning to discuss this.  We only give up when we reach the
point when a batch is entirely entirely kept or sent to a new batch
(ie splitting the batch resulted in one batch with the whole contents
and another empty batch).  If you have about 2 million evenly
distributed keys and an ideal hash function, and then you also have 42
billion keys that are the same (and exceed work_mem), we won't detect
extreme skew until the 2 million well behaved keys have been spread so
thin that the 42 billion keys are isolated in a batch on their own,
which we should expect to happen somewhere around 2 million batches.
I have wondered if our extreme skew detector needs to go off sooner.
I don't have a specific suggestion, but it could just be something
like 'you threw out or kept more than X% of the tuples'.

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-general by date:

From: Andrew Sullivan
Date: 21 November 2017, 23:27:31
Subject: Re: migrations (was Re: To all who wish to unsubscribe)

From: Torsten Förtsch
Date: 22 November 2017, 00:35:25
Subject: dblink surprise

Re: [GENERAL] Query Using Massive Temp Space - Mailing list pgsql-general

Previous

Next