Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Date
Msg-id CAM3SWZSxakL6Uynep+sMXOPKVM23BEvQ37kzap1rhetwHdRkfg@mail.gmail.com
Whole thread Raw
In response to Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates  (Peter Geoghegan <pg@heroku.com>)
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates  (Peter Geoghegan <pg@heroku.com>)
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Dec 9, 2015 at 11:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I find the references to a "void" representation in this patch to be
> completely opaque.  I see that there are some such references in
> tuplesort.c already, and most likely they were put there by commits
> that I did, so I guess I have nobody but myself to blame, but I don't
> know what this means, and I don't think we should let this terminology
> proliferate.
>
> My understanding is that the "void" representation is intended to
> whatever Datum we originally got, which might be a pointer.  Why not
> just say that instead of referring to it this way?

That isn't what is intended. "void" is the state that macros like
index_getattr() leave NULL leading attributes (that go in the
SortTuple.datum1 field) in. However, the function tuplesort_putdatum()
requires callers to initialize their Datum to 0 now, which is new. A
"void" representation is a would-be NULL pointer in the case of
pass-by-value types, and a NULL pointer for pass-by-reference types.

> My understanding is also that it's OK if the abbreviated key stays the
> same even though the value has changed, but that the reverse could
> cause queries to return wrong answers.  The first part of that
> justifies why this is safe when no abbreviation is available: we'll
> return an abbreviated value of 0 for everything, but that's fine.
> However, using the original Datum (which might be a pointer) seems
> unsafe, because two binary-identical values could be stored at
> different addresses and thus have different pointer representations.
>
> I'm probably missing something here, so clue me in...

I think that you're missing that patch 0001 formally forbids
abbreviated keys that are pass-by-value, by revising the contract
(this is proposed for backpatch to 9.5 -- only comments are changed).
This is already something that is all but forbidden, although the
datum case does tacitly acknowledge the possibility by not allowing
abbreviation to work with the pass-by-value-and-yet-abbreviated case.

I think that this revision is also useful for putting abbreviated keys
in indexes, something that may happen yet.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Stas Kelvich
Date:
Subject: Re: Speedup twophase transactions
Next
From: Tom Lane
Date:
Subject: Re: [sqlsmith] Failed to generate plan on lateral subqueries