Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Date
Msg-id CA+Tgmob8C_gDeOywGqmpf2tghYR6qhGhf9Va54Ct4j=MMpQHZA@mail.gmail.com
Whole thread Raw
In response to Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Sat, Oct 10, 2015 at 9:03 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Sep 25, 2015 at 2:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> This needs a rebase, there are several conflicts in src/backend/executor/nodeAgg.c
>
> I attached a revised version of the second patch in the series, fixing
> this bitrot.
>
> I also noticed a bug in tuplesort's TSS_SORTEDONTAPE case with the
> previous patch, where no final on-the-fly merge step is required (no
> merge step is required whatsoever, because replacement selection
> managed to produce only one run). The function mergeruns() previously
> only "abandoned" abbreviated ahead of any merge step iff there was
> one. When there was only one run (not requiring a merge) it happened
> to continue to keep its state consistent with abbreviated keys still
> being in use. It didn't matter before, because abbreviated keys were
> only for tuplesort to compare, but that's different now.
>
> That bug is fixed in this revision by reordering things within
> mergeruns(). The previous order of the two things that were switched
> is not at all significant (I should know, I wrote that code).

I find the references to a "void" representation in this patch to be
completely opaque.  I see that there are some such references in
tuplesort.c already, and most likely they were put there by commits
that I did, so I guess I have nobody but myself to blame, but I don't
know what this means, and I don't think we should let this terminology
proliferate.

My understanding is that the "void" representation is intended to
whatever Datum we originally got, which might be a pointer.  Why not
just say that instead of referring to it this way?

My understanding is also that it's OK if the abbreviated key stays the
same even though the value has changed, but that the reverse could
cause queries to return wrong answers.  The first part of that
justifies why this is safe when no abbreviation is available: we'll
return an abbreviated value of 0 for everything, but that's fine.
However, using the original Datum (which might be a pointer) seems
unsafe, because two binary-identical values could be stored at
different addresses and thus have different pointer representations.

I'm probably missing something here, so clue me in...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Logical replication and multimaster
Next
From: Robert Haas
Date:
Subject: Re: [PROPOSAL] VACUUM Progress Checker.