On Sun, 4 Jan 2026 at 18:05, Jacob Jackson <jej.jackson.08@gmail.com> wrote:
> I have seen the issue pop up a few times when the unique constraint is
> across multiple columns and the join is only on one of those columns
> (e.g. https://www.postgresql.org/message-id/CAAiQw3yBPrCw6ZLeTwVS4QhKDWgJkmmp9LnGPdodxeQmn=kqVg@mail.gmail.com),
> and a constant filter is on the other column. I think what happens is
> that this introduces potential for error into the sample because
> Postgres can now come across more duplicates of the join key than
> expected, reducing n_distinct, or could come across more rows with the
> constant filtered value, thus increasing its predicted frequency (in
> the case I linked, the constant's frequency was stored in the columns
> MCV list), leading to a nonzero hit ratio as the cardinality
> estimation/estcalls > ndistinct. However, I am not certain this is the
> case (while there probably wouldn't need to be much of an asymmetry to
> cause memorization given the high cost in the planner for extra index
> scans, it still seems odd that stats could be off enough to enable
> this because of sampling alone). Maybe there is a statistics bug at
> play? I am not certain.
I've managed to reverse engineer some tables for this base on the
query you posted in the other thread. I didn't spend enough time to
figure out the exact row counts to insert to get the Memoize plan by
default, but I can get it from disabling merge and hash joins. This
doesn't mean it's not an issue as ideally the Memoize costing would
realise all lookups are unique and opt to not Memoize due to no repeat
lookups.
What seems to be happening is in estimate_num_groups(), we don't set
isunique == true because the unique index search in has_unique_index()
only considers very simplistic cases where there's a single column
unique index on the Var that's being looked up. Later in
estimate_num_groups(), because we have some other base quals doing
some filtering, the "if (reldistinct > 0 && rel->rows < rel->tuples)"
kicks in to attempt to estimate the number of distinct values
accounting for the other qual (user = 0, in this case). That's fairly
generic code that only does calculations based on the distinction
between rel->tuples and rel->rows, so it doesn't know about the
guarantees of uniqueness.
For a fix, I suppose has_unique_index() could be expanded to find a
unique index which mentions the particular Var and ensures that all
other columns that exist in the index have EquivalenceClasses with an
ec_has_const, but that would make that function much more expensive
than it is today and it still wouldn't cover cases where there is
another parameter in the Memoize lookup that matches to one of the
other unique index's columns. That makes me think that the UniqueKey
stuff might be the solution to these problems, as we'd easily be able
to determine which unique properties hold true at any level of the
join. That's a very complex patch, however.
David