Thread: ResultCache cache error: "cache entry already complete" in 14beta1

ResultCache cache error: "cache entry already complete" in 14beta1

From
David Christensen
Date:
Hi -list,

There is an issue with the new ResultCache machinery that we found in the wild, with a reproducible test case here:

CREATE TEMP TABLE fail (a INT PRIMARY KEY);

SELECT att.attname
FROM pg_index i
JOIN pg_class d ON d.oid = i.indrelid
JOIN generate_series(1, 32) col(i) ON col.i <= i.indnatts
LEFT JOIN pg_attribute att ON att.attrelid = d.oid AND att.attnum = i.indkey[col.i - 1]
WHERE d.relname = 'fail';

ERROR:  cache entry already complete

Explain in question:

                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=8.59..55.26 rows=11 width=64)
   Join Filter: (att.attnum = i.indkey[(col.i - 1)])
   ->  Nested Loop  (cost=8.30..14.98 rows=11 width=35)
         Join Filter: (col.i <= i.indnatts)
         ->  Hash Join  (cost=8.30..14.26 rows=1 width=33)
               Hash Cond: (i.indrelid = d.oid)
               ->  Seq Scan on pg_index i  (cost=0.00..5.55 rows=155 width=33)
               ->  Hash  (cost=8.29..8.29 rows=1 width=4)
                     ->  Index Scan using pg_class_relname_nsp_index on pg_class d  (cost=0.27..8.29 rows=1 width=4)
                           Index Cond: (relname = 'fail'::name)
         ->  Function Scan on generate_series col  (cost=0.00..0.32 rows=32 width=4)
   ->  Result Cache  (cost=0.29..20.34 rows=7 width=70)
         Cache Key: d.oid
         ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute att  (cost=0.28..20.33 rows=7 width=70)
               Index Cond: (attrelid = d.oid)
(15 rows)

Git bisect identified this as 9eacee2e62d89cab7b004f97c206c4fba4f1d745 where the ResultCache machinery was introduced.  I have not identified this further, but can help look into it.

Best,

David

Re: ResultCache cache error: "cache entry already complete" in 14beta1

From
David Rowley
Date:
On Fri, 21 May 2021 at 09:57, David Christensen
<david.christensen@crunchydata.com> wrote:
> CREATE TEMP TABLE fail (a INT PRIMARY KEY);
>
> SELECT att.attname
> FROM pg_index i
> JOIN pg_class d ON d.oid = i.indrelid
> JOIN generate_series(1, 32) col(i) ON col.i <= i.indnatts
> LEFT JOIN pg_attribute att ON att.attrelid = d.oid AND att.attnum = i.indkey[col.i - 1]
> WHERE d.relname = 'fail';
>
> ERROR:  cache entry already complete

Thanks for reporting this.

I've just had a quick look so far. EXPLAIN VERBOSE shows that for some
reason the join to pg_attribute.attrelid is unique.

 Nested Loop Left Join  (cost=8.59..55.40 rows=11 width=64)
   Output: att.attname
   Inner Unique: true

I'm not quite sure why it came to that conclusion, there's certainly
no unique index on that column.

I'll keep looking to see what the problem is there.

David



Re: ResultCache cache error: "cache entry already complete" in 14beta1

From
David Rowley
Date:
On Fri, 21 May 2021 at 10:22, David Rowley <dgrowleyml@gmail.com> wrote:
> I've just had a quick look so far. EXPLAIN VERBOSE shows that for some
> reason the join to pg_attribute.attrelid is unique.
>
>  Nested Loop Left Join  (cost=8.59..55.40 rows=11 width=64)
>    Output: att.attname
>    Inner Unique: true
>
> I'm not quite sure why it came to that conclusion, there's certainly
> no unique index on that column.

It looks like I've incorrectly assumed that if the join is unique that
we can always mark cache entries as completed after getting 1 row.

The problem is that the join condition to pg_attribute is:
att.attrelid = d.oid AND att.attnum = i.indkey[col.i - 1] but we only
parameterize the attrelid, as you can see in:

       ->  Index Scan using pg_attribute_relid_attnum_index on
pg_attribute att  (cost=0.28..20.33 rows=7 width=70)
               Index Cond: (attrelid = d.oid)

this means that the cache can receive multiple records, despite the
join being inner_unique == true.  The additional rows are filtered out
later in the join condition. (Join Filter: (att.attnum =
i.indkey[(col.i - 1)]))

I'll need to add a check to ensure the parameters include the entire
join condition when determining if the cache can be used in
"singlerow" mode.

I've attached a very quick patch which appears to fix the problem.
I'll need to take a bit more time to check if that's a safe way to
check if the entire join clause is parameterized or not.

(It would also be nice if the remaining join condition could be moved
into the index filter condition as this would save having to cache
rows we'll never need. I'm not going to fix that here though)

David

Attachment

Re: ResultCache cache error: "cache entry already complete" in 14beta1

From
David Rowley
Date:
On Fri, 21 May 2021 at 11:05, David Rowley <dgrowleyml@gmail.com> wrote:
> I'll need to take a bit more time to check if that's a safe way to
> check if the entire join clause is parameterized or not.

I think it's fine to check the list_length() to see if any quals are
missing from the param info. This is very similar to what's done
already in generate_mergejoin_paths() to check if any join quals are
missing from the merge quals.

I've attached a cleaned-up patch.

David

Attachment

Re: ResultCache cache error: "cache entry already complete" in 14beta1

From
David Christensen
Date:
On Thu, May 20, 2021 at 6:35 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 21 May 2021 at 11:05, David Rowley <dgrowleyml@gmail.com> wrote:
> I'll need to take a bit more time to check if that's a safe way to
> check if the entire join clause is parameterized or not.

I think it's fine to check the list_length() to see if any quals are
missing from the param info. This is very similar to what's done
already in generate_mergejoin_paths() to check if any join quals are
missing from the merge quals.

I've attached a cleaned-up patch.

I can confirm that this fixes the issue in our case (both in the actual query and in the minimal reproduction case).

Thanks!
 

Re: ResultCache cache error: "cache entry already complete" in 14beta1

From
David Rowley
Date:
On Sat, 22 May 2021 at 02:00, David Christensen
<david.christensen@crunchydata.com> wrote:
> I can confirm that this fixes the issue in our case (both in the actual query and in the minimal reproduction case).

Thank you for checking that.

I looked at the patch again and realised that if we don't make the
result cache singlerow == true with a unique join that since Nested
Loop just skips to the next outer tuple after it matches to an inner
tuple, that the only chance we'll get to mark a cache entry as
complete will be for outer tuples that have no matching inner tuple.
That's the only time we'll run the inner scan to completion with
unique joins.  That would mean that we'd never get any cache hits for
sets of parameters that do have some matching inner rows. Remember a
cache hit is can only use the cache entry if the entry is marked as
complete. Otherwise, there might be missing tuples.   Such a scenario
might be common with ANTI joins, but since we don't currently detect
unique joins for those, that's a case we'll never hit.

I ended up pushing a patch that just does not consider a Result Cache
path for unique joins where the parameterisation is not the entire
join condition.  That'll mean the plan will change in your test case
when beta2 comes out.

David