Thread: Getting "cache lookup failed for aggregate" error

Getting "cache lookup failed for aggregate" error

From
Patrick Krecker
Date:
Hello everyone -- We received a strange error today on our production write master. During a routine maintenance script, we got the following error:

"ERROR:  cache lookup failed for aggregate 5953992"

I think I found the culprit. We have a cron script that (among other things) recreates the aggregate function array_cat_aggregate()once every minute. My guess is that a) transactions do not guarantee a snapshot of custom functions and b) we got unlucky and the aggregate was deleted during the execution of the UPDATE.

I was just wondering if my analysis is correct, so I can make the necessary changes to production to prevent this from happening again.

Here is the query that produced the error:

UPDATE marbury_case SET
  components_vector = (
    SELECT array_cat_aggregate(component) FROM (
      SELECT ARRAY[
        id,
        type_id,
        "offset",
        length,
        internal_id,
        parent_id,
        right_sibling_id] AS component
      FROM marbury_component WHERE case_id = marbury_case.id ORDER BY id)
    AS foo),
  attributes_json = (
    SELECT array_to_json(array_agg(attributes || hstore('_ind', ind::text))) FROM (
      SELECT (rank() OVER (ORDER BY id)) - 1 AS ind, attributes
      FROM marbury_component
      WHERE case_id = marbury_case.id ORDER BY id)
    AS foo WHERE attributes IS NOT NULL
                 AND array_length(akeys(attributes), 1) > 0),
  vectors_updated = timeofday()::timestamp
WHERE id = 71865

Re: Getting "cache lookup failed for aggregate" error

From
Tom Lane
Date:
Patrick Krecker <patrick@judicata.com> writes:
> Hello everyone -- We received a strange error today on our production write
> master. During a routine maintenance script, we got the following error:

> "ERROR:  cache lookup failed for aggregate 5953992"

> I think I found the culprit. We have a cron script that (among other
> things) recreates the aggregate function array_cat_aggregate()once every
> minute.

Um.  Why's it do that?

> My guess is that a) transactions do not guarantee a snapshot of
> custom functions and b) we got unlucky and the aggregate was deleted during
> the execution of the UPDATE.

Yeah, there's no guarantee that dropping the aggregate wouldn't leave a
window for this type of failure.

9.4 might be a little better about this because it doesn't use SnapshotNow
for catalog fetches anymore, but I think you'd still be at some risk.

> Here is the query that produced the error:

It'd be more interesting to see what the cron script was doing to the
aggregate definition.

            regards, tom lane


Re: Getting "cache lookup failed for aggregate" error

From
Patrick Krecker
Date:



On Tue, Jun 24, 2014 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Patrick Krecker <patrick@judicata.com> writes:
> Hello everyone -- We received a strange error today on our production write
> master. During a routine maintenance script, we got the following error:

> "ERROR:  cache lookup failed for aggregate 5953992"

> I think I found the culprit. We have a cron script that (among other
> things) recreates the aggregate function array_cat_aggregate()once every
> minute.

Um.  Why's it do that?

Well, it's admittedly a hack. The cron script uses the function just after creating it. It was an easy way of guaranteeing that the database will have the function when it's necessary. However, I suppose that assumption is wrong because you have confirmed that aggregate functions are not part of the snapshot provided by beginning a transaction.
 

> My guess is that a) transactions do not guarantee a snapshot of
> custom functions and b) we got unlucky and the aggregate was deleted during
> the execution of the UPDATE.

Yeah, there's no guarantee that dropping the aggregate wouldn't leave a
window for this type of failure.

9.4 might be a little better about this because it doesn't use SnapshotNow
for catalog fetches anymore, but I think you'd still be at some risk.

> Here is the query that produced the error:

It'd be more interesting to see what the cron script was doing to the
aggregate definition.

FWIW the SQL is 

DROP AGGREGATE IF EXISTS array_cat_aggregate(anyarray);
CREATE AGGREGATE array_cat_aggregate(anyarray)  (
    SFUNC     = array_cat,
    STYPE     = anyarray,
    INITCOND  = '{}'
);

Followed by the other statement given in my previous email. But, I think you've thoroughly answered by question. Thanks!
 

                        regards, tom lane


Re: Getting "cache lookup failed for aggregate" error

From
Igor Neyman
Date:

 

 

 

FWIW the SQL is 

 

DROP AGGREGATE IF EXISTS array_cat_aggregate(anyarray);

CREATE AGGREGATE array_cat_aggregate(anyarray)  (

    SFUNC     = array_cat,

    STYPE     = anyarray,

    INITCOND  = '{}'

);

 

Followed by the other statement given in my previous email. But, I think you've thoroughly answered by question. Thanks!

 

 

So, instead of dropping aggregate “if exists” why not check pg_catalog for aggregate existence, and create it only if it does NOT exist?

 

Regards,

Igor Neyman