Different results from identical matviews - Mailing list pgsql-general

From Anders Steinlein
Subject Different results from identical matviews
Date
Msg-id CAC35HNnNGavaZ=P=rUcwTwYEhfoyXDg32REXCRDgxBmC3No3nA@mail.gmail.com
Whole thread Raw
Responses Re: Different results from identical matviews
List pgsql-general
Hi folks,

We have a materialized view from which a customer reported some confusing/invalid results, leading us to inspect the query and not finding anything wrong. Running the query defining the matview manually, or creating a new (identical) materialized view returns the correct result. Obviously, we've done REFRESH MATERIALIZED VIEW just before doing the comparison, and all runs are in the same schema.

It's a pretty big query, but let's describe the two matviews to see that they are identical. The first is the original returning invalid results, the one with _2 name postfix is the re-created one.

mm_prod=> \d+ segments_with_contacts
                       Materialized view "aakpnews.segments_with_contacts"
 Column |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description
--------+---------------+-----------+----------+---------+----------+--------------+-------------
 lid    | integer       |           |          |         | plain    |              |
 sid    | integer       |           |          |         | plain    |              |
 email  | public.citext |           |          |         | extended |              |
Indexes:
    "segments_with_contacts_sid_lid_email_idx" UNIQUE, btree (sid, lid, email)
View definition:
 WITH tagged_contacts AS (
         SELECT cl.lid,
            cl.email,
            cl.skip_preexisting_campaigns AS skip_subscribed,
            ct.skip_preexisting_campaigns AS skip_tags,
            cl.ladded,
            ct.tagname,
            ct.created
           FROM contacts_lists cl
             LEFT JOIN contacts_tags ct USING (email)
          WHERE cl.lstatus::bpchar = 'a'::bpchar
        ), tagged_segments AS (
         SELECT s.lid,
            s.cid,
            s.sid,
            sp.type,
            sp.mid,
            sp.matchdelay,
            sp.tagname,
            sp.event,
            count(*) OVER (PARTITION BY s.sid) AS requirements,
            campaigns.activated_at
           FROM segments s
             LEFT JOIN campaigns USING (cid)
             JOIN segments_predicates sp USING (sid)
          WHERE s.archived_at IS NULL AND (s.cid IS NULL OR campaigns.activated_at IS NOT NULL)
        ), segments_contacts AS (
         SELECT s.lid,
            s.sid,
            s.requirements,
                CASE
                    WHEN s.type = 'subscribed'::public.predicate THEN ( SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg
                       FROM tagged_contacts
                      WHERE tagged_contacts.lid = s.lid AND tagged_contacts.ladded >= s.activated_at AND (s.matchdelay IS NULL OR (tagged_contacts.ladded + s.matchdelay) < now()) AND NOT COALESCE(tagged_contacts.skip_subscribed, false))
                    WHEN s.type = 'has_tag'::public.predicate THEN ( SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg
                       FROM tagged_contacts
                      WHERE tagged_contacts.lid = s.lid AND tagged_contacts.tagname OPERATOR(public.=) s.tagname AND (s.matchdelay IS NULL OR (tagged_contacts.created + s.matchdelay) < now()) AND (s.cid IS NULL OR tagged_contacts.created >= s.activated_at AND NOT COALESCE(tagged_contacts.skip_tags, false)))
                    WHEN s.type = 'not_has_tag'::public.predicate THEN ( SELECT array_agg(aggregated_tags.email::public.citext) AS array_agg
                       FROM ( SELECT tagged_contacts.email,
                                array_agg(tagged_contacts.tagname) AS tags
                               FROM tagged_contacts
                              WHERE tagged_contacts.lid = s.lid AND (s.cid IS NULL OR tagged_contacts.ladded >= s.activated_at AND NOT COALESCE(tagged_contacts.skip_subscribed, false))
                              GROUP BY tagged_contacts.email) aggregated_tags
                      WHERE NOT aggregated_tags.tags @> ARRAY[s.tagname])
                    WHEN s.type = 'received'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_sent.email::public.citext) AS array_agg
                       FROM mails_contacts_sent
                      WHERE mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now()))
                    WHEN s.type = 'not_received'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg
                       FROM ( SELECT tagged_contacts.email
                               FROM tagged_contacts
                              WHERE tagged_contacts.lid = s.lid
                            EXCEPT
                             SELECT DISTINCT mails_contacts_sent.email
                               FROM mails_contacts_sent
                              WHERE mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())) x)
                    WHEN s.type = 'opened'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_opens.email::public.citext) AS array_agg
                       FROM mails_contacts_opens
                      WHERE mails_contacts_opens.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_opens.opentime + s.matchdelay) < now()))
                    WHEN s.type = 'not_opened'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg
                       FROM ( SELECT tagged_contacts.email
                               FROM tagged_contacts
                                 JOIN mails_contacts_sent USING (email)
                              WHERE tagged_contacts.lid = s.lid AND mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())
                            EXCEPT
                             SELECT DISTINCT mails_contacts_opens.email
                               FROM mails_contacts_opens
                              WHERE mails_contacts_opens.mid = s.mid) x)
                    WHEN s.type = 'clicked'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_clicks.email::public.citext) AS array_agg
                       FROM mails_contacts_clicks
                      WHERE mails_contacts_clicks.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_clicks.clicktime + s.matchdelay) < now()))
                    WHEN s.type = 'not_clicked'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg
                       FROM ( SELECT tagged_contacts.email
                               FROM tagged_contacts
                                 JOIN mails_contacts_sent USING (email)
                              WHERE tagged_contacts.lid = s.lid AND mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())
                            EXCEPT
                             SELECT DISTINCT mails_contacts_clicks.email
                               FROM mails_contacts_clicks
                              WHERE mails_contacts_clicks.mid = s.mid) x)
                    WHEN s.type = 'event_triggered_first'::public.predicate THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg
                       FROM ( SELECT e.email
                               FROM events e
                                 JOIN tagged_contacts USING (email)
                              WHERE tagged_contacts.lid = s.lid AND e.event OPERATOR(public.=) s.event
                              GROUP BY e.email
                             HAVING (s.matchdelay IS NULL OR (min(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR min(e.triggered_at) >= s.activated_at)) x)
                    WHEN s.type = 'event_triggered_last'::public.predicate THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg
                       FROM ( SELECT e.email
                               FROM events e
                                 JOIN tagged_contacts USING (email)
                              WHERE tagged_contacts.lid = s.lid AND e.event OPERATOR(public.=) s.event
                              GROUP BY e.email
                             HAVING (s.matchdelay IS NULL OR (max(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR max(e.triggered_at) >= s.activated_at)) x)
                    ELSE NULL::public.citext[]
                END AS emails
           FROM tagged_segments s
        ), unnested AS (
         SELECT segments_contacts.lid,
            segments_contacts.sid,
            segments_contacts.requirements,
            unnest(segments_contacts.emails) AS email
           FROM segments_contacts
        )
 SELECT unnested.lid,
    unnested.sid,
    unnested.email
   FROM unnested
  GROUP BY unnested.lid, unnested.sid, unnested.email, unnested.requirements
 HAVING count(unnested.email) = unnested.requirements;
Access method: heap


mm_prod=> \d+ segments_with_contacts_2
                      Materialized view "aakpnews.segments_with_contacts_2"
 Column |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description
--------+---------------+-----------+----------+---------+----------+--------------+-------------
 lid    | integer       |           |          |         | plain    |              |
 sid    | integer       |           |          |         | plain    |              |
 email  | public.citext |           |          |         | extended |              |
View definition:
 WITH tagged_contacts AS (
         SELECT cl.lid,
            cl.email,
            cl.skip_preexisting_campaigns AS skip_subscribed,
            ct.skip_preexisting_campaigns AS skip_tags,
            cl.ladded,
            ct.tagname,
            ct.created
           FROM contacts_lists cl
             LEFT JOIN contacts_tags ct USING (email)
          WHERE cl.lstatus::bpchar = 'a'::bpchar
        ), tagged_segments AS (
         SELECT s.lid,
            s.cid,
            s.sid,
            sp.type,
            sp.mid,
            sp.matchdelay,
            sp.tagname,
            sp.event,
            count(*) OVER (PARTITION BY s.sid) AS requirements,
            campaigns.activated_at
           FROM segments s
             LEFT JOIN campaigns USING (cid)
             JOIN segments_predicates sp USING (sid)
          WHERE s.archived_at IS NULL AND (s.cid IS NULL OR campaigns.activated_at IS NOT NULL)
        ), segments_contacts AS (
         SELECT s.lid,
            s.sid,
            s.requirements,
                CASE
                    WHEN s.type = 'subscribed'::public.predicate THEN ( SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg
                       FROM tagged_contacts
                      WHERE tagged_contacts.lid = s.lid AND tagged_contacts.ladded >= s.activated_at AND (s.matchdelay IS NULL OR (tagged_contacts.ladded + s.matchdelay) < now()) AND NOT COALESCE(tagged_contacts.skip_subscribed, false))
                    WHEN s.type = 'has_tag'::public.predicate THEN ( SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg
                       FROM tagged_contacts
                      WHERE tagged_contacts.lid = s.lid AND tagged_contacts.tagname OPERATOR(public.=) s.tagname AND (s.matchdelay IS NULL OR (tagged_contacts.created + s.matchdelay) < now()) AND (s.cid IS NULL OR tagged_contacts.created >= s.activated_at AND NOT COALESCE(tagged_contacts.skip_tags, false)))
                    WHEN s.type = 'not_has_tag'::public.predicate THEN ( SELECT array_agg(aggregated_tags.email::public.citext) AS array_agg
                       FROM ( SELECT tagged_contacts.email,
                                array_agg(tagged_contacts.tagname) AS tags
                               FROM tagged_contacts
                              WHERE tagged_contacts.lid = s.lid AND (s.cid IS NULL OR tagged_contacts.ladded >= s.activated_at AND NOT COALESCE(tagged_contacts.skip_subscribed, false))
                              GROUP BY tagged_contacts.email) aggregated_tags
                      WHERE NOT aggregated_tags.tags @> ARRAY[s.tagname])
                    WHEN s.type = 'received'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_sent.email::public.citext) AS array_agg
                       FROM mails_contacts_sent
                      WHERE mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now()))
                    WHEN s.type = 'not_received'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg
                       FROM ( SELECT tagged_contacts.email
                               FROM tagged_contacts
                              WHERE tagged_contacts.lid = s.lid
                            EXCEPT
                             SELECT DISTINCT mails_contacts_sent.email
                               FROM mails_contacts_sent
                              WHERE mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())) x)
                    WHEN s.type = 'opened'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_opens.email::public.citext) AS array_agg
                       FROM mails_contacts_opens
                      WHERE mails_contacts_opens.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_opens.opentime + s.matchdelay) < now()))
                    WHEN s.type = 'not_opened'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg
                       FROM ( SELECT tagged_contacts.email
                               FROM tagged_contacts
                                 JOIN mails_contacts_sent USING (email)
                              WHERE tagged_contacts.lid = s.lid AND mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())
                            EXCEPT
                             SELECT DISTINCT mails_contacts_opens.email
                               FROM mails_contacts_opens
                              WHERE mails_contacts_opens.mid = s.mid) x)
                    WHEN s.type = 'clicked'::public.predicate THEN ( SELECT array_agg(DISTINCT mails_contacts_clicks.email::public.citext) AS array_agg
                       FROM mails_contacts_clicks
                      WHERE mails_contacts_clicks.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_clicks.clicktime + s.matchdelay) < now()))
                    WHEN s.type = 'not_clicked'::public.predicate THEN ( SELECT array_agg(x.email::public.citext) AS array_agg
                       FROM ( SELECT tagged_contacts.email
                               FROM tagged_contacts
                                 JOIN mails_contacts_sent USING (email)
                              WHERE tagged_contacts.lid = s.lid AND mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) < now())
                            EXCEPT
                             SELECT DISTINCT mails_contacts_clicks.email
                               FROM mails_contacts_clicks
                              WHERE mails_contacts_clicks.mid = s.mid) x)
                    WHEN s.type = 'event_triggered_first'::public.predicate THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg
                       FROM ( SELECT e.email
                               FROM events e
                                 JOIN tagged_contacts USING (email)
                              WHERE tagged_contacts.lid = s.lid AND e.event OPERATOR(public.=) s.event
                              GROUP BY e.email
                             HAVING (s.matchdelay IS NULL OR (min(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR min(e.triggered_at) >= s.activated_at)) x)
                    WHEN s.type = 'event_triggered_last'::public.predicate THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg
                       FROM ( SELECT e.email
                               FROM events e
                                 JOIN tagged_contacts USING (email)
                              WHERE tagged_contacts.lid = s.lid AND e.event OPERATOR(public.=) s.event
                              GROUP BY e.email
                             HAVING (s.matchdelay IS NULL OR (max(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR max(e.triggered_at) >= s.activated_at)) x)
                    ELSE NULL::public.citext[]
                END AS emails
           FROM tagged_segments s
        ), unnested AS (
         SELECT segments_contacts.lid,
            segments_contacts.sid,
            segments_contacts.requirements,
            unnest(segments_contacts.emails) AS email
           FROM segments_contacts
        )
 SELECT unnested.lid,
    unnested.sid,
    unnested.email
   FROM unnested
  GROUP BY unnested.lid, unnested.sid, unnested.email, unnested.requirements
 HAVING count(unnested.email) = unnested.requirements;
Access method: heap

Here you can see the results are different:

mm_prod=> begin;
BEGIN
mm_prod=> refresh materialized view segments_with_contacts;
REFRESH MATERIALIZED VIEW
mm_prod=> select count(*) from segments_with_contacts where sid = 42259;
 count
-------
    91
(1 row)

mm_prod=> refresh materialized view segments_with_contacts_2;
REFRESH MATERIALIZED VIEW
mm_prod=> select count(*) from segments_with_contacts_2 where sid = 42259;
 count
-------
    98
(1 row)


How can we figure out what's wrong here? Since it's a materialized view using EXPLAIN doesn't give me much to go on:

mm_prod=> explain select count(*) from segments_with_contacts where sid = 42259;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Aggregate  (cost=83.52..83.53 rows=1 width=8)
   ->  Seq Scan on segments_with_contacts  (cost=0.00..83.29 rows=91 width=0)
         Filter: (sid = 42259)
(3 rows)

mm_prod=> explain select count(*) from segments_with_contacts_2 where sid = 42259;
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Aggregate  (cost=87.46..87.47 rows=1 width=8)
   ->  Seq Scan on segments_with_contacts_2  (cost=0.00..87.20 rows=105 width=0)
         Filter: (sid = 42259)
(3 rows)

mm_prod=> select count(*) from segments_with_contacts_2 where sid = 42259;
 count
-------
    98
(1 row)


Any insights into this, and how to debug this further, is much appreciated.

Best,
-- a.

pgsql-general by date:

Previous
From: FOUTE K. Jaurès
Date:
Subject: Re: PostgreSQL 12 - ERROR: could not rename file "pg_logical/snapshots/1A-7C00D890.snap.13594.tmp" to "pg_logical/snapshots/1A-7C00D890.snap": No space left on device
Next
From: Adrian Klaver
Date:
Subject: Re: PostgreSQL 12 - ERROR: could not rename file "pg_logical/snapshots/1A-7C00D890.snap.13594.tmp" to "pg_logical/snapshots/1A-7C00D890.snap": No space left on device