Thread: [HACKERS] Something is rotten in publication drop

[HACKERS] Something is rotten in publication drop

From
Tom Lane
Date:
I'm looking at this recent failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2017-06-08%2023%3A54%3A12
which is

*** /home/nm/farm/xlc32/HEAD/pgsql.build/src/test/regress/expected/matview.out    Thu Jun  8 23:55:50 2017
--- /home/nm/farm/xlc32/HEAD/pgsql.build/src/test/regress/results/matview.out    Fri Jun  9 00:18:12 2017
***************
*** 155,171 ****  SET search_path = mvtest_mvschema, public; \d+ mvtest_tvm
!                       Materialized view "mvtest_mvschema.mvtest_tvm"
!  Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description
! --------+---------+-----------+----------+---------+----------+--------------+-------------
!  type   | text    |           |          |         | extended |              |
!  totamt | numeric |           |          |         | main     |              |
! View definition:
!  SELECT mvtest_tv.type,
!     mvtest_tv.totamt
!    FROM mvtest_tv
!   ORDER BY mvtest_tv.type;
!  -- modify the underlying table data INSERT INTO mvtest_t VALUES (6, 'z', 13); -- confirm pre- and post-refresh
contentsof fairly simple materialized views 
--- 155,161 ----  SET search_path = mvtest_mvschema, public; \d+ mvtest_tvm
! ERROR:  publication "addr_pub" does not exist -- modify the underlying table data INSERT INTO mvtest_t VALUES (6,
'z',13); -- confirm pre- and post-refresh contents of fairly simple materialized views 

This appears to have something to do with the concurrently-running
object_address test script, which creates and then drops a publication
named "addr_pub".  However, there is no visible connection between
mvtest_tvm (or any of the objects it depends on) and addr_pub or any
of the objects it is told to publish.  So what happened here, and
isn't this a bug?
        regards, tom lane



Re: [HACKERS] Something is rotten in publication drop

From
Peter Eisentraut
Date:
On 6/8/17 23:53, Tom Lane wrote:
> --- 155,161 ----
>   
>   SET search_path = mvtest_mvschema, public;
>   \d+ mvtest_tvm
> ! ERROR:  publication "addr_pub" does not exist
>   -- modify the underlying table data
>   INSERT INTO mvtest_t VALUES (6, 'z', 13);
>   -- confirm pre- and post-refresh contents of fairly simple materialized views
> 
> This appears to have something to do with the concurrently-running
> object_address test script, which creates and then drops a publication
> named "addr_pub".  However, there is no visible connection between
> mvtest_tvm (or any of the objects it depends on) and addr_pub or any
> of the objects it is told to publish.  So what happened here, and
> isn't this a bug?

The \d+ command attempts to print out any publications that the relation
is part of.  To find the publications it is part of, it runs this query:
   "SELECT pub.pubname\n"   " FROM pg_catalog.pg_publication pub,\n"   "
pg_catalog.pg_get_publication_tables(pub.pubname)\n"  "WHERE relid = '%s'\n"   "ORDER BY 1;",
 

pg_get_publication_tables() calls GetPublicationByName(), which throws
this error.

So I suppose that if a publication is dropped between the time
pg_publication is read and the function is called, you could get this error.

How could we improve that?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Something is rotten in publication drop

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 6/8/17 23:53, Tom Lane wrote:
>> ! ERROR:  publication "addr_pub" does not exist

> The \d+ command attempts to print out any publications that the relation
> is part of.  To find the publications it is part of, it runs this query:

>     "SELECT pub.pubname\n"
>     " FROM pg_catalog.pg_publication pub,\n"
>     "      pg_catalog.pg_get_publication_tables(pub.pubname)\n"
>     "WHERE relid = '%s'\n"
>     "ORDER BY 1;",

> pg_get_publication_tables() calls GetPublicationByName(), which throws
> this error.

> So I suppose that if a publication is dropped between the time
> pg_publication is read and the function is called, you could get this error.

Yeah, I'd suspected as much but not tracked it down yet.

> How could we improve that?

What we've done in many comparable situations is to allow a
catalog-probing function to return NULL instead of failing
when handed an OID or other identifier that it can't locate.
Here it seems like pg_get_publication_tables() needs to use
missing_ok = TRUE and then return zero rows for a null result.
Arguably, a nonexistent publication publishes no tables, so
it's not clear that's not the Right Thing anyway.

BTW, isn't the above command a hugely inefficient way of finding
the publications for the target rel?  Unless you've got a rather
small number of rather restricted publications, seems like it's
going to take a long time.  Maybe we don't care too much about
manual invocations of \d+, but I bet somebody will carp if there's
not a better way to find this out.  Maybe a better answer is to
define a more suitable function pg_publications_for_table(relid)
and let it have the no-error-for-bad-OID behavior.
        regards, tom lane



Re: [HACKERS] Something is rotten in publication drop

From
Noah Misch
Date:
On Fri, Jun 09, 2017 at 11:45:58AM -0400, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > On 6/8/17 23:53, Tom Lane wrote:
> >> ! ERROR:  publication "addr_pub" does not exist
> 
> > The \d+ command attempts to print out any publications that the relation
> > is part of.  To find the publications it is part of, it runs this query:
> 
> >     "SELECT pub.pubname\n"
> >     " FROM pg_catalog.pg_publication pub,\n"
> >     "      pg_catalog.pg_get_publication_tables(pub.pubname)\n"
> >     "WHERE relid = '%s'\n"
> >     "ORDER BY 1;",
> 
> > pg_get_publication_tables() calls GetPublicationByName(), which throws
> > this error.
> 
> > So I suppose that if a publication is dropped between the time
> > pg_publication is read and the function is called, you could get this error.
> 
> Yeah, I'd suspected as much but not tracked it down yet.
> 
> > How could we improve that?
> 
> What we've done in many comparable situations is to allow a
> catalog-probing function to return NULL instead of failing
> when handed an OID or other identifier that it can't locate.
> Here it seems like pg_get_publication_tables() needs to use
> missing_ok = TRUE and then return zero rows for a null result.
> Arguably, a nonexistent publication publishes no tables, so
> it's not clear that's not the Right Thing anyway.
> 
> BTW, isn't the above command a hugely inefficient way of finding
> the publications for the target rel?  Unless you've got a rather
> small number of rather restricted publications, seems like it's
> going to take a long time.  Maybe we don't care too much about
> manual invocations of \d+, but I bet somebody will carp if there's
> not a better way to find this out.  Maybe a better answer is to
> define a more suitable function pg_publications_for_table(relid)
> and let it have the no-error-for-bad-OID behavior.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] Something is rotten in publication drop

From
Peter Eisentraut
Date:
On 6/9/17 11:45, Tom Lane wrote:
> What we've done in many comparable situations is to allow a
> catalog-probing function to return NULL instead of failing
> when handed an OID or other identifier that it can't locate.
> Here it seems like pg_get_publication_tables() needs to use
> missing_ok = TRUE and then return zero rows for a null result.

Why is it that dropping a publication in another session makes our local
view of things change in middle of a single SQL statement?  Is there
something we can change to address that?

> BTW, isn't the above command a hugely inefficient way of finding
> the publications for the target rel?  Unless you've got a rather
> small number of rather restricted publications, seems like it's
> going to take a long time.  Maybe we don't care too much about
> manual invocations of \d+, but I bet somebody will carp if there's
> not a better way to find this out.  Maybe a better answer is to
> define a more suitable function pg_publications_for_table(relid)
> and let it have the no-error-for-bad-OID behavior.

That would possibly be better (the current function was written for a
different use case), but it could have the same concurrency problem as
above.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Something is rotten in publication drop

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 6/9/17 11:45, Tom Lane wrote:
>> What we've done in many comparable situations is to allow a
>> catalog-probing function to return NULL instead of failing
>> when handed an OID or other identifier that it can't locate.
>> Here it seems like pg_get_publication_tables() needs to use
>> missing_ok = TRUE and then return zero rows for a null result.

> Why is it that dropping a publication in another session makes our local
> view of things change in middle of a single SQL statement?  Is there
> something we can change to address that?

Not easily.  The syscaches run on CatalogSnapshot time and may therefore
see changes not yet visible to the surrounding query's snapshot.  It's
possible that you could reimplement pg_get_publication_tables() to do
its own catalog access with the outer query's snapshot, but it'd be
pretty tedious and duplicative.

It strikes me that you could rewrite psql's query to just do its own
catalog search and not bother with the function at all.  It would have
to know a bit more about the catalog structure than it does now, but not
that much:

select pub.pubname from pg_catalog.pg_publication pub
where puballtables or exists(select 1 from pg_catalog.pg_publication_rel r        where r.prpubid = pub.oid and
r.prrelid= '%s');
 
        regards, tom lane



Re: [HACKERS] Something is rotten in publication drop

From
Peter Eisentraut
Date:
On 6/15/17 12:23, Tom Lane wrote:
> It strikes me that you could rewrite psql's query to just do its own
> catalog search and not bother with the function at all.  It would have
> to know a bit more about the catalog structure than it does now, but not
> that much:
> 
> select pub.pubname from pg_catalog.pg_publication pub
> where puballtables or
>   exists(select 1 from pg_catalog.pg_publication_rel r
>          where r.prpubid = pub.oid and r.prrelid = '%s');

We used to do something like that, but then people complained that that
was not absolutely accurate, because it did not exclude catalog tables
and related things properly.  See commit
2d460179baa8744e9e2a183a5121306596c53fba.  To do this properly, you need
to filter pg_class using is_publishable_class() (hitherto internal C
function).

The way this was originally written was for use by subscriptioncmds.c
fetch_table_list(), which generally only deals with a small number of
publications as the search key and wants to find all the relations.  The
psql use case is exactly the opposite: We start with a relation and want
to find all the publications.  The third use case is that we document
the view pg_publication_tables for general use, so depending on which
search key you start with, you might get terrible performance if you
have a lot of tables.

An academically nice way to write a general query for this would be:

CREATE VIEW pg_publication_tables AS
     SELECT
         p.pubname AS pubname,
         n.nspname AS schemaname,
         c.relname AS tablename,
         c.oid AS relid
     FROM pg_publication p
          JOIN pg_publication_rel pr ON p.oid = pr.prpubid
          JOIN pg_class c ON pr.prrelid = c.oid
          JOIN pg_namespace n ON c.relnamespace = n.oid
     UNION
     SELECT
         p.pubname AS pubname,
         n.nspname AS schemaname,
         c.relname AS tablename,
         c.oid AS relid
     FROM pg_publication p
          JOIN pg_class c ON p.puballtables AND
pg_is_relation_publishable(c.oid)
          JOIN pg_namespace n ON c.relnamespace = n.oid;

But looking at the plans this generates, it will do a sequential scan of
pg_class even if you look for a publication that is not puballtables,
which would suck for the subscriptioncmds.c use case.

We could use the above definition for the documented view and the psql
use case.

We could then create second view that uses the existing definition

CREATE VIEW pg_publication_tables AS
    SELECT
        P.pubname AS pubname,
        N.nspname AS schemaname,
        C.relname AS tablename
    FROM pg_publication P, pg_class C
         JOIN pg_namespace N ON (N.oid = C.relnamespace)
    WHERE C.oid IN (SELECT relid FROM pg_get_publication_tables(P.pubname));

for use in subscriptioncmds.c.  Or don't use a view for that.  But the
view is useful because we should preserve this interface across versions.

Or we throw away all the views and use custom code everywhere.

Patch for experimentation attached.

Any ideas?


-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Something is rotten in publication drop

From
Noah Misch
Date:
On Thu, Jun 15, 2017 at 02:04:04AM +0000, Noah Misch wrote:
> On Fri, Jun 09, 2017 at 11:45:58AM -0400, Tom Lane wrote:
> > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > > On 6/8/17 23:53, Tom Lane wrote:
> > >> ! ERROR:  publication "addr_pub" does not exist
> > 
> > > The \d+ command attempts to print out any publications that the relation
> > > is part of.  To find the publications it is part of, it runs this query:
> > 
> > >     "SELECT pub.pubname\n"
> > >     " FROM pg_catalog.pg_publication pub,\n"
> > >     "      pg_catalog.pg_get_publication_tables(pub.pubname)\n"
> > >     "WHERE relid = '%s'\n"
> > >     "ORDER BY 1;",
> > 
> > > pg_get_publication_tables() calls GetPublicationByName(), which throws
> > > this error.
> > 
> > > So I suppose that if a publication is dropped between the time
> > > pg_publication is read and the function is called, you could get this error.
> > 
> > Yeah, I'd suspected as much but not tracked it down yet.
> > 
> > > How could we improve that?
> > 
> > What we've done in many comparable situations is to allow a
> > catalog-probing function to return NULL instead of failing
> > when handed an OID or other identifier that it can't locate.
> > Here it seems like pg_get_publication_tables() needs to use
> > missing_ok = TRUE and then return zero rows for a null result.
> > Arguably, a nonexistent publication publishes no tables, so
> > it's not clear that's not the Right Thing anyway.
> > 
> > BTW, isn't the above command a hugely inefficient way of finding
> > the publications for the target rel?  Unless you've got a rather
> > small number of rather restricted publications, seems like it's
> > going to take a long time.  Maybe we don't care too much about
> > manual invocations of \d+, but I bet somebody will carp if there's
> > not a better way to find this out.  Maybe a better answer is to
> > define a more suitable function pg_publications_for_table(relid)
> > and let it have the no-error-for-bad-OID behavior.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] Something is rotten in publication drop

From
Peter Eisentraut
Date:
If there are no new insights, I plan to proceed with the attached patch
tomorrow.  This leaves the existing view and function alone, adds
pg_relation_is_publishable() and uses that in psql.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Something is rotten in publication drop

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> If there are no new insights, I plan to proceed with the attached patch
> tomorrow.  This leaves the existing view and function alone, adds
> pg_relation_is_publishable() and uses that in psql.

Hm, patch looks okay, but while eyeballing it I started to wonder
why in the world is pg_get_publication_tables marked prosecdef?
If that has any consequences at all, they're probably bad.
There are exactly no other built-in functions that have that set.
        regards, tom lane



Re: [HACKERS] Something is rotten in publication drop

From
Peter Eisentraut
Date:
On 6/19/17 21:57, Tom Lane wrote:
> but while eyeballing it I started to wonder
> why in the world is pg_get_publication_tables marked prosecdef?
> If that has any consequences at all, they're probably bad.
> There are exactly no other built-in functions that have that set.

That was quite likely a mistake.  I've fixed it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Something is rotten in publication drop

From
Peter Eisentraut
Date:
On 6/19/17 21:57, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> If there are no new insights, I plan to proceed with the attached patch
>> tomorrow.  This leaves the existing view and function alone, adds
>> pg_relation_is_publishable() and uses that in psql.
> 
> Hm, patch looks okay,

committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Something is rotten in publication drop

From
Robert Haas
Date:
On Mon, Jun 19, 2017 at 9:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> If there are no new insights, I plan to proceed with the attached patch
>> tomorrow.  This leaves the existing view and function alone, adds
>> pg_relation_is_publishable() and uses that in psql.
>
> Hm, patch looks okay, but while eyeballing it I started to wonder
> why in the world is pg_get_publication_tables marked prosecdef?
> If that has any consequences at all, they're probably bad.
> There are exactly no other built-in functions that have that set.

Should we add that to the opr_sanity tests?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Something is rotten in publication drop

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 19, 2017 at 9:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm, patch looks okay, but while eyeballing it I started to wonder
>> why in the world is pg_get_publication_tables marked prosecdef?
>> If that has any consequences at all, they're probably bad.
>> There are exactly no other built-in functions that have that set.

> Should we add that to the opr_sanity tests?

Yeah, I was wondering about that too.  I can imagine that someday
there will be prosecdef built-in functions ... but probably, there
would never be so many that maintaining the expected-results list
would be hard.
        regards, tom lane



Re: [HACKERS] Something is rotten in publication drop

From
Robert Haas
Date:
On Tue, Jun 20, 2017 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Jun 19, 2017 at 9:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Hm, patch looks okay, but while eyeballing it I started to wonder
>>> why in the world is pg_get_publication_tables marked prosecdef?
>>> If that has any consequences at all, they're probably bad.
>>> There are exactly no other built-in functions that have that set.
>
>> Should we add that to the opr_sanity tests?
>
> Yeah, I was wondering about that too.  I can imagine that someday
> there will be prosecdef built-in functions ... but probably, there
> would never be so many that maintaining the expected-results list
> would be hard.

And if it is, then we remove the test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Something is rotten in publication drop

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 20, 2017 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Should we add that to the opr_sanity tests?

>> Yeah, I was wondering about that too.  I can imagine that someday
>> there will be prosecdef built-in functions ... but probably, there
>> would never be so many that maintaining the expected-results list
>> would be hard.

> And if it is, then we remove the test.

Right.  I'll go make it so.
        regards, tom lane