Thread: Test "tablespace" fails during `make installcheck` on master-replica setup

Test "tablespace" fails during `make installcheck` on master-replica setup

From
Aleksander Alekseev
Date:
Hello.

I noticed, that `make installcheck` fails on my laptop with following
errors:

http://afiskon.ru/s/98/6f94ce2cfa_regression.out.txt
http://afiskon.ru/s/b3/d0da05597e_regression.diffs.txt

My first idea was to use `git bisect`. It turned out that this issue
reproduces on commits back from 2015 as well (older versions don't compile
on my laptop). However it reproduces rarely and with different errors:

http://afiskon.ru/s/8e/1ad2c8ed2b_regression.diffs-8c48375.txt

Here are scripts I use to compile and test PostgreSQL:

https://github.com/afiskon/pgscripts

Exact steps to reproduce are:

```
./quick-build.sh && ./install.sh && make installcheck
```

Completely removing all `configure` flags doesn't make any difference.
Issue reproduces only on master-replica setup i.e. if instead of
install.sh you run ./single-install.sh all tests pass.

I'm using Arch Linux and GCC 6.2.1.

Any ideas what can cause this issue?

--
Best regards,
Aleksander Alekseev

Re: Test "tablespace" fails during `make installcheck` on master-replica setup

From
Michael Paquier
Date:
On Wed, Dec 07, 2016 at 03:18:59PM +0300, Aleksander Alekseev wrote:
> I noticed, that `make installcheck` fails on my laptop with following
> errors:
>
> http://afiskon.ru/s/98/6f94ce2cfa_regression.out.txt
> http://afiskon.ru/s/b3/d0da05597e_regression.diffs.txt

The interesting bit for the archives:

*** /home/eax/work/postgrespro/postgresql-src/src/test/regress/expected/tablespace.out    2016-12-07 13:53:44.000728436
+0300
--- /home/eax/work/postgrespro/postgresql-src/src/test/regress/results/tablespace.out    2016-12-07 13:53:46.150728558
+0300
***************
*** 66,71 ****
--- 66,72 ----
INSERT INTO testschema.test_default_tab VALUES (1);
CREATE INDEX test_index1 on testschema.test_default_tab (id);
CREATE INDEX test_index2 on testschema.test_default_tab (id) TABLESPACE regress_tblspace;
+ ERROR:  could not open file "pg_tblspc/16395/PG_10_201612061/16393/16407": No such file or directory
\d testschema.test_index1

> Any ideas what can cause this issue?

In the same host, primary and standby will try to use the tablespace
in the same path. That's the origin of this breakage.
--
Michael

Re: Test "tablespace" fails during `make installcheck` on master-replica setup

From
Aleksander Alekseev
Date:
> In the same host, primary and standby will try to use the tablespace
> in the same path. That's the origin of this breakage.

Sorry, I don't follow. Don't master and replica use different
directories to store _all_ data? Particularly in my case:

```
$ find path/to/postgresql-install/ -type d -name pg_tblspc
/home/eax/work/postgrespro/postgresql-install/data-slave/pg_tblspc
/home/eax/work/postgrespro/postgresql-install/data-master/pg_tblspc
```

Where exactly a collision happens?

On Wed, Dec 07, 2016 at 09:39:20PM +0900, Michael Paquier wrote:
> On Wed, Dec 07, 2016 at 03:18:59PM +0300, Aleksander Alekseev wrote:
> > I noticed, that `make installcheck` fails on my laptop with following
> > errors:
> >
> > http://afiskon.ru/s/98/6f94ce2cfa_regression.out.txt
> > http://afiskon.ru/s/b3/d0da05597e_regression.diffs.txt
>
> The interesting bit for the archives:
>
> *** /home/eax/work/postgrespro/postgresql-src/src/test/regress/expected/tablespace.out    2016-12-07
13:53:44.000728436+0300 
> --- /home/eax/work/postgrespro/postgresql-src/src/test/regress/results/tablespace.out    2016-12-07
13:53:46.150728558+0300 
> ***************
> *** 66,71 ****
> --- 66,72 ----
> INSERT INTO testschema.test_default_tab VALUES (1);
> CREATE INDEX test_index1 on testschema.test_default_tab (id);
> CREATE INDEX test_index2 on testschema.test_default_tab (id) TABLESPACE regress_tblspace;
> + ERROR:  could not open file "pg_tblspc/16395/PG_10_201612061/16393/16407": No such file or directory
> \d testschema.test_index1
>
> > Any ideas what can cause this issue?
>
> In the same host, primary and standby will try to use the tablespace
> in the same path. That's the origin of this breakage.
> --
> Michael



--
Best regards,
Aleksander Alekseev

Re: Test "tablespace" fails during `make installcheck` on master-replica setup

From
Michael Paquier
Date:
On Wed, Dec 07, 2016 at 03:42:53PM +0300, Aleksander Alekseev wrote:
> > In the same host, primary and standby will try to use the tablespace
> > in the same path. That's the origin of this breakage.
>
> Sorry, I don't follow. Don't master and replica use different
> directories to store _all_ data? Particularly in my case:
>
> ```
> $ find path/to/postgresql-install/ -type d -name pg_tblspc
> /home/eax/work/postgrespro/postgresql-install/data-slave/pg_tblspc
> /home/eax/work/postgrespro/postgresql-install/data-master/pg_tblspc
> ```
>
> Where exactly a collision happens?

At the location of the tablespaces, pg_tblspc just stores symlinks to
the place data is stored, and both point to the same path, the same path
being stream to the standby when replaying the create tablespace record.
--
Michael

Re: Test "tablespace" fails during `make installcheck` on master-replica setup

From
Stephen Frost
Date:
Michael, all,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Wed, Dec 07, 2016 at 03:42:53PM +0300, Aleksander Alekseev wrote:
> > > In the same host, primary and standby will try to use the tablespace
> > > in the same path. That's the origin of this breakage.
> >
> > Sorry, I don't follow. Don't master and replica use different
> > directories to store _all_ data? Particularly in my case:
> >
> > ```
> > $ find path/to/postgresql-install/ -type d -name pg_tblspc
> > /home/eax/work/postgrespro/postgresql-install/data-slave/pg_tblspc
> > /home/eax/work/postgrespro/postgresql-install/data-master/pg_tblspc
> > ```
> >
> > Where exactly a collision happens?
>
> At the location of the tablespaces, pg_tblspc just stores symlinks to
> the place data is stored, and both point to the same path, the same path
> being stream to the standby when replaying the create tablespace record.

It would be really nice if we would detect that some other postmaster is
already using a given tablespace directory and to throw an error and
complain rather than starting up thinking everything is fine.

We do that already for $PGDATA, of course, but not tablespaces.

Thanks!

Stephen

Stephen Frost <sfrost@snowman.net> writes:
> It would be really nice if we would detect that some other postmaster is
> already using a given tablespace directory and to throw an error and
> complain rather than starting up thinking everything is fine.

In principle, we could have the postmaster run through $PGDATA/pg_tblspc
and drop a lockfile into each referenced directory.  But the devil is in
the details --- in particular, not sure how to get the right thing to
happen during a CREATE TABLESPACE.  Also, I kinda doubt that this is going
to fix anything for the replica-on-same-machine problem.
        regards, tom lane



Re: Test "tablespace" fails during `make installcheck` on master-replica setup

From
Michael Paquier
Date:
On Thu, Dec 8, 2016 at 12:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> It would be really nice if we would detect that some other postmaster is
>> already using a given tablespace directory and to throw an error and
>> complain rather than starting up thinking everything is fine.
>
> In principle, we could have the postmaster run through $PGDATA/pg_tblspc
> and drop a lockfile into each referenced directory.  But the devil is in
> the details --- in particular, not sure how to get the right thing to
> happen during a CREATE TABLESPACE.  Also, I kinda doubt that this is going
> to fix anything for the replica-on-same-machine problem.

That's where having a node-based ID would become helpful, which is
different from the global system ID. Ages ago when working on
Postgres-XC, we took care of this problem by appending to the
tablespace folder name, the one prefixed with PGXX, a suffix using a
node name. When applying this concept to PG, we could have standbys to
set up this node ID each time recovery is done using a backup_label.
This won't solve the problem of tablespaces already created, that
should be handled by users when taking the base backup by remapping
them. But it would adress the problems for newly-created ones.
-- 
Michael