Thread: vacuumdb parallel has a deadlock detected in 9.5.4

vacuumdb parallel has a deadlock detected in 9.5.4

From
huang
Date:
Hi friends,

There is a error deadlock detected in vacuumdb parallel .
but if I do not use parallel , it has not deadlock detected .


[postgres@vdedu1 ~]$ vacuumdb -fazev -j 10
SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;
vacuumdb: vacuuming database "mydb"
SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns
  WHERE relkind IN ('r', 'm') AND c.relnamespace = ns.oid
  ORDER BY c.relpages DESC;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_proc;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_depend;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_attribute;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_description;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_collation;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_statistic;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_class;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_operator;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_rewrite;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_type;
ERROR:  deadlock detected
DETAIL:  Process 19840 waits for AccessShareLock on relation 1249 of
database 16387; blocked by process 19841.
     Process 19841 waits for RowExclusiveLock on relation 2608 of
database 16387; blocked by process 19840.
     Process 19840: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_depend;
     Process 19841: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_attribute;
HINT:  See server log for query details.
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_depend;
vacuumdb: vacuuming of database "mydb" failed: ERROR:  deadlock detected
DETAIL:  Process 19840 waits for AccessShareLock on relation 1249 of
database 16387; blocked by process 19841.
Process 19841 waits for RowExclusiveLock on relation 2608 of database
16387; blocked by process 19840.
HINT:  See server log for query details.
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_proc;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_attribute;
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_description;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_collation;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_statistic;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_operator;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_rewrite;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_type;
[postgres@vdedu1 ~]$ LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_class;
ERROR:  canceling statement due to user request
FATAL:  connection to client lost
LOG:  could not send data to client: Broken pipe
ERROR:  canceling statement due to user request
FATAL:  connection to client lost

[postgres@vdedu1 ~]$
[postgres@vdedu1 ~]$ cat /proc/v
version      vmallocinfo  vmstat
[postgres@vdedu1 ~]$ cat /proc/version
Linux version 2.6.32-504.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org)
(gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Wed Oct 15
04:27:16 UTC 2014
[postgres@vdedu1 ~]$ vacuumdb --version
vacuumdb (PostgreSQL) 9.5.4
[postgres@vdedu1 ~]$ psql -c"select version();"
version

--------------------------------------------------------------------------------
--------------------------
  PostgreSQL 9.5.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (
Red Hat 4.4.7-11), 64-bit
(1 row)

[postgres@vdedu1 ~]$


2016-09-28
TT

Re: vacuumdb parallel has a deadlock detected in 9.5.4

From
hubert depesz lubaczewski
Date:
On Wed, Sep 28, 2016 at 09:46:22PM +0800, huang wrote:
>  Hi friends,
>
> There is a error deadlock detected in vacuumdb parallel .
> but if I do not use parallel , it has not deadlock detected .

man vacuumdb says about -j option:

>> Note that using this mode together with the -f (FULL) option might cause
>> deadlock failures if certain system catalogs are processed in parallel.

so, while I understand it's not pretty, it's well documented.

Best regards,

depesz

Re: vacuumdb parallel has a deadlock detected in 9.5.4

From
Alvaro Herrera
Date:
hubert depesz lubaczewski wrote:
> On Wed, Sep 28, 2016 at 09:46:22PM +0800, huang wrote:
> >  Hi friends,
> >
> > There is a error deadlock detected in vacuumdb parallel .
> > but if I do not use parallel , it has not deadlock detected .
>
> man vacuumdb says about -j option:
>
> >> Note that using this mode together with the -f (FULL) option might cause
> >> deadlock failures if certain system catalogs are processed in parallel.
>
> so, while I understand it's not pretty, it's well documented.

Yeah.  However I think this behavior is rather unhelpful.  Maybe we
should try to fix it somehow, but I'm not sure how.  We could say that
pg_catalog tables can only be processed one at a time, instead of trying
to paralelize them?  For example: have vacuumdb fill two lists of
tables, one for pg_catalog and one for tables in other schemas.  Each
worker chooses a table from the pg_catalog list first if it's not empty
and there's no other worker doing one of these, otherwise one from the
other table.

I think that'd fix it, while not destroying paralelizability too badly.

I'm not volunteering, though.  Patches welcome.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: vacuumdb parallel has a deadlock detected in 9.5.4

From
Francisco Olarte
Date:
Hi:

On Wed, Sep 28, 2016 at 10:39 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> > There is a error deadlock detected in vacuumdb parallel .
>> > but if I do not use parallel , it has not deadlock detected .
>> man vacuumdb says about -j option:
>>
>> >> Note that using this mode together with the -f (FULL) option might cause
>> >> deadlock failures if certain system catalogs are processed in parallel.
>> so, while I understand it's not pretty, it's well documented.
>
> Yeah.  However I think this behavior is rather unhelpful.  Maybe we
> should try to fix it somehow, but I'm not sure how.  We could say that
> pg_catalog tables can only be processed one at a time, instead of trying
> to paralelize them?  For example: have vacuumdb fill two lists of
> tables, one for pg_catalog and one for tables in other schemas.  Each
> worker chooses a table from the pg_catalog list first if it's not empty
> and there's no other worker doing one of these, otherwise one from the
> other table.
>
> I think that'd fix it, while not destroying paralelizability too badly.

I would propose another behaviour, which I think can solve the problem
without introducing more complex code. Put a couple of flags to vacuum
only catalog tables / non catalog tables ( I believe this can be
served by include/exclude schemas too, which will be even more useful
for other things ). This way one can do a full paralell vacuum of
non-catalog objects followed by a serial one on catalog objects (
which should not be too slow on 'normal' databases ) ( I propose this
order because IIRC normal vacuum updates catalog tables so they get
vacuumed after to tidy them )

Francisco Olarte

Re: vacuumdb parallel has a deadlock detected in 9.5.4

From
huang
Date:
CgpBdCAyMDE2LTA5LTI5IDE1OjU3OjU4LCAiRnJhbmNpc2NvIE9sYXJ0ZSIgPGZvbGFydGVAcGVv
cGxlY2FsbC5jb20+IHdyb3RlOgo+SGk6Cj4KPk9uIFdlZCwgU2VwIDI4LCAyMDE2IGF0IDEwOjM5
IFBNLCBBbHZhcm8gSGVycmVyYQo+PGFsdmhlcnJlQDJuZHF1YWRyYW50LmNvbT4gd3JvdGU6Cj4+
PiA+IFRoZXJlIGlzIGEgZXJyb3IgZGVhZGxvY2sgZGV0ZWN0ZWQgaW4gdmFjdXVtZGIgcGFyYWxs
ZWwgLgo+Pj4gPiBidXQgaWYgSSBkbyBub3QgdXNlIHBhcmFsbGVsICwgaXQgaGFzIG5vdCBkZWFk
bG9jayBkZXRlY3RlZCAuCj4+PiBtYW4gdmFjdXVtZGIgc2F5cyBhYm91dCAtaiBvcHRpb246Cj4+
Pgo+Pj4gPj4gTm90ZSB0aGF0IHVzaW5nIHRoaXMgbW9kZSB0b2dldGhlciB3aXRoIHRoZSAtZiAo
RlVMTCkgb3B0aW9uIG1pZ2h0IGNhdXNlCj4+PiA+PiBkZWFkbG9jayBmYWlsdXJlcyBpZiBjZXJ0
YWluIHN5c3RlbSBjYXRhbG9ncyBhcmUgcHJvY2Vzc2VkIGluIHBhcmFsbGVsLgo+Pj4gc28sIHdo
aWxlIEkgdW5kZXJzdGFuZCBpdCdzIG5vdCBwcmV0dHksIGl0J3Mgd2VsbCBkb2N1bWVudGVkLgo+
Pgo+PiBZZWFoLiAgSG93ZXZlciBJIHRoaW5rIHRoaXMgYmVoYXZpb3IgaXMgcmF0aGVyIHVuaGVs
cGZ1bC4gIE1heWJlIHdlCj4+IHNob3VsZCB0cnkgdG8gZml4IGl0IHNvbWVob3csIGJ1dCBJJ20g
bm90IHN1cmUgaG93LiAgV2UgY291bGQgc2F5IHRoYXQKPj4gcGdfY2F0YWxvZyB0YWJsZXMgY2Fu
IG9ubHkgYmUgcHJvY2Vzc2VkIG9uZSBhdCBhIHRpbWUsIGluc3RlYWQgb2YgdHJ5aW5nCj4+IHRv
IHBhcmFsZWxpemUgdGhlbT8gIEZvciBleGFtcGxlOiBoYXZlIHZhY3V1bWRiIGZpbGwgdHdvIGxp
c3RzIG9mCj4+IHRhYmxlcywgb25lIGZvciBwZ19jYXRhbG9nIGFuZCBvbmUgZm9yIHRhYmxlcyBp
biBvdGhlciBzY2hlbWFzLiAgRWFjaAo+PiB3b3JrZXIgY2hvb3NlcyBhIHRhYmxlIGZyb20gdGhl
IHBnX2NhdGFsb2cgbGlzdCBmaXJzdCBpZiBpdCdzIG5vdCBlbXB0eQo+PiBhbmQgdGhlcmUncyBu
byBvdGhlciB3b3JrZXIgZG9pbmcgb25lIG9mIHRoZXNlLCBvdGhlcndpc2Ugb25lIGZyb20gdGhl
Cj4+IG90aGVyIHRhYmxlLgo+Pgo+PiBJIHRoaW5rIHRoYXQnZCBmaXggaXQsIHdoaWxlIG5vdCBk
ZXN0cm95aW5nIHBhcmFsZWxpemFiaWxpdHkgdG9vIGJhZGx5Lgo+Cj5JIHdvdWxkIHByb3Bvc2Ug
YW5vdGhlciBiZWhhdmlvdXIsIHdoaWNoIEkgdGhpbmsgY2FuIHNvbHZlIHRoZSBwcm9ibGVtCj53
aXRob3V0IGludHJvZHVjaW5nIG1vcmUgY29tcGxleCBjb2RlLiBQdXQgYSBjb3VwbGUgb2YgZmxh
Z3MgdG8gdmFjdXVtCj5vbmx5IGNhdGFsb2cgdGFibGVzIC8gbm9uIGNhdGFsb2cgdGFibGVzICgg
SSBiZWxpZXZlIHRoaXMgY2FuIGJlCj5zZXJ2ZWQgYnkgaW5jbHVkZS9leGNsdWRlIHNjaGVtYXMg
dG9vLCB3aGljaCB3aWxsIGJlIGV2ZW4gbW9yZSB1c2VmdWwKPmZvciBvdGhlciB0aGluZ3MgKS4g
VGhpcyB3YXkgb25lIGNhbiBkbyBhIGZ1bGwgcGFyYWxlbGwgdmFjdXVtIG9mCj5ub24tY2F0YWxv
ZyBvYmplY3RzIGZvbGxvd2VkIGJ5IGEgc2VyaWFsIG9uZSBvbiBjYXRhbG9nIG9iamVjdHMgKAo+
d2hpY2ggc2hvdWxkIG5vdCBiZSB0b28gc2xvdyBvbiAnbm9ybWFsJyBkYXRhYmFzZXMgKSAoIEkg
cHJvcG9zZSB0aGlzCj5vcmRlciBiZWNhdXNlIElJUkMgbm9ybWFsIHZhY3V1bSB1cGRhdGVzIGNh
dGFsb2cgdGFibGVzIHNvIHRoZXkgZ2V0Cj52YWN1dW1lZCBhZnRlciB0byB0aWR5IHRoZW0gKQo+
Cgo+RnJhbmNpc2NvIE9sYXJ0ZQoKCgoKaGkgLCAKSSBqdXN0ICBob3BlICB0aGUgIHZhY3V1bWRi
ICBwYXJhbGxlbCBjYW4gcXVpY2tseSBmaW5pc2hpbmcgd29yayAgLiAgCm91ciBkYXRhYmFzZSBo
YXMgYSBsb3Qgb2YgdGFibGVzIC4KdGhleSBuZWVkIHZhY3V1bWRiIC4gYnV0IHRoZSB0aW1lIGlz
IHRvbyBsb25nIC4gCnNvICwgSSB1c2VkIHRoZSB2YWN1dW1kYiBwYXJhbGxlbCAuIAogCnRoYW5r
cyB0byBldmVyeW9uZSAuICAgOiApCgoKQmVzdCByZWdhcmRzLApUVAoyOC0wOS0yMDE2CgoKCg==

Re: vacuumdb parallel has a deadlock detected in 9.5.4

From
Alvaro Herrera
Date:
Francisco Olarte wrote:

> I would propose another behaviour, which I think can solve the problem
> without introducing more complex code. Put a couple of flags to vacuum
> only catalog tables / non catalog tables ( I believe this can be
> served by include/exclude schemas too, which will be even more useful
> for other things ). This way one can do a full paralell vacuum of
> non-catalog objects followed by a serial one on catalog objects (
> which should not be too slow on 'normal' databases )

OK, that sounds easier to implement.  Are you going to submit a patch?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: vacuumdb parallel has a deadlock detected in 9.5.4

From
Francisco Olarte
Date:
Hello all:

On Thu, Sep 29, 2016 at 7:10 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Francisco Olarte wrote:
>
>> I would propose another behaviour, which I think can solve the problem
>> without introducing more complex code. Put a couple of flags to vacuum
>> only catalog tables / non catalog tables ( I believe this can be
>> served by include/exclude schemas too, which will be even more useful
>> for other things ). This way one can do a full paralell vacuum of
>> non-catalog objects followed by a serial one on catalog objects (
>> which should not be too slow on 'normal' databases )
>
> OK, that sounds easier to implement.  Are you going to submit a patch?

After looking at vacuumdb source I'm trying to develop and submit a
patch implementing the following:

Two new flags --include-schema=NAME, --exclude-schema=NAME

Two new flags --system-schemas --non-system-schemas, as alias to
--include-schema=pg_catalog and --exclude-schema=pg_catalog

The intended behaviour will be:

- if a table list (--table) is specified, include / exclude schema
will be ignored ( potentially with a warning or may be an error if
more knowledgeable people steer me that way ).

- if both include and exclude are specified, include will take
preference (with warning or error ? ).

- if not the already existing code used to enumerate tables for
paralell vacuum will be augmented to either use only tables in the
include schemas or not use tables in the exclude schemas. If, after
enumeration, the table list becomes empty an error will be produced (
think on --include-schema=oops-I-mistyped-it ). The filtering will be
done client side and will be case sensitive against the schema name
name returned by the queries to the catalogs ( no case folding, to
make it simple ).

- After that vacuumdb will proceed as it would if the table list where
given in the command line, either in serial or paralell mode.

This does not intend to solve the paralell vacuum deadlocks, but it
can let people avoid them by just doing

vacuumdb -f --exclude-schema=pg_catalog -j 20
vacuumdb -f --include-schema=pg_catalog

And vaccuuming catalogs is not going to benefit that much from
paralellism as it is normally fast.

Any comment ?

Francisco Olarte.

Re: vacuumdb parallel has a deadlock detected in 9.5.4

From
Euler Taveira
Date:
On 01-10-2016 12:57, Francisco Olarte wrote:
> On Thu, Sep 29, 2016 at 7:10 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Francisco Olarte wrote:
>>
>>> I would propose another behaviour, which I think can solve the problem
>>> without introducing more complex code. Put a couple of flags to vacuum
>>> only catalog tables / non catalog tables ( I believe this can be
>>> served by include/exclude schemas too, which will be even more useful
>>> for other things ). This way one can do a full paralell vacuum of
>>> non-catalog objects followed by a serial one on catalog objects (
>>> which should not be too slow on 'normal' databases )
>>
>> OK, that sounds easier to implement.  Are you going to submit a patch?
>
> After looking at vacuumdb source I'm trying to develop and submit a
> patch implementing the following:
>
> Two new flags --include-schema=NAME, --exclude-schema=NAME
>
> Two new flags --system-schemas --non-system-schemas, as alias to
> --include-schema=pg_catalog and --exclude-schema=pg_catalog
>
This is another feature. Also, it will *sometimes* solve the problem.
Why don't you implement your parallel + serialization idea or Alvaro's
two lists idea? I'm not against adding schema options, I just think it's
not the right way to solve the deadlock problem.


--
   Euler Taveira                   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: vacuumdb parallel has a deadlock detected in 9.5.4

From
Francisco Olarte
Date:
Hi Euler:

On Sun, Oct 2, 2016 at 5:33 AM, Euler Taveira <euler@timbira.com.br> wrote:

> This is another feature.

Yes, which allows an easier workaround around the problem. Personally
I do not need it because I do not vacuumm paralell and I can easily
write a simple dozen lines in perl which achieves it, but I feel it
can help people like  the OP.

> Also, it will *sometimes* solve the problem.

No, it does not solve the problem ( paralell vacuum of system catalogs
can still deadlock ). What it does is let you transform a deadlockish
operation ( full paralell database wide vacuum ) into a non
deadlockish one ( full paralell non-system plus linear system )
without too much effort.

> Why don't you implement your parallel + serialization idea or Alvaro's
> two lists idea?

Well, I want a servant, not a nanny. If I command vacuum db to do full
paralell system catalog vacuum, I want it to do it ( although I WILL
tolerate a warning and MAY tolerate a confirmation request ).

Besides, I proposed the schema solution because I thought that
paralell vacuum would need some code building a table list, and the -t
option will need code to process tables from a list. Code examination
revealed I was correct in both. Adding the options is a simple matter
of adding some simple code ( a couple options, a couple string lists
to hold their data, a couple ifs in the table scanning loop used when
no table list given, a modification to make it happen when schema
optins are given ( presently it plays only when no -t options and
paralell is requested, I need to activate it when schema options are
present too, either in serial or paralell mode ), and the working code
remains untouched). I could refactor a bit to make several runs
easier, have a built in list of system data, and many other things,
but this is a deeper modification which I do not feel presently
prepared to attack. I may do it in the future, but I feel anyone
needing to run full paralell vacuums has a big system and should have
this kind of ops scripted, so it is not going to be that useful.

Bear in mind my work is not specifically in postgres like yours, Its
just a tool. I've been using it since before it had SQL, and have some
experience with C and libpq, but I do not feel confortable adding
complex code even to a simple client program.

> I'm not against adding schema options, I just think it's
> not the right way to solve the deadlock problem.

As I said, it does not solve the problem, just builds an easier path
than the present ones around it ( you can do exactly the same by
querying the catalog on a script and emitting two vacuumdb commands
with a table list ).

Francisco Olarte.