Thread: DBLink: interesting issue
I am using dbLink library to synchronize tables in two databases. Below are two identical in terms of their results queries, one is using NOT EXISTS and another NOT IN. The later one is 100 time slower then the former one according to EXPLAIN. However, the former one throws an interesting exception every time I run it. Interestingly, it works fine if the local table is empty. Any ideas why it happens?
replica=> SELECT objectid
FROM activity
WHERE
NOT EXISTS(SELECT remoteid
FROM (SELECT CAST (dblink_tok(t1.dblink_p, 0) AS int8) AS remoteid
FROM (SELECT CAST (dblink_tok(t1.dblink_p, 0) AS int8) AS remoteid
FROM (SELECT dblink(dblink_settings,'SELECT objectid FROM activity') AS dblink_p
) t1
) a1
WHERE remoteid=activity.objectid);
ERROR: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
replica=> SELECT objectid
FROM activity
WHERE
objectid NOT IN (SELECT remoteid
FROM (SELECT CAST (dblink_tok(t1.dblink_p, 0) AS int8) AS remoteid
FROM (SELECT dblink(dblink_settings,'SELECT objectid FROM activity') AS dblink_p
) t1
) a1
);
objectid
----------
(0 rows)
objectid
----------
(0 rows)
Oleg Lebedev wrote: > I am using dbLink library to synchronize tables in two databases. Below > are two identical in terms of their results queries, one is using > NOT EXISTS and another NOT IN. The later one is 100 time slower then the > former one according to EXPLAIN. However, the former one throws an > interesting exception every time I run it. Interestingly, it works fine > if the local table is empty. Any ideas why it happens? I need more information. What version of PostgreSQL, and is dblink the version distributed with Postgres, or is it a later copy from CVS? I have a machine running 7.2.2 and a CVS (Version 0.4, ~mid April, 2002) copy of dblink, and cannot reproduce the failure: testmst=# select version(); version ------------------------------------------------------------- PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.96 (1 row) testslv=# SELECT objectid FROM activity WHERE NOT EXISTS (SELECT remoteid FROM (SELECT CAST (dblink_tok(t1.dblink_p, 0) AS int8) AS remoteid FROM (SELECT dblink('dbname=testmst','SELECT objectid FROM activity') AS dblink_p) t1) a1 WHERE remoteid=activity.objectid); objectid ---------- (0 rows) Early in 7.3 development I updated dblink in a 7.2 backward compatible way. Unfortunately, it wasn't practical to keep it backward compatible to 7.2.x, so the current CVS HEAD version will no longer work with 7.2.x. If you would like to try dblink Version 0.4, let me know and I'll send you a patch. As a side note, 7.3 (currently in beta) has much improved semantics for use of dblink, made possible by the new table function capability. E.g.: regression_slave=# select version(); version ---------------------------------------------------------------------------------------------------------------- PostgreSQL 7.3b1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.1 20020205 (Red Hat Linux Rawhide 3.1-0.21) (1 row) regression_slave=# SELECT objectid FROM activity WHERE NOT EXISTS (SELECT remoteid FROM dblink('dbname=regression','SELECT objectid FROM activity') AS t1(remoteid int8) WHERE remoteid=activity.objectid); objectid ---------- 5 (1 row) Or even better performance-wise: regression_slave=# SELECT objectid FROM activity WHERE NOT EXISTS (SELECT remoteid FROM dblink('dbname=regression','SELECT objectid FROM activity WHERE objectid = ' || activity.objectid) AS t1(remoteid int8)); objectid ---------- 5 (1 row) If you can, please give the beta (Postgres and dblink) a try (not for production yet, of course, but for feedback). Thanks, Joe
Joe, I am running PostgreSQL 7.2.1 on Linux 7.2 I installed Postgres7.3b and a version of DBLink that comes with it in a tar.gz archive. I tried to use the optimized code you sent me and got the same error. Here is what I got: DELETE FROM activity WHERE NOT EXISTS (SELECT remoteid FROM (SELECT remoteid FROM dblink('dbname=replica','SELECT objectid FROM activity WHERE objectid = ' || activity.objectid) AS dblink_rec(remoteid int8)) AS t1 ) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Interstingly, this "haults" Postgres7.3b postmaster and I have to restart it again. When I run a modified version of this delete statement on Postgres7.2.1 I still get the same message, but am able to reconnect to the database without restarting the postmaster. -----Original Message----- From: Joe Conway [mailto:mail@joeconway.com] Sent: Friday, September 20, 2002 5:37 PM To: Oleg Lebedev Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] DBLink: interesting issue Oleg Lebedev wrote: > I am using dbLink library to synchronize tables in two databases. > Below > are two identical in terms of their results queries, one is using > NOT EXISTS and another NOT IN. The later one is 100 time slower then the > former one according to EXPLAIN. However, the former one throws an > interesting exception every time I run it. Interestingly, it works fine > if the local table is empty. Any ideas why it happens? I need more information. What version of PostgreSQL, and is dblink the version distributed with Postgres, or is it a later copy from CVS? I have a machine running 7.2.2 and a CVS (Version 0.4, ~mid April, 2002) copy of dblink, and cannot reproduce the failure: testmst=# select version(); version ------------------------------------------------------------- PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.96 (1 row) testslv=# SELECT objectid FROM activity WHERE NOT EXISTS (SELECT remoteid FROM (SELECT CAST (dblink_tok(t1.dblink_p, 0) AS int8) AS remoteid FROM (SELECT dblink('dbname=testmst','SELECT objectid FROM activity') AS dblink_p) t1) a1 WHERE remoteid=activity.objectid); objectid ---------- (0 rows) Early in 7.3 development I updated dblink in a 7.2 backward compatible way. Unfortunately, it wasn't practical to keep it backward compatible to 7.2.x, so the current CVS HEAD version will no longer work with 7.2.x. If you would like to try dblink Version 0.4, let me know and I'll send you a patch. As a side note, 7.3 (currently in beta) has much improved semantics for use of dblink, made possible by the new table function capability. E.g.: regression_slave=# select version(); version ------------------------------------------------------------------------ ---------------------------------------- PostgreSQL 7.3b1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.1 20020205 (Red Hat Linux Rawhide 3.1-0.21) (1 row) regression_slave=# SELECT objectid FROM activity WHERE NOT EXISTS (SELECT remoteid FROM dblink('dbname=regression','SELECT objectid FROM activity') AS t1(remoteid int8) WHERE remoteid=activity.objectid); objectid ---------- 5 (1 row) Or even better performance-wise: regression_slave=# SELECT objectid FROM activity WHERE NOT EXISTS (SELECT remoteid FROM dblink('dbname=regression','SELECT objectid FROM activity WHERE objectid = ' || activity.objectid) AS t1(remoteid int8)); objectid ---------- 5 (1 row) If you can, please give the beta (Postgres and dblink) a try (not for production yet, of course, but for feedback). Thanks, Joe
Oleg Lebedev wrote: > Joe, > I am running PostgreSQL 7.2.1 on Linux 7.2 > I installed Postgres7.3b and a version of DBLink that comes with it in a > tar.gz archive. > I tried to use the optimized code you sent me and got the same error. Interesting. I'm using Red Hat 7.3 with Postgres7.3b and don't get that result. Can I get more details, i.e. maybe a pg_dump of the two databases in question along with a script to reproduce the problem? If so, please tar it all up and send to me off list. I'd also be interested in your configure command line and the core file left as residue from the crash, if there is one. Do you think you can compile with debug symbols and assertion checking (i.e. configure --enable-debug --enable-cassert), if you haven't already, and attach a debugger to see what's going on? Thanks, Joe
Oleg Lebedev wrote: > Ok, here are all the files. > I'm now seeing the problem you reported. It is a bug in the new table function code. Basically, you are trying to do this: DELETE FROM tablea WHERE NOT EXISTS ( SELECT remoteid FROM ( SELECT remoteid FROM dblink('hostaddr=1.23.45.6 port=5432 dbname=webspec user=user password=pass', 'SELECT objectid FROM tablea WHERE objectid = ' || tablea.objectid) AS dblink_rec(remoteid int8) ) AS t1 ); But if you try: SELECT remoteid FROM ( SELECT remoteid FROM dblink('hostaddr=1.23.45.6 port=5432 dbname=webspec user=user password=pass', 'SELECT objectid FROM tablea WHERE objectid = ' || tablea.objectid) AS dblink_rec(remoteid int8) ) AS t1; you'll get: ERROR: FROM function expression may not refer to other relations of same query level which is what you're supposed to get. Apparently the error is not getting generated as it should when this query is run as a subquery. What you should actually be doing is: DELETE FROM tablea WHERE NOT EXISTS ( SELECT remoteid FROM dblink('hostaddr=1.23.45.6 port=5432 dbname=webspec user=user password=pass', 'SELECT objectid FROM tablea WHERE objectid = ' || tablea.objectid) AS dblink_rec(remoteid int8) ); DELETE 0 This should make your function work on 7.3beta, but I still need to track down a fix for the bug. Thanks for the report! Joe