Thread: How different is AWS-RDS postgres?
I have what purports to be Postgres 12 ( PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit ) running on RDS. I testing a new release of my service with an admittedly large data set (which may be my swamping AWS bandwidth). But my app logs tell me a table was created and loaded (and later, read)
From my tomcat log
However, no such table exists, though later processing renames it by appending "_done" to the name (being careful to remain under 64 char)
My middleware does receive an exception notice
which confirms(?) that the table /was/ there and read from 10 times prior since I copy from this temp, bulk loaded table to the actual target in 16 chunks by diddling the first hex digit from 0 through f. Here zero through nine apparently didn't have a problem. These 16 inserts are in a single transaction, separate from the bulk copy. (There are a dozen more of these files processed and disappeared.)
My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs? Is this frailty inherent in AWS/RDS infrastructure?
Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.
From my tomcat log
00:09:58.567 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG edu.utah.camplab.jx.PayloadFromMux - STAGING TABLE CREATED: bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" 00:09:58.585 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: started COPY work at 1621987798585 00:09:58.586 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO edu.utah.camplab.jx.PayloadWriterThread - bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594": Begin bulk copy segment 00:10:01.597 [https-jsse-nio-10.0.2.28-15002-exec-11] ERROR edu.utah.camplab.servlet.PayloadSaveServlet - received payload 00:10:01.598 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG org.jooq.impl.DefaultConnectionProvider - setting auto commit : false 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG edu.utah.camplab.jx.PayloadFromMux - 074d449b-c3ba-499f-83e3-f48427fe0156: start transaction at 1621987801599 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG org.jooq.impl.DefaultConnectionProvider - set savepoint 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO edu.utah.camplab.jx.PayloadWriterThread - bulk transfer of 2528447 took 8.925s 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] DEBUG edu.utah.camplab.jx.PayloadWriterThread - staged in 8925 ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: Total segment save took 9486 ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.AbstractPayload - ff3ba73c-d927-431c-bd29-9687a47f1594: closing process ff3ba73c-d927-431c-bd29-9687a47f1594 00:10:07.608 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG org.jooq.impl.DefaultConnectionProvider - release savepoint 00:10:07.609 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: end transaction at 1621987807609Which claims to have written 2,528,447 records in roughly 9 seconds into the newly created table "bulk.rjs.GEV15_15_FF3ba73c_d927_431c_bd29_9687147f1594". Nice.
However, no such table exists, though later processing renames it by appending "_done" to the name (being careful to remain under 64 char)
My middleware does receive an exception notice
00:10:55.101 [https-jsse-nio-10.0.2.28-15002-exec-3] ERROR edu.utah.camplab.jx.AbstractPayload - run ff3ba73c-d927-431c-bd29-9687a47f1594: Exception from db write: SQL [insert into segment select * from bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" as s where s.probandset_id >= 'a0000000-0000-0000-0000-000000000000' and s.probandset_id < 'b0000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.: {}
which confirms(?) that the table /was/ there and read from 10 times prior since I copy from this temp, bulk loaded table to the actual target in 16 chunks by diddling the first hex digit from 0 through f. Here zero through nine apparently didn't have a problem. These 16 inserts are in a single transaction, separate from the bulk copy. (There are a dozen more of these files processed and disappeared.)
My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs? Is this frailty inherent in AWS/RDS infrastructure?
Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.
On 5/26/21 12:50 PM, Rob Sargent wrote:
The big differences I notice are:
1. "postgres" is not a superuser,
2. viewing logs is a hassle.
Otherwise, they're really similar. We've pumped about 6TB of data into an instance, and it's been rock solid. JBoss is quite happy, and there haven't been any problems.
I have what purports to be Postgres 12 ( PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit ) running on RDS. I testing a new release of my service with an admittedly large data set (which may be my swamping AWS bandwidth). But my app logs tell me a table was created and loaded (and later, read)
From my tomcat log00:09:58.567 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG edu.utah.camplab.jx.PayloadFromMux - STAGING TABLE CREATED: bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" 00:09:58.585 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: started COPY work at 1621987798585 00:09:58.586 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO edu.utah.camplab.jx.PayloadWriterThread - bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594": Begin bulk copy segment 00:10:01.597 [https-jsse-nio-10.0.2.28-15002-exec-11] ERROR edu.utah.camplab.servlet.PayloadSaveServlet - received payload 00:10:01.598 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG org.jooq.impl.DefaultConnectionProvider - setting auto commit : false 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG edu.utah.camplab.jx.PayloadFromMux - 074d449b-c3ba-499f-83e3-f48427fe0156: start transaction at 1621987801599 00:10:01.599 [https-jsse-nio-10.0.2.28-15002-exec-11] DEBUG org.jooq.impl.DefaultConnectionProvider - set savepoint 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] INFO edu.utah.camplab.jx.PayloadWriterThread - bulk transfer of 2528447 took 8.925s 00:10:07.511 [ff3ba73c-d927-431c-bd29-9687a47f1594] DEBUG edu.utah.camplab.jx.PayloadWriterThread - staged in 8925 ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: Total segment save took 9486 ms 00:10:07.567 [https-jsse-nio-10.0.2.28-15002-exec-3] INFO edu.utah.camplab.jx.AbstractPayload - ff3ba73c-d927-431c-bd29-9687a47f1594: closing process ff3ba73c-d927-431c-bd29-9687a47f1594 00:10:07.608 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG org.jooq.impl.DefaultConnectionProvider - release savepoint 00:10:07.609 [https-jsse-nio-10.0.2.28-15002-exec-3] DEBUG edu.utah.camplab.jx.PayloadFromMux - ff3ba73c-d927-431c-bd29-9687a47f1594: end transaction at 1621987807609Which claims to have written 2,528,447 records in roughly 9 seconds into the newly created table "bulk.rjs.GEV15_15_FF3ba73c_d927_431c_bd29_9687147f1594". Nice.
However, no such table exists, though later processing renames it by appending "_done" to the name (being careful to remain under 64 char)
My middleware does receive an exception notice00:10:55.101 [https-jsse-nio-10.0.2.28-15002-exec-3] ERROR edu.utah.camplab.jx.AbstractPayload - run ff3ba73c-d927-431c-bd29-9687a47f1594: Exception from db write: SQL [insert into segment select * from bulk."rjs_GEV15_15_ff3ba73c_d927_431c_bd29_9687a47f1594" as s where s.probandset_id >= 'a0000000-0000-0000-0000-000000000000' and s.probandset_id < 'b0000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.: {}
which confirms(?) that the table /was/ there and read from 10 times prior since I copy from this temp, bulk loaded table to the actual target in 16 chunks by diddling the first hex digit from 0 through f. Here zero through nine apparently didn't have a problem. These 16 inserts are in a single transaction, separate from the bulk copy. (There are a dozen more of these files processed and disappeared.)
My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs? Is this frailty inherent in AWS/RDS infrastructure?
Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.
The big differences I notice are:
1. "postgres" is not a superuser,
2. viewing logs is a hassle.
Otherwise, they're really similar. We've pumped about 6TB of data into an instance, and it's been rock solid. JBoss is quite happy, and there haven't been any problems.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
Good to hear.My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs? Is this frailty inherent in AWS/RDS infrastructure?
Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.
The big differences I notice are:
1. "postgres" is not a superuser,
2. viewing logs is a hassle.
Otherwise, they're really similar. We've pumped about 6TB of data into an instance, and it's been rock solid. JBoss is quite happy, and there haven't been any problems.--
Angular momentum makes the world go 'round.
I pushed this same data set into our previous server (Version 10) so I was surprised it blew up this time, though in that case these separate files were more spread out in arrival time. Nothing in the (available) logs other that lots of auto-vacuuming and the suggestion to increase wal size (which I thought a little surprising). I really would like to know just what went south but that isn't obvious yet - OOM?, network?, disk full? but I can't find any smoking gun on the AWS/RDS pages.
rjs
On 5/26/21 1:24 PM, Rob Sargent wrote:
The difficulty in viewing logs needs to be emphasized.
Good to hear.My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs? Is this frailty inherent in AWS/RDS infrastructure?
Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.
The big differences I notice are:
1. "postgres" is not a superuser,
2. viewing logs is a hassle.
Otherwise, they're really similar. We've pumped about 6TB of data into an instance, and it's been rock solid. JBoss is quite happy, and there haven't been any problems.--
Angular momentum makes the world go 'round.
I pushed this same data set into our previous server (Version 10) so I was surprised it blew up this time, though in that case these separate files were more spread out in arrival time. Nothing in the (available) logs other that lots of auto-vacuuming and the suggestion to increase wal size (which I thought a little surprising). I really would like to know just what went south but that isn't obvious yet - OOM?, network?, disk full? but I can't find any smoking gun on the AWS/RDS pages.
The difficulty in viewing logs needs to be emphasized.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
There is an option to send the logs to cloudwatch which makes it less awful to look at them.
On Wed, May 26, 2021, 11:52 AM Ron <ronljohnsonjr@gmail.com> wrote:
On 5/26/21 1:24 PM, Rob Sargent wrote:Good to hear.My question is:
Should I be surprised that the initial bulk loaded table is nowhere to be seen, given the thumbs up from the logs? Is this frailty inherent in AWS/RDS infrastructure?
Since this is an academic exorcise, I have minimal AWS support, which has yet to chime in on this matter. My access to the logs is um, er, constrained.
The big differences I notice are:
1. "postgres" is not a superuser,
2. viewing logs is a hassle.
Otherwise, they're really similar. We've pumped about 6TB of data into an instance, and it's been rock solid. JBoss is quite happy, and there haven't been any problems.--
Angular momentum makes the world go 'round.
I pushed this same data set into our previous server (Version 10) so I was surprised it blew up this time, though in that case these separate files were more spread out in arrival time. Nothing in the (available) logs other that lots of auto-vacuuming and the suggestion to increase wal size (which I thought a little surprising). I really would like to know just what went south but that isn't obvious yet - OOM?, network?, disk full? but I can't find any smoking gun on the AWS/RDS pages.
The difficulty in viewing logs needs to be emphasized.--
Angular momentum makes the world go 'round.
On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote:
I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!? I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets.There is an option to send the logs to cloudwatch which makes it less awful to look at them.
I don’t have query logging on, but I do see reports from my psql session fat-fingering.
As to the logs UI, the search is pretty feeble; I don’t understand why there are four channels of logs; the graphs are wearing the same rose-coloured as the logs.
And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.)
My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear. I am making separate calls to JDBC getConnection () for each doPost.
> On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> wrote: > > > >> On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote: >> >> >> There is an option to send the logs to cloudwatch which makes it less awful to look at them. > I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!? I haveyet to spot anything which corresponds to the “I/O failure” which the middle ware gets. > > I don’t have query logging on, but I do see reports from my psql session fat-fingering. > > As to the logs UI, the search is pretty feeble; I don’t understand why there are four channels of logs; the graphs arewearing the same rose-coloured as the logs. > And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.) > > My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tablescould disappear. I am making separate calls to JDBC getConnection () for each doPost. We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to downloadcloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email.It’s hacky but not too terrible. I hope you find it useful. Cheers Philip
Attachment
> We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to downloadcloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email.It’s hacky but not too terrible. I hope you find it useful. > > Cheers > Philip Thank you. > > > <get_aws_logs.py>
On 5/27/21 7:45 AM, Philip Semanchuk wrote:
I may have found another difference: JDBC connections are not logged?!On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> wrote:On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote: There is an option to send the logs to cloudwatch which makes it less awful to look at them.I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!? I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets. I don’t have query logging on, but I do see reports from my psql session fat-fingering. As to the logs UI, the search is pretty feeble; I don’t understand why there are four channels of logs; the graphs are wearing the same rose-coloured as the logs. And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.) My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear. I am making separate calls to JDBC getConnection () for each doPost.We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful. Cheers Philip
I just reproduce my report, and the CloudWatch view of the logs shows some psql interaction from before and after the test, but no mention of losing 7.5M records.
The same JDBC connection that is resulting in lost data? Sounds to me like you aren't connecting to the DB you think you are connecting to.
On Thu, May 27, 2021 at 2:01 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 7:45 AM, Philip Semanchuk wrote:I may have found another difference: JDBC connections are not logged?!On May 26, 2021, at 10:04 PM, Rob Sargent <robjsargent@gmail.com> wrote:On May 26, 2021, at 4:37 PM, Ian Harding <harding.ian@gmail.com> wrote: There is an option to send the logs to cloudwatch which makes it less awful to look at them.I have that but precious little of interest there. Lots of autovac, a smattering of hints to increase wal size!? I have yet to spot anything which corresponds to the “I/O failure” which the middle ware gets. I don’t have query logging on, but I do see reports from my psql session fat-fingering. As to the logs UI, the search is pretty feeble; I don’t understand why there are four channels of logs; the graphs are wearing the same rose-coloured as the logs. And 24 hours without a peep from AWS support. (I don’t call mailing me what I sent them “contact”.) My guess right now is that the entire tomcat connection pool is in a single transaction? That’s the only way the tables could disappear. I am making separate calls to JDBC getConnection () for each doPost.We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful. Cheers Philip
I just reproduce my report, and the CloudWatch view of the logs shows some psql interaction from before and after the test, but no mention of losing 7.5M records.
On 5/27/21 3:08 PM, Sam Gendler wrote:
I almost wish that were true.The same JDBC connection that is resulting in lost data? Sounds to me like you aren't connecting to the DB you think you are connecting to.
However, looking at AWS "Performance Insights" is see the sql statements generate by my app begin executed on the server. Not coincidentally this is from the "Top SQL (10)/Load by waits" view. Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself. I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.
I just notice the "Top database" tab on that screen: there is only one and it's the correct one.
I've reproduced the behaviour. I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear. Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).
The loss of the staging table is most spectacular. It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read. Then the "top sql" happens:
insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data. In the stack trace on my tomcat server I get
Caused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-00000000\So this would have been the fourth such insert statement read from that staging table.
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
That table is not deleted by my code. It is renamed after the last insert into segment, by appending "_done" to the name. But that table, by either name, is nowhere to be seen on the server.
For those scoring at home, the trouble in the tomcat log start with
0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux - bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6": Begin transfer from bulk to segment
27-May-2021 20:02:50.338 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1622145770338 sessioncount 0
27-May-2021 20:02:50.338 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:50.476 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1622145770476 sessioncount 1
27-May-2021 20:02:50.476 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:51.847 WARNING [Tomcat JDBC Pool Cleaner[1731185718:1622133381802]] org.apache.tomcat.jdbc.pool.ConnectionPool.abandon Connection has been abandoned PooledConnection[org.postgresql.jdbc.PgCo\
nnection@1622ead9]:java.lang.Exception
at org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1163)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:816)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:660)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:198)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:132)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:90)
at edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:123)
at edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:105)
at edu.utah.camplab.servlet.PayloadSaveServlet.doPost(PayloadSaveServlet.java:50)
....
20:02:51.854 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - rollback to savepoint
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - rollback
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - setting auto commit : true
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] ERROR edu.utah.camplab.jx.AbstractPayload - run c9e224ca-85d2-40b4-ad46-327cfb9f0ac6: Exception from db write: SQL [insert into segment select * from bulk.\
"rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-000000000000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred \
while sending to the backend.: {}
org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-000000000000' an\
d s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
at org.jooq_3.14.7.POSTGRES.debug(Unknown Source)
at org.jooq.impl.Tools.translate(Tools.java:2880)
at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:757)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:389)
at edu.utah.camplab.jx.PayloadFromMux.insertAllSegments(PayloadFromMux.java:177)
at edu.utah.camplab.jx.PayloadFromMux.lambda$completeSegmentSave$3(PayloadFromMux.java:165)
at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)
at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:612)
at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:543)
at org.jooq.impl.Tools$35$1.block(Tools.java:5203)
at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
at org.jooq.impl.Tools$35.get(Tools.java:5200)
at org.jooq.impl.DefaultDSLContext.transactionResult0(DefaultDSLContext.java:595)
at org.jooq.impl.DefaultDSLContext.transactionResult(DefaultDSLContext.java:512)
at org.jooq.impl.DefaultDSLContext.transaction(DefaultDSLContext.java:612)
at edu.utah.camplab.jx.PayloadFromMux.completeSegmentSave(PayloadFromMux.java:164)
at edu.utah.camplab.jx.PayloadFromMux.writedb(PayloadFromMux.java:45)
at edu.utah.camplab.jx.AbstractPayload.lambda$write$0(AbstractPayload.java:77)
at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)
...
Suppressed: org.jooq.exception.DataAccessException: Cannot rollback transaction
at org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:142)
at org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:223)
at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
... 45 common frames omitted
Caused by: java.sql.SQLException: Connection has already been closed.
at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
at com.sun.proxy.$Proxy4.rollback(Unknown Source)
at org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:139)
... 47 common frames omitted
Suppressed: org.jooq.exception.DataAccessException: Cannot set autoCommit
at org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:222)
at org.jooq.impl.DefaultTransactionProvider.brace(DefaultTransactionProvider.java:246)
at org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:229)
at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
... 33 common frames omitted
Caused by: java.sql.SQLException: Connection has already been closed.
at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
at com.sun.proxy.$Proxy4.setAutoCommit(Unknown Source)
at org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:219)
... 36 common frames omitted
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)
at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
at com.sun.proxy.$Proxy14.execute(Unknown Source)
at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:214)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:458)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:375)
... 50 common frames omitted
Caused by: javax.net.ssl.SSLException: Socket closed
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:369)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:312)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1680)
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1054)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)
On 5/27/21 4:58 PM, Rob Sargent wrote:
Is there any way to replicate this in psql?
On 5/27/21 3:08 PM, Sam Gendler wrote:I almost wish that were true.The same JDBC connection that is resulting in lost data? Sounds to me like you aren't connecting to the DB you think you are connecting to.
However, looking at AWS "Performance Insights" is see the sql statements generate by my app begin executed on the server. Not coincidentally this is from the "Top SQL (10)/Load by waits" view. Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself. I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.
I just notice the "Top database" tab on that screen: there is only one and it's the correct one.
I've reproduced the behaviour. I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear. Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).
The loss of the staging table is most spectacular. It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read. Then the "top sql" happens:insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data. In the stack trace on my tomcat server I getCaused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-00000000\So this would have been the fourth such insert statement read from that staging table.
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
That table is not deleted by my code. It is renamed after the last insert into segment, by appending "_done" to the name. But that table, by either name, is nowhere to be seen on the server.
For those scoring at home, the trouble in the tomcat log start with0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux -
Is there any way to replicate this in psql?
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On 5/27/21 4:10 PM, Ron wrote:
On 5/27/21 4:58 PM, Rob Sargent wrote:I have the json file which feeds the staging table and I have code which is designed to load such files. I suppose an sql file with those 16 insert statements would approximate the the apps call. Let me give that a shot.On 5/27/21 3:08 PM, Sam Gendler wrote:I almost wish that were true.The same JDBC connection that is resulting in lost data? Sounds to me like you aren't connecting to the DB you think you are connecting to.
However, looking at AWS "Performance Insights" is see the sql statements generate by my app begin executed on the server. Not coincidentally this is from the "Top SQL (10)/Load by waits" view. Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself. I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.
I just notice the "Top database" tab on that screen: there is only one and it's the correct one.
I've reproduced the behaviour. I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear. Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).
The loss of the staging table is most spectacular. It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read. Then the "top sql" happens:insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data. In the stack trace on my tomcat server I getCaused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-00000000\So this would have been the fourth such insert statement read from that staging table.
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
That table is not deleted by my code. It is renamed after the last insert into segment, by appending "_done" to the name. But that table, by either name, is nowhere to be seen on the server.
For those scoring at home, the trouble in the tomcat log start with0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux -
Is there any way to replicate this in psql?
--
Angular momentum makes the world go 'round.
That sure looks like something is causing your connection to have a transaction rollback. I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.
One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester. Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested.
One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester. Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested.
On Thu, May 27, 2021 at 2:58 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 3:08 PM, Sam Gendler wrote:I almost wish that were true.The same JDBC connection that is resulting in lost data? Sounds to me like you aren't connecting to the DB you think you are connecting to.
However, looking at AWS "Performance Insights" is see the sql statements generate by my app begin executed on the server. Not coincidentally this is from the "Top SQL (10)/Load by waits" view. Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself. I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.
I just notice the "Top database" tab on that screen: there is only one and it's the correct one.
I've reproduced the behaviour. I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear. Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).
The loss of the staging table is most spectacular. It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read. Then the "top sql" happens:insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data. In the stack trace on my tomcat server I getCaused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-00000000\So this would have been the fourth such insert statement read from that staging table.
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
That table is not deleted by my code. It is renamed after the last insert into segment, by appending "_done" to the name. But that table, by either name, is nowhere to be seen on the server.
For those scoring at home, the trouble in the tomcat log start with0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux - bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6": Begin transfer from bulk to segment
27-May-2021 20:02:50.338 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1622145770338 sessioncount 0
27-May-2021 20:02:50.338 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:50.476 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1622145770476 sessioncount 1
27-May-2021 20:02:50.476 FINE [Catalina-utility-2] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 0 expired sessions: 0
27-May-2021 20:02:51.847 WARNING [Tomcat JDBC Pool Cleaner[1731185718:1622133381802]] org.apache.tomcat.jdbc.pool.ConnectionPool.abandon Connection has been abandoned PooledConnection[org.postgresql.jdbc.PgCo\
nnection@1622ead9]:java.lang.Exception
at org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1163)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:816)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:660)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:198)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:132)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:90)
at edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:123)
at edu.utah.camplab.servlet.AbstractSGSServlet.getDbConn(AbstractSGSServlet.java:105)
at edu.utah.camplab.servlet.PayloadSaveServlet.doPost(PayloadSaveServlet.java:50)
....
20:02:51.854 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - rollback to savepoint
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - rollback
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] DEBUG org.jooq.impl.DefaultConnectionProvider - setting auto commit : true
20:02:51.855 [https-jsse-nio-10.0.2.28-15002-exec-7] ERROR edu.utah.camplab.jx.AbstractPayload - run c9e224ca-85d2-40b4-ad46-327cfb9f0ac6: Exception from db write: SQL [insert into segment select * from bulk.\
"rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-000000000000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred \
while sending to the backend.: {}
org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-000000000000' an\
d s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
at org.jooq_3.14.7.POSTGRES.debug(Unknown Source)
at org.jooq.impl.Tools.translate(Tools.java:2880)
at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:757)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:389)
at edu.utah.camplab.jx.PayloadFromMux.insertAllSegments(PayloadFromMux.java:177)
at edu.utah.camplab.jx.PayloadFromMux.lambda$completeSegmentSave$3(PayloadFromMux.java:165)
at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)
at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:612)
at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:543)
at org.jooq.impl.Tools$35$1.block(Tools.java:5203)
at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
at org.jooq.impl.Tools$35.get(Tools.java:5200)
at org.jooq.impl.DefaultDSLContext.transactionResult0(DefaultDSLContext.java:595)
at org.jooq.impl.DefaultDSLContext.transactionResult(DefaultDSLContext.java:512)
at org.jooq.impl.DefaultDSLContext.transaction(DefaultDSLContext.java:612)
at edu.utah.camplab.jx.PayloadFromMux.completeSegmentSave(PayloadFromMux.java:164)
at edu.utah.camplab.jx.PayloadFromMux.writedb(PayloadFromMux.java:45)
at edu.utah.camplab.jx.AbstractPayload.lambda$write$0(AbstractPayload.java:77)
at org.jooq.impl.DefaultDSLContext$3.run(DefaultDSLContext.java:615)
...
Suppressed: org.jooq.exception.DataAccessException: Cannot rollback transaction
at org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:142)
at org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:223)
at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
... 45 common frames omitted
Caused by: java.sql.SQLException: Connection has already been closed.
at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
at com.sun.proxy.$Proxy4.rollback(Unknown Source)
at org.jooq.impl.DefaultConnectionProvider.rollback(DefaultConnectionProvider.java:139)
... 47 common frames omitted
Suppressed: org.jooq.exception.DataAccessException: Cannot set autoCommit
at org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:222)
at org.jooq.impl.DefaultTransactionProvider.brace(DefaultTransactionProvider.java:246)
at org.jooq.impl.DefaultTransactionProvider.rollback(DefaultTransactionProvider.java:229)
at org.jooq.impl.DefaultDSLContext.lambda$transactionResult0$0(DefaultDSLContext.java:567)
... 33 common frames omitted
Caused by: java.sql.SQLException: Connection has already been closed.
at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:79)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108)
at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:81)
at com.sun.proxy.$Proxy4.setAutoCommit(Unknown Source)
at org.jooq.impl.DefaultConnectionProvider.setAutoCommit(DefaultConnectionProvider.java:219)
... 36 common frames omitted
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)
at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
at com.sun.proxy.$Proxy14.execute(Unknown Source)
at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:214)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:458)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:375)
... 50 common frames omitted
Caused by: javax.net.ssl.SSLException: Socket closed
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:369)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:312)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1680)
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1054)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)
The docs for the connection pool you are using: https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html and a screenshot for the relevant configuration params.
On Thu, May 27, 2021 at 3:24 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 4:10 PM, Ron wrote:On 5/27/21 4:58 PM, Rob Sargent wrote:I have the json file which feeds the staging table and I have code which is designed to load such files. I suppose an sql file with those 16 insert statements would approximate the the apps call. Let me give that a shot.On 5/27/21 3:08 PM, Sam Gendler wrote:I almost wish that were true.The same JDBC connection that is resulting in lost data? Sounds to me like you aren't connecting to the DB you think you are connecting to.
However, looking at AWS "Performance Insights" is see the sql statements generate by my app begin executed on the server. Not coincidentally this is from the "Top SQL (10)/Load by waits" view. Now that view does NOT, in point of fact, name the database in which the sql is running, but the rest of my environment pretty much rules out silently switching tracks on myself. I have to read from the correct database, using a UUID, to get data to analyze, then save the results of the analysis back. I'm using my wonderful webapp to run this and I've successfully analyzed and stored result for small starting data sets.
I just notice the "Top database" tab on that screen: there is only one and it's the correct one.
I've reproduced the behaviour. I'm pretty convinced it a size thing, but which part of the system is being max'd out is not clear. Not CPU, but that's the only resource the "Performance Insight" mentions (suggesting this UI wasn't designed by a database person).
The loss of the staging table is most spectacular. It filled from a file which has 7.5M records. It's clear in the tomcat logs that is was created and written to, one record read. Then the "top sql" happens:insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= ? and s.probandset_id < ?the "bulk" table is the staging table. The params are filled in with a quasi-uuid which grab roughly 1/16th of the data. In the stack trace on my tomcat server I getCaused by: org.jooq.exception.DataAccessException: SQL [insert into segment select * from bulk."rjs_GEV15_01_c9e224ca_85d2_40b4_ad46_327cfb9f0ac6" as s where s.probandset_id >= '30000000-0000-0000-0000-00000000\So this would have been the fourth such insert statement read from that staging table.
0000' and s.probandset_id < '40000000-0000-0000-0000-000000000000' ]; An I/O error occurred while sending to the backend.
That table is not deleted by my code. It is renamed after the last insert into segment, by appending "_done" to the name. But that table, by either name, is nowhere to be seen on the server.
For those scoring at home, the trouble in the tomcat log start with0:02:11.452 [https-jsse-nio-10.0.2.28-15002-exec-7] INFO edu.utah.camplab.jx.PayloadFromMux -
Is there any way to replicate this in psql?--
Angular momentum makes the world go 'round.
Attachment
On 5/27/21 4:25 PM, Sam Gendler wrote:
That sure looks like something is causing your connection to have a transaction rollback. I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.
One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester. Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested.
Well I /was/ doing
contextResource.setProperty("validationQuery", "SELECT 1");but I see that I lost that when I switched to using a properties file. Thanks for point me there.
The loop of 16 insert statement is in a single transaction, single connection so I'm not sure who's choking first. Is the connection idle after the I/O error or is the I/O error from a dead connection? (Small disclaimer: there is no catalina involved here, just an embedded tomcat instance.)
Unless something has changed in recent years, the core servlet engine of tomcat IS catalina. Embedded tomcat is embedded catalina. It looks like the I/O error is a result of attempting to send a query on an already dead connection. I'd look for something that is limiting how long a connection can be open - either an explicit or default value for a timeout in the connection pool or on the server side. If you don't get the same behaviour when running against a database locally, I'd definitely look at the default settings in RDS. It may be automatically closing connections if they are idle for even a brief period.
On Thu, May 27, 2021 at 3:35 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 5/27/21 4:25 PM, Sam Gendler wrote:That sure looks like something is causing your connection to have a transaction rollback. I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.
One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester. Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested.
Well I /was/ doingcontextResource.setProperty("validationQuery", "SELECT 1");but I see that I lost that when I switched to using a properties file. Thanks for point me there.
The loop of 16 insert statement is in a single transaction, single connection so I'm not sure who's choking first. Is the connection idle after the I/O error or is the I/O error from a dead connection? (Small disclaimer: there is no catalina involved here, just an embedded tomcat instance.)
Not important who does what with whom re Catalina/Tomcat ;)
I will indeed re-examine timeouts and such as inserting 100sK record is not instantaneous by any stretch. Tomcat is the new kid on my block as prior to this release I managed a naked port with a Selector and that had no trouble with this same test data. My biggest concern is the lack of server side indications. I also need to confirm the table create/bulk-copy and update to target table (segment) are in separate transactions. Doesn’t look like it, but they’re supposed to be. I want the data written to the db in bulk, then come back round and load to the final table in chunks.
Unless something has changed in recent years, the core servlet engine of tomcat IS catalina. Embedded tomcat is embedded catalina. It looks like the I/O error is a result of attempting to send a query on an already dead connection. I'd look for something that is limiting how long a connection can be open - either an explicit or default value for a timeout in the connection pool or on the server side. If you don't get the same behaviour when running against a database locally, I'd definitely look at the default settings in RDS. It may be automatically closing connections if they are idle for even a brief period.On Thu, May 27, 2021 at 3:35 PM Rob Sargent <robjsargent@gmail.com> wrote:On 5/27/21 4:25 PM, Sam Gendler wrote:That sure looks like something is causing your connection to have a transaction rollback. I haven't worked in Java in far too long, but it seems like your connection pool is under the impression your connection was abandoned so it reclaims it and rollback the transaction, which would explain why you aren't seeing the table when all is said and done - all of the work is being undone at the end.
One possibility, based on the catalina log you provided - if you have either end of the connection set up to automatically close idle connections after a period of time, then you might receive a closed connection from the pool, which will just error out when you attempt to run a query. In which case, you need to set up your connection pool to test a connection before it returns it to the requester. Usually something as simple as "select 2" will be sufficient to determine if the database connection is open. I can just about guarantee that your connection pool has a parameter which allows you to specify a query to execute when a connection is requested.
Well I /was/ doingcontextResource.setProperty("validationQuery", "SELECT 1");but I see that I lost that when I switched to using a properties file. Thanks for point me there.
The loop of 16 insert statement is in a single transaction, single connection so I'm not sure who's choking first. Is the connection idle after the I/O error or is the I/O error from a dead connection? (Small disclaimer: there is no catalina involved here, just an embedded tomcat instance.)
Found it. Though there are indeed separate transactions for the bulk copy v. target update, I had missed that there is an over-arching transaction in play and of course when things go south, /all/ things go south. I get away with this, for now by bumping the tomcat idle time to an hour, which is serious overkill.