Re: Query hitting empty tables taking 48 minutes - Mailing list pgsql-general

From Robert Creager
Subject Re: Query hitting empty tables taking 48 minutes
Date
Msg-id 0F6EE4A2-A97E-48B4-BF80-BC801238F102@logicalchaos.org
Whole thread Raw
In response to Query hitting empty tables taking 48 minutes  (Robert Creager <robert@logicalchaos.org>)
List pgsql-general

Re-sending from a subscribed address (sigh).

On Jun 7, 2018, at 4:18 PM, Robert <domain@logicalchaos.org> wrote:


On Jun 7, 2018, at 2:15 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Robert Creager wrote:
I have a system running FreeBSD 11.1-STABLE, PostgreSQL 9.6.8,Java OpenJDK 1.8.0_131,
jdbc 9.3-1104-jdbc41 which is exhibiting very bizarre behavior.
A query is executing against a couple of tables that have 1 and 0 records in them.
ds3.job_entry has 0 rows, ds3.blob has 1 row.
If I execute the query manually via command line, it executes fine.  There are no other active queries, no locks.
The system is idle, in our Dev Test group, so I can leave it this way for a bit of time.
The general software setup is in production and I’ve seen nothing like this before.
Even a system with 300M ds3.blob entries executes this query fine.

Jun  7 17:24:21 blackpearl postgres[10670]: [7737-1] db=tapesystem,user=Administrator,app=[unknown],client=127.0.0.1 LOG:  duration: 2903612.206 ms  execute fetch from S_2037436/C_2037437: SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id AND (job_id = $1))

tapesystem=# explain analyze SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id AND (job_id = 'b51357cd-e07a-4c87-a50b-999c347a5c71'));                       
                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------                                                                
Nested Loop  (cost=0.84..6.89 rows=1 width=77) (actual time=0.044..0.044 rows=0 loops=1)
  ->  Index Scan using job_entry_job_id_idx on job_entry  (cost=0.42..2.44 rows=1 width=16) (actual time=0.042..0.042 rows=0 loops=1)
        Index Cond: (job_id = 'b51357cd-e07a-4c87-a50b-999c347a5c71'::uuid)
  ->  Index Scan using blob_pkey on blob  (cost=0.42..4.44 rows=1 width=77) (never executed)
        Index Cond: (id = job_entry.blob_id)                                  
Planning time: 0.388 ms
Execution time: 0.118 ms                                                                                                                                                                


pid client_port runtime query_start datname state query usename
10670 11211 0 years 0 mons 0 days 0 hours 43 mins 28.852273 secs 2018-06-07 17:24:22.026384 tapesystem active SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id AND (job_id = $1)) Administrator


From the system with 300M ds3.blob entries, which works fine (along with every other system in house):

QUERY PLAN
Nested Loop  (cost=0.57..738052.90 rows=164379 width=75) (actual time=1.001..1947.029 rows=50000 loops=1)
 ->  Seq Scan on job_entry  (cost=0.00..10039.50 rows=164379 width=16) (actual time=0.871..56.442 rows=50000 loops=1)
       Filter: (job_id = 'ef27d2fa-2727-424e-8f44-da9e33a5ca05'::uuid)
       Rows Removed by Filter: 60001
 ->  Index Scan using blob_pkey on blob  (cost=0.57..4.42 rows=1 width=75) (actual time=0.037..0.037 rows=1 loops=50000)
       Index Cond: (id = job_entry.blob_id)
Planning time: 6.330 ms
Execution time: 1951.564 ms

The JDBC query is using a cursor since you are using setFetchSize(10000).

I can see two reasons why the plan might be different:

1. It is a prepared statement that has been executed often enough
  for the generic plan to kick in, and the generic plan is bad.

Tom brought up that, see my response to him.  It is a bad plan, but on a table with no entries.

2. PostgreSQL chooses a fast startup plan because a cursor is used,
  and that plan performs much worse.

The parameterized plan Tom had me look at starts with sequence scans, which would be bad on the job_entry table (topping out at ~9M entries worst case), horrendous on the blob table (open ended size, testing with 300M entries on one of our systems).

To see if 1. causes the problem, you could set
 m_statement.setPrepareThreshold(0);
and see if that makes a difference.

If I can keep this machine and spin some code, I could try this.  And, if this is a JDBC/app problem, re-starting the app would ‘fix’ the problem I’m guessing.

To test if 2. has an influence, try setting cursor_tuple_fraction to 1.0
and see if it makes a difference.

You can capture the execution plan of the bad statement by using auto_explain,
that would certainly shed more light on the problem.

I’ll look at enabling that setting.  I’d be unable to do it easily without restarting postgres (via changing the postgresql.conf file) without spinning new code (possible, not ideal).

Best,
Robert

pgsql-general by date:

Previous
From: "Ahmed, Nawaz"
Date:
Subject: RE: Service pgpool
Next
From: "David G. Johnston"
Date:
Subject: Re: Query hitting empty tables taking 48 minutes