Home > mailing lists

[GENERAL] Weird performance difference - Mailing list pgsql-general

From	Israel Brewster
Subject	[GENERAL] Weird performance difference
Date	October 21, 2017 02:08:26
Msg-id	AAD8DF38-32CE-437F-B2D2-C7BB0D5B5BA3@ravnalaska.net Whole thread Raw
Responses	Re: [GENERAL] Weird performance difference
List	pgsql-general

Tree view

Summary: the following query takes around 12 seconds on my test machine. On my production machine, it's at half an hour and counting. What's going on?

Details:

As a first stab at getting some data I need, I've developed the following SQL query:

SELECT

legfrom,

count(*) as totaldeps,

count(*) filter (where depdly<='00:00:00'::interval) as d0,

count(*) filter (where depdly<='00:05:00'::interval) as d5,

count(*) filter (where depdly<='00:14:00'::interval) as d14,

count(*) filter (where depdly<='00:30:00'::interval) as d30,

count(*) filter (where legto=destination) as totalarriv,

count(*) filter (where arrdly<='00:00:00'::interval AND legto=destination) as a0,

count(*) filter (where arrdly<='00:05:00'::interval AND legto=destination) as a5,

count(*) filter (where arrdly<='00:14:00'::interval AND legto=destination) as a14,

count(*) filter (where arrdly<='00:30:00'::interval AND legto=destination) as a30,

count(*) filter (where legto!=destination) as div

FROM

(SELECT distinct on (legdetail.flightnum, legfrom, outtime)

legdetail.flightnum,

legfrom,

legto,

outtime,

intime,

depart_time,

arrival_time,

destination,

outtime::time-oag_schedules.depart_time as depdly,

intime::time-oag_Schedules.arrival_time as arrdly

FROM oag_batches

NATURAL JOIN oag_schedules

INNER JOIN legdetail

ON outtime::date BETWEEN oag_schedules.startdate AND coalesce(oag_schedules.enddate,'infinity')

AND outtime::date BETWEEN oag_batches.eff_from AND coalesce(oag_batches.eff_to,'infinity')

AND extract(isodow from outtime::date)=ANY(frequency)

AND oag_schedules.flightnum::text=legdetail.flightnum

AND oag_schedules.origin=legdetail.legfrom

WHERE

outtime::date>='2017-01-01'

AND legdetail.flightnum~E'^[0-9]*$'

AND legdetail.flightnum::integer < 8000

AND csom='S'

ORDER BY legdetail.flightnum, legfrom, outtime, oag_batches.eff_from DESC) s1

GROUP BY legfrom

ORDER BY legfrom;

which, when run on my test server, has this explain analyze output: https://explain.depesz.com/s/4piv. Around 12 second runtime, which isn't too bad (in the grand scheme of things), although there is probably room for improvement.

oag_batches and oag_schedules are postgres_fdw foreign tables in the same database cluster (different DB same host/server instance). oag_schedules has about 80,000 rows, legdetail has about 60,000 rows, and oag_batches has 125 rows, so while we are dealing with a fair amount of data, it's nothing extreme - I have tables with billions of rows that it handles just fine.

With the query comfortably running on my test machine, I went ahead and tried it out on my production machine. And I waited. And waited. And waited some more. As of the writing of this line, pg_stat_activity shows that the query has been running for over 30 minutes. It also shows a status of "active", and nothing under the wait_event column. A second entry in pg_stat_activity shows the foreign data wrapper connection, with a state of "idle in transaction" and again nothing under the wait_event column. Top shows the two processes using around 62% and 45% CPU, with the rest of machine being 96% idle, so they are doing *something*, but not even enough to max out one of the cores, nor is the machine as a whole maxed out. The PostgreSQL log shows plenty of activity (and none of the users of the database are complaining), so apparently the database as a whole is working fine - it's just this one query that's not completing. Further, iotop shows no significant disk access - read is a solid 0.00 B/s, and write is only around 80 K/s, so it's not like it's waiting on disk i/o.

Both test server and production are running PostgreSQL 9.6, although my test server is at 9.6.1 while production is at 9.6.5 (huh, production got ahead of test. I'll have to fix that. Oh well), and I did a dump from production to test using -C just before starting work on the query, so the datasets are pretty closely matched. The production server is configured to take full(er) advantage of the hardware than my test database, so if anything I would expect it to be faster. Hardware differences obviously can have an effect on the query speed, but I wouldn't think a difference of 12 seconds to over half an hour (and counting). So what could explain this huge discrepancy?

One difference I can think of is that my production database is replicating (streaming replication), but since this is just a select, I can't think why that would make a difference... any thoughts?

-----------------------------------------------

Israel Brewster

Systems Analyst II

Ravn Alaska

5245 Airport Industrial Rd

Fairbanks, AK 99709

(907) 450-7293

-----------------------------------------------

Attachment

Israel Brewster.vcf

pgsql-general by date:

From: Joe Conway
Date: 21 October 2017, 00:16:23
Subject: Re: [GENERAL] How to find out extension directory

From: Justin Pryzby
Date: 21 October 2017, 02:31:40
Subject: Re: [GENERAL] Weird performance difference

[GENERAL] Weird performance difference - Mailing list pgsql-general

Attachment

Previous

Next