Thread: Consecutive Query Executions with Increasing Execution Time
Hi all,
I am running TPC-H on recent postgresql (12.0 and 12.1).
On some of the queries (that may involve parallel scans) I see this interesting behavior:
When these queries are executed back-to-back (sent from psql interactive terminal), the total execution time of them increase monotonically.
I simplified query-1 to demonstrate this effect:
``` example.sql
explain (analyze, buffers) select
max(l_shipdate) as max_data,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '20' day;
```
max(l_shipdate) as max_data,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '20' day;
```
When I execute (from fish) following command:
`for i in (seq 1 20); psql tpch < example.sql | grep Execution; end`
The results I get are as follows:
"
Execution Time: 184.864 ms
Execution Time: 192.758 ms
Execution Time: 197.380 ms
Execution Time: 200.384 ms
Execution Time: 202.950 ms
Execution Time: 205.695 ms
Execution Time: 208.082 ms
Execution Time: 209.108 ms
Execution Time: 212.428 ms
Execution Time: 214.539 ms
Execution Time: 215.799 ms
Execution Time: 219.057 ms
Execution Time: 222.102 ms
Execution Time: 223.779 ms
Execution Time: 227.819 ms
Execution Time: 229.710 ms
Execution Time: 239.439 ms
Execution Time: 237.649 ms
Execution Time: 249.178 ms
Execution Time: 261.268 ms
"
`for i in (seq 1 20); psql tpch < example.sql | grep Execution; end`
The results I get are as follows:
"
Execution Time: 184.864 ms
Execution Time: 192.758 ms
Execution Time: 197.380 ms
Execution Time: 200.384 ms
Execution Time: 202.950 ms
Execution Time: 205.695 ms
Execution Time: 208.082 ms
Execution Time: 209.108 ms
Execution Time: 212.428 ms
Execution Time: 214.539 ms
Execution Time: 215.799 ms
Execution Time: 219.057 ms
Execution Time: 222.102 ms
Execution Time: 223.779 ms
Execution Time: 227.819 ms
Execution Time: 229.710 ms
Execution Time: 239.439 ms
Execution Time: 237.649 ms
Execution Time: 249.178 ms
Execution Time: 261.268 ms
"
In addition, if the repeated more times, the total execution time can end up being 10X and more!!!
When there a wait period in-between queries, (e.g. sleep 10) in the above for loop, this increasing execution time behavior goes a way.
For more complex queries, the "wait period" needs to be longer to avoid the increase in execution time.
Some metadata about this table "lineitem":
tpch=# \d lineitem
Table "public.lineitem"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
l_orderkey | integer | | not null |
l_partkey | integer | | not null |
l_suppkey | integer | | not null |
l_linenumber | integer | | not null |
l_quantity | numeric(15,2) | | not null |
l_extendedprice | numeric(15,2) | | not null |
l_discount | numeric(15,2) | | not null |
l_tax | numeric(15,2) | | not null |
l_returnflag | character(1) | | not null |
l_linestatus | character(1) | | not null |
l_shipdate | date | | not null |
l_commitdate | date | | not null |
l_receiptdate | date | | not null |
l_shipinstruct | character(25) | | not null |
l_shipmode | character(10) | | not null |
l_comment | character varying(44) | | not null |
Indexes:
"i_l_commitdate" btree (l_commitdate)
"i_l_orderkey" btree (l_orderkey)
"i_l_orderkey_quantity" btree (l_orderkey, l_quantity)
"i_l_partkey" btree (l_partkey)
"i_l_receiptdate" btree (l_receiptdate)
"i_l_shipdate" btree (l_shipdate)
"i_l_suppkey" btree (l_suppkey)
"i_l_suppkey_partkey" btree (l_partkey, l_suppkey)
Table "public.lineitem"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
l_orderkey | integer | | not null |
l_partkey | integer | | not null |
l_suppkey | integer | | not null |
l_linenumber | integer | | not null |
l_quantity | numeric(15,2) | | not null |
l_extendedprice | numeric(15,2) | | not null |
l_discount | numeric(15,2) | | not null |
l_tax | numeric(15,2) | | not null |
l_returnflag | character(1) | | not null |
l_linestatus | character(1) | | not null |
l_shipdate | date | | not null |
l_commitdate | date | | not null |
l_receiptdate | date | | not null |
l_shipinstruct | character(25) | | not null |
l_shipmode | character(10) | | not null |
l_comment | character varying(44) | | not null |
Indexes:
"i_l_commitdate" btree (l_commitdate)
"i_l_orderkey" btree (l_orderkey)
"i_l_orderkey_quantity" btree (l_orderkey, l_quantity)
"i_l_partkey" btree (l_partkey)
"i_l_receiptdate" btree (l_receiptdate)
"i_l_shipdate" btree (l_shipdate)
"i_l_suppkey" btree (l_suppkey)
"i_l_suppkey_partkey" btree (l_partkey, l_suppkey)
tpch=# SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='lineitem';
relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
----------+----------+--------------+---------------+---------+----------+----------------+------------+---------------
lineitem | 112503 | 6.001167e+06 | 112503 | r | 16 | f | | 921903104
(1 row)
relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
----------+----------+--------------+---------------+---------+----------+----------------+------------+---------------
lineitem | 112503 | 6.001167e+06 | 112503 | r | 16 | f | | 921903104
(1 row)
Postgresql 12.0 and 12.1 are all manually installed from source.
Both are running on Ubuntu 16.04 kernel 4.4.0-142-generic, on Intel(R) Core(TM) i7-6700K.
Any help greatly appreciated!
Shijia
Hi Shijia,
It sounds like concurrency on the queries: the second starts before the first ends, and so on. With a short wait in between you ensure sequential execution. Notice that you also have the overhead of concurrent psql...
It sounds like concurrency on the queries: the second starts before the first ends, and so on. With a short wait in between you ensure sequential execution. Notice that you also have the overhead of concurrent psql...
Sounds normal to me.
Best regards
Olivier
On Mon, Dec 16, 2019, 07:00 Shijia Wei <shijiawei@utexas.edu> wrote:
Hi all,I am running TPC-H on recent postgresql (12.0 and 12.1).On some of the queries (that may involve parallel scans) I see this interesting behavior:When these queries are executed back-to-back (sent from psql interactive terminal), the total execution time of them increase monotonically.I simplified query-1 to demonstrate this effect:``` example.sqlexplain (analyze, buffers) select
max(l_shipdate) as max_data,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '20' day;
```When I execute (from fish) following command:
`for i in (seq 1 20); psql tpch < example.sql | grep Execution; end`
The results I get are as follows:
"
Execution Time: 184.864 ms
Execution Time: 192.758 ms
Execution Time: 197.380 ms
Execution Time: 200.384 ms
Execution Time: 202.950 ms
Execution Time: 205.695 ms
Execution Time: 208.082 ms
Execution Time: 209.108 ms
Execution Time: 212.428 ms
Execution Time: 214.539 ms
Execution Time: 215.799 ms
Execution Time: 219.057 ms
Execution Time: 222.102 ms
Execution Time: 223.779 ms
Execution Time: 227.819 ms
Execution Time: 229.710 ms
Execution Time: 239.439 ms
Execution Time: 237.649 ms
Execution Time: 249.178 ms
Execution Time: 261.268 ms
"In addition, if the repeated more times, the total execution time can end up being 10X and more!!!When there a wait period in-between queries, (e.g. sleep 10) in the above for loop, this increasing execution time behavior goes a way.For more complex queries, the "wait period" needs to be longer to avoid the increase in execution time.
Some metadata about this table "lineitem":tpch=# \d lineitem
Table "public.lineitem"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
l_orderkey | integer | | not null |
l_partkey | integer | | not null |
l_suppkey | integer | | not null |
l_linenumber | integer | | not null |
l_quantity | numeric(15,2) | | not null |
l_extendedprice | numeric(15,2) | | not null |
l_discount | numeric(15,2) | | not null |
l_tax | numeric(15,2) | | not null |
l_returnflag | character(1) | | not null |
l_linestatus | character(1) | | not null |
l_shipdate | date | | not null |
l_commitdate | date | | not null |
l_receiptdate | date | | not null |
l_shipinstruct | character(25) | | not null |
l_shipmode | character(10) | | not null |
l_comment | character varying(44) | | not null |
Indexes:
"i_l_commitdate" btree (l_commitdate)
"i_l_orderkey" btree (l_orderkey)
"i_l_orderkey_quantity" btree (l_orderkey, l_quantity)
"i_l_partkey" btree (l_partkey)
"i_l_receiptdate" btree (l_receiptdate)
"i_l_shipdate" btree (l_shipdate)
"i_l_suppkey" btree (l_suppkey)
"i_l_suppkey_partkey" btree (l_partkey, l_suppkey)tpch=# SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='lineitem';
relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
----------+----------+--------------+---------------+---------+----------+----------------+------------+---------------
lineitem | 112503 | 6.001167e+06 | 112503 | r | 16 | f | | 921903104
(1 row)Postgresql 12.0 and 12.1 are all manually installed from source.Both are running on Ubuntu 16.04 kernel 4.4.0-142-generic, on Intel(R) Core(TM) i7-6700K.Any help greatly appreciated!Shijia
Hi Olivier,
I do not think that the queries are executed concurrently. The bash for loop ensures that the next command fires only after the first returns.
Also for some 'complex' queries, even a wait-period that is longer than the total execution time does not completely avoid this effect.
For example, a wait-period of 5-second in between queries that take 2-second to run, does not help avoid the increasing runtime problem completely.
Thanks,
Shijia
On Mon, Dec 16, 2019 at 2:04 AM Olivier Gautherot <ogautherot@gautherot.net> wrote:
Hi Shijia,
It sounds like concurrency on the queries: the second starts before the first ends, and so on. With a short wait in between you ensure sequential execution. Notice that you also have the overhead of concurrent psql...Sounds normal to me.Best regardsOlivierOn Mon, Dec 16, 2019, 07:00 Shijia Wei <shijiawei@utexas.edu> wrote:Hi all,I am running TPC-H on recent postgresql (12.0 and 12.1).On some of the queries (that may involve parallel scans) I see this interesting behavior:When these queries are executed back-to-back (sent from psql interactive terminal), the total execution time of them increase monotonically.I simplified query-1 to demonstrate this effect:``` example.sqlexplain (analyze, buffers) select
max(l_shipdate) as max_data,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '20' day;
```When I execute (from fish) following command:
`for i in (seq 1 20); psql tpch < example.sql | grep Execution; end`
The results I get are as follows:
"
Execution Time: 184.864 ms
Execution Time: 192.758 ms
Execution Time: 197.380 ms
Execution Time: 200.384 ms
Execution Time: 202.950 ms
Execution Time: 205.695 ms
Execution Time: 208.082 ms
Execution Time: 209.108 ms
Execution Time: 212.428 ms
Execution Time: 214.539 ms
Execution Time: 215.799 ms
Execution Time: 219.057 ms
Execution Time: 222.102 ms
Execution Time: 223.779 ms
Execution Time: 227.819 ms
Execution Time: 229.710 ms
Execution Time: 239.439 ms
Execution Time: 237.649 ms
Execution Time: 249.178 ms
Execution Time: 261.268 ms
"In addition, if the repeated more times, the total execution time can end up being 10X and more!!!When there a wait period in-between queries, (e.g. sleep 10) in the above for loop, this increasing execution time behavior goes a way.For more complex queries, the "wait period" needs to be longer to avoid the increase in execution time.
Some metadata about this table "lineitem":tpch=# \d lineitem
Table "public.lineitem"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
l_orderkey | integer | | not null |
l_partkey | integer | | not null |
l_suppkey | integer | | not null |
l_linenumber | integer | | not null |
l_quantity | numeric(15,2) | | not null |
l_extendedprice | numeric(15,2) | | not null |
l_discount | numeric(15,2) | | not null |
l_tax | numeric(15,2) | | not null |
l_returnflag | character(1) | | not null |
l_linestatus | character(1) | | not null |
l_shipdate | date | | not null |
l_commitdate | date | | not null |
l_receiptdate | date | | not null |
l_shipinstruct | character(25) | | not null |
l_shipmode | character(10) | | not null |
l_comment | character varying(44) | | not null |
Indexes:
"i_l_commitdate" btree (l_commitdate)
"i_l_orderkey" btree (l_orderkey)
"i_l_orderkey_quantity" btree (l_orderkey, l_quantity)
"i_l_partkey" btree (l_partkey)
"i_l_receiptdate" btree (l_receiptdate)
"i_l_shipdate" btree (l_shipdate)
"i_l_suppkey" btree (l_suppkey)
"i_l_suppkey_partkey" btree (l_partkey, l_suppkey)tpch=# SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='lineitem';
relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
----------+----------+--------------+---------------+---------+----------+----------------+------------+---------------
lineitem | 112503 | 6.001167e+06 | 112503 | r | 16 | f | | 921903104
(1 row)Postgresql 12.0 and 12.1 are all manually installed from source.Both are running on Ubuntu 16.04 kernel 4.4.0-142-generic, on Intel(R) Core(TM) i7-6700K.Any help greatly appreciated!Shijia
Shijia Wei
ECE, UT Austin | ACSES | 3rd Year PhD
ECE, UT Austin | ACSES | 3rd Year PhD
Hi Shijia,
If you're using fish, I suspect you're on a Mac - I don't have experience on this platform.
Can you check with pgAdmin (3 or 4) what the server is busy doing after a few iterations? Check for locks, as it could be a cause. Also, do you have concurrent INSERTs?
Olivier
On Mon, Dec 16, 2019, 10:52 Shijia Wei <shijiawei@utexas.edu> wrote:
Hi Olivier,I do not think that the queries are executed concurrently. The bash for loop ensures that the next command fires only after the first returns.Also for some 'complex' queries, even a wait-period that is longer than the total execution time does not completely avoid this effect.For example, a wait-period of 5-second in between queries that take 2-second to run, does not help avoid the increasing runtime problem completely.Thanks,ShijiaOn Mon, Dec 16, 2019 at 2:04 AM Olivier Gautherot <ogautherot@gautherot.net> wrote:Hi Shijia,
It sounds like concurrency on the queries: the second starts before the first ends, and so on. With a short wait in between you ensure sequential execution. Notice that you also have the overhead of concurrent psql...Sounds normal to me.Best regardsOlivierOn Mon, Dec 16, 2019, 07:00 Shijia Wei <shijiawei@utexas.edu> wrote:Hi all,I am running TPC-H on recent postgresql (12.0 and 12.1).On some of the queries (that may involve parallel scans) I see this interesting behavior:When these queries are executed back-to-back (sent from psql interactive terminal), the total execution time of them increase monotonically.I simplified query-1 to demonstrate this effect:``` example.sqlexplain (analyze, buffers) select
max(l_shipdate) as max_data,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '20' day;
```When I execute (from fish) following command:
`for i in (seq 1 20); psql tpch < example.sql | grep Execution; end`
The results I get are as follows:
"
Execution Time: 184.864 ms
Execution Time: 192.758 ms
Execution Time: 197.380 ms
Execution Time: 200.384 ms
Execution Time: 202.950 ms
Execution Time: 205.695 ms
Execution Time: 208.082 ms
Execution Time: 209.108 ms
Execution Time: 212.428 ms
Execution Time: 214.539 ms
Execution Time: 215.799 ms
Execution Time: 219.057 ms
Execution Time: 222.102 ms
Execution Time: 223.779 ms
Execution Time: 227.819 ms
Execution Time: 229.710 ms
Execution Time: 239.439 ms
Execution Time: 237.649 ms
Execution Time: 249.178 ms
Execution Time: 261.268 ms
"In addition, if the repeated more times, the total execution time can end up being 10X and more!!!When there a wait period in-between queries, (e.g. sleep 10) in the above for loop, this increasing execution time behavior goes a way.For more complex queries, the "wait period" needs to be longer to avoid the increase in execution time.
Some metadata about this table "lineitem":tpch=# \d lineitem
Table "public.lineitem"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
l_orderkey | integer | | not null |
l_partkey | integer | | not null |
l_suppkey | integer | | not null |
l_linenumber | integer | | not null |
l_quantity | numeric(15,2) | | not null |
l_extendedprice | numeric(15,2) | | not null |
l_discount | numeric(15,2) | | not null |
l_tax | numeric(15,2) | | not null |
l_returnflag | character(1) | | not null |
l_linestatus | character(1) | | not null |
l_shipdate | date | | not null |
l_commitdate | date | | not null |
l_receiptdate | date | | not null |
l_shipinstruct | character(25) | | not null |
l_shipmode | character(10) | | not null |
l_comment | character varying(44) | | not null |
Indexes:
"i_l_commitdate" btree (l_commitdate)
"i_l_orderkey" btree (l_orderkey)
"i_l_orderkey_quantity" btree (l_orderkey, l_quantity)
"i_l_partkey" btree (l_partkey)
"i_l_receiptdate" btree (l_receiptdate)
"i_l_shipdate" btree (l_shipdate)
"i_l_suppkey" btree (l_suppkey)
"i_l_suppkey_partkey" btree (l_partkey, l_suppkey)tpch=# SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='lineitem';
relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
----------+----------+--------------+---------------+---------+----------+----------------+------------+---------------
lineitem | 112503 | 6.001167e+06 | 112503 | r | 16 | f | | 921903104
(1 row)Postgresql 12.0 and 12.1 are all manually installed from source.Both are running on Ubuntu 16.04 kernel 4.4.0-142-generic, on Intel(R) Core(TM) i7-6700K.Any help greatly appreciated!Shijia--Shijia Wei
ECE, UT Austin | ACSES | 3rd Year PhD
On Sun, 2019-12-15 at 23:59 -0600, Shijia Wei wrote: > I am running TPC-H on recent postgresql (12.0 and 12.1). > On some of the queries (that may involve parallel scans) I see this interesting behavior: > When these queries are executed back-to-back (sent from psql interactive terminal), the total execution time of them increasemonotonically. > > I simplified query-1 to demonstrate this effect: > ``` example.sql > explain (analyze, buffers) select > max(l_shipdate) as max_data, > count(*) as count_order > from > lineitem > where > l_shipdate <= date '1998-12-01' - interval '20' day; > ``` > > When I execute (from fish) following command: > `for i in (seq 1 20); psql tpch < example.sql | grep Execution; end` > The results I get are as follows: > " > Execution Time: 184.864 ms > Execution Time: 192.758 ms > Execution Time: 197.380 ms > Execution Time: 200.384 ms > Execution Time: 202.950 ms > Execution Time: 205.695 ms > Execution Time: 208.082 ms > Execution Time: 209.108 ms > Execution Time: 212.428 ms > Execution Time: 214.539 ms > Execution Time: 215.799 ms > Execution Time: 219.057 ms > Execution Time: 222.102 ms > Execution Time: 223.779 ms > Execution Time: 227.819 ms > Execution Time: 229.710 ms > Execution Time: 239.439 ms > Execution Time: 237.649 ms > Execution Time: 249.178 ms > Execution Time: 261.268 ms I don't know TPC-H, but the slowdown is not necessarily surprising: If the number of rows that satisfy the condition keeps growing over time, counting those rows will necessarily take longer. Maybe you can provide more details, for example EXPLAIN (ANALYZE, BUFFERS) output for the query when it is fast and when it is slow. Yours, Laurenz Albe -- +43-670-6056265 Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Shijia Wei <shijiawei@utexas.edu> writes: > I am running TPC-H on recent postgresql (12.0 and 12.1). > On some of the queries (that may involve parallel scans) I see this > interesting behavior: > When these queries are executed back-to-back (sent from psql interactive > terminal), the total execution time of them increase monotonically. FWIW, I can't reproduce this here. Using a different chosen-at-random query, I tried $ for i in `seq 1 20`; do > psql -c 'explain (analyze) select * from tenk1 a,tenk1 b where a.hundred=b.hundred;' regression | grep Execution > done Execution Time: 468.548 ms Execution Time: 467.905 ms Execution Time: 467.634 ms Execution Time: 465.852 ms Execution Time: 463.328 ms Execution Time: 462.541 ms Execution Time: 463.922 ms Execution Time: 466.171 ms Execution Time: 464.778 ms Execution Time: 464.474 ms Execution Time: 466.087 ms Execution Time: 463.092 ms Execution Time: 463.700 ms Execution Time: 468.924 ms Execution Time: 464.970 ms Execution Time: 464.844 ms Execution Time: 464.665 ms Execution Time: 465.247 ms Execution Time: 465.931 ms Execution Time: 466.722 ms > When there a wait period in-between queries, (e.g. sleep 10) in the above > for loop, this increasing execution time behavior goes a way. A conceivable theory is that the previous backends haven't exited yet and the extra runtime represents overhead due to having lots of active PGPROC entries. This is pretty hard to credit on a multi-core machine, however. I think it'd require assuming that the old backends have some seconds' worth of cleanup work to do before they can exit, which makes little sense. (Unless, perhaps, you have turned on coverage instrumentation, or some other expensive debug monitoring?) I concur with the suggestion to try to pin down where the cycles are going. I'd suggest using perf or some similar tool. regards, tom lane
Hi Laurenz,
Each time the identical query executes, the total number of rows selected stays the same. The table is actually not modified between/during runs.
The query plan stays the same between fast and slow runs. Please find two copied here:
The first one is the output of the first query in the loop 1-to-20; The second one is the output of the last query (20th).
The first one is the output of the first query in the loop 1-to-20; The second one is the output of the last query (20th).
1st Query:
Finalize Aggregate (cost=126840.37..126840.38 rows=1 width=12) (actual time=178.825..178.826 rows=1 loops=1)
Buffers: shared hit=17074 read=16388
-> Gather (cost=126839.73..126840.34 rows=6 width=12) (actual time=178.786..180.064 rows=7 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=17074 read=16388
-> Partial Aggregate (cost=125839.73..125839.74 rows=1 width=12) (actual time=176.781..176.781 rows=1 loops=7)
Buffers: shared hit=17074 read=16388
-> Parallel Index Only Scan using i_l_shipdate on lineitem (cost=0.43..120842.46 rows=999455 width=4) (actual time=0.045..114.871 rows=856704 loops=7)
Index Cond: (l_shipdate <= '1998-11-11 00:00:00'::timestamp without time zone)
Heap Fetches: 0
Buffers: shared hit=17074 read=16388
Planning Time: 0.458 ms
Execution Time: 180.111 ms
(14 rows)
Buffers: shared hit=17074 read=16388
-> Gather (cost=126839.73..126840.34 rows=6 width=12) (actual time=178.786..180.064 rows=7 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=17074 read=16388
-> Partial Aggregate (cost=125839.73..125839.74 rows=1 width=12) (actual time=176.781..176.781 rows=1 loops=7)
Buffers: shared hit=17074 read=16388
-> Parallel Index Only Scan using i_l_shipdate on lineitem (cost=0.43..120842.46 rows=999455 width=4) (actual time=0.045..114.871 rows=856704 loops=7)
Index Cond: (l_shipdate <= '1998-11-11 00:00:00'::timestamp without time zone)
Heap Fetches: 0
Buffers: shared hit=17074 read=16388
Planning Time: 0.458 ms
Execution Time: 180.111 ms
(14 rows)
20th Query:
Finalize Aggregate (cost=126840.37..126840.38 rows=1 width=12) (actual time=223.928..223.929 rows=1 loops=1)
Buffers: shared hit=17037 read=16390
-> Gather (cost=126839.73..126840.34 rows=6 width=12) (actual time=223.856..225.474 rows=7 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=17037 read=16390
-> Partial Aggregate (cost=125839.73..125839.74 rows=1 width=12) (actual time=221.918..221.918 rows=1 loops=7)
Buffers: shared hit=17037 read=16390
-> Parallel Index Only Scan using i_l_shipdate on lineitem (cost=0.43..120842.46 rows=999455 width=4) (actual time=0.062..143.808 rows=856704 loops=7)
Index Cond: (l_shipdate <= '1998-11-11 00:00:00'::timestamp without time zone)
Heap Fetches: 0
Buffers: shared hit=17037 read=16390
Planning Time: 0.552 ms
Execution Time: 225.529 ms
(14 rows)
Buffers: shared hit=17037 read=16390
-> Gather (cost=126839.73..126840.34 rows=6 width=12) (actual time=223.856..225.474 rows=7 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=17037 read=16390
-> Partial Aggregate (cost=125839.73..125839.74 rows=1 width=12) (actual time=221.918..221.918 rows=1 loops=7)
Buffers: shared hit=17037 read=16390
-> Parallel Index Only Scan using i_l_shipdate on lineitem (cost=0.43..120842.46 rows=999455 width=4) (actual time=0.062..143.808 rows=856704 loops=7)
Index Cond: (l_shipdate <= '1998-11-11 00:00:00'::timestamp without time zone)
Heap Fetches: 0
Buffers: shared hit=17037 read=16390
Planning Time: 0.552 ms
Execution Time: 225.529 ms
(14 rows)
One difference I noticed here is that "actual time" of the Parallel Index Only Scan increased from 114ms to 143ms.
The same holds for other examples that involve Parallel Seq Scan.
Thanks,
Shijia
On Mon, Dec 16, 2019 at 7:25 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sun, 2019-12-15 at 23:59 -0600, Shijia Wei wrote:
> I am running TPC-H on recent postgresql (12.0 and 12.1).
> On some of the queries (that may involve parallel scans) I see this interesting behavior:
> When these queries are executed back-to-back (sent from psql interactive terminal), the total execution time of them increase monotonically.
>
> I simplified query-1 to demonstrate this effect:
> ``` example.sql
> explain (analyze, buffers) select
> max(l_shipdate) as max_data,
> count(*) as count_order
> from
> lineitem
> where
> l_shipdate <= date '1998-12-01' - interval '20' day;
> ```
>
> When I execute (from fish) following command:
> `for i in (seq 1 20); psql tpch < example.sql | grep Execution; end`
> The results I get are as follows:
> "
> Execution Time: 184.864 ms
> Execution Time: 192.758 ms
> Execution Time: 197.380 ms
> Execution Time: 200.384 ms
> Execution Time: 202.950 ms
> Execution Time: 205.695 ms
> Execution Time: 208.082 ms
> Execution Time: 209.108 ms
> Execution Time: 212.428 ms
> Execution Time: 214.539 ms
> Execution Time: 215.799 ms
> Execution Time: 219.057 ms
> Execution Time: 222.102 ms
> Execution Time: 223.779 ms
> Execution Time: 227.819 ms
> Execution Time: 229.710 ms
> Execution Time: 239.439 ms
> Execution Time: 237.649 ms
> Execution Time: 249.178 ms
> Execution Time: 261.268 ms
I don't know TPC-H, but the slowdown is not necessarily surprising:
If the number of rows that satisfy the condition keeps growing over time,
counting those rows will necessarily take longer.
Maybe you can provide more details, for example EXPLAIN (ANALYZE, BUFFERS)
output for the query when it is fast and when it is slow.
Yours,
Laurenz Albe
--
+43-670-6056265
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com
On Mon, Dec 16, 2019 at 9:28 AM Shijia Wei <shijiawei@utexas.edu> wrote: > 1st Query: > Buffers: shared hit=17074 read=16388 > 20th Query: > Buffers: shared hit=17037 read=16390 Why do the first and the twentieth executions of the query have almost identical "buffers shared/read" numbers? That seems odd. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > Why do the first and the twentieth executions of the query have almost > identical "buffers shared/read" numbers? That seems odd. It's repeat execution of the same query, so that doesn't seem odd to me. This last set of numbers suggests that there's some issue with the parallel execution infrastructure in particular, though I don't see what it would be. Doesn't execParallel wait for the workers to exit before the leader finishes its query? If so, how is there any persistent state that would interfere with a later query? regards, tom lane
Could it be that your CPUs is warming and throttling? You didn't mention the platform used, so I'm not sure whether it'sa server or a laptop Nicolas Le 16 décembre 2019 21:50:17 GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> a écrit : >Peter Geoghegan <pg@bowt.ie> writes: >> Why do the first and the twentieth executions of the query have >almost >> identical "buffers shared/read" numbers? That seems odd. > >It's repeat execution of the same query, so that doesn't seem odd to >me. > >This last set of numbers suggests that there's some issue with the >parallel execution infrastructure in particular, though I don't see >what >it would be. Doesn't execParallel wait for the workers to exit before >the leader finishes its query? If so, how is there any persistent >state >that would interfere with a later query? > > regards, tom lane -- Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
Nicolas Charles <nicolas.charles@normation.com> writes: > Could it be that your CPUs is warming and throttling? You didn't mention the platform used, so I'm not sure whether it'sa server or a laptop Hmm, that's an interesting thought. The OP did say the CPU type, but according to Intel's spec page for it [1] the difference between base and turbo frequency is only 4.0 vs 4.2 GHz, which doesn't seem like enough to explain the results ... unless you suppose it actually throttled to below base freq, which surely shouldn't happen that fast. Might be worth watching the CPU frequency while doing the test though. I was speculating about some OS-level problem myself. Plain old "top" might be enough to show relevant info if it's in that area. regards, tom lane [1] https://ark.intel.com/content/www/us/en/ark/products/88195/intel-core-i7-6700k-processor-8m-cache-up-to-4-20-ghz.html
On Mon, Dec 16, 2019 at 2:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
unless you suppose it actually
throttled to below base freq, which surely shouldn't happen that fast.
Might be worth watching the CPU frequency while doing the test though.
Wouldn't expect to see such linear progression if that were the case. Steps, over a relatively long period of time, would be the likely pattern, no? Same goes for some other process fighting for resources. Every iteration requiring what appears to be a fairly constant increase in execution time (2-5ms on every iteration) seems an unlikely pattern unless the two processes are linked in some way, I would think.
Hi, On 2019-12-16 17:48:16 -0500, Tom Lane wrote: > Hmm, that's an interesting thought. The OP did say the CPU type, > but according to Intel's spec page for it [1] the difference between > base and turbo frequency is only 4.0 vs 4.2 GHz, which doesn't seem > like enough to explain the results ... unless you suppose it actually > throttled to below base freq, which surely shouldn't happen that fast. > Might be worth watching the CPU frequency while doing the test though. FWIW, it takes about 3s for my laptop CPU to throttle way below non-turbo when I put it under strenuous load. Obviously that's a laptop, and caused by a firmware bug leading to fans not spinning fast enough automatically. But it'd not take that much for insufficient cooling to cause problems in a desktop either. Been there, done that. But: I don't see that causing a 10x slowdown as reported in the first mail in this thread. I think we need a system-wide perf profile during a few initial "good" runs and then later from a few "really bad" runs. For that you'd have to make sure you compiled postgres with debug symbols (--enable-debug to configure), and then run something like perf record -o fast.data --call-graph dwarf -a sleep 3 while running repeated "fast" queries and then perf record -o slow.data --call-graph dwarf -a sleep 3 and then show us the results of something like perf report -i fast.data -g folded --percent-limit 1 > fast.txt perf report -i slow.data -g folded --percent-limit 1 > slow.txt and also, if your perf is new enough: perf diff fast.data slow.data > diff.txt - Andres
On Mon, 2019-12-16 at 15:50 -0500, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > Why do the first and the twentieth executions of the query have almost > > identical "buffers shared/read" numbers? That seems odd. > > It's repeat execution of the same query, so that doesn't seem odd to me. Really? Shouldn't the blocks be in shared buffers after a couple of executions? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Tue, Dec 17, 2019 at 8:08 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2019-12-16 at 15:50 -0500, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > Why do the first and the twentieth executions of the query have almost
> > identical "buffers shared/read" numbers? That seems odd.
>
> It's repeat execution of the same query, so that doesn't seem odd to me.
Really? Shouldn't the blocks be in shared buffers after a couple
of executions?
If it is doing a seq scan (I don't know if it is) they intentionally use a small ring buffer to, so they evict their own recently used blocks, rather than evicting other people's blocks. So these blocks won't build up in shared_buffers very rapidly just on the basis of repeated seq scans.
Cheers,
Jeff
On Tue, 2019-12-17 at 11:11 -0500, Jeff Janes wrote: > On Tue, Dec 17, 2019 at 8:08 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Mon, 2019-12-16 at 15:50 -0500, Tom Lane wrote: > > > Peter Geoghegan <pg@bowt.ie> writes: > > > > Why do the first and the twentieth executions of the query have almost > > > > identical "buffers shared/read" numbers? That seems odd. > > > > > > It's repeat execution of the same query, so that doesn't seem odd to me. > > > > Really? Shouldn't the blocks be in shared buffers after a couple > > of executions? > > If it is doing a seq scan (I don't know if it is) they intentionally use a > small ring buffer to, so they evict their own recently used blocks, rather > than evicting other people's blocks. So these blocks won't build up in > shared_buffers very rapidly just on the basis of repeated seq scans. Sure, but according to the execution plans it is doing a Parallel Index Only Scan. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Tue, 2019-12-17 at 11:11 -0500, Jeff Janes wrote: >> If it is doing a seq scan (I don't know if it is) they intentionally use a >> small ring buffer to, so they evict their own recently used blocks, rather >> than evicting other people's blocks. So these blocks won't build up in >> shared_buffers very rapidly just on the basis of repeated seq scans. > Sure, but according to the execution plans it is doing a Parallel Index Only Scan. Nonetheless, the presented test case consists of repeatedly doing the same query, in a fresh session each time. If there's not other activity then this should reach some sort of steady state. The table is apparently fairly large, so I don't find it surprising that the steady state fails to be 100% cached. regards, tom lane
Hi everyone!
Thanks a ton for this brilliant discussion here!
It turned out that Nicolas was correct! I found that the CPU was broken and not spinning at all.
With consecutive parallel query execution, the CPU temperature hits 100C almost immediately after 1 or 2 iterations.
So the processor starts throttling way below baseline clk frequency to something like 1.2G or even 1G.
So the processor starts throttling way below baseline clk frequency to something like 1.2G or even 1G.
I waited until the new Fan came to report back, and now this weird behavior went away.
Thanks,
Shijia
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Tue, 2019-12-17 at 11:11 -0500, Jeff Janes wrote:
>> If it is doing a seq scan (I don't know if it is) they intentionally use a
>> small ring buffer to, so they evict their own recently used blocks, rather
>> than evicting other people's blocks. So these blocks won't build up in
>> shared_buffers very rapidly just on the basis of repeated seq scans.
> Sure, but according to the execution plans it is doing a Parallel Index Only Scan.
Nonetheless, the presented test case consists of repeatedly doing
the same query, in a fresh session each time. If there's not other
activity then this should reach some sort of steady state. The
table is apparently fairly large, so I don't find it surprising
that the steady state fails to be 100% cached.
regards, tom lane