Thread: explain analyze faster then query
Hi,
I'm using postgres 9.6.
I have a table with 100M+ records which consume on disk about 8GB. In addition I have an index on the id column of the table.
When I run in psql : explain analyze select id from my_table order by id
The query returns output after 130 seconds which is great. The plan that is chosen is Index only scan.
However when I run the query without the explain analyze it takes forever to run it(More then two hours).
All the statistics are accurate and work_mem set to 4MB. What there is so much difference between running the query with explain analyze and without ?
Is there a possibility that it is related to fetching or something like that ?
Thanks.
Cc: pgsql-performance@lists.postgresql.org, pgsql-admin@lists.postgresql.org Please avoid simultaneously sending the same question to multiple lists. It means that people can't see each others replies and everything that implies. On Sun, Nov 25, 2018 at 03:08:33PM +0200, Mariel Cherkassky wrote: > However when I run the query without the explain analyze it takes forever > to run it(More then two hours). > Is there a possibility that it is related to fetching or something like > that ? If it's a remote database, I expect that's why. Maybe you can test by running the query on the DB server. Or by running another variant of the query, such as: WITH x AS (QUERY GOES HERE) SELECT 1; which returns only one row but after having executed the query behind CTE, as optimization fence. Justin
I run it from inside the machine on the local database.
For example :
db=# create table rule_test as select generate_series(1,100000000);
SELECT 100000000
db=# explain analyze select generate_series from rule_test order by generate_series asc;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=17763711.32..18045791.04 rows=112831890 width=4) (actual time=62677.752..100928.829 rows=100000000 loops=1)
Sort Key: generate_series
Sort Method: external merge Disk: 1367624kB
-> Seq Scan on rule_test (cost=0.00..1570796.90 rows=112831890 width=4) (actual time=0.019..36098.463 rows=100000000 loops=1)
Planning time: 0.072 ms
Execution time: 107025.113 ms
(6 rows)
db=# create index on rule_test(generate_series);
CREATE INDEX
db=# select generate_series from rule_test order by generate_series asc;
db=# explain analyze select generate_series from rule_test order by generate_series asc;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using rule_test_generate_series_idx on rule_test (cost=0.57..2490867.57 rows=100000000 width=4) (actual time=0.103..63122.906 rows=100000000 loops=1)
Heap Fetches: 100000000
Planning time: 6.682 ms
Execution time: 69265.311 ms
(4 rows)
db=# select generate_series from rule_test order by generate_series asc;
stuck for more then a hour
בתאריך יום א׳, 25 בנוב׳ 2018 ב-15:30 מאת Justin Pryzby <pryzby@telsasoft.com>:
Cc: pgsql-performance@lists.postgresql.org,
pgsql-admin@lists.postgresql.org
Please avoid simultaneously sending the same question to multiple lists.
It means that people can't see each others replies and everything that implies.
On Sun, Nov 25, 2018 at 03:08:33PM +0200, Mariel Cherkassky wrote:
> However when I run the query without the explain analyze it takes forever
> to run it(More then two hours).
> Is there a possibility that it is related to fetching or something like
> that ?
If it's a remote database, I expect that's why.
Maybe you can test by running the query on the DB server.
Or by running another variant of the query, such as:
WITH x AS (QUERY GOES HERE) SELECT 1;
which returns only one row but after having executed the query behind CTE, as
optimization fence.
Justin
On Sun, Nov 25, 2018 at 03:37:46PM +0200, Mariel Cherkassky wrote: > I run it from inside the machine on the local database. > For example : > > db=# create table rule_test as select generate_series(1,100000000); > SELECT 100000000 > db=# explain analyze select generate_series from rule_test order by > generate_series asc; So it's returning 100M rows to the client, which nominally will require moving 400MB. And pgsql is formatting the output. I did a test with 10M rows: [pryzbyj@database ~]$ command time -v psql postgres -c 'SELECT * FROM rule_test' |wc -c& Command being timed: "psql postgres -c SELECT * FROM rule_test" User time (seconds): 11.52 Percent of CPU this job got: 78% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:17.25 Maximum resident set size (kbytes): 396244 ... 170000053 Explain analyze takes 0.8sec, but returning query results uses 11sec CPU time on the *client*, needed 400MB RAM (ints now being represented as strings instead of machine types), and wrote 170MB to stdout, Also, if the output is being piped to less, the data is going to be buffered there, which means your query is perhaps using 4GB RAM in psql + 4GB in less.. Is the server swapping ? check "si" and "so" in output of "vmstat -w 1" Justin