Thread: explain analyze faster then query

explain analyze faster then query

From

Mariel Cherkassky

Date:

25 November 2018, 16:08:33

Hi,

I'm using postgres 9.6.

I have a table with 100M+ records which consume on disk about 8GB. In addition I have an index on the id column of the table.

When I run in psql : explain analyze select id from my_table order by id

The query returns output after 130 seconds which is great. The plan that is chosen is Index only scan.

However when I run the query without the explain analyze it takes forever to run it(More then two hours).

All the statistics are accurate and work_mem set to 4MB. What there is so much difference between running the query with explain analyze and without ?

Is there a possibility that it is related to fetching or something like that ?

Thanks.

Re: explain analyze faster then query

From

Justin Pryzby

Date:

25 November 2018, 16:30:02

Cc: pgsql-performance@lists.postgresql.org,
    pgsql-admin@lists.postgresql.org

Please avoid simultaneously sending the same question to multiple lists.

It means that people can't see each others replies and everything that implies.

On Sun, Nov 25, 2018 at 03:08:33PM +0200, Mariel Cherkassky wrote:
> However when I run the query without the explain analyze it takes forever
> to run it(More then two hours).
> Is there a possibility that it is related to fetching or something like
> that ?

If it's a remote database, I expect that's why.
Maybe you can test by running the query on the DB server.
Or by running another variant of the query, such as:

WITH x AS (QUERY GOES HERE) SELECT 1;

which returns only one row but after having executed the query behind CTE, as
optimization fence.

Justin

Re: explain analyze faster then query

From

Mariel Cherkassky

Date:

25 November 2018, 16:37:46

I run it from inside the machine on the local database.

For example :

db=# create table rule_test as select generate_series(1,100000000);

SELECT 100000000

db=# explain analyze select generate_series from rule_test order by generate_series asc;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------

Sort (cost=17763711.32..18045791.04 rows=112831890 width=4) (actual time=62677.752..100928.829 rows=100000000 loops=1)

Sort Key: generate_series

Sort Method: external merge Disk: 1367624kB

-> Seq Scan on rule_test (cost=0.00..1570796.90 rows=112831890 width=4) (actual time=0.019..36098.463 rows=100000000 loops=1)

Planning time: 0.072 ms

Execution time: 107025.113 ms

(6 rows)

db=# create index on rule_test(generate_series);

CREATE INDEX

db=# select generate_series from rule_test order by generate_series asc;

db=# explain analyze select generate_series from rule_test order by generate_series asc;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Index Only Scan using rule_test_generate_series_idx on rule_test (cost=0.57..2490867.57 rows=100000000 width=4) (actual time=0.103..63122.906 rows=100000000 loops=1)

Heap Fetches: 100000000

Planning time: 6.682 ms

Execution time: 69265.311 ms

(4 rows)

db=# select generate_series from rule_test order by generate_series asc;

stuck for more then a hour

‫בתאריך יום א׳, 25 בנוב׳ 2018 ב-15:30 מאת ‪Justin Pryzby‬‏ <‪pryzby@telsasoft.com‬‏>:‬

Cc: pgsql-performance@lists.postgresql.org,
pgsql-admin@lists.postgresql.org

Please avoid simultaneously sending the same question to multiple lists.

It means that people can't see each others replies and everything that implies.

On Sun, Nov 25, 2018 at 03:08:33PM +0200, Mariel Cherkassky wrote:
> However when I run the query without the explain analyze it takes forever
> to run it(More then two hours).
> Is there a possibility that it is related to fetching or something like
> that ?

If it's a remote database, I expect that's why.
Maybe you can test by running the query on the DB server.
Or by running another variant of the query, such as:

WITH x AS (QUERY GOES HERE) SELECT 1;

which returns only one row but after having executed the query behind CTE, as
optimization fence.

Justin

Re: explain analyze faster then query

From

Justin Pryzby

Date:

25 November 2018, 17:12:25

On Sun, Nov 25, 2018 at 03:37:46PM +0200, Mariel Cherkassky wrote:
> I run it from inside the machine on the local database.
> For example :
> 
> db=# create table rule_test as select generate_series(1,100000000);
> SELECT 100000000

> db=# explain analyze select generate_series from rule_test order by
> generate_series asc;

So it's returning 100M rows to the client, which nominally will require moving
400MB.

And pgsql is formatting the output.

I did a test with 10M rows:

[pryzbyj@database ~]$ command time -v psql postgres -c 'SELECT * FROM rule_test' |wc -c&
Command being timed: "psql postgres -c SELECT * FROM rule_test"
        User time (seconds): 11.52
        Percent of CPU this job got: 78%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:17.25
        Maximum resident set size (kbytes): 396244
...
170000053

Explain analyze takes 0.8sec, but returning query results uses 11sec CPU time
on the *client*, needed 400MB RAM (ints now being represented as strings
instead of machine types), and wrote 170MB to stdout, Also, if the output is
being piped to less, the data is going to be buffered there, which means your
query is perhaps using 4GB RAM in psql + 4GB in less..

Is the server swapping ?   check "si" and "so" in output of "vmstat -w 1"

Justin