Re: FETCH FIRST clause PERCENT option - Mailing list pgsql-hackers

From Ryan Lambert
Subject Re: FETCH FIRST clause PERCENT option
Date
Msg-id CAN-V+g-KwCFhBJoFGyoqBGE+A-+4fvgF23SPEiX62UPVH9+grA@mail.gmail.com
Whole thread Raw
In response to Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
Responses Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers
Hi Surafel,

The v5 patch [1] applies cleanly and passes make installcheck-world.

My concern with this is its performance.  As a user I would expect using this feature to enable queries to run faster than they would simply pulling the full table.  I tested on tables ranging from 10k rows up to 10 million with the same basic result that using FETCH FIRST N PERCENT is slower than using the full table.  At best it ran slightly slower than querying the full table, at worst it increased execution times by 1400% when using a large high percentage (95%).

Testing was on a DigitalOcean 2 CPU, 2GB RAM droplet.  All queries were executed multiple times (6+) with EXPLAIN (ANALYZE, COSTS) and /timing enabled, query plans provided are from the faster end of each query.

Create the test table:

DROP TABLE IF EXISTS r10mwide;
CREATE TABLE r10mwide AS
SELECT generate_series(1, 10000000)::INT AS id,
    random() AS v1,
    random() AS v2,
    random() AS v3,
    random() AS v4,
    random() AS v5,
    random() AS v6,
    random() AS v7,
    random() AS v8,
    random() AS v9,
    random() AS v10
    ;
VACUUM ANALYZE;


Start with a baseline query from the full table, the limiting will be done within the CTE as I would typically do in a production query.  The outer query does basic aggregates.  Below I provide each full query for easy copy/paste but the only real change is how the inner results are limited.  This runs in 2.2s off the full table.

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
    FROM r10mwide
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
       
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=227192.07..227192.08 rows=1 width=24) (actual time=2230.419..2230.419 rows=1 loops=1)
   ->  Gather  (cost=227191.84..227192.05 rows=2 width=72) (actual time=2228.321..2231.225 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=226191.84..226191.85 rows=1 width=72) (actual time=2218.754..2218.754 rows=1 loops=3)
               ->  Parallel Seq Scan on r10mwide  (cost=0.00..184524.92 rows=4166692 width=16) (actual time=0.032..934.652 rows=3333333 loops=3)
 Planning Time: 0.116 ms
 Execution Time: 2231.935 ms
(8 rows)

Time: 2232.459 ms (00:02.232)


It did use parallel, since the FETCH FIRST queries apparently won't I turned that off and ran it again.

SET max_parallel_workers_per_gather = 0;

New query plan for full table, no parallel, just over 4 seconds.

                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=342859.21..342859.22 rows=1 width=24) (actual time=4253.861..4253.862 rows=1 loops=1)
   ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=10000060 width=16) (actual time=0.062..1728.346 rows=10000000 loops=1)
 Planning Time: 0.082 ms
 Execution Time: 4253.918 ms
(4 rows)


The following uses the explicit row count to pull 100k rows (1%) and gives a massive improvement in speed, this query ranged from 55- 120ms.  This is the type of speedup I would expect, and it beats the full-table query hands down, regardless of parallel query.

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
    FROM r10mwide
    FETCH FIRST 100000 ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
       
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4428.58..4428.59 rows=1 width=24) (actual time=55.963..55.963 rows=1 loops=1)
   ->  Limit  (cost=0.00..2428.57 rows=100000 width=20) (actual time=0.041..35.725 rows=100000 loops=1)
         ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=10000060 width=20) (actual time=0.039..24.796 rows=100000 loops=1)
 Planning Time: 0.134 ms
 Execution Time: 56.032 ms
(5 rows)

Time: 57.262 ms



Using FETCH FIRST 1 PERCENT it takes over 7 seconds, 71% slower than querying the full table.

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
    FROM r10mwide
    FETCH FIRST 1 PERCENT ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6857.22..6857.23 rows=1 width=24) (actual time=7112.205..7112.206 rows=1 loops=1)
   ->  Limit  (cost=2428.60..4857.19 rows=100001 width=20) (actual time=0.036..7047.394 rows=100000 loops=1)
         ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=10000060 width=20) (actual time=0.033..3072.881 rows=10000000 loops=1)
 Planning Time: 0.132 ms
 Execution Time: 7214.302 ms
(5 rows)

Time: 7215.278 ms (00:07.215)


Cranking the percentage up to 95% took 59-75 seconds.

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
    FROM r10mwide
    FETCH FIRST 95 PERCENT ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
                                                               QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=651432.48..651432.49 rows=1 width=24) (actual time=58981.043..58981.044 rows=1 loops=1)
   ->  Limit  (cost=230715.67..461431.34 rows=9500057 width=20) (actual time=0.017..55799.389 rows=9500000 loops=1)
         ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=10000060 width=20) (actual time=0.014..3847.146 rows=10000000 loops=1)
 Planning Time: 0.117 ms
 Execution Time: 59079.680 ms
(5 rows)


Even taking it way down to .001% (100 rows!) didn't run fast by any measure, more than 6 seconds.

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
    FROM r10mwide
    FETCH FIRST .001 PERCENT ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;


                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6.86..6.87 rows=1 width=24) (actual time=6414.406..6414.406 rows=1 loops=1)
   ->  Limit  (cost=2.43..4.86 rows=100 width=20) (actual time=0.021..6413.981 rows=100 loops=1)
         ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=10000060 width=20) (actual time=0.017..3086.504 rows=10000000 loops=1)
 Planning Time: 0.168 ms
 Execution Time: 6495.686 ms



[1] https://www.postgresql.org/message-id/attachment/102333/percent-incremental-v5.patch


Ryan Lambert
RustProof Labs


On Mon, Jul 8, 2019 at 1:09 AM Surafel Temesgen <surafel3000@gmail.com> wrote:
Hi Thomas,
Thank you for informing me

Hi Surafel,

There's a call to adjust_limit_rows_costs() hiding under
contrib/postgres_fdw, so this fails check-world.

Fixed . I also include the review given by Ryan in attached patch

regards
Surafel

pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Alexander Korotkov
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)