A performance issue in ROW_NUMBER() OVER(ORDER BY NULL) [27 times slow than OVER()] V14.5 - Mailing list pgsql-general

From Kirk Wolak
Subject A performance issue in ROW_NUMBER() OVER(ORDER BY NULL) [27 times slow than OVER()] V14.5
Date
Msg-id CACLU5mS7Wc=xANexj7KSPMX2PRnv6FwDDHQn2DW+MY1UDBJm3A@mail.gmail.com
Whole thread Raw
Responses Re: A performance issue in ROW_NUMBER() OVER(ORDER BY NULL) [27 times slow than OVER()] V14.5
List pgsql-general
Okay,
  I have some converted code that uses this syntax.
  For 20 Million rows it was taking 15-20 minutes!  (versus 3 minutes) on live data.
  See here: https://explain.depesz.com/s/VQFJ  [There are 2 optimizations, removing the ORDER BY NULL, and just using a sequence]
  (The above is a live data run)  The IO with the ORDER BY NULL is crazy high.  10x the READ in GB!

  The solution is to remove the ORDER BY NULL.  [since that is not sortable, should it be ignored?]

  This does NOT SHOW UP with 1 million rows.

simple example (slow):
explain (verbose, buffers, analyze) select row_number() over(ORDER BY NULL) as rn, x, random() from generate_series(1,20000000) x order by x desc, random();
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3365975.67..3415975.67 rows=20000000 width=52) (actual time=816050.580..817984.447 rows=20000000 loops=1)
   Output: (row_number() OVER (?)), x, (random()), NULL::text
   Sort Key: x.x DESC, (random())
   Sort Method: external merge  Disk: 665376kB
   Buffers: temp read=20136158 written=161301
   ->  WindowAgg  (cost=0.00..550000.00 rows=20000000 width=52) (actual time=12585.463..791440.272 rows=20000000 loops=1)
         Output: row_number() OVER (?), x, random(), (NULL::text)
         Buffers: temp read=20052986 written=78126
         ->  Function Scan on pg_catalog.generate_series x  (cost=0.00..200000.00 rows=20000000 width=36) (actual time=3282.882..6343.470 rows=20000000 loops=1)
               Output: NULL::text, x
               Function Call: generate_series(1, 20000000)
               Buffers: temp read=34180 written=34180
 Query Identifier: 7415410443092007025
 Planning Time: 0.036 ms
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.231 ms, Inlining 5.021 ms, Optimization 18.850 ms, Emission 12.934 ms, Total 37.036 ms
 Execution Time: 819003.415 ms
(19 rows)

Time: 819004.041 ms (13:39.004)



the fixed version:
explain (verbose, buffers, analyze) select row_number() over() as rn, x, random() from generate_series(1,20000000) x order by x desc, random();
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3159725.67..3209725.67 rows=20000000 width=20) (actual time=23361.306..25466.728 rows=20000000 loops=1)
   Output: (row_number() OVER (?)), x, (random())
   Sort Key: x.x DESC, (random())
   Sort Method: external merge  Disk: 665376kB
   Buffers: temp read=117352 written=117355
   ->  WindowAgg  (cost=0.00..500000.00 rows=20000000 width=20) (actual time=3409.882..9789.217 rows=20000000 loops=1)
         Output: row_number() OVER (?), x, random()
         Buffers: temp read=34180 written=34180
         ->  Function Scan on pg_catalog.generate_series x  (cost=0.00..200000.00 rows=20000000 width=4) (actual time=3409.853..6768.737 rows=20000000 loops=1)
               Output: x
               Function Call: generate_series(1, 20000000)
               Buffers: temp read=34180 written=34180
 Query Identifier: -8739706385719272689
 Planning Time: 0.067 ms
 JIT:
   Functions: 4
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.180 ms, Inlining 32.829 ms, Optimization 5.890 ms, Emission 3.899 ms, Total 42.798 ms
 Execution Time: 26070.514 ms

Time: 26071.112 ms (00:26.071)



pgsql-general by date:

Previous
From: Christophe Pettus
Date:
Subject: Re: Who adds the "start transaction" and "commit" to the intended SQL statement in "autocommit" mode?
Next
From: Tom Lane
Date:
Subject: Re: A performance issue in ROW_NUMBER() OVER(ORDER BY NULL) [27 times slow than OVER()] V14.5