Thread: performance of SELECT * much faster than SELECT with large offset

performance of SELECT * much faster than SELECT with large offset

From
Marc Slemko
Date:
I ran into this oddity lately that goes against everything I thought I
understood and was wondering if anyone had any insight.  Version/env
details at the end.

The root of it is these query times:

marcs=# select * from ccrimes offset 5140000 limit 1;
[...data omitted...]
(1 row)
Time: 650.280 ms
marcs=# select description from ccrimes offset 5140000 limit 1;
             description
-------------------------------------
 FINANCIAL IDENTITY THEFT OVER $ 300
(1 row)

Time: 1298.672 ms

These times are all from data that is cached and are very repeatable.
Yes, I know that offset and limit without an order by isn't useful for
paging through data.

And an explain on them both... everything looks the same other than
the width and actual times:

marcs=# explain (analyze,buffers) select * from ccrimes offset 5140000 limit 1;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=204146.73..204146.73 rows=1 width=202) (actual
time=1067.901..1067.901 rows=1 loops=1)
   Buffers: shared hit=152743
   ->  Seq Scan on ccrimes  (cost=0.00..204146.73 rows=5139873
width=202) (actual time=0.014..810.672 rows=5140001 loops=1)
         Buffers: shared hit=152743
 Total runtime: 1067.951 ms
(5 rows)

Time: 1068.612 ms
marcs=# explain (analyze,buffers) select description from ccrimes
offset 5140000 limit 1;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=204146.73..204146.73 rows=1 width=17) (actual
time=1713.027..1713.027 rows=1 loops=1)
   Buffers: shared hit=152743
   ->  Seq Scan on ccrimes  (cost=0.00..204146.73 rows=5139873
width=17) (actual time=0.013..1457.521 rows=5140001 loops=1)
         Buffers: shared hit=152743
 Total runtime: 1713.053 ms
(5 rows)

Time: 1713.612 ms

When I run the query and capture a profile using "perf" and compare
the two, the thing that stands out is the slot_getsomeattrs call that
dominates the trace in the slow query but not in the faster "SELECT *"
version:

-  39.25%  postgres  postgres           [.] _start
   - _start
      - 99.47% slot_getsomeattrs
           ExecProject
           ExecScan
           ExecProcNode
           ExecLimit
           ExecProcNode
           standard_ExecutorRun
           0x7f4315f7c427
           PortalRun
           PostgresMain
           PostmasterMain
           main
           __libc_start_main
      + 0.53% ExecProject
+  18.82%  postgres  postgres           [.] HeapTupleSatisfiesMVCC
+  12.01%  postgres  postgres           [.] 0xb6353
+   9.47%  postgres  postgres           [.] ExecProject


 The table is defined as:

        Column        |              Type              | Modifiers
----------------------+--------------------------------+-----------
 s_updated_at_0       | timestamp(3) with time zone    |
 s_version_1          | bigint                         |
 s_id_2               | bigint                         |
 s_created_at_3       | timestamp(3) with time zone    |
 id                   | numeric                        |
 case_number          | text                           |
 date                 | timestamp(3) without time zone |
 block                | text                           |
 iucr                 | text                           |
 primary_type         | text                           |
 description          | text                           |
 location_description | text                           |
 arrest               | boolean                        |
 domestic             | boolean                        |
 beat                 | text                           |
 district             | text                           |
 ward                 | numeric                        |
 community_area       | text                           |
 fbi_code             | text                           |
 x_coordinate         | numeric                        |
 y_coordinate         | numeric                        |
 year                 | numeric                        |
 updated_on           | timestamp(3) without time zone |
 latitude             | numeric                        |
 longitude            | numeric                        |
 location_lat         | double precision               |
 location_long        | double precision               |


I've been testing this against Postgres 9.3.5 on Ubuntu 12.04 LTS
running with a 3.2.0 kernel, and get similar results on both raw
hardware and in Azure VMs.  This repros on boxes with no other load.

Any suggestions about what is going on or where to dig further would
be appreciated.  I can make a pgdump of the data I'm using if anyone
is interested.

Thanks.


Re: performance of SELECT * much faster than SELECT with large offset

From
Tom Lane
Date:
Marc Slemko <marcs@znep.com> writes:
> I ran into this oddity lately that goes against everything I thought I
> understood and was wondering if anyone had any insight.

SELECT * avoids a projection step ... see ExecAssignScanProjectionInfo.

            regards, tom lane


Re: performance of SELECT * much faster than SELECT with large offset

From
Marti Raudsepp
Date:
On Fri, Oct 3, 2014 at 5:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marc Slemko <marcs@znep.com> writes:
>> I ran into this oddity lately that goes against everything I thought I
>> understood and was wondering if anyone had any insight.
>
> SELECT * avoids a projection step ... see ExecAssignScanProjectionInfo.

It would be cool if OFFSET could somehow signal the child nodes "don't
bother constructing the actual tuple". Not sure if that could work in
more complex queries. But this is just one of many performance
problems with large OFFSETs.

Of course you can always work around this using a subquery...
select description from (
  select * from ccrimes offset 5140000 limit 1
) subq;

But most of the time it's better to use scalable paging techniques:
http://use-the-index-luke.com/sql/partial-results/fetch-next-page

Regards,
Marti