Re: BUG #12202: json operator ->>with offset - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #12202: json operator ->>with offset
Date
Msg-id 20141211200657.GO1768@alvh.no-ip.org
Whole thread Raw
In response to BUG #12202: json operator ->>with offset  (matt@nonuby.com)
List pgsql-bugs
Matt Freeman wrote:

> SELECT blob->>'firstName' FROM couchcontacts OFFSET 10000 LIMIT 10
>
> Takes upwards of 10 seconds!! Noted inefficiencies of offset aside why is
> this presumably causing 10,010 json decode ops? As the projection has no
> side-effects I don't understand the reason this can't be fast?

With respect to the tuples emitted, an Offset node runs later than
whatever is emitting the tuples in the first place.  So the expressions
in the target list (blob->>'firstName' in this example) are executed for
all tuples, not just the ones past the 10000 offset.  I guess we could
have an optimization to skip evaluating expressions for tuples that are
not going to be emitted, if none of them are volatile (bonus points if
only the volatile ones are evaluated, I guess).  I don't think we have
that today; it certainly doesn't sound trivial.

This is not a JSON thing: you would probably see the same behavior with
expensive expressions involving any other type.

As a workaround you could grab the rows you want in a subselect in FROM,
and evaluate the expression in the outer query:

SELECT blob->>'firstName' FROM
    (SELECT blob
       FROM couchcontacts
   ORDER BY something
   OFFSET 10000 LIMIT 10) f;

The OFFSET in the subquery prevents the optimizer from flattening this
into a simple scan.

Note that if you don't have an ORDER BY clause, there is no guarantee
about which rows are going to be returned; if you have many of these
queries running concurrently, two executions in the same session could
get completely disparate result sets (i.e. if you first ask for OFFSET
9990 and then OFFSET 10000, you might not get consecutive rows as you
probably expect), for example because of synchronized seqscans.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #12203: No password dialog when access the postgresql server
Next
From: Andrew Sackville-West
Date:
Subject: Re: regression, deadlock in high frequency single-row UPDATE