Re: BUG #12202: json operator ->>with offset - Mailing list pgsql-bugs
From | Matt Freeman |
---|---|
Subject | Re: BUG #12202: json operator ->>with offset |
Date | |
Msg-id | CAHEc+CDpuQKWcr-EiXGdbzGqDZ2zckHKzS8wYpca4huRvqnk0w@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #12202: json operator ->>with offset (Alvaro Herrera <alvherre@2ndquadrant.com>) |
List | pgsql-bugs |
No problem, Using postgresql 9.4 we have a simple contacts table with (id text not null (as pk), blob json) to experiment with porting a couchdb crm database. We will eventually split out to more columns etc, and handle the data more idomatically for a rdbms, but that's besides the point for the time being. There are approximately 100k rows. I am aware that hardcore postgresql performance experts advise against using offset however I can accept a small performance penalty (happy with anything under 100msec) SELECT id FROM couchcontacts OFFSET 10000 LIMIT 10 As expected takes <10ms SELECT blob->>'firstName' FROM couchcontacts LIMIT 10 Also takes < 10ms (presume 10 json decode ops on blob column here) SELECT blob->>'firstName' FROM couchcontacts OFFSET 10000 LIMIT 10 Takes upwards of 10 seconds!! Noted inefficiencies of offset aside why is this presumably causing 10,010 json decode ops? As the projection has no side-effects I don't understand the reason this can't be fast? Is this a limitation of json functionality being relatively new to postgres? and thus unable to determine ->>opereator isnt yielding side-effects? Interesting rewriting the query to this bring it back under 10milliseconds SELECT jsonblob->>'firstName' FROM couchdbcontacts WHERE id IN (SELECT id FROM couchcontacts OFFSET 10000 LIMIT 10) Is there a way to ensure offset doesnt json decode the offsetted records? (i.e. don't execute the select projection) "Limit (cost=3D1680.31..1681.99 rows=3D10 width=3D32) (actual time=3D12634.674..12634.842 rows=3D10 loops=3D1)" " -> Seq Scan on couchcontacts (cost=3D0.00..17186.53 rows=3D102282 width=3D32) (actual time=3D0.088..12629.401 rows=3D10010 loops=3D1)" "Planning time: 0.194 ms" "Execution time: 12634.895 ms" postgresql share|edit|close|delete|flag edited 11 hours ago asked 13 hours ago Matt Freeman - nonuby 615315 That does look like a rough corner of this new feature. Please report it as a bug to Postgresql. I guess the workaround is manual paging, just like you showed yourself. =E2=80=93 Thilo 13 hours ago 1 EXPLAIN ANALYZE please? I'm not totally convinced by the explanation of the discrepancy. Did you profile / perf top / etc to see if your hypothesized explanation fits observed behaviour? Though on second thoughts ... I think that if you request a result set with an offset, PostgreSQL should evaluate expressions in discarded rows unless it can prove they have no side-effects. So maybe it is evaluating the json expressions... and arguably it should be unless it can prove they can't abort the query with an ERROR or change database state. =E2=80=93 Craig Ringer 11 hours ago Given ->> is a built in operator, shouldnt postgresql know this isnt causing side-effects? is there a way to hint it is none side-effect causing?. Updated with explain analyze =E2=80=93 Matt Freeman - nonuby 11 = hours ago @CraigRinger: I just ran a few tests, and I actually found similar effects. If you run select a field, or if you select *, the performance is roughly the same. But when you select foo->>'bar', it becomes slightly slower with empty json data ('{}'), and slower when the field actually exists ('{"bar":0}'), and increasingly slower if you make the json larger. It's basically behaving as if it's unserializing the json for each row when the operator is used. =E2=80=93 Denis 10 hours ago 1 Someone asked (and then removed answer, possibly @Denis?) if a non-indexed-json field behaves the same. There is another non indexed field on the table called version (int not null) SELECT couchcontacts."version" FROM couchcontacts OFFSET 10000 LIMIT 10 returns in 14ms. vs blob->>'field' taking 4158ms - 12,000ms =E2=80=93 Matt Freeman - nonuby 10 hours ago @MattFreeman-nonuby: Yeah, that was me. I added a new answer, after running a few tests. =E2=80=93 Denis 9 hours ago add a comment question eligible for bounty tomorrow 1 Answer activeoldestvotes up vote2down voteaccept I ran a few tests, and I'm seeing similar behaviors. Each of these have immaterial differences in performance: select id ... select indexed_field ... select unindexed_field ... select json_field ... select * ... This one, however, does show a difference in performance: select json_field->>'key' ... When the json_field is null, the performance impact is negligible. When it's empty, it degrades things very slightly. When it's filled in, it degrades noticeably. And when the field is loaded with larger data, it degrades materially. In other words, Postgres seems to want to unserialize the json data for every row it's visiting. (Which is probably a bug, and one that's massively affecting RoR developers seeing how they use json.) Fwiw, I noted that re-arranging the query so it uses a CTE will work around the problem: with data as ( select * from table offset 10000 limit 10 ) select json_field->>'key' from data; (It might get an only-very-slightly better plan than the id IN (...) query that you highlighted.) share|edit|flag edited 9 hours ago answered 10 hours ago Denis 36.7k43575 Thanks for the tests. Just to add we are not using RoR, but Clojure, as many developers are now pushing Postgres as a NoSQL replacement (blogs, conferences, microbenchmarks) Im sure these scenarios will become more common across a multitude of stacks. =E2=80=93 Matt Freeman - nonuby 9 hou= rs ago 1 up voted Yeah. Just in case, please take a minute to report the issue to the pg-performance or the pg-bugs list. My guess is this is an oversight in the code base: the various json functions related to reading are immutable insofar as I can tell, so I fail to see any reason to bother with evaluating them when discarding rows. =E2=80=93 Denis9 hours ago I have reported to pg-bugs and will wait to see what happens (its a moderated list), I'll leave this marked unanswered for a day or two incase for some odd reason it's by design. On Fri, Dec 12, 2014 at 12:34 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > matt@nonuby.com wrote: > > > I am just going to link to the stackoverflow post, as it contains both the > > question and an answer which helps clarify the problem, I fear pasting as-is > > here might be confusing. > > > > http://stackoverflow.com/questions/27415340/postgresql-offset-behavior-with= -json-column?noredirect=3D1#comment43279635_27415340 > > > > tl;dr When using offset, postgres seems to want to unserialize the json data > > for every row it's visiting. > > Please don't tl;dr bug reports. Paste the contents here. We're not in > love with links to external sites. > > -- > =C3=81lvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Matt Freeman
pgsql-bugs by date: