Re: pgsql: Clean up jsonb code. - Mailing list pgsql-committers

From Peter Geoghegan
Subject Re: pgsql: Clean up jsonb code.
Date
Msg-id CAM3SWZTVWYaMRWpvRfrDXO_U6V6-ciK1eb=TOoX19_o8UaJ-HQ@mail.gmail.com
Whole thread Raw
In response to pgsql: Clean up jsonb code.  (Heikki Linnakangas <heikki.linnakangas@iki.fi>)
List pgsql-committers
Thanks for cleaning this up.

On Wed, May 7, 2014 at 1:18 PM, Heikki Linnakangas
<heikki.linnakangas@iki.fi> wrote:
> The jsonb_exists_any and jsonb_exists_all functions no longer sort the input
> array. That was a premature optimization, the idea being that if there are
> duplicates in the input array, you only need to check them once. Also,
> sorting the array saves some effort in the binary search used to find a key
> within an object. But there were drawbacks too: the sorting and
> deduplicating obviously isn't free, and in the typical case there are no
> duplicates to remove, and the gain in the binary search was minimal. Remove
> all that, which makes the code simpler too.

This is not the reason why the code did that. De-duplication was not
the point at all. findJsonbValueFromSuperHeader()'s lowbound argument
previously served to establish a low bound for searching when
searching for multiple keys (so the second and subsequent
user-supplied key could skip much of the object). In the case of
jsonb_exists_any(), say, if you only have a reasonable expectation
that about 1 key exists, and that happens to be the last key that the
user passed to the text[] argument (to the existence/? operator), then
n - 1 calls to what is now findJsonbValueFromContainer() (which now
does not accept a lowbound) are wasted.  That's elem_count - 1
top-level binary searches of the entire jsonb. Or elem_count such
calls rather than 1 call (plus 1 sort of the supplied array) in the
common case where jsonb_exists_any() will return false.

Granted, that might not be that bad right now, given that it's only
ever (say) elem_count or elem_count - 1 wasted binary searches through
the *top* level, but that might not always be true. And even today,
sorting a presumably much smaller user-passed lookup array once has to
be cheaper than searching through the entire jsonb perhaps elem_count
times per call.

--
Peter Geoghegan


pgsql-committers by date:

Previous
From: Robert Haas
Date:
Subject: pgsql: When a background worker exists with code 0, unregister it.
Next
From: Tom Lane
Date:
Subject: pgsql: Avoid buffer bloat in libpq when server is consistently faster t