Thread: JSON path

JSON path

From
Jesper Pedersen
Date:
Hi,

We have a table which has a jsonb column in it. Each row contains a lot
of data in that column, so TOASTed.

We have to extract data from that column at different levels, so an
example query could look like

select
   col1,
   col2,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val1."text()"')
as val1,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val2."text()"')
as val2,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val3."text()"')
as val3
from tbl
where
   id = 1;

I tried to rewrite it to

WITH foo AS (select
   id,
   col1,
   col2,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5') as jsondata,
   from tbl )
select
   col1,
   col2,
   jsondata->val1->'text()' as val1,
   jsondata->val2->'text()' as val2,
   jsondata->val3->'text()' as val3
from foo
where
   id = 1;

However, WITH has the same run-time profile - most of the time is spent
in pglz_decompress. Using the -> notation has the same profile.

The more data I extract from the JSON object the slower the query gets.

Of course, if I change the column to EXTERNAL we see a ~3.5 x speedup in
the queries but disk space requirements goes up by too much.

(We need to use a jsonb column as the data is unstructured, and may
differ in structure between rows. Yes, yes, I know...)

PostgreSQL 12.x on RHEL.

If anybody has some good ideas it would be appreciated.

Thanks in advance !

Best regards,
  Jesper




Re: JSON path

From
Tom Lane
Date:
Jesper Pedersen <jesper.pedersen@redhat.com> writes:
> We have a table which has a jsonb column in it. Each row contains a lot
> of data in that column, so TOASTed.
> We have to extract data from that column at different levels, so an
> example query could look like

> select
>    col1,
>    col2,
> jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val1."text()"')
> as val1,
> jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val2."text()"')
> as val2,
> jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5.val3."text()"')
> as val3
> from tbl
> where
>    id = 1;

Right ...

> I tried to rewrite it to

> WITH foo AS (select
>    id,
>    col1,
>    col2,
> jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5') as jsondata,
>    from tbl )
> select
>    col1,
>    col2,
>    jsondata->val1->'text()' as val1,
>    jsondata->val2->'text()' as val2,
>    jsondata->val3->'text()' as val3
> from foo
> where
>    id = 1;

This has got syntax errors, but I get the point.

> However, WITH has the same run-time profile - most of the time is spent
> in pglz_decompress. Using the -> notation has the same profile.

As of v12, that WITH will get flattened, so that you still end up
with three invocations of jsonb_path_query_first, as EXPLAIN VERBOSE
will show you.  You could write "WITH foo AS MATERIALIZED ..." to
prevent that, but then you'll need to stick the WHERE clause inside
the WITH or you'll end up running jsonb_path_query_first for every
row of tbl.

With

explain verbose WITH foo AS materialized (select
   id,
   col1,
   col2,
jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5') as jsondata
   from tbl where id = 1 )
select
   col1,
   col2,
   jsondata->'val1'->'text()' as val1,
   jsondata->'val2'->'text()' as val2,
   jsondata->'val3'->'text()' as val3
from foo;

I get a plan that does what you're looking for:

        QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on foo  (cost=24.14..24.35 rows=6 width=104)
   Output: foo.col1, foo.col2, ((foo.jsondata -> 'val1'::text) -> 'text()'::text), ((foo.jsondata -> 'val2'::text) ->
'text()'::text),((foo.jsondata -> 'val3'::text) -> 'text()'::text) 
   CTE foo
     ->  Seq Scan on public.tbl  (cost=0.00..24.14 rows=6 width=44)
           Output: tbl.id, tbl.col1, tbl.col2, jsonb_path_query_first(tbl.data,
'$."lvl1"."lvl2"."lvl3"."lvl4"."lvl5"'::jsonpath,'{}'::jsonb, false) 
           Filter: (tbl.id = 1)
(6 rows)

            regards, tom lane



Re: JSON path

From
Jesper Pedersen
Date:
Hi,

On 11/14/19 1:04 PM, Tom Lane wrote:
> As of v12, that WITH will get flattened, so that you still end up
> with three invocations of jsonb_path_query_first, as EXPLAIN VERBOSE
> will show you.  You could write "WITH foo AS MATERIALIZED ..." to
> prevent that, but then you'll need to stick the WHERE clause inside
> the WITH or you'll end up running jsonb_path_query_first for every
> row of tbl.
>
> With
>
> explain verbose WITH foo AS materialized (select
>     id,
>     col1,
>     col2,
> jsonb_path_query_first(data,'$.lvl1.lvl2.lvl3.lvl4.lvl5') as jsondata
>     from tbl where id = 1 )
> select
>     col1,
>     col2,
>     jsondata->'val1'->'text()' as val1,
>     jsondata->'val2'->'text()' as val2,
>     jsondata->'val3'->'text()' as val3
> from foo;
>

Thanks Tom ! This works :)

I owe you one.

Best regards,
  Jesper