Thread: problem with partitioned table and indexed json field

problem with partitioned table and indexed json field

From
Raphael Bauduin
Date:

Hi,

I have a partitioned table events, with one partition for each month, eg events_2013_03. The partition is done on the field timestamp, and constraints are set, but insertion of data is done in the partition directly (so not with a trigger on the events table)
The field event is of type json, and has a field '_id', which I can access:

=> select event->>'_id' from events limit 1;
         ?column?        
--------------------------
 4f9a786f44650105b50aafc9

I created an index on each partition of the table, but not on the events table itself:
create index events_${y}_${m}_event_id_index on events_${y}_${m} ((event->>'_id'));

Querying the max event_id from a partition works fine:
=> select max(event->>'_id') from events_2013_03;
           max           
--------------------------
 5158cdfe4465012cff522b74


However, requesting on the parent table does return the whole json field, and not only the '_id':
=> select max(event->>'_id') from events;
{"_id":"526eb3ad4465013e3e131a43","origin":..... }

An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR:  no tlist entry for key 2

This problem appeared when I created the indexes, and removing the index make the explain work fine, but the plan implies a sequential scan on the tables which is exactly what I wanted to avoid with the indexes.

Does someone have an explanation, and possibly a way to solve this problem?

thanks

Raph


Re: problem with partitioned table and indexed json field

From
Merlin Moncure
Date:
On Thu, Oct 31, 2013 at 5:46 AM, Raphael Bauduin <rblists@gmail.com> wrote:
>
> Hi,
>
> I have a partitioned table events, with one partition for each month, eg
> events_2013_03. The partition is done on the field timestamp, and
> constraints are set, but insertion of data is done in the partition directly
> (so not with a trigger on the events table)
> The field event is of type json, and has a field '_id', which I can access:
>
> => select event->>'_id' from events limit 1;
>          ?column?
> --------------------------
>  4f9a786f44650105b50aafc9
>
> I created an index on each partition of the table, but not on the events
> table itself:
> create index events_${y}_${m}_event_id_index on events_${y}_${m}
> ((event->>'_id'));
>
> Querying the max event_id from a partition works fine:
> => select max(event->>'_id') from events_2013_03;
>            max
> --------------------------
>  5158cdfe4465012cff522b74
>
>
> However, requesting on the parent table does return the whole json field,
> and not only the '_id':
> => select max(event->>'_id') from events;
> {"_id":"526eb3ad4465013e3e131a43","origin":..... }
>
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR:  no tlist entry for key 2
>
> This problem appeared when I created the indexes, and removing the index
> make the explain work fine, but the plan implies a sequential scan on the
> tables which is exactly what I wanted to avoid with the indexes.
>
> Does someone have an explanation, and possibly a way to solve this problem?

wow, that looks like a bug.  Can you post the specific postgres version?

merlin


Re: problem with partitioned table and indexed json field

From
Raphael Bauduin
Date:
It's postgresql 9.3, from the pgdg apt repository:
9.3.0-2.pgdg10.4+1

Raph


On Thu, Oct 31, 2013 at 1:48 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, Oct 31, 2013 at 5:46 AM, Raphael Bauduin <rblists@gmail.com> wrote:
>
> Hi,
>
> I have a partitioned table events, with one partition for each month, eg
> events_2013_03. The partition is done on the field timestamp, and
> constraints are set, but insertion of data is done in the partition directly
> (so not with a trigger on the events table)
> The field event is of type json, and has a field '_id', which I can access:
>
> => select event->>'_id' from events limit 1;
>          ?column?
> --------------------------
>  4f9a786f44650105b50aafc9
>
> I created an index on each partition of the table, but not on the events
> table itself:
> create index events_${y}_${m}_event_id_index on events_${y}_${m}
> ((event->>'_id'));
>
> Querying the max event_id from a partition works fine:
> => select max(event->>'_id') from events_2013_03;
>            max
> --------------------------
>  5158cdfe4465012cff522b74
>
>
> However, requesting on the parent table does return the whole json field,
> and not only the '_id':
> => select max(event->>'_id') from events;
> {"_id":"526eb3ad4465013e3e131a43","origin":..... }
>
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR:  no tlist entry for key 2
>
> This problem appeared when I created the indexes, and removing the index
> make the explain work fine, but the plan implies a sequential scan on the
> tables which is exactly what I wanted to avoid with the indexes.
>
> Does someone have an explanation, and possibly a way to solve this problem?

wow, that looks like a bug.  Can you post the specific postgres version?

merlin



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

Re: problem with partitioned table and indexed json field

From
Tom Lane
Date:
Raphael Bauduin <rblists@gmail.com> writes:
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR:  no tlist entry for key 2

This is certainly a bug.  Can we see a self-contained example that
triggers that?

            regards, tom lane


Re: problem with partitioned table and indexed json field

From
Raphael Bauduin
Date:
I'll look at providing such an example later this week.

Raph


On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR:  no tlist entry for key 2

This is certainly a bug.  Can we see a self-contained example that
triggers that?

                        regards, tom lane



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

Re: problem with partitioned table and indexed json field

From
Raphael Bauduin
Date:
Hi,

I have narrowed it a bit. It happens when I create said index on an empty field. Here's the scenario to reproduce it:

Let me know if you need more info

Cheers

Raph


create table events(id SERIAL,
                    timestamp timestamp,
                    event json);


create table events_2012_01( CHECK (timestamp>='2012-01-01' and timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22 08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22 08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');


-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK

--create index
create index events_2012_03_event_id_index on events_2012_03 ((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANG

drop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OK



On Mon, Nov 4, 2013 at 8:39 AM, Raphael Bauduin <rblists@gmail.com> wrote:
I'll look at providing such an example later this week.

Raph


On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR:  no tlist entry for key 2

This is certainly a bug.  Can we see a self-contained example that
triggers that?

                        regards, tom lane



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

Re: problem with partitioned table and indexed json field

From
Raphael Bauduin
Date:
Correction: It happens when I create said index on an empty *table*.

Raph


On Thu, Nov 7, 2013 at 11:10 AM, Raphael Bauduin <rblists@gmail.com> wrote:
Hi,

I have narrowed it a bit. It happens when I create said index on an empty field. Here's the scenario to reproduce it:

Let me know if you need more info

Cheers

Raph


create table events(id SERIAL,
                    timestamp timestamp,
                    event json);


create table events_2012_01( CHECK (timestamp>='2012-01-01' and timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22 08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22 08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');


-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK

--create index
create index events_2012_03_event_id_index on events_2012_03 ((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANG

drop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OK



On Mon, Nov 4, 2013 at 8:39 AM, Raphael Bauduin <rblists@gmail.com> wrote:
I'll look at providing such an example later this week.

Raph


On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR:  no tlist entry for key 2

This is certainly a bug.  Can we see a self-contained example that
triggers that?

                        regards, tom lane



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

Re: problem with partitioned table and indexed json field

From
Tom Lane
Date:
Raphael Bauduin <rblists@gmail.com> writes:
> I have narrowed it a bit. It happens when I create said index on an empty
> field. Here's the scenario to reproduce it:

Thanks, I've reproduced the problem here.  The query still seems to run OK,
it's just EXPLAIN that's falling over --- do you see the same?

            regards, tom lane


Re: problem with partitioned table and indexed json field

From
Raphael Bauduin
Date:
The query is also problematic here, because it returns the full json, and not only the data I selected in the json.
Below, it should return only '_id', and not the whole json stored in event:

test3=> select max(event->>'_id') from events where event is not null;
                      max                      
------------------------------------------------
 {"_id":"5f93c3a044650105b5074c9a","type":"t2"}

Thanks

raph



On Thu, Nov 7, 2013 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:
> I have narrowed it a bit. It happens when I create said index on an empty
> field. Here's the scenario to reproduce it:

Thanks, I've reproduced the problem here.  The query still seems to run OK,
it's just EXPLAIN that's falling over --- do you see the same?

                        regards, tom lane



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

Re: problem with partitioned table and indexed json field

From
Tom Lane
Date:
Raphael Bauduin <rblists@gmail.com> writes:
> The query is also problematic here, because it returns the full json, and
> not only the data I selected in the json.

Doh, right, you mentioned that in the original bug report, and now that
I'm paying a bit more attention I see it too.  I was looking for
some sort of error from running the query, not just wrong data.

It looks like the problem is we're building a MergeAppend plan and not
getting the targetlist for the MergeAppend node right.  I hacked EXPLAIN
very quickly to not fall over when it fails to find a sort key in the
node's targetlist, and here's what I see:

regression=# explain verbose select max(event->>'_id') from events where event is not null;
                                                                 QUERY PLAN
                     

---------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=58.75..58.76 rows=1 width=0)
   Output: $0
   InitPlan 1 (returns $0)
     ->  Limit  (cost=58.70..58.75 rows=1 width=32)
           Output: events.event
           ->  Merge Append  (cost=58.70..200.88 rows=3268 width=32)
                 Sort Key: [no tlist entry for key 2]
                 ->  Sort  (cost=0.01..0.02 rows=1 width=32)
                       Output: events.event, ((events.event ->> '_id'::text))
                       Sort Key: ((events.event ->> '_id'::text))
                       ->  Seq Scan on public.events  (cost=0.00..0.00 rows=1 width=32)
                             Output: events.event, (events.event ->> '_id'::text)
                             Filter: ((events.event IS NOT NULL) AND ((events.event ->> '_id'::text) IS NOT NULL))
                 ->  Sort  (cost=29.20..31.92 rows=1089 width=32)
                       Output: events_2012_01.event, ((events_2012_01.event ->> '_id'::text))
                       Sort Key: ((events_2012_01.event ->> '_id'::text))
                       ->  Seq Scan on public.events_2012_01  (cost=0.00..23.75 rows=1089 width=32)
                             Output: events_2012_01.event, (events_2012_01.event ->> '_id'::text)
                             Filter: ((events_2012_01.event IS NOT NULL) AND ((events_2012_01.event ->> '_id'::text) IS
NOTNULL)) 
                 ->  Sort  (cost=29.20..31.92 rows=1089 width=32)
                       Output: events_2012_02.event, ((events_2012_02.event ->> '_id'::text))
                       Sort Key: ((events_2012_02.event ->> '_id'::text))
                       ->  Seq Scan on public.events_2012_02  (cost=0.00..23.75 rows=1089 width=32)
                             Output: events_2012_02.event, (events_2012_02.event ->> '_id'::text)
                             Filter: ((events_2012_02.event IS NOT NULL) AND ((events_2012_02.event ->> '_id'::text) IS
NOTNULL)) 
                 ->  Index Scan Backward using events_2012_03_event_id_index on public.events_2012_03
(cost=0.15..63.30rows=1089 width=32) 
                       Output: events_2012_03.event, (events_2012_03.event ->> '_id'::text)
                       Index Cond: ((events_2012_03.event ->> '_id'::text) IS NOT NULL)
                       Filter: (events_2012_03.event IS NOT NULL)
(29 rows)

So everything looks right for the individual table-scan subplans, but
something's going badly wrong when making the MergeAppend ...
dunno what yet.

            regards, tom lane


Re: problem with partitioned table and indexed json field

From
Tom Lane
Date:
I wrote:
> It looks like the problem is we're building a MergeAppend plan and not
> getting the targetlist for the MergeAppend node right.

Found it --- simple oversight in building optimized min/max plans.
If you need a patch now, see
http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=5d0731da521f090f80ea39529fe274ac6d6bffa1

            regards, tom lane


Re: problem with partitioned table and indexed json field

From
Raphael Bauduin
Date:



On Thu, Nov 7, 2013 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> It looks like the problem is we're building a MergeAppend plan and not
> getting the targetlist for the MergeAppend node right.

Found it --- simple oversight in building optimized min/max plans.
If you need a patch now, see
http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=5d0731da521f090f80ea39529fe274ac6d6bffa1


Wow, the patch is available thes same day I supplied the steps to reproduce the bug! I don't think it's possible to be faster :-)
Thanks a alot!

Raph
 

                        regards, tom lane



--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org