Thread: problem with partitioned table and indexed json field
Hi,
I have a partitioned table events, with one partition for each month, eg events_2013_03. The partition is done on the field timestamp, and constraints are set, but insertion of data is done in the partition directly (so not with a trigger on the events table)
The field event is of type json, and has a field '_id', which I can access:
=> select event->>'_id' from events limit 1;
?column?
--------------------------
4f9a786f44650105b50aafc9
I created an index on each partition of the table, but not on the events table itself:
create index events_${y}_${m}_event_id_index on events_${y}_${m} ((event->>'_id'));
Querying the max event_id from a partition works fine:
=> select max(event->>'_id') from events_2013_03;
max
--------------------------
5158cdfe4465012cff522b74
However, requesting on the parent table does return the whole json field, and not only the '_id':
=> select max(event->>'_id') from events;
{"_id":"526eb3ad4465013e3e131a43","origin":..... }
An explain returns an error:
=> explain select max(event->>'_id') from events;
ERROR: no tlist entry for key 2
This problem appeared when I created the indexes, and removing the index make the explain work fine, but the plan implies a sequential scan on the tables which is exactly what I wanted to avoid with the indexes.
Does someone have an explanation, and possibly a way to solve this problem?
thanks
Raph
thanks
Raph
On Thu, Oct 31, 2013 at 5:46 AM, Raphael Bauduin <rblists@gmail.com> wrote: > > Hi, > > I have a partitioned table events, with one partition for each month, eg > events_2013_03. The partition is done on the field timestamp, and > constraints are set, but insertion of data is done in the partition directly > (so not with a trigger on the events table) > The field event is of type json, and has a field '_id', which I can access: > > => select event->>'_id' from events limit 1; > ?column? > -------------------------- > 4f9a786f44650105b50aafc9 > > I created an index on each partition of the table, but not on the events > table itself: > create index events_${y}_${m}_event_id_index on events_${y}_${m} > ((event->>'_id')); > > Querying the max event_id from a partition works fine: > => select max(event->>'_id') from events_2013_03; > max > -------------------------- > 5158cdfe4465012cff522b74 > > > However, requesting on the parent table does return the whole json field, > and not only the '_id': > => select max(event->>'_id') from events; > {"_id":"526eb3ad4465013e3e131a43","origin":..... } > > An explain returns an error: > => explain select max(event->>'_id') from events; > ERROR: no tlist entry for key 2 > > This problem appeared when I created the indexes, and removing the index > make the explain work fine, but the plan implies a sequential scan on the > tables which is exactly what I wanted to avoid with the indexes. > > Does someone have an explanation, and possibly a way to solve this problem? wow, that looks like a bug. Can you post the specific postgres version? merlin
It's postgresql 9.3, from the pgdg apt repository:
9.3.0-2.pgdg10.4+1
Raph
9.3.0-2.pgdg10.4+1
Raph
On Thu, Oct 31, 2013 at 1:48 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
wow, that looks like a bug. Can you post the specific postgres version?On Thu, Oct 31, 2013 at 5:46 AM, Raphael Bauduin <rblists@gmail.com> wrote:
>
> Hi,
>
> I have a partitioned table events, with one partition for each month, eg
> events_2013_03. The partition is done on the field timestamp, and
> constraints are set, but insertion of data is done in the partition directly
> (so not with a trigger on the events table)
> The field event is of type json, and has a field '_id', which I can access:
>
> => select event->>'_id' from events limit 1;
> ?column?
> --------------------------
> 4f9a786f44650105b50aafc9
>
> I created an index on each partition of the table, but not on the events
> table itself:
> create index events_${y}_${m}_event_id_index on events_${y}_${m}
> ((event->>'_id'));
>
> Querying the max event_id from a partition works fine:
> => select max(event->>'_id') from events_2013_03;
> max
> --------------------------
> 5158cdfe4465012cff522b74
>
>
> However, requesting on the parent table does return the whole json field,
> and not only the '_id':
> => select max(event->>'_id') from events;
> {"_id":"526eb3ad4465013e3e131a43","origin":..... }
>
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR: no tlist entry for key 2
>
> This problem appeared when I created the indexes, and removing the index
> make the explain work fine, but the plan implies a sequential scan on the
> tables which is exactly what I wanted to avoid with the indexes.
>
> Does someone have an explanation, and possibly a way to solve this problem?
merlin
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Raphael Bauduin <rblists@gmail.com> writes: > An explain returns an error: > => explain select max(event->>'_id') from events; > ERROR: no tlist entry for key 2 This is certainly a bug. Can we see a self-contained example that triggers that? regards, tom lane
I'll look at providing such an example later this week.
Raph
Raph
On Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:This is certainly a bug. Can we see a self-contained example that
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR: no tlist entry for key 2
triggers that?
regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Hi,
I have narrowed it a bit. It happens when I create said index on an empty field. Here's the scenario to reproduce it:
Let me know if you need more info
Cheers
Raph
create table events(id SERIAL,
timestamp timestamp,
event json);
create table events_2012_01( CHECK (timestamp>='2012-01-01' and timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22 08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22 08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');
-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK
--create index
create index events_2012_03_event_id_index on events_2012_03 ((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANG
drop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OK
I have narrowed it a bit. It happens when I create said index on an empty field. Here's the scenario to reproduce it:
Let me know if you need more info
Cheers
Raph
create table events(id SERIAL,
timestamp timestamp,
event json);
create table events_2012_01( CHECK (timestamp>='2012-01-01' and timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22 08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22 08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');
-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK
--create index
create index events_2012_03_event_id_index on events_2012_03 ((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANG
drop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OK
On Mon, Nov 4, 2013 at 8:39 AM, Raphael Bauduin <rblists@gmail.com> wrote:
I'll look at providing such an example later this week.
RaphOn Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Raphael Bauduin <rblists@gmail.com> writes:This is certainly a bug. Can we see a self-contained example that
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR: no tlist entry for key 2
triggers that?
regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Correction: It happens when I create said index on an empty *table*.
Raph
Raph
On Thu, Nov 7, 2013 at 11:10 AM, Raphael Bauduin <rblists@gmail.com> wrote:
Hi,
I have narrowed it a bit. It happens when I create said index on an empty field. Here's the scenario to reproduce it:
Let me know if you need more info
Cheers
Raph
create table events(id SERIAL,
timestamp timestamp,
event json);
create table events_2012_01( CHECK (timestamp>='2012-01-01' and timestamp<'2012-2-01' )) inherits (events) ;
create table events_2012_02( CHECK (timestamp>='2012-02-01' and timestamp<'2012-3-01' )) inherits (events) ;
insert into events_2012_01 (timestamp, event) values ('2012-01-22 08:38:56', '{"_id":"4f93c3a044650105b5074c9a","type":"t1"}');
insert into events_2012_02 (timestamp, event) values ('2012-02-22 08:38:56', '{"_id":"5f93c3a044650105b5074c9a","type":"t2"}');
-- create empty table
create table events_2012_03( CHECK (timestamp>='2012-03-01' and timestamp<'2012-4-01' )) inherits (events) ;
explain select max(event->>'_id') from events where event is not null;
--OK
--create index
create index events_2012_03_event_id_index on events_2012_03 ((event->>'_id'));
explain select max(event->>'_id') from events where event is not null;
--BANG
drop index events_2012_03_event_id_index;
explain select max(event->>'_id') from events where event is not null;
--OKOn Mon, Nov 4, 2013 at 8:39 AM, Raphael Bauduin <rblists@gmail.com> wrote:I'll look at providing such an example later this week.
RaphOn Thu, Oct 31, 2013 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Raphael Bauduin <rblists@gmail.com> writes:This is certainly a bug. Can we see a self-contained example that
> An explain returns an error:
> => explain select max(event->>'_id') from events;
> ERROR: no tlist entry for key 2
triggers that?
regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Raphael Bauduin <rblists@gmail.com> writes: > I have narrowed it a bit. It happens when I create said index on an empty > field. Here's the scenario to reproduce it: Thanks, I've reproduced the problem here. The query still seems to run OK, it's just EXPLAIN that's falling over --- do you see the same? regards, tom lane
The query is also problematic here, because it returns the full json, and not only the data I selected in the json.
Below, it should return only '_id', and not the whole json stored in event:
test3=> select max(event->>'_id') from events where event is not null;
max
------------------------------------------------
{"_id":"5f93c3a044650105b5074c9a","type":"t2"}
Thanks
raph
Below, it should return only '_id', and not the whole json stored in event:
test3=> select max(event->>'_id') from events where event is not null;
max
------------------------------------------------
{"_id":"5f93c3a044650105b5074c9a","type":"t2"}
Thanks
raph
On Thu, Nov 7, 2013 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Raphael Bauduin <rblists@gmail.com> writes:Thanks, I've reproduced the problem here. The query still seems to run OK,
> I have narrowed it a bit. It happens when I create said index on an empty
> field. Here's the scenario to reproduce it:
it's just EXPLAIN that's falling over --- do you see the same?
regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org
Raphael Bauduin <rblists@gmail.com> writes: > The query is also problematic here, because it returns the full json, and > not only the data I selected in the json. Doh, right, you mentioned that in the original bug report, and now that I'm paying a bit more attention I see it too. I was looking for some sort of error from running the query, not just wrong data. It looks like the problem is we're building a MergeAppend plan and not getting the targetlist for the MergeAppend node right. I hacked EXPLAIN very quickly to not fall over when it fails to find a sort key in the node's targetlist, and here's what I see: regression=# explain verbose select max(event->>'_id') from events where event is not null; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=58.75..58.76 rows=1 width=0) Output: $0 InitPlan 1 (returns $0) -> Limit (cost=58.70..58.75 rows=1 width=32) Output: events.event -> Merge Append (cost=58.70..200.88 rows=3268 width=32) Sort Key: [no tlist entry for key 2] -> Sort (cost=0.01..0.02 rows=1 width=32) Output: events.event, ((events.event ->> '_id'::text)) Sort Key: ((events.event ->> '_id'::text)) -> Seq Scan on public.events (cost=0.00..0.00 rows=1 width=32) Output: events.event, (events.event ->> '_id'::text) Filter: ((events.event IS NOT NULL) AND ((events.event ->> '_id'::text) IS NOT NULL)) -> Sort (cost=29.20..31.92 rows=1089 width=32) Output: events_2012_01.event, ((events_2012_01.event ->> '_id'::text)) Sort Key: ((events_2012_01.event ->> '_id'::text)) -> Seq Scan on public.events_2012_01 (cost=0.00..23.75 rows=1089 width=32) Output: events_2012_01.event, (events_2012_01.event ->> '_id'::text) Filter: ((events_2012_01.event IS NOT NULL) AND ((events_2012_01.event ->> '_id'::text) IS NOTNULL)) -> Sort (cost=29.20..31.92 rows=1089 width=32) Output: events_2012_02.event, ((events_2012_02.event ->> '_id'::text)) Sort Key: ((events_2012_02.event ->> '_id'::text)) -> Seq Scan on public.events_2012_02 (cost=0.00..23.75 rows=1089 width=32) Output: events_2012_02.event, (events_2012_02.event ->> '_id'::text) Filter: ((events_2012_02.event IS NOT NULL) AND ((events_2012_02.event ->> '_id'::text) IS NOTNULL)) -> Index Scan Backward using events_2012_03_event_id_index on public.events_2012_03 (cost=0.15..63.30rows=1089 width=32) Output: events_2012_03.event, (events_2012_03.event ->> '_id'::text) Index Cond: ((events_2012_03.event ->> '_id'::text) IS NOT NULL) Filter: (events_2012_03.event IS NOT NULL) (29 rows) So everything looks right for the individual table-scan subplans, but something's going badly wrong when making the MergeAppend ... dunno what yet. regards, tom lane
I wrote: > It looks like the problem is we're building a MergeAppend plan and not > getting the targetlist for the MergeAppend node right. Found it --- simple oversight in building optimized min/max plans. If you need a patch now, see http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=5d0731da521f090f80ea39529fe274ac6d6bffa1 regards, tom lane
On Thu, Nov 7, 2013 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Wow, the patch is available thes same day I supplied the steps to reproduce the bug! I don't think it's possible to be faster :-)
Thanks a alot!
Raph
I wrote:Found it --- simple oversight in building optimized min/max plans.
> It looks like the problem is we're building a MergeAppend plan and not
> getting the targetlist for the MergeAppend node right.
If you need a patch now, see
http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=5d0731da521f090f80ea39529fe274ac6d6bffa1
Wow, the patch is available thes same day I supplied the steps to reproduce the bug! I don't think it's possible to be faster :-)
Thanks a alot!
Raph
regards, tom lane
--
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org