Thread: json datatype and table bloat?

json datatype and table bloat?

From
Gregory Haase
Date:
I have a table that is triggering my nagios database bloat alert regularly. Usually, I have to give it the vacuum full more than once to get it under the threshold. Today I tried repeatedly and cannot get the alert to resolve.

I had a discussion with one of the primary developers about how the table is utilized, and it turns out they are basically only ever inserting into it. This sort of flies in the face of conventional wisdom about bloat being caused by frequent updates and deletes.

We were looking at it, and one of the things that struck me is that this table has a column with a json datatype. I looked through information_schema.columns and there is only one other table with a json datatype, and I recall having bloat issues with this table in the past as well.

I'm wondering if the json datatype is just naturally more bloated than other types, or if the query in the check_postgresql.pl nagios script is not accurate, or if maybe my thresholds are simply too low?

The table design itself is pretty simple:
 id         | integer                     | not null default nextval('table_schema.table_name_id_seq'::regclass)
 type       | character varying(255)      | 
 criteria   | json                        | 
 created_at | timestamp without time zone | not null
 updated_at | timestamp without time zone | not null
Indexes:
    "table_name_pkey" PRIMARY KEY, btree (id)


The nagios output looks like this (sanitized):
POSTGRES_BLOAT WARNING: DB "db_name" (host:host.domain.com) (db db_name) table table_schema.table_name rows:29305 pages:1733 shouldbe:330 (5.3X) wasted size:11493376 (10 MB) 

Thoughts?

Greg Haase

Re: json datatype and table bloat?

From
Chris Travers
Date:



On Mon, Oct 28, 2013 at 4:17 PM, Gregory Haase <haaseg@onefreevoice.com> wrote:
I have a table that is triggering my nagios database bloat alert regularly. Usually, I have to give it the vacuum full more than once to get it under the threshold. Today I tried repeatedly and cannot get the alert to resolve.

I had a discussion with one of the primary developers about how the table is utilized, and it turns out they are basically only ever inserting into it. This sort of flies in the face of conventional wisdom about bloat being caused by frequent updates and deletes.

As I understand it, vacuuming only removes the tuples removed by delete and update operations. 

The question is how this is being inserted and if there is anything that ever updates the rows in any way.  Maybe this is an manual process?  But for 5x bloat, you have to have it be a repeated process.

Maybe there was bad data that had to be corrected? 

We were looking at it, and one of the things that struck me is that this table has a column with a json datatype. I looked through information_schema.columns and there is only one other table with a json datatype, and I recall having bloat issues with this table in the past as well.

I'm wondering if the json datatype is just naturally more bloated than other types, or if the query in the check_postgresql.pl nagios script is not accurate, or if maybe my thresholds are simply too low?

Next time this happens it would be worth seeing what VACUUM FULL VERBOSE output is for that table.
 
--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: json datatype and table bloat?

From
Merlin Moncure
Date:
On Tue, Oct 29, 2013 at 5:38 AM, Chris Travers <chris.travers@gmail.com> wrote:
>
>
>
> On Mon, Oct 28, 2013 at 4:17 PM, Gregory Haase <haaseg@onefreevoice.com>
> wrote:
>>
>> I have a table that is triggering my nagios database bloat alert
>> regularly. Usually, I have to give it the vacuum full more than once to get
>> it under the threshold. Today I tried repeatedly and cannot get the alert to
>> resolve.
>>
>> I had a discussion with one of the primary developers about how the table
>> is utilized, and it turns out they are basically only ever inserting into
>> it. This sort of flies in the face of conventional wisdom about bloat being
>> caused by frequent updates and deletes.
>
>
> As I understand it, vacuuming only removes the tuples removed by delete and
> update operations.

well, or by rolled back transactions.  we we have to wonder if OP has
a lot of queries trying to insert and failing.  maybe check the log?

merlin


Re: json datatype and table bloat?

From
Gregory Haase
Date:
Following up.

I don't see any rolled back transactions in the logs. 

The part that troubles me really is that vacuum full doesn't actually fix the problem. If there were bad data that had been corrected via mass updates, I'd expect the bloat issue to be fixed by a vacuum full. 

When I run the vacuum back to back, this is what I get:

db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO:  vacuuming "table_schema.table_name"
INFO:  "table_name": found 2 removable, 29663 nonremovable row versions in 1754 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.
VACUUM
db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO:  vacuuming "table_schema.table_name"
INFO:  "table_name": found 0 removable, 29663 nonremovable row versions in 1754 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 0.09s/0.09u sec elapsed 0.32 sec.
VACUUM

I think the question to address may be: "Why does the check_postgres query think there should only be 334 pages instead of 1754?"

The tbloat and wastedbytes calculations provided in the query from this page: http://wiki.postgresql.org/wiki/Show_database_bloat seems to correlate with the wasted bytes reported by nagios though.

Greg Haase




On Tue, Oct 29, 2013 at 7:06 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Tue, Oct 29, 2013 at 5:38 AM, Chris Travers <chris.travers@gmail.com> wrote:
>
>
>
> On Mon, Oct 28, 2013 at 4:17 PM, Gregory Haase <haaseg@onefreevoice.com>
> wrote:
>>
>> I have a table that is triggering my nagios database bloat alert
>> regularly. Usually, I have to give it the vacuum full more than once to get
>> it under the threshold. Today I tried repeatedly and cannot get the alert to
>> resolve.
>>
>> I had a discussion with one of the primary developers about how the table
>> is utilized, and it turns out they are basically only ever inserting into
>> it. This sort of flies in the face of conventional wisdom about bloat being
>> caused by frequent updates and deletes.
>
>
> As I understand it, vacuuming only removes the tuples removed by delete and
> update operations.

well, or by rolled back transactions.  we we have to wonder if OP has
a lot of queries trying to insert and failing.  maybe check the log?

merlin

Re: json datatype and table bloat?

From
John R Pierce
Date:
On 10/29/2013 12:41 PM, Gregory Haase wrote:
db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO:  vacuuming "table_schema.table_name"
INFO:  "table_name": found 2 removable, 29663 nonremovable row versions in 1754 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.

is there an old transaction pending?   that 'masks' vacuum from touching any tuples newer than the start of that transaction.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: json datatype and table bloat?

From
Gregory Haase
Date:
So, between yesterday and today we actually failed over to our hot-standby instance and the issue hasn't changed. I don't think you can have a pending transaction across streaming replication.


On Tue, Oct 29, 2013 at 12:49 PM, John R Pierce <pierce@hogranch.com> wrote:
On 10/29/2013 12:41 PM, Gregory Haase wrote:
db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO:  vacuuming "table_schema.table_name"
INFO:  "table_name": found 2 removable, 29663 nonremovable row versions in 1754 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.

is there an old transaction pending?   that 'masks' vacuum from touching any tuples newer than the start of that transaction.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: json datatype and table bloat?

From
Gregory Haase
Date:
One more thing I just tried:

create table table_schema.table_name_new (like table_schema.table_name);
insert into table_schema.table_name_new select * from table_schema.table_name;

The new tables shows the same amount of wasted bytes and pages as the old.

So I think based on that I'm going to throw out any notion of updates or deletes as cause for bloat on this particular table.

-G


On Tue, Oct 29, 2013 at 12:53 PM, Gregory Haase <haaseg@onefreevoice.com> wrote:
So, between yesterday and today we actually failed over to our hot-standby instance and the issue hasn't changed. I don't think you can have a pending transaction across streaming replication.


On Tue, Oct 29, 2013 at 12:49 PM, John R Pierce <pierce@hogranch.com> wrote:
On 10/29/2013 12:41 PM, Gregory Haase wrote:
db_name=# VACUUM FULL VERBOSE table_schema.table_name;
INFO:  vacuuming "table_schema.table_name"
INFO:  "table_name": found 2 removable, 29663 nonremovable row versions in 1754 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 0.07s/0.10u sec elapsed 0.30 sec.

is there an old transaction pending?   that 'masks' vacuum from touching any tuples newer than the start of that transaction.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast


Re: json datatype and table bloat?

From
Tom Lane
Date:
John R Pierce <pierce@hogranch.com> writes:
> On 10/29/2013 12:41 PM, Gregory Haase wrote:
>> db_name=# VACUUM FULL VERBOSE table_schema.table_name;
>> INFO:  vacuuming "table_schema.table_name"
>> INFO:  "table_name": found 2 removable, 29663 nonremovable row
>> versions in 1754 pages
>> DETAIL:  0 dead row versions cannot be removed yet.
>> CPU 0.07s/0.10u sec elapsed 0.30 sec.

> is there an old transaction pending?   that 'masks' vacuum from touching
> any tuples newer than the start of that transaction.

If old transactions were the problem, vacuum would be reporting that
some-large-number of dead row versions couldn't be removed yet.

There doesn't seem to be anything obviously wrong here.

            regards, tom lane


Re: json datatype and table bloat?

From
Gregory Haase
Date:
I've isolated the problem to the json field not showing up in pg_stats, which affects the calculation of the avg row size in the bloat query.

I'm not sure if this is a json issue or some other kind of issue.

db_name=# select c.column_name, c.data_type from information_schema.columns c where table_name = 'table_name' and not exists (select 1 from pg_stats s where c.table_name = s.tablename and c.column_name = s.attname);
 column_name | data_type 
-------------+-----------
 criteria    | json
(1 row)

-G


On Tue, Oct 29, 2013 at 1:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
John R Pierce <pierce@hogranch.com> writes:
> On 10/29/2013 12:41 PM, Gregory Haase wrote:
>> db_name=# VACUUM FULL VERBOSE table_schema.table_name;
>> INFO:  vacuuming "table_schema.table_name"
>> INFO:  "table_name": found 2 removable, 29663 nonremovable row
>> versions in 1754 pages
>> DETAIL:  0 dead row versions cannot be removed yet.
>> CPU 0.07s/0.10u sec elapsed 0.30 sec.

> is there an old transaction pending?   that 'masks' vacuum from touching
> any tuples newer than the start of that transaction.

If old transactions were the problem, vacuum would be reporting that
some-large-number of dead row versions couldn't be removed yet.

There doesn't seem to be anything obviously wrong here.

                        regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: json datatype and table bloat?

From
Tom Lane
Date:
Gregory Haase <haaseg@onefreevoice.com> writes:
> I've isolated the problem to the json field not showing up in pg_stats,
> which affects the calculation of the avg row size in the bloat query.

> I'm not sure if this is a json issue or some other kind of issue.

Possibly your "bloat query" is failing to consider the toast table
associated with this table?  If the json values are large they'd
mostly be in the toast table not the main table.

(It's unfortunate that VACUUM FULL doesn't tell you about what's
in the toast table.  I'd try just VACUUM VERBOSE here, without the
FULL, to get more info.)

            regards, tom lane


Re: json datatype and table bloat?

From
Gregory Haase
Date:
Tom is correct: Vacuum verbose shows that their is an associated toast table. Neither the check_postgres.pl script or the query on http://wiki.postgresql.org/wiki/Show_database_bloat appear to take this into consideration. Both rely on null_frac and avg_width from pg_stats to estimate how big the table should be. I'm not sure how you would factor the toast table into that estimate.

-G


On Tue, Oct 29, 2013 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gregory Haase <haaseg@onefreevoice.com> writes:
> I've isolated the problem to the json field not showing up in pg_stats,
> which affects the calculation of the avg row size in the bloat query.

> I'm not sure if this is a json issue or some other kind of issue.

Possibly your "bloat query" is failing to consider the toast table
associated with this table?  If the json values are large they'd
mostly be in the toast table not the main table.

(It's unfortunate that VACUUM FULL doesn't tell you about what's
in the toast table.  I'd try just VACUUM VERBOSE here, without the
FULL, to get more info.)

                        regards, tom lane

Re: json datatype and table bloat?

From
Gregory Haase
Date:
I spent some more time on this today, and I realized that the issue isn't that there are records in the toast table. The issue is that there are NO records in the toast table. Apparently, all the json we are inserting are too small to get toasted.

I setup a separate benchmark locally:
create table test_json_stats
(
    test_json_stats_id serial,
    json_data json,
    insert_timestamp timestamp default now() not null
);

created a file called "json bench" with the following:
BEGIN;
insert into test_json_stats (json_data) values ('{"counters": [ { "first":"1","second":"2"}, { "first":"3","second":"4"}, { "first":"5","second":"6"}, { "first":"7","second":"8"}, { "first":"9","second":"10"}, { "first":"11","second":"12"}, { "first":"13","second":"14"}, { "first":"15","second":"16"}, { "first":"17","second":"18"}, { "first":"19","second":"20"}, { "first":"21","second":"22"}, { "first":"23","second":"24"}, { "first":"25","second":"26"}, { "first":"27","second":"28"}, { "first":"29","second":"30"}, { "first":"31","second":"32"}, { "first":"33","second":"34"}, { "first":"35","second":"36"}, { "first":"37","second":"38"}, { "first":"39","second":"40"}, { "first":"41","second":"42"}, { "first":"43","second":"44"}, { "first":"45","second":"46"} ] }');
END;

Then ran pgbench: 
pgbench -c 5 -t 2000000 -f json_bench greg

vacuum vebose shows the test_json_stats table has over a million pages and the toast table exists with zero pages:
INFO:  vacuuming "public.test_json_stats"
INFO:  "test_json_stats": found 0 removable, 0 nonremovable row versions in 0 out of 1010011 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO:  vacuuming "pg_toast.pg_toast_51822"
INFO:  index "pg_toast_51822_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_51822": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

The json_data column is not accounted for in pg_stats:
select attname, null_frac, avg_width from pg_stats where tablename = 'test_json_stats';
      attname       | null_frac | avg_width 
--------------------+-----------+-----------
 test_json_stats_id |         0 |         4
 insert_timestamp   |         0 |         8
(2 rows)

So I'm not sure if I'd actually qualify this as a "bug", but it appears that there is no way to currently get stats on a json data type.

I subsequently inserted a very large json into the table that consumed 2 pages in pg_toast_51822, but there still doesn't appear to me any way to get stats on the column.

Greg Haase




On Tue, Oct 29, 2013 at 2:55 PM, Gregory Haase <haaseg@onefreevoice.com> wrote:
Tom is correct: Vacuum verbose shows that their is an associated toast table. Neither the check_postgres.pl script or the query on http://wiki.postgresql.org/wiki/Show_database_bloat appear to take this into consideration. Both rely on null_frac and avg_width from pg_stats to estimate how big the table should be. I'm not sure how you would factor the toast table into that estimate.

-G


On Tue, Oct 29, 2013 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gregory Haase <haaseg@onefreevoice.com> writes:
> I've isolated the problem to the json field not showing up in pg_stats,
> which affects the calculation of the avg row size in the bloat query.

> I'm not sure if this is a json issue or some other kind of issue.

Possibly your "bloat query" is failing to consider the toast table
associated with this table?  If the json values are large they'd
mostly be in the toast table not the main table.

(It's unfortunate that VACUUM FULL doesn't tell you about what's
in the toast table.  I'd try just VACUUM VERBOSE here, without the
FULL, to get more info.)

                        regards, tom lane


Re: json datatype and table bloat?

From
Tom Lane
Date:
Gregory Haase <haaseg@onefreevoice.com> writes:
> The json_data column is not accounted for in pg_stats:

Ah!  I hadn't twigged to the fact that your bloat measurement approach
assumed you had pg_stats entries for all the columns.

> So I'm not sure if I'd actually qualify this as a "bug", but it appears
> that there is no way to currently get stats on a json data type.

ANALYZE currently punts on columns that don't have an equality operator,
which json does not.  There isn't that much in the way of stats that we
could collect, though I suppose we could still compute average datum width
and null fraction.

I'm not sure whether there are plans to invent an equality operator for
json.

            regards, tom lane


Re: json datatype and table bloat?

From
Merlin Moncure
Date:
On Fri, Nov 1, 2013 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gregory Haase <haaseg@onefreevoice.com> writes:
>> The json_data column is not accounted for in pg_stats:
>
> Ah!  I hadn't twigged to the fact that your bloat measurement approach
> assumed you had pg_stats entries for all the columns.
>
>> So I'm not sure if I'd actually qualify this as a "bug", but it appears
>> that there is no way to currently get stats on a json data type.
>
> ANALYZE currently punts on columns that don't have an equality operator,
> which json does not.  There isn't that much in the way of stats that we
> could collect, though I suppose we could still compute average datum width
> and null fraction.
>
> I'm not sure whether there are plans to invent an equality operator for
> json.

IMNSO, this may not be a bug, but it's pretty close.  All base types
should have equality operator as well as other supporting
infrastructure that the database itself depends on (in/out and
send/receive for example).  This is a pretty good example of why.

merlin


json datatype and table bloat?

From
Adam Jelinek
Date:
Along the lines of the equality operator; I have ran into issues trying to pivot a table/result set with a json type due what seemed to be no equality operator.
 
 

On Nov 4, 2013 10:14 AM, "Merlin Moncure" <mmoncure@gmail.com> wrote:
On Fri, Nov 1, 2013 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gregory Haase <haaseg@onefreevoice.com> writes:
>> The json_data column is not accounted for in pg_stats:
>
> Ah!  I hadn't twigged to the fact that your bloat measurement approach
> assumed you had pg_stats entries for all the columns.
>
>> So I'm not sure if I'd actually qualify this as a "bug", but it appears
>> that there is no way to currently get stats on a json data type.
>
> ANALYZE currently punts on columns that don't have an equality operator,
> which json does not.  There isn't that much in the way of stats that we
> could collect, though I suppose we could still compute average datum width
> and null fraction.
>
> I'm not sure whether there are plans to invent an equality operator for
> json.

IMNSO, this may not be a bug, but it's pretty close.  All base types
should have equality operator as well as other supporting
infrastructure that the database itself depends on (in/out and
send/receive for example).  This is a pretty good example of why.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
Sent from Gmail Mobile

Re: json datatype and table bloat?

From
David Johnston
Date:
ajelinek@gmail.com wrote
> Along the lines of the equality operator; I have ran into issues trying to
> pivot a table/result set with a json type due what seemed to be no
> equality
> operator.

For the curious, and also use-case considerations for development, would you
be able to share what it is you are doing (and how) that combines full json
documents with pivoting?

Compound types holding source data for a pivot seems problematic since
generally all the pivot components are single-valued and, for data, often
numerical.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/json-datatype-and-table-bloat-tp5776182p5776880.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: json datatype and table bloat?

From
Merlin Moncure
Date:
On Mon, Nov 4, 2013 at 12:14 PM, David Johnston <polobo@yahoo.com> wrote:
> ajelinek@gmail.com wrote
>> Along the lines of the equality operator; I have ran into issues trying to
>> pivot a table/result set with a json type due what seemed to be no
>> equality
>> operator.
>
> For the curious, and also use-case considerations for development, would you
> be able to share what it is you are doing (and how) that combines full json
> documents with pivoting?
>
> Compound types holding source data for a pivot seems problematic since
> generally all the pivot components are single-valued and, for data, often
> numerical.

would also like to see this. json type has completely displaced
crosstab in my usage. I don't typically pivot json though: I pivot the
raw data then transform to json.  With limited exceptions I consider
storing json in actual table rows to be an anti-pattern (but it should
still work if you do it).

merlin


Re: json datatype and table bloat?

From
"ajelinek@gmail.com"
Date:
>>> Along the lines of the equality operator; I have ran into issues trying
to
>>> pivot a table/result set with a json type due what seemed to be no
>>> equality
>>> operator.
>>
>> For the curious, and also use-case considerations for development, would
>> you
>> be able to share what it is you are doing (and how) that combines full
>> json
>> documents with pivoting?
>>
>> Compound types holding source data for a pivot seems problematic since
>> generally all the pivot components are single-valued and, for data, often
>> numerical.

>would also like to see this. json type has completely displaced
>crosstab in my usage. I don't typically pivot json though: I pivot the
>raw data then transform to json.  With limited exceptions I consider
>storing json in actual table rows to be an anti-pattern (but it should
>still work if you do it).

I could not figure out what I was doing last month to reproduce this.  So  I
did a small pivot poc, and it is erroring on the max function. So it is
probably not the same issue. My guess is I tried the using the GREATEST
function as a hail marry (which would not have worked) and got the following
message; ERROR:  could not identify a comparison function for type json and
then thought/hopped it was the same thing when reading the emails.

CREATE TABLE bad_table_json(id int, detail_type text, details json);
INSERT INTO bad_table_json values(1, 'a', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'b', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'c', '{"a":1}'::json);

SELECT id
      ,MAX(CASE WHEN detail_type = 'a' THEN details END) AS a
      ,MAX(CASE WHEN detail_type = 'b' THEN details END) AS b
      ,MAX(CASE WHEN detail_type = 'c' THEN details END) AS c
  FROM bad_table_json
 GROUP BY id



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/json-datatype-and-table-bloat-tp5776182p5776947.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: json datatype and table bloat?

From
Gregory Haase
Date:

While I agree that an equality operator doesn't really make sense for json operationally, there are certain maintenance reasons why it may come in handy. Removing duplicate records comes to mind.

Other than adding basic stats to necessary columns, I would say that the equality operator is really one of the most basic tenets of a relational database and should probably exist for any data type - regardless of it's given usefullness.

Greg Haase

On Nov 4, 2013 6:31 PM, "ajelinek@gmail.com" <ajelinek@gmail.com> wrote:
>>> Along the lines of the equality operator; I have ran into issues trying
to
>>> pivot a table/result set with a json type due what seemed to be no
>>> equality
>>> operator.
>>
>> For the curious, and also use-case considerations for development, would
>> you
>> be able to share what it is you are doing (and how) that combines full
>> json
>> documents with pivoting?
>>
>> Compound types holding source data for a pivot seems problematic since
>> generally all the pivot components are single-valued and, for data, often
>> numerical.

>would also like to see this. json type has completely displaced
>crosstab in my usage. I don't typically pivot json though: I pivot the
>raw data then transform to json.  With limited exceptions I consider
>storing json in actual table rows to be an anti-pattern (but it should
>still work if you do it).

I could not figure out what I was doing last month to reproduce this.  So  I
did a small pivot poc, and it is erroring on the max function. So it is
probably not the same issue. My guess is I tried the using the GREATEST
function as a hail marry (which would not have worked) and got the following
message; ERROR:  could not identify a comparison function for type json and
then thought/hopped it was the same thing when reading the emails.

CREATE TABLE bad_table_json(id int, detail_type text, details json);
INSERT INTO bad_table_json values(1, 'a', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'b', '{"a":1}'::json);
INSERT INTO bad_table_json values(1, 'c', '{"a":1}'::json);

SELECT id
      ,MAX(CASE WHEN detail_type = 'a' THEN details END) AS a
      ,MAX(CASE WHEN detail_type = 'b' THEN details END) AS b
      ,MAX(CASE WHEN detail_type = 'c' THEN details END) AS c
  FROM bad_table_json
 GROUP BY id



--
View this message in context: http://postgresql.1045698.n5.nabble.com/json-datatype-and-table-bloat-tp5776182p5776947.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: json datatype and table bloat?

From
Merlin Moncure
Date:
On Mon, Nov 4, 2013 at 8:31 PM, ajelinek@gmail.com <ajelinek@gmail.com> wrote:
>>>> Along the lines of the equality operator; I have ran into issues trying
> to
>>>> pivot a table/result set with a json type due what seemed to be no
>>>> equality
>>>> operator.
>>>
>>> For the curious, and also use-case considerations for development, would
>>> you
>>> be able to share what it is you are doing (and how) that combines full
>>> json
>>> documents with pivoting?
>>>
>>> Compound types holding source data for a pivot seems problematic since
>>> generally all the pivot components are single-valued and, for data, often
>>> numerical.
>
>>would also like to see this. json type has completely displaced
>>crosstab in my usage. I don't typically pivot json though: I pivot the
>>raw data then transform to json.  With limited exceptions I consider
>>storing json in actual table rows to be an anti-pattern (but it should
>>still work if you do it).
>
> I could not figure out what I was doing last month to reproduce this.  So  I
> did a small pivot poc, and it is erroring on the max function. So it is
> probably not the same issue. My guess is I tried the using the GREATEST
> function as a hail marry (which would not have worked) and got the following
> message; ERROR:  could not identify a comparison function for type json and
> then thought/hopped it was the same thing when reading the emails.
>
> CREATE TABLE bad_table_json(id int, detail_type text, details json);
> INSERT INTO bad_table_json values(1, 'a', '{"a":1}'::json);
> INSERT INTO bad_table_json values(1, 'b', '{"a":1}'::json);
> INSERT INTO bad_table_json values(1, 'c', '{"a":1}'::json);
>
> SELECT id
>       ,MAX(CASE WHEN detail_type = 'a' THEN details END) AS a
>       ,MAX(CASE WHEN detail_type = 'b' THEN details END) AS b
>       ,MAX(CASE WHEN detail_type = 'c' THEN details END) AS c
>   FROM bad_table_json
>  GROUP BY id

Aside: here's a way to do those type of things.  It's not faster
necessarily but seems cleaner to me.  This will bypass need for json
comparison.  IMMUTABLE plpgsql is generally the fastest way to
implement highly iterated trivial functions.

CREATE OR REPLACE FUNCTION PickInternal(State anyelement, WantValue
TEXT, PickValue TEXT, Value anyelement)
  RETURNS anyelement AS
$$
BEGIN
  RETURN CASE WHEN WantValue = PickValue THEN Value ELSE State END;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE;

CREATE AGGREGATE Pick(TEXT, TEXT, anyelement) (
  SFUNC=PickInternal,
  SType=anyelement);

SELECT id
      ,Pick('a', detail_type, details) AS a
      ,Pick('b', detail_type, details) AS b
      ,Pick('c', detail_type, details) AS c
  FROM bad_table_json
  GROUP BY id;

merlin