Re: json datatype and table bloat? - Mailing list pgsql-general

From Gregory Haase
Subject Re: json datatype and table bloat?
Date
Msg-id CAHA6QFRADbCMhyJKnrQxmyzVNa1kDmy8E2F0qTUJ2PsGJeXnMg@mail.gmail.com
Whole thread Raw
In response to Re: json datatype and table bloat?  (Gregory Haase <haaseg@onefreevoice.com>)
Responses Re: json datatype and table bloat?
List pgsql-general
I spent some more time on this today, and I realized that the issue isn't that there are records in the toast table. The issue is that there are NO records in the toast table. Apparently, all the json we are inserting are too small to get toasted.

I setup a separate benchmark locally:
create table test_json_stats
(
    test_json_stats_id serial,
    json_data json,
    insert_timestamp timestamp default now() not null
);

created a file called "json bench" with the following:
BEGIN;
insert into test_json_stats (json_data) values ('{"counters": [ { "first":"1","second":"2"}, { "first":"3","second":"4"}, { "first":"5","second":"6"}, { "first":"7","second":"8"}, { "first":"9","second":"10"}, { "first":"11","second":"12"}, { "first":"13","second":"14"}, { "first":"15","second":"16"}, { "first":"17","second":"18"}, { "first":"19","second":"20"}, { "first":"21","second":"22"}, { "first":"23","second":"24"}, { "first":"25","second":"26"}, { "first":"27","second":"28"}, { "first":"29","second":"30"}, { "first":"31","second":"32"}, { "first":"33","second":"34"}, { "first":"35","second":"36"}, { "first":"37","second":"38"}, { "first":"39","second":"40"}, { "first":"41","second":"42"}, { "first":"43","second":"44"}, { "first":"45","second":"46"} ] }');
END;

Then ran pgbench: 
pgbench -c 5 -t 2000000 -f json_bench greg

vacuum vebose shows the test_json_stats table has over a million pages and the toast table exists with zero pages:
INFO:  vacuuming "public.test_json_stats"
INFO:  "test_json_stats": found 0 removable, 0 nonremovable row versions in 0 out of 1010011 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO:  vacuuming "pg_toast.pg_toast_51822"
INFO:  index "pg_toast_51822_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_51822": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

The json_data column is not accounted for in pg_stats:
select attname, null_frac, avg_width from pg_stats where tablename = 'test_json_stats';
      attname       | null_frac | avg_width 
--------------------+-----------+-----------
 test_json_stats_id |         0 |         4
 insert_timestamp   |         0 |         8
(2 rows)

So I'm not sure if I'd actually qualify this as a "bug", but it appears that there is no way to currently get stats on a json data type.

I subsequently inserted a very large json into the table that consumed 2 pages in pg_toast_51822, but there still doesn't appear to me any way to get stats on the column.

Greg Haase




On Tue, Oct 29, 2013 at 2:55 PM, Gregory Haase <haaseg@onefreevoice.com> wrote:
Tom is correct: Vacuum verbose shows that their is an associated toast table. Neither the check_postgres.pl script or the query on http://wiki.postgresql.org/wiki/Show_database_bloat appear to take this into consideration. Both rely on null_frac and avg_width from pg_stats to estimate how big the table should be. I'm not sure how you would factor the toast table into that estimate.

-G


On Tue, Oct 29, 2013 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gregory Haase <haaseg@onefreevoice.com> writes:
> I've isolated the problem to the json field not showing up in pg_stats,
> which affects the calculation of the avg row size in the bloat query.

> I'm not sure if this is a json issue or some other kind of issue.

Possibly your "bloat query" is failing to consider the toast table
associated with this table?  If the json values are large they'd
mostly be in the toast table not the main table.

(It's unfortunate that VACUUM FULL doesn't tell you about what's
in the toast table.  I'd try just VACUUM VERBOSE here, without the
FULL, to get more info.)

                        regards, tom lane


pgsql-general by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Cannot create matview when referencing another not-populated-yet matview in subquery
Next
From: Tom Lane
Date:
Subject: Re: json datatype and table bloat?