[PoC] pgstattuple2: block sampling to reduce physical read - Mailing list pgsql-hackers

From Satoshi Nagayasu
Subject [PoC] pgstattuple2: block sampling to reduce physical read
Date
Msg-id 51EE1FBB.2050502@uptime.jp
Whole thread Raw
Responses Re: [PoC] pgstattuple2: block sampling to reduce physical read
List pgsql-hackers
Hi,

I've been working on new pgstattuple function to allow
block sampling [1] in order to reduce block reads while
scanning a table. A PoC patch is attached.

[1] Re: [RFC] pgstattuple/pgstatindex enhancement

http://www.postgresql.org/message-id/CA+TgmoaxJhGZ2c4AYfbr9muUVNhGWU4co-cthqpZRwwDtamvhw@mail.gmail.com

This new function, pgstattuple2(), samples only 3,000 blocks
(which accounts 24MB) from the table randomly, and estimates
several parameters of the entire table.

The function calculates the averages of the samples, estimates
the parameters (averages and SDs), and shows "standard errors
(SE)" to allow estimating status of the table with statistical
approach.

And, of course, it reduces number of physical block reads
while scanning a bigger table.

The following example shows that new pgstattuple2 function
runs x100 faster than the original pgstattuple function with
well-estimated results.

----------------------------------------------
postgres=# select * from pgstattuple('pgbench_accounts');
-[ RECORD 1 ]------+-----------
table_len          | 1402642432
tuple_count        | 10000000
tuple_len          | 1210000000
tuple_percent      | 86.27
dead_tuple_count   | 182895
dead_tuple_len     | 22130295
dead_tuple_percent | 1.58
free_space         | 21012328
free_percent       | 1.5

Time: 1615.651 ms
postgres=# select * from pgstattuple2('pgbench_accounts');
NOTICE:  pgstattuple2: SE tuple_count 2376.47, tuple_len 287552.58,
dead_tuple_count 497.63, dead_tuple_len 60213.08, free_space 289752.38
-[ RECORD 1 ]------+-----------
table_len          | 1402642432
tuple_count        | 9978074
tuple_len          | 1207347074
tuple_percent      | 86.08
dead_tuple_count   | 187315
dead_tuple_len     | 22665208
dead_tuple_percent | 1.62
free_space         | 23400431
free_percent       | 1.67

Time: 15.026 ms
postgres=#
----------------------------------------------

In addition to that, see attached chart to know how pgstattuple2
estimates well during repeating (long-running) pgbench.

I understand that pgbench would generate "random" transactions,
and those update operations might not have any skew over the table,
so estimating table status seems to be easy in this test.

However, I'm still curious to know whether it would work in
"real-world" worklaod.

Is it worth having this? Any comment or suggestion?

Regards,
--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Auto explain target tables
Next
From: Peter Geoghegan
Date:
Subject: Re: Design proposal: fsync absorb linear slider