Thread: [GENERAL] vacuumdb --analyze-only scans all pages?
Hello, PostgreSQl 9.6.1: after a pg_dump/restore procedure it scans all pages (at least for some of the tables, analyze-only switch is specified). I would expect that only the sample rows are scanned. "log_details": scanned 2133350 of 2133350 pages vacuumdb --analyze-only --all --verbose INFO: analyzing "public.log" INFO: "log": scanned 30000 of 30851 pages, containing 3599899 live rows and 0 dead rows; 30000 rows in sample, 3702016 estimated total rows INFO: analyzing "public.log_details" INFO: "log_details": scanned 2133350 of 2133350 pages, containing 334935843 live rows and 0 dead rows; 3000000 rows in sample, 334935843 estimated total rows INFO: analyzing "public.log_details_str" INFO: "log_details_str": scanned 30000 of 521126 pages, containing 3601451 live rows and 0 dead rows; 30000 rows in sample, 62560215 estimated total rows Any ideas why? Thnx. Ciao, Gerhard
On 12/28/2016 11:54 PM, Gerhard Wiesinger wrote: > Hello, > > PostgreSQl 9.6.1: after a pg_dump/restore procedure it scans all pages > (at least for some of the tables, analyze-only switch is specified). > > I would expect that only the sample rows are scanned. > > "log_details": scanned 2133350 of 2133350 pages > > vacuumdb --analyze-only --all --verbose > INFO: analyzing "public.log" > INFO: "log": scanned 30000 of 30851 pages, containing 3599899 live rows > and 0 dead rows; 30000 rows in sample, 3702016 estimated total rows > INFO: analyzing "public.log_details" > INFO: "log_details": scanned 2133350 of 2133350 pages, containing > 334935843 live rows and 0 dead rows; 3000000 rows in sample, 334935843 > estimated total rows > INFO: analyzing "public.log_details_str" > INFO: "log_details_str": scanned 30000 of 521126 pages, containing > 3601451 live rows and 0 dead rows; 30000 rows in sample, 62560215 > estimated total rows > > Any ideas why? I would say because the '3000000 rows in sample' where spread out over all 2133350 pages. > > Thnx. > > Ciao, > > Gerhard > > > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > On 12/28/2016 11:54 PM, Gerhard Wiesinger wrote: >> vacuumdb --analyze-only --all --verbose >> INFO: analyzing "public.log" >> INFO: "log": scanned 30000 of 30851 pages, containing 3599899 live rows >> and 0 dead rows; 30000 rows in sample, 3702016 estimated total rows >> INFO: analyzing "public.log_details" >> INFO: "log_details": scanned 2133350 of 2133350 pages, containing >> 334935843 live rows and 0 dead rows; 3000000 rows in sample, 334935843 >> estimated total rows >> INFO: analyzing "public.log_details_str" >> INFO: "log_details_str": scanned 30000 of 521126 pages, containing >> 3601451 live rows and 0 dead rows; 30000 rows in sample, 62560215 >> estimated total rows >> >> Any ideas why? > I would say because the '3000000 rows in sample' where spread out over > all 2133350 pages. Worth pointing out here is that you must have a custom statistics target set on log_details to make it want a sample so much larger than the default. If you feel ANALYZE is taking too long, you should reconsider whether you need such a large target. regards, tom lane
On 29.12.2016 16:10, Tom Lane wrote: > Adrian Klaver <adrian.klaver@aklaver.com> writes: >> On 12/28/2016 11:54 PM, Gerhard Wiesinger wrote: >>> vacuumdb --analyze-only --all --verbose >>> INFO: analyzing "public.log" >>> INFO: "log": scanned 30000 of 30851 pages, containing 3599899 live rows >>> and 0 dead rows; 30000 rows in sample, 3702016 estimated total rows >>> INFO: analyzing "public.log_details" >>> INFO: "log_details": scanned 2133350 of 2133350 pages, containing >>> 334935843 live rows and 0 dead rows; 3000000 rows in sample, 334935843 >>> estimated total rows >>> INFO: analyzing "public.log_details_str" >>> INFO: "log_details_str": scanned 30000 of 521126 pages, containing >>> 3601451 live rows and 0 dead rows; 30000 rows in sample, 62560215 >>> estimated total rows >>> >>> Any ideas why? >> I would say because the '3000000 rows in sample' where spread out over >> all 2133350 pages. > Worth pointing out here is that you must have a custom statistics target > set on log_details to make it want a sample so much larger than the > default. If you feel ANALYZE is taking too long, you should reconsider > whether you need such a large target. Thanx Tom and Adrian Yes, there is a custom statistic target of 10000 set, I guess for some reasons some time ago to overcome a performance problem after upgrade from 8.3 to 8.4. Thanx Tom for pointing that out. Good query to find it out: SELECT n.nspname AS schemaname, CASE WHEN cl.relkind = 'r' THEN CAST('TABLE' AS VARCHAR(20)) WHEN cl.relkind = 'i' THEN CAST('INDEX' AS VARCHAR(20)) WHEN cl.relkind = 'S' THEN CAST('SEQUENCE' AS VARCHAR(20)) WHEN cl.relkind = 's' THEN CAST('SPECIAL' AS VARCHAR(20)) WHEN cl.relkind = 'v' THEN CAST('VIEW' AS VARCHAR(20)) WHEN cl.relkind = 't' THEN CAST('TOAST TABLE' AS VARCHAR(18)) ELSE null END AS object_type, cl.relname, attname, attstattarget FROM pg_attribute a LEFT OUTER JOIN pg_class cl ON a.attrelid = cl.oid LEFT OUTER JOIN pg_namespace n ON n.oid = cl.relnamespace WHERE (cl.relkind = 'r' OR cl.relkind = 'i' OR cl.relkind = 't') AND attnum > 0 -- only regular columns AND n.nspname = 'public' -- public schema only AND NOT(relname ILIKE 'pgstatspack_%') AND cl.relkind = 'r' -- TABLE AND attstattarget <> -1 -- non default values only ORDER BY n.nspname, cl.relname, attnum ; BTW: It looks like that the statistics target is multiplied by 300 to get the number of rows, is that true (didn't find any documentation about that)? https://www.postgresql.org/docs/current/static/planner-stats.html -- ALTER TABLE public.log_details ALTER COLUMN fk_id SET STATISTICS 10000; -- ALTER TABLE public.log_details ALTER COLUMN fk_keyid SET STATISTICS 10000; -- Default is 100, means 300*100=30000 rows (30k) -- Max ss 10000, means 300*10000=3000000 rows (3 Mio) ALTER TABLE public.log_details ALTER COLUMN fk_id SET STATISTICS -1; ALTER TABLE public.log_details ALTER COLUMN fk_keyid SET STATISTICS -1; https://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET Thnx. Ciao, Gerhard