Re: [GENERAL] Queries on very big table - Mailing list pgsql-general

From Merlin Moncure
Subject Re: [GENERAL] Queries on very big table
Date
Msg-id CAHyXU0zXuw01aSVpTyx0KiKdKg60UZNJ920n7Q0pX_FNaNHQVw@mail.gmail.com
Whole thread Raw
In response to [GENERAL] Queries on very big table  (Job <Job@colliniconsulting.it>)
List pgsql-general
On Mon, Jan 2, 2017 at 5:23 AM, Job <Job@colliniconsulting.it> wrote:
> Hello guys and very good new year to everybody!
>
> We are now approaching some queries and statistics on very big table (about 180 millions of record).
> The table is partitioned by day (about ~3 Gb of data for every partition/day).
> We use Postgresql 9.6.1
>
> I am experiencing quite important slowdown on queries.
> I manually made a "vacuum full" and a "reindex" on every partition in order to clean free space and reorder records.
>
> I have a BRIN index on timestamp and index on other field (btree)
>
> Starting by a simple query: explain analyze select count(domain) from webtraffic_archive:
>
>                                                                                    QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Finalize Aggregate  (cost=3220451.94..3220451.95 rows=1 width=8) (actual time=36912.624..36912.624 rows=1 loops=1)
>    ->  Gather  (cost=3220451.52..3220451.93 rows=4 width=8) (actual time=36911.600..36912.614 rows=5 loops=1)
>          Workers Planned: 4
>          Workers Launched: 4
>          ->  Partial Aggregate  (cost=3219451.52..3219451.53 rows=1 width=8) (actual time=36906.804..36906.804 rows=1
loops=5)
>                ->  Append  (cost=0.00..3094635.41 rows=49926443 width=0) (actual time=4.716..31331.229 rows=39853988
loops=5)
>                      ->  Parallel Seq Scan on webtraffic_archive  (cost=0.00..0.00 rows=1 width=0) (actual
time=0.001..0.001rows=0 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_01_01  (cost=0.00..10.47 rows=47 width=0)
(actualtime=0.000..0.000 rows=0 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_16  (cost=0.00..213728.26 rows=3498026
width=0)(actual time=4.713..2703.458 rows=2798421 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_17  (cost=0.00..201379.39 rows=3247739
width=0)(actual time=6.334..2364.726 rows=2598191 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_18  (cost=0.00..176248.86 rows=2824986
width=0)(actual time=7.437..2014.812 rows=2259989 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_19  (cost=0.00..177493.33 rows=2866433
width=0)(actual time=9.951..2145.958 rows=2293146 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_20  (cost=0.00..120271.83 rows=1960883
width=0)(actual time=0.011..372.092 rows=1568706 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_21  (cost=0.00..276391.94 rows=4485294
width=0)(actual time=5.386..3111.589 rows=3588235 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_22  (cost=0.00..287611.68 rows=4630668
width=0)(actual time=6.598..3335.834 rows=3704535 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_23  (cost=0.00..249047.61 rows=4014361
width=0)(actual time=7.206..2628.884 rows=3211489 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_24  (cost=0.00..192008.70 rows=3097370
width=0)(actual time=9.870..1882.826 rows=2477896 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_25  (cost=0.00..87385.16 rows=1405616
width=0)(actual time=0.018..427.248 rows=1124493 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_26  (cost=0.00..88262.80 rows=1436080
width=0)(actual time=0.014..277.327 rows=1148864 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_27  (cost=0.00..222607.43 rows=3557243
width=0)(actual time=8.497..1232.210 rows=2845795 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_28  (cost=0.00..210414.76 rows=3365676
width=0)(actual time=0.033..548.878 rows=2692541 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_29  (cost=0.00..185065.72 rows=2955872
width=0)(actual time=0.031..498.079 rows=2364697 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_30  (cost=0.00..149139.55 rows=2382656
width=0)(actual time=0.011..501.351 rows=1906124 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_31  (cost=0.00..166991.89 rows=2664288
width=0)(actual time=0.041..437.631 rows=2131431 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2017_01_01  (cost=0.00..79197.29 rows=1260930
width=0)(actual time=0.018..254.124 rows=1008744 loops=5) 
>                      ->  Parallel Seq Scan on webtraffic_archive_day_2017_01_02  (cost=0.00..11378.74 rows=272274
width=0)(actual time=0.017..34.352 rows=130691 loops=5) 
>  Planning time: 313.907 ms
>  Execution time: 36941.700 ms
>
> Other more complex queries are slower.
>
> How can i improve it?
> Records number can raise up until 1.000 millions.
> Do i need a third-part tool for big data?

What storage do you have?

merlin


pgsql-general by date:

Previous
From: Pierre Ducroquet
Date:
Subject: Re: [GENERAL] PostgreSQL not reusing free space in table ?
Next
From: Alban Hertroys
Date:
Subject: Re: [GENERAL] Postgres 9.6.1 big slowdown by upgrading 8.4.22