Thread: why select count(*) consumes wal logs

why select count(*) consumes wal logs

From
Ravi Krishna
Date:
PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY. The table has
no index at this time. Since I am the only user I don't see any other activity.
Now when I run select count(*) on the table where I just loaded data, it runs for ever, 
more than 10min and still running. Intrigued, I checked locks and saw nothing.  Then I noticed something
strange.  When select count(*) runs, PG is writing to wal_logs, and that too a large amount. Why?  
I suspect vaccum is getting triggered, but this is a brand new table with no updates. So it should not.

Is there a SQL to peek into what PG is doing to write so much to WAL logs ?



Re: why select count(*) consumes wal logs

From
Ravi Krishna
Date:
Must be something to do with Vaccum as the second time I ran the SQL, it did not consume WAL logs.


Re: why select count(*) consumes wal logs

From
Michael Nolan
Date:


On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <srkrishna1@aol.com> wrote:
PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

It's always a good idea after doing a large scale data load to do a vacuum analyze on the table (or the entire database.)
--
Mike Nolan

Re: why select count(*) consumes wal logs

From
Ron
Date:
On 11/06/2018 11:12 AM, Michael Nolan wrote:
On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <srkrishna1@aol.com> wrote:
PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

It's always a good idea after doing a large scale data load to do a vacuum analyze on the table (or the entire database.)


I understand the need to ANALYZE (populate the histograms needed by the dynamic optimizer), but why VACUUM (which is recommended after updates and deletes).

Thanks

--
Angular momentum makes the world go 'round.

RE: why select count(*) consumes wal logs

From
"Kumar, Virendra"
Date:

I concord.

Why VACUUM when there is no update or deletes.

 

Regards,

Virendra

 

From: Ron [mailto:ronljohnsonjr@gmail.com]
Sent: Tuesday, November 06, 2018 12:20 PM
To: pgsql-general@lists.postgresql.org
Subject: Re: why select count(*) consumes wal logs

 

On 11/06/2018 11:12 AM, Michael Nolan wrote:

On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <srkrishna1@aol.com> wrote:

PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

 

It's always a good idea after doing a large scale data load to do a vacuum analyze on the table (or the entire database.)

 


I understand the need to ANALYZE (populate the histograms needed by the dynamic optimizer), but why VACUUM (which is recommended after updates and deletes).

Thanks

--
Angular momentum makes the world go 'round.




This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.

Re: why select count(*) consumes wal logs

From
Tom Lane
Date:
Ravi Krishna <srkrishna1@aol.com> writes:
> I loaded 133 million rows to a wide table (more than 100 cols) via COPY. The table has
> no index at this time. Since I am the only user I don't see any other activity.
> Now when I run select count(*) on the table where I just loaded data, it runs for ever,
> more than 10min and still running. Intrigued, I checked locks and saw nothing.  Then I noticed something
> strange.  When select count(*) runs, PG is writing to wal_logs, and that too a large amount. Why?

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows.  The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

            regards, tom lane


Re: why select count(*) consumes wal logs

From
Ravi Krishna
Date:
>That represents setting the yes-this-row-is-committed hint bits on the
>newly loaded rows.  The first access to any such row will set that bit,
>whether it's a select or a VACUUM or whatever.

yes now I recollect reading this in a blog.  Thanks Tom.


Re: why select count(*) consumes wal logs

From
Michael Nolan
Date:



On Tue, Nov 6, 2018 at 11:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows.  The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

Tom, does that include ANALYZE?
--
Mike Nolan

Re: why select count(*) consumes wal logs

From
Tom Lane
Date:
Michael Nolan <htfoot@gmail.com> writes:
> On Tue, Nov 6, 2018 at 11:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That represents setting the yes-this-row-is-committed hint bits on the
>> newly loaded rows.  The first access to any such row will set that bit,
>> whether it's a select or a VACUUM or whatever.

> Tom, does that include ANALYZE?

Yes, but remember that ANALYZE doesn't scan the whole table; it'll only
set the bit on rows it visits.

(I forget at the moment if it's guaranteed to set the bit on all rows
in each page it examines, or only on the rows it selects to sample.
But in any case it will not examine every page in the table.)

            regards, tom lane


Re: why select count(*) consumes wal logs

From
Bruno Lavoie
Date:


Le mar. 6 nov. 2018 12:40 PM, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Ravi Krishna <srkrishna1@aol.com> writes:
> I loaded 133 million rows to a wide table (more than 100 cols) via COPY. The table has
> no index at this time. Since I am the only user I don't see any other activity.
> Now when I run select count(*) on the table where I just loaded data, it runs for ever,
> more than 10min and still running. Intrigued, I checked locks and saw nothing.  Then I noticed something
> strange.  When select count(*) runs, PG is writing to wal_logs, and that too a large amount. Why? 

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows.  The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

                        regards, tom lane


And IIRC, it can generate a high WAL traffic since the first page change after a checkpoint is done with full page write. And you said that it's happening on a big table with wide rows....

RE: why select count(*) consumes wal logs

From
Steven Winfield
Date:

As long as you don’t have page checksums turned on, you can prevent this by turning off wal_log_hints.

 

Steve.





This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice

Re: why select count(*) consumes wal logs

From
Ravi Krishna
Date:
> As long as you don’t have page checksums turned on,
> you can prevent this by turning off wal_log_hints.

  


I did not run initdb. How to find out which parameter were used with initdb. For page checksums
to be on, it must have been run with -k option.

Our wal_log_hints is left at default which means off.

thanks


RE: why select count(*) consumes wal logs

From
Steven Winfield
Date:
> How to find out which parameter were used with initdb

pg_controldata -D <datadir> | grep sum
...should give you something like:
Data page checksum version: 0
...and 0 means off.

Similarly, from SQL:
select data_page_checksum_version from pg_control_init()

Steve.




This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice

Re: why select count(*) consumes wal logs

From
Ravi Krishna
Date:
> select data_page_checksum_version from pg_control_init()

returned 1.  So we have page_checksum turned on, and wal_log_hints off.


Re: why select count(*) consumes wal logs

From
Thomas Kellerer
Date:
Ravi Krishna schrieb am 07.11.2018 um 15:10:
> 
>> select data_page_checksum_version from pg_control_init()
> 
> returned 1.  So we have page_checksum turned on, and wal_log_hints off.

If page_checksum is enabled, then wal_log_hints is ignored (or actually always assumed "on")