Thread: Why does this query write to the disk?
List,
I'm a bit confused as to why this query writes to the disk:
SELECT count(*)
FROM bigbigtable
WHERE customerid IN (SELECT customerid FROM smallcustomertable)
AND x != 'special'
AND y IS NULL
It writes a whole bunch of data to the disk that has the tablespace where bigbigtable lives as well as writes a little data to the main disk. It looks like its is actually WAL logging these writes.
Here is the EXPLAIN ANALYZE:
Aggregate (cost=46520194.16..46520194.17 rows=1 width=0) (actual time=4892191.995..4892191.995 rows=1 loops=1)
-> Hash IN Join (cost=58.56..46203644.01 rows=126620058 width=0) (actual time=2.938..4840349.573 rows=79815986 loops=1)
Hash Cond: ((bigbigtable.customerid)::text = (smallcustomertable.customerid)::text)
-> Seq Scan on bigbigtable (cost=0.00..43987129.60 rows=126688839 width=11) (actual time=0.011..4681248.143 rows=128087340 loops=1)
Filter: ((y IS NULL) AND ((x)::text <> 'special'::text))
-> Hash (cost=35.47..35.47 rows=1847 width=18) (actual time=2.912..2.912 rows=1847 loops=1)
-> Seq Scan on smallcustomertable (cost=0.00..35.47 rows=1847 width=18) (actual time=0.006..1.301 rows=1847 loops=1)
Total runtime: 4892192.086 ms
Can someone point me to some documentation as to why this writes to disk?
Thanks,
Nik
I'm a bit confused as to why this query writes to the disk:
SELECT count(*)
FROM bigbigtable
WHERE customerid IN (SELECT customerid FROM smallcustomertable)
AND x != 'special'
AND y IS NULL
It writes a whole bunch of data to the disk that has the tablespace where bigbigtable lives as well as writes a little data to the main disk. It looks like its is actually WAL logging these writes.
Here is the EXPLAIN ANALYZE:
Aggregate (cost=46520194.16..46520194.17 rows=1 width=0) (actual time=4892191.995..4892191.995 rows=1 loops=1)
-> Hash IN Join (cost=58.56..46203644.01 rows=126620058 width=0) (actual time=2.938..4840349.573 rows=79815986 loops=1)
Hash Cond: ((bigbigtable.customerid)::text = (smallcustomertable.customerid)::text)
-> Seq Scan on bigbigtable (cost=0.00..43987129.60 rows=126688839 width=11) (actual time=0.011..4681248.143 rows=128087340 loops=1)
Filter: ((y IS NULL) AND ((x)::text <> 'special'::text))
-> Hash (cost=35.47..35.47 rows=1847 width=18) (actual time=2.912..2.912 rows=1847 loops=1)
-> Seq Scan on smallcustomertable (cost=0.00..35.47 rows=1847 width=18) (actual time=0.006..1.301 rows=1847 loops=1)
Total runtime: 4892192.086 ms
Can someone point me to some documentation as to why this writes to disk?
Thanks,
Nik
>>> "Nikolas Everett" <nik9000@gmail.com> wrote: > I'm a bit confused as to why this query writes to the disk: > SELECT count(*) > FROM bigbigtable > WHERE customerid IN (SELECT customerid FROM > smallcustomertable) > AND x != > 'special' > > AND y IS NULL > > It writes a whole bunch of data to the disk that has the tablespace where > bigbigtable lives as well as writes a little data to the main disk. It > looks like its is actually WAL logging these writes. It's probably writing hint bits to improve performance of subsequent access to the table. The issue is discussed here: http://wiki.postgresql.org/wiki/Hint_Bits -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > "Nikolas Everett" <nik9000@gmail.com> wrote: >> I'm a bit confused as to why this query writes to the disk: > It's probably writing hint bits to improve performance of subsequent > access to the table. The issue is discussed here: > http://wiki.postgresql.org/wiki/Hint_Bits Hint-bit updates wouldn't be WAL-logged. If the table has been around a long time, it might be freezing old tuples, which *would* be WAL-logged (since 8.2 or so) --- but that would be a one-time, non-repeatable behavior. How sure are you that there was WAL output? What I was thinking was more likely was that the hash table for the hash join was spilling out to temp files. That wouldn't be WAL-logged either, but depending on your tablespace setup it might result in I/O on some other disk than the table proper. regards, tom lane
How big is your work_mem setting, and is this behavior affected by its size?
You can increase the work_mem on an individual connection before the test.
Simply:
set work_mem = '100MB'
to set it to 100 Megabytes. If your issue is spilling data out of work_mem to the temp storage, this setting will affect that.
You can increase the work_mem on an individual connection before the test.
Simply:
set work_mem = '100MB'
to set it to 100 Megabytes. If your issue is spilling data out of work_mem to the temp storage, this setting will affect that.
On Thu, Sep 18, 2008 at 10:30 AM, Nikolas Everett <nik9000@gmail.com> wrote:
List,
I'm a bit confused as to why this query writes to the disk:
SELECT count(*)
FROM bigbigtable
WHERE customerid IN (SELECT customerid FROM smallcustomertable)
AND x != 'special'
AND y IS NULL
It writes a whole bunch of data to the disk that has the tablespace where bigbigtable lives as well as writes a little data to the main disk. It looks like its is actually WAL logging these writes.
Here is the EXPLAIN ANALYZE:
Aggregate (cost=46520194.16..46520194.17 rows=1 width=0) (actual time=4892191.995..4892191.995 rows=1 loops=1)
-> Hash IN Join (cost=58.56..46203644.01 rows=126620058 width=0) (actual time=2.938..4840349.573 rows=79815986 loops=1)
Hash Cond: ((bigbigtable.customerid)::text = (smallcustomertable.customerid)::text)
-> Seq Scan on bigbigtable (cost=0.00..43987129.60 rows=126688839 width=11) (actual time=0.011..4681248.143 rows=128087340 loops=1)
Filter: ((y IS NULL) AND ((x)::text <> 'special'::text))
-> Hash (cost=35.47..35.47 rows=1847 width=18) (actual time=2.912..2.912 rows=1847 loops=1)
-> Seq Scan on smallcustomertable (cost=0.00..35.47 rows=1847 width=18) (actual time=0.006..1.301 rows=1847 loops=1)
Total runtime: 4892192.086 ms
Can someone point me to some documentation as to why this writes to disk?
Thanks,
Nik
Under what conditions does EXPLAIN ANALYZE report spilling work_mem to disk? When does it not report work_mem or other overflow to disk?
I know that a planned disk-sort shows up. I have also seen it report a hash-agg on disk, but this was a while ago and rather difficult to reproduce and I'm somewhat confident I have seen it spill to temp disk without reporting it in EXPLAIN ANALYZE, but I could be wrong.
I know that a planned disk-sort shows up. I have also seen it report a hash-agg on disk, but this was a while ago and rather difficult to reproduce and I'm somewhat confident I have seen it spill to temp disk without reporting it in EXPLAIN ANALYZE, but I could be wrong.
On Thu, Sep 18, 2008 at 11:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> "Nikolas Everett" <nik9000@gmail.com> wrote:
>> I'm a bit confused as to why this query writes to the disk:> It's probably writing hint bits to improve performance of subsequentHint-bit updates wouldn't be WAL-logged. If the table has been around a
> access to the table. The issue is discussed here:
> http://wiki.postgresql.org/wiki/Hint_Bits
long time, it might be freezing old tuples, which *would* be WAL-logged
(since 8.2 or so) --- but that would be a one-time, non-repeatable
behavior. How sure are you that there was WAL output?
What I was thinking was more likely was that the hash table for the hash
join was spilling out to temp files. That wouldn't be WAL-logged
either, but depending on your tablespace setup it might result in I/O on
some other disk than the table proper.
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> http://wiki.postgresql.org/wiki/Hint_BitsOn Thu, Sep 18, 2008 at 2:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
freezing old tuples
hash join was spilling out to temp files
Since this was a new table and the writes to the table's disk were very large it was probably the hint bits.
The small table was about 1300 rows and my work_mem was 100MB so the writes to the main disk probably was not hash spillage. They were tiny, so I'm not worried about them.
Thanks very much,
Nik
"Scott Carey" <scott@richrelevance.com> writes: > Under what conditions does EXPLAIN ANALYZE report spilling work_mem to > disk? For hash joins, it doesn't. You might be thinking of the additional reporting we added for sorts recently; but there's no comparable logging for hash ... regards, tom lane