Thread: Unlinking Parallel Hash Join inner batch files sooner
Hi, One complaint about PHJ is that it can, in rare cases, use a surprising amount of temporary disk space where non-parallel HJ would not. When it decides that it needs to double the number of batches to try to fit each inner batch into memory, and then again and again depending on your level of bad luck, it leaves behind all the earlier generations of inner batch files to be cleaned up at the end of the query. That's stupid. Here's a patch to unlink them sooner, as a small improvement. The reason I didn't do this earlier is that sharedtuplestore.c continues the pre-existing tradition where each parallel process counts what it writes against its own temp_file_limit. At the time I thought I'd need to have one process unlink all the files, but if a process were to unlink files that it didn't create, that accounting system would break. Without some new kind of shared temp_file_limit mechanism that doesn't currently exist, per-process counters could go negative, creating free money. In the attached patch, I realised something that I'd missed before: there is a safe point for each backend to unlink just the files that it created, and there is no way for a process that created files not to reach that point. Here's an example query that tries 8, 16 and then 32 batches on my machine, because reltuples is clobbered with a bogus value. Pathological cases can try many more rounds than that, but 3 is enough to demonstrate. Using truss and shell tricks I spat out the list of create and unlink operations from master and the attached draft/POC patch. See below. set work_mem = '1MB'; CREATE TABLE t (i int); INSERT INTO t SELECT generate_series(1, 1000000); ANALYZE t; UPDATE pg_class SET reltuples = reltuples / 4 WHERE relname = 't'; EXPLAIN ANALYZE SELECT COUNT(*) FROM t t1 JOIN t t2 USING (i); This code is also exercised by the existing "bad" case in join_hash.sql. This is the second of two experimental patches investigating increased resource usage in PHJ compared to HJ based on user complaints, this one being per-batch temp files, and the other[1] being per-batch buffer memory. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGKCnU9NjFfzO219V-YeyWr8mZe4JRrf%3Dx_uv6qsePBcOw%40mail.gmail.com ===== master: 99861: create i3of8.p0.0 99861: create i6of8.p0.0 99861: create i4of8.p0.0 99861: create i5of8.p0.0 99861: create i7of8.p0.0 99861: create i2of8.p0.0 99861: create i1of8.p0.0 99863: create i2of8.p1.0 99862: create i7of8.p2.0 99863: create i5of8.p1.0 99862: create i1of8.p2.0 99863: create i6of8.p1.0 99862: create i5of8.p2.0 99863: create i1of8.p1.0 99862: create i3of8.p2.0 99863: create i7of8.p1.0 99862: create i4of8.p2.0 99863: create i3of8.p1.0 99862: create i2of8.p2.0 99863: create i4of8.p1.0 99862: create i6of8.p2.0 99863: create i8of16.p1.0 99861: create i8of16.p0.0 99862: create i8of16.p2.0 99863: create i9of16.p1.0 99862: create i1of16.p2.0 99863: create i1of16.p1.0 99862: create i9of16.p2.0 99861: create i9of16.p0.0 99861: create i1of16.p0.0 99862: create i10of16.p2.0 99863: create i2of16.p1.0 99862: create i2of16.p2.0 99863: create i10of16.p1.0 99861: create i2of16.p0.0 99861: create i10of16.p0.0 99862: create i11of16.p2.0 99863: create i3of16.p1.0 99862: create i3of16.p2.0 99861: create i3of16.p0.0 99863: create i11of16.p1.0 99861: create i11of16.p0.0 99863: create i4of16.p1.0 99863: create i12of16.p1.0 99862: create i12of16.p2.0 99862: create i4of16.p2.0 99861: create i12of16.p0.0 99861: create i4of16.p0.0 99863: create i13of16.p1.0 99863: create i5of16.p1.0 99862: create i5of16.p2.0 99862: create i13of16.p2.0 99861: create i5of16.p0.0 99861: create i13of16.p0.0 99862: create i6of16.p2.0 99863: create i6of16.p1.0 99861: create i14of16.p0.0 99862: create i14of16.p2.0 99863: create i14of16.p1.0 99861: create i6of16.p0.0 99863: create i15of16.p1.0 99861: create i7of16.p0.0 99863: create i7of16.p1.0 99862: create i15of16.p2.0 99861: create i15of16.p0.0 99862: create i7of16.p2.0 99863: create i16of32.p1.0 99862: create i16of32.p2.0 99861: create i16of32.p0.0 99863: create i17of32.p1.0 99863: create i1of32.p1.0 99861: create i1of32.p0.0 99862: create i17of32.p2.0 99862: create i1of32.p2.0 99861: create i17of32.p0.0 99863: create i18of32.p1.0 99863: create i2of32.p1.0 99862: create i2of32.p2.0 99862: create i18of32.p2.0 99861: create i2of32.p0.0 99861: create i18of32.p0.0 99862: create i3of32.p2.0 99862: create i19of32.p2.0 99861: create i19of32.p0.0 99861: create i3of32.p0.0 99863: create i19of32.p1.0 99863: create i3of32.p1.0 99863: create i20of32.p1.0 99863: create i4of32.p1.0 99861: create i20of32.p0.0 99861: create i4of32.p0.0 99862: create i20of32.p2.0 99862: create i4of32.p2.0 99861: create i21of32.p0.0 99863: create i21of32.p1.0 99861: create i5of32.p0.0 99863: create i5of32.p1.0 99862: create i5of32.p2.0 99862: create i21of32.p2.0 99863: create i22of32.p1.0 99863: create i6of32.p1.0 99861: create i22of32.p0.0 99862: create i22of32.p2.0 99861: create i6of32.p0.0 99862: create i6of32.p2.0 99863: create i7of32.p1.0 99863: create i23of32.p1.0 99861: create i7of32.p0.0 99862: create i23of32.p2.0 99862: create i7of32.p2.0 99861: create i23of32.p0.0 99862: create i24of32.p2.0 99862: create i8of32.p2.0 99863: create i24of32.p1.0 99863: create i8of32.p1.0 99861: create i24of32.p0.0 99861: create i8of32.p0.0 99863: create i9of32.p1.0 99863: create i25of32.p1.0 99862: create i9of32.p2.0 99862: create i25of32.p2.0 99861: create i9of32.p0.0 99861: create i25of32.p0.0 99861: create i26of32.p0.0 99862: create i26of32.p2.0 99863: create i10of32.p1.0 99862: create i10of32.p2.0 99861: create i10of32.p0.0 99863: create i26of32.p1.0 99862: create i11of32.p2.0 99861: create i11of32.p0.0 99862: create i27of32.p2.0 99861: create i27of32.p0.0 99863: create i27of32.p1.0 99863: create i11of32.p1.0 99862: create i12of32.p2.0 99861: create i28of32.p0.0 99862: create i28of32.p2.0 99861: create i12of32.p0.0 99863: create i12of32.p1.0 99863: create i28of32.p1.0 99863: create i29of32.p1.0 99863: create i13of32.p1.0 99862: create i29of32.p2.0 99862: create i13of32.p2.0 99861: create i13of32.p0.0 99861: create i29of32.p0.0 99863: create i14of32.p1.0 99862: create i14of32.p2.0 99863: create i30of32.p1.0 99861: create i30of32.p0.0 99862: create i30of32.p2.0 99861: create i14of32.p0.0 99863: create i15of32.p1.0 99863: create i31of32.p1.0 99862: create i15of32.p2.0 99861: create i31of32.p0.0 99862: create i31of32.p2.0 99861: create i15of32.p0.0 99863: create o19of32.p1.0 99861: create o19of32.p0.0 99863: create o30of32.p1.0 99861: create o20of32.p0.0 99863: create o28of32.p1.0 99862: create o23of32.p2.0 99861: create o25of32.p0.0 99863: create o21of32.p1.0 99862: create o26of32.p2.0 99861: create o15of32.p0.0 99863: create o8of32.p1.0 99862: create o7of32.p2.0 99863: create o20of32.p1.0 99861: create o2of32.p0.0 99863: create o14of32.p1.0 99862: create o3of32.p2.0 99863: create o12of32.p1.0 99861: create o8of32.p0.0 99862: create o18of32.p2.0 99863: create o7of32.p1.0 99862: create o24of32.p2.0 99861: create o24of32.p0.0 99863: create o24of32.p1.0 99862: create o10of32.p2.0 99863: create o11of32.p1.0 99861: create o23of32.p0.0 99862: create o22of32.p2.0 99863: create o31of32.p1.0 99862: create o12of32.p2.0 99861: create o10of32.p0.0 99863: create o2of32.p1.0 99862: create o30of32.p2.0 99861: create o6of32.p0.0 99863: create o22of32.p1.0 99862: create o14of32.p2.0 99861: create o17of32.p0.0 99863: create o0of32.p1.0 99862: create o29of32.p2.0 99861: create o4of32.p0.0 99863: create o6of32.p1.0 99862: create o8of32.p2.0 99861: create o11of32.p0.0 99863: create o18of32.p1.0 99862: create o15of32.p2.0 99861: create o1of32.p0.0 99863: create o5of32.p1.0 99862: create o2of32.p2.0 99861: create o12of32.p0.0 99863: create o4of32.p1.0 99862: create o28of32.p2.0 99861: create o13of32.p0.0 99863: create o9of32.p1.0 99862: create o31of32.p2.0 99861: create o21of32.p0.0 99863: create o27of32.p1.0 99862: create o0of32.p2.0 99861: create o16of32.p0.0 99863: create o26of32.p1.0 99862: create o13of32.p2.0 99861: create o29of32.p0.0 99863: create o3of32.p1.0 99862: create o5of32.p2.0 99861: create o3of32.p0.0 99863: create o25of32.p1.0 99862: create o21of32.p2.0 99861: create o5of32.p0.0 99863: create o1of32.p1.0 99862: create o20of32.p2.0 99861: create o30of32.p0.0 99863: create o17of32.p1.0 99862: create o1of32.p2.0 99861: create o14of32.p0.0 99863: create o23of32.p1.0 99862: create o16of32.p2.0 99861: create o0of32.p0.0 99863: create o13of32.p1.0 99862: create o19of32.p2.0 99861: create o28of32.p0.0 99863: create o16of32.p1.0 99862: create o6of32.p2.0 99861: create o26of32.p0.0 99863: create o15of32.p1.0 99862: create o9of32.p2.0 99861: create o18of32.p0.0 99863: create o29of32.p1.0 99862: create o11of32.p2.0 99861: create o31of32.p0.0 99862: create o4of32.p2.0 99863: create o10of32.p1.0 99861: create o27of32.p0.0 99862: create o27of32.p2.0 99861: create o7of32.p0.0 99862: create o17of32.p2.0 99861: create o22of32.p0.0 99862: create o25of32.p2.0 99861: create o9of32.p0.0 99861: unlink i20of32.p0.0 99861: unlink o24of32.p0.0 99861: unlink i29of32.p2.0 99861: unlink i7of8.p1.0 99861: unlink i26of32.p2.0 99861: unlink i3of8.p1.0 99861: unlink o7of32.p1.0 99861: unlink o22of32.p2.0 99861: unlink o8of32.p1.0 99861: unlink i30of32.p0.0 99861: unlink i1of32.p2.0 99861: unlink i7of32.p0.0 99861: unlink i18of32.p1.0 99861: unlink i8of32.p0.0 99861: unlink i17of32.p1.0 99861: unlink i6of16.p0.0 99861: unlink o13of32.p1.0 99861: unlink i9of16.p0.0 99861: unlink o0of32.p0.0 99861: unlink o6of32.p2.0 99861: unlink o9of32.p2.0 99861: unlink o23of32.p1.0 99861: unlink i28of32.p1.0 99861: unlink i27of32.p1.0 99861: unlink i1of16.p1.0 99861: unlink i11of16.p0.0 99861: unlink o14of32.p0.0 99861: unlink i10of32.p0.0 99861: unlink o12of32.p2.0 99861: unlink i19of32.p2.0 99861: unlink i16of32.p2.0 99861: unlink i6of8.p2.0 99861: unlink i9of32.p2.0 99861: unlink i6of32.p2.0 99861: unlink i8of16.p2.0 99861: unlink i2of8.p2.0 99861: unlink i7of16.p2.0 99861: unlink i10of32.p1.0 99861: unlink i31of32.p2.0 99861: unlink i11of16.p1.0 99861: unlink o14of32.p1.0 99861: unlink i1of16.p0.0 99861: unlink i27of32.p0.0 99861: unlink i28of32.p0.0 99861: unlink o23of32.p0.0 99861: unlink i21of32.p2.0 99861: unlink o25of32.p2.0 99861: unlink o0of32.p1.0 99861: unlink o13of32.p0.0 99861: unlink i9of16.p1.0 99861: unlink i6of16.p1.0 99861: unlink i8of32.p1.0 99861: unlink i17of32.p0.0 99861: unlink i7of32.p1.0 99861: unlink i18of32.p0.0 99861: unlink o15of32.p2.0 99861: unlink i10of16.p2.0 99861: unlink i11of32.p2.0 99861: unlink i30of32.p1.0 99861: unlink o8of32.p0.0 99861: unlink i3of8.p0.0 99861: unlink o7of32.p0.0 99861: unlink i7of8.p0.0 99861: unlink o24of32.p1.0 99861: unlink o1of32.p2.0 99861: unlink i20of32.p1.0 99861: unlink o1of32.p0.0 99861: unlink i7of8.p2.0 99861: unlink i26of32.p1.0 99861: unlink i29of32.p1.0 99861: unlink o22of32.p1.0 99861: unlink o8of32.p2.0 99861: unlink i3of8.p2.0 99861: unlink o7of32.p2.0 99861: unlink i11of32.p0.0 99861: unlink i1of32.p1.0 99861: unlink o15of32.p0.0 99861: unlink i10of16.p0.0 99861: unlink i17of32.p2.0 99861: unlink i18of32.p2.0 99861: unlink o13of32.p2.0 99861: unlink o25of32.p0.0 99861: unlink i21of32.p0.0 99861: unlink o9of32.p1.0 99861: unlink o23of32.p2.0 99861: unlink o6of32.p1.0 99861: unlink i27of32.p2.0 99861: unlink i28of32.p2.0 99861: unlink i1of16.p2.0 99861: unlink i31of32.p0.0 99861: unlink i8of16.p0.0 99861: unlink o12of32.p1.0 99861: unlink i2of8.p0.0 99861: unlink i7of16.p0.0 99861: unlink i6of8.p0.0 99861: unlink i16of32.p1.0 99861: unlink i9of32.p0.0 99861: unlink i19of32.p1.0 99861: unlink i6of32.p0.0 99861: unlink i19of32.p0.0 99861: unlink i6of32.p1.0 99861: unlink i6of8.p1.0 99861: unlink i16of32.p0.0 99861: unlink i9of32.p1.0 99861: unlink i7of16.p1.0 99861: unlink i2of8.p1.0 99861: unlink i8of16.p1.0 99861: unlink o12of32.p0.0 99861: unlink i31of32.p1.0 99861: unlink i10of32.p2.0 99861: unlink i11of16.p2.0 99861: unlink o14of32.p2.0 99861: unlink o6of32.p0.0 99861: unlink o9of32.p0.0 99861: unlink i21of32.p1.0 99861: unlink o0of32.p2.0 99861: unlink o25of32.p1.0 99861: unlink i6of16.p2.0 99861: unlink i9of16.p2.0 99861: unlink i7of32.p2.0 99861: unlink i8of32.p2.0 99861: unlink o15of32.p1.0 99861: unlink i10of16.p1.0 99861: unlink i30of32.p2.0 99861: unlink i1of32.p0.0 99861: unlink i11of32.p1.0 99861: unlink o22of32.p0.0 99861: unlink i29of32.p0.0 99861: unlink i26of32.p0.0 99861: unlink o1of32.p1.0 99861: unlink o24of32.p2.0 99861: unlink i20of32.p2.0 99861: unlink i3of16.p0.0 99861: unlink o19of32.p1.0 99861: unlink o16of32.p1.0 99861: unlink i13of16.p1.0 99861: unlink i2of32.p0.0 99861: unlink i12of32.p1.0 99861: unlink i5of16.p2.0 99861: unlink o31of32.p0.0 99861: unlink i4of32.p2.0 99861: unlink o2of32.p1.0 99861: unlink o28of32.p2.0 99861: unlink o27of32.p2.0 99861: unlink i23of32.p2.0 99861: unlink i5of8.p2.0 99861: unlink o21of32.p0.0 99861: unlink i25of32.p0.0 99861: unlink i1of8.p2.0 99861: unlink i13of32.p2.0 99861: unlink o18of32.p2.0 99861: unlink i12of16.p2.0 99861: unlink o17of32.p2.0 99861: unlink i5of32.p1.0 99861: unlink i15of32.p0.0 99861: unlink i4of16.p1.0 99861: unlink i4of8.p0.0 99861: unlink i14of16.p0.0 99861: unlink o11of32.p0.0 99861: unlink i22of32.p1.0 99861: unlink o29of32.p1.0 99861: unlink o3of32.p2.0 99861: unlink o26of32.p1.0 99861: unlink o5of32.p0.0 99861: unlink o20of32.p2.0 99861: unlink o5of32.p1.0 99861: unlink i24of32.p2.0 99861: unlink o26of32.p0.0 99861: unlink o29of32.p0.0 99861: unlink i22of32.p0.0 99861: unlink i14of16.p1.0 99861: unlink o11of32.p1.0 99861: unlink o30of32.p2.0 99861: unlink i4of16.p0.0 99861: unlink i4of8.p1.0 99861: unlink i15of32.p1.0 99861: unlink i5of32.p0.0 99861: unlink i2of16.p2.0 99861: unlink i3of32.p2.0 99861: unlink i25of32.p1.0 99861: unlink o21of32.p1.0 99861: unlink o4of32.p2.0 99861: unlink o2of32.p0.0 99861: unlink i14of32.p2.0 99861: unlink o10of32.p2.0 99861: unlink i15of16.p2.0 99861: unlink o31of32.p1.0 99861: unlink i12of32.p0.0 99861: unlink i2of32.p1.0 99861: unlink o16of32.p0.0 99861: unlink i13of16.p0.0 99861: unlink i3of16.p1.0 99861: unlink o19of32.p0.0 99861: unlink o16of32.p2.0 99861: unlink i13of16.p2.0 99861: unlink o19of32.p2.0 99861: unlink i12of32.p2.0 99861: unlink o10of32.p0.0 99861: unlink i15of16.p0.0 99861: unlink i5of16.p1.0 99861: unlink i14of32.p0.0 99861: unlink i4of32.p1.0 99861: unlink o27of32.p1.0 99861: unlink o2of32.p2.0 99861: unlink o28of32.p1.0 99861: unlink i23of32.p1.0 99861: unlink o4of32.p0.0 99861: unlink i5of8.p1.0 99861: unlink i1of8.p1.0 99861: unlink i13of32.p1.0 99861: unlink i3of32.p0.0 99861: unlink i12of16.p1.0 99861: unlink o17of32.p1.0 99861: unlink o18of32.p1.0 99861: unlink i2of16.p0.0 99861: unlink i5of32.p2.0 99861: unlink o30of32.p0.0 99861: unlink i4of16.p2.0 99861: unlink i22of32.p2.0 99861: unlink o26of32.p2.0 99861: unlink o29of32.p2.0 99861: unlink o3of32.p1.0 99861: unlink i24of32.p0.0 99861: unlink o20of32.p0.0 99861: unlink o5of32.p2.0 99861: unlink o20of32.p1.0 99861: unlink i24of32.p1.0 99861: unlink o3of32.p0.0 99861: unlink o30of32.p1.0 99861: unlink i4of8.p2.0 99861: unlink i14of16.p2.0 99861: unlink o11of32.p2.0 99861: unlink i15of32.p2.0 99861: unlink o18of32.p0.0 99861: unlink i2of16.p1.0 99861: unlink i12of16.p0.0 99861: unlink o17of32.p0.0 99861: unlink i3of32.p1.0 99861: unlink i13of32.p0.0 99861: unlink i25of32.p2.0 99861: unlink i1of8.p0.0 99861: unlink o4of32.p1.0 99861: unlink i5of8.p0.0 99861: unlink o21of32.p2.0 99861: unlink i23of32.p0.0 99861: unlink o28of32.p0.0 99861: unlink o27of32.p0.0 99861: unlink i4of32.p0.0 99861: unlink i14of32.p1.0 99861: unlink i5of16.p0.0 99861: unlink o31of32.p2.0 99861: unlink o10of32.p1.0 99861: unlink i15of16.p1.0 99861: unlink i2of32.p2.0 99861: unlink i3of16.p2.0 Patched: 93662: create i3of8.p0.0 93662: create i6of8.p0.0 93662: create i4of8.p0.0 93662: create i5of8.p0.0 93662: create i7of8.p0.0 93662: create i2of8.p0.0 93662: create i1of8.p0.0 93664: create i4of8.p1.0 93663: create i2of8.p2.0 93664: create i6of8.p1.0 93663: create i7of8.p2.0 93664: create i1of8.p1.0 93663: create i3of8.p2.0 93664: create i2of8.p1.0 93663: create i4of8.p2.0 93664: create i7of8.p1.0 93664: create i5of8.p1.0 93663: create i5of8.p2.0 93664: create i3of8.p1.0 93663: create i6of8.p2.0 93663: create i1of8.p2.0 93664: create i8of16.p1.0 93662: create i8of16.p0.0 93663: create i8of16.p2.0 93662: create i9of16.p0.0 93664: create i1of16.p1.0 93663: create i1of16.p2.0 93664: create i9of16.p1.0 93662: create i1of16.p0.0 93663: create i9of16.p2.0 93663: create i10of16.p2.0 93664: create i10of16.p1.0 93663: create i2of16.p2.0 93664: create i2of16.p1.0 93662: create i2of16.p0.0 93662: create i10of16.p0.0 93663: create i11of16.p2.0 93663: create i3of16.p2.0 93664: create i3of16.p1.0 93664: create i11of16.p1.0 93662: create i3of16.p0.0 93662: create i11of16.p0.0 93662: create i12of16.p0.0 93664: create i12of16.p1.0 93663: create i12of16.p2.0 93664: create i4of16.p1.0 93662: create i4of16.p0.0 93663: create i4of16.p2.0 93664: create i5of16.p1.0 93663: create i5of16.p2.0 93662: create i5of16.p0.0 93664: create i13of16.p1.0 93662: create i13of16.p0.0 93663: create i13of16.p2.0 93664: create i6of16.p1.0 93664: create i14of16.p1.0 93663: create i6of16.p2.0 93663: create i14of16.p2.0 93662: create i14of16.p0.0 93662: create i6of16.p0.0 93662: create i7of16.p0.0 93663: create i15of16.p2.0 93662: create i15of16.p0.0 93663: create i7of16.p2.0 93664: create i15of16.p1.0 93664: create i7of16.p1.0 93664: unlink i1of8.p1.0 93663: unlink i1of8.p2.0 93662: unlink i1of8.p0.0 93664: unlink i2of8.p1.0 93663: unlink i2of8.p2.0 93662: unlink i2of8.p0.0 93664: unlink i3of8.p1.0 93663: unlink i3of8.p2.0 93664: unlink i4of8.p1.0 93662: unlink i3of8.p0.0 93664: unlink i5of8.p1.0 93663: unlink i4of8.p2.0 93662: unlink i4of8.p0.0 93664: unlink i6of8.p1.0 93663: unlink i5of8.p2.0 93664: unlink i7of8.p1.0 93662: unlink i5of8.p0.0 93663: unlink i6of8.p2.0 93662: unlink i6of8.p0.0 93663: unlink i7of8.p2.0 93662: unlink i7of8.p0.0 93664: create i16of32.p1.0 93663: create i16of32.p2.0 93662: create i16of32.p0.0 93663: create i1of32.p2.0 93664: create i17of32.p1.0 93663: create i17of32.p2.0 93664: create i1of32.p1.0 93662: create i1of32.p0.0 93662: create i17of32.p0.0 93663: create i18of32.p2.0 93663: create i2of32.p2.0 93664: create i18of32.p1.0 93664: create i2of32.p1.0 93662: create i2of32.p0.0 93662: create i18of32.p0.0 93663: create i3of32.p2.0 93663: create i19of32.p2.0 93662: create i19of32.p0.0 93664: create i19of32.p1.0 93662: create i3of32.p0.0 93664: create i3of32.p1.0 93663: create i4of32.p2.0 93663: create i20of32.p2.0 93664: create i4of32.p1.0 93662: create i20of32.p0.0 93664: create i20of32.p1.0 93662: create i4of32.p0.0 93664: create i5of32.p1.0 93664: create i21of32.p1.0 93662: create i21of32.p0.0 93663: create i21of32.p2.0 93663: create i5of32.p2.0 93662: create i5of32.p0.0 93664: create i22of32.p1.0 93663: create i6of32.p2.0 93664: create i6of32.p1.0 93663: create i22of32.p2.0 93662: create i22of32.p0.0 93662: create i6of32.p0.0 93664: create i7of32.p1.0 93664: create i23of32.p1.0 93663: create i7of32.p2.0 93662: create i7of32.p0.0 93663: create i23of32.p2.0 93662: create i23of32.p0.0 93664: create i24of32.p1.0 93662: create i24of32.p0.0 93664: create i8of32.p1.0 93663: create i8of32.p2.0 93662: create i8of32.p0.0 93663: create i24of32.p2.0 93663: create i9of32.p2.0 93664: create i25of32.p1.0 93663: create i25of32.p2.0 93662: create i9of32.p0.0 93664: create i9of32.p1.0 93662: create i25of32.p0.0 93663: create i26of32.p2.0 93663: create i10of32.p2.0 93664: create i26of32.p1.0 93664: create i10of32.p1.0 93662: create i26of32.p0.0 93662: create i10of32.p0.0 93662: create i11of32.p0.0 93664: create i11of32.p1.0 93662: create i27of32.p0.0 93663: create i27of32.p2.0 93664: create i27of32.p1.0 93663: create i11of32.p2.0 93663: create i28of32.p2.0 93664: create i28of32.p1.0 93663: create i12of32.p2.0 93664: create i12of32.p1.0 93662: create i28of32.p0.0 93662: create i12of32.p0.0 93664: create i29of32.p1.0 93664: create i13of32.p1.0 93663: create i13of32.p2.0 93663: create i29of32.p2.0 93662: create i13of32.p0.0 93662: create i29of32.p0.0 93663: create i30of32.p2.0 93664: create i30of32.p1.0 93663: create i14of32.p2.0 93664: create i14of32.p1.0 93662: create i30of32.p0.0 93662: create i14of32.p0.0 93664: create i31of32.p1.0 93663: create i31of32.p2.0 93664: create i15of32.p1.0 93663: create i15of32.p2.0 93662: create i31of32.p0.0 93662: create i15of32.p0.0 93664: unlink i1of16.p1.0 93663: unlink i1of16.p2.0 93662: unlink i1of16.p0.0 93664: unlink i2of16.p1.0 93663: unlink i2of16.p2.0 93664: unlink i3of16.p1.0 93662: unlink i2of16.p0.0 93663: unlink i3of16.p2.0 93664: unlink i4of16.p1.0 93662: unlink i3of16.p0.0 93663: unlink i4of16.p2.0 93664: unlink i5of16.p1.0 93663: unlink i5of16.p2.0 93664: unlink i6of16.p1.0 93663: unlink i6of16.p2.0 93662: unlink i4of16.p0.0 93664: unlink i7of16.p1.0 93663: unlink i7of16.p2.0 93662: unlink i5of16.p0.0 93664: unlink i8of16.p1.0 93663: unlink i8of16.p2.0 93664: unlink i9of16.p1.0 93662: unlink i6of16.p0.0 93663: unlink i9of16.p2.0 93664: unlink i10of16.p1.0 93662: unlink i7of16.p0.0 93663: unlink i10of16.p2.0 93664: unlink i11of16.p1.0 93663: unlink i11of16.p2.0 93664: unlink i12of16.p1.0 93662: unlink i8of16.p0.0 93664: unlink i13of16.p1.0 93663: unlink i12of16.p2.0 93662: unlink i9of16.p0.0 93664: unlink i14of16.p1.0 93663: unlink i13of16.p2.0 93662: unlink i10of16.p0.0 93664: unlink i15of16.p1.0 93663: unlink i14of16.p2.0 93662: unlink i11of16.p0.0 93663: unlink i15of16.p2.0 93662: unlink i12of16.p0.0 93662: unlink i13of16.p0.0 93662: unlink i14of16.p0.0 93662: unlink i15of16.p0.0 93664: create o19of32.p1.0 93663: create o19of32.p2.0 93662: create o23of32.p0.0 93664: create o30of32.p1.0 93663: create o20of32.p2.0 93662: create o26of32.p0.0 93664: create o28of32.p1.0 93663: create o25of32.p2.0 93664: create o21of32.p1.0 93662: create o7of32.p0.0 93663: create o15of32.p2.0 93664: create o8of32.p1.0 93663: create o2of32.p2.0 93662: create o3of32.p0.0 93664: create o20of32.p1.0 93663: create o8of32.p2.0 93662: create o18of32.p0.0 93664: create o14of32.p1.0 93663: create o24of32.p2.0 93662: create o24of32.p0.0 93664: create o12of32.p1.0 93663: create o23of32.p2.0 93664: create o7of32.p1.0 93662: create o10of32.p0.0 93663: create o10of32.p2.0 93664: create o24of32.p1.0 93662: create o22of32.p0.0 93663: create o6of32.p2.0 93664: create o11of32.p1.0 93663: create o17of32.p2.0 93662: create o12of32.p0.0 93664: create o31of32.p1.0 93663: create o4of32.p2.0 93664: create o2of32.p1.0 93662: create o30of32.p0.0 93663: create o11of32.p2.0 93664: create o22of32.p1.0 93663: create o1of32.p2.0 93662: create o14of32.p0.0 93664: create o0of32.p1.0 93663: create o12of32.p2.0 93662: create o29of32.p0.0 93664: create o6of32.p1.0 93663: create o13of32.p2.0 93664: create o18of32.p1.0 93663: create o21of32.p2.0 93662: create o8of32.p0.0 93664: create o5of32.p1.0 93663: create o16of32.p2.0 93662: create o15of32.p0.0 93664: create o4of32.p1.0 93663: create o29of32.p2.0 93662: create o2of32.p0.0 93664: create o9of32.p1.0 93663: create o3of32.p2.0 93662: create o28of32.p0.0 93664: create o27of32.p1.0 93663: create o5of32.p2.0 93662: create o31of32.p0.0 93664: create o26of32.p1.0 93663: create o30of32.p2.0 93662: create o0of32.p0.0 93664: create o3of32.p1.0 93663: create o14of32.p2.0 93664: create o25of32.p1.0 93663: create o0of32.p2.0 93662: create o13of32.p0.0 93664: create o1of32.p1.0 93663: create o28of32.p2.0 93662: create o5of32.p0.0 93664: create o17of32.p1.0 93663: create o26of32.p2.0 93664: create o23of32.p1.0 93662: create o21of32.p0.0 93663: create o18of32.p2.0 93664: create o13of32.p1.0 93662: create o20of32.p0.0 93663: create o31of32.p2.0 93664: create o16of32.p1.0 93662: create o1of32.p0.0 93663: create o27of32.p2.0 93664: create o15of32.p1.0 93662: create o16of32.p0.0 93663: create o7of32.p2.0 93664: create o29of32.p1.0 93662: create o19of32.p0.0 93663: create o22of32.p2.0 93664: create o10of32.p1.0 93662: create o6of32.p0.0 93663: create o9of32.p2.0 93662: create o9of32.p0.0 93662: create o11of32.p0.0 93662: create o4of32.p0.0 93662: create o27of32.p0.0 93662: create o17of32.p0.0 93662: create o25of32.p0.0 93662: unlink i9of32.p0.0 93662: unlink i6of32.p0.0 93662: unlink o23of32.p0.0 93662: unlink i28of32.p0.0 93662: unlink i27of32.p0.0 93662: unlink o25of32.p2.0 93662: unlink i21of32.p2.0 93662: unlink o9of32.p1.0 93662: unlink o6of32.p1.0 93662: unlink o14of32.p1.0 93662: unlink i31of32.p2.0 93662: unlink i10of32.p1.0 93662: unlink i20of32.p1.0 93662: unlink i1of32.p1.0 93662: unlink o24of32.p1.0 93662: unlink o8of32.p2.0 93662: unlink i18of32.p0.0 93662: unlink o7of32.p2.0 93662: unlink i17of32.p0.0 93662: unlink o13of32.p0.0 93662: unlink i30of32.p1.0 93662: unlink i11of32.p2.0 93662: unlink o1of32.p0.0 93662: unlink o15of32.p2.0 93662: unlink o1of32.p1.0 93662: unlink i30of32.p0.0 93662: unlink o13of32.p1.0 93662: unlink i17of32.p1.0 93662: unlink i18of32.p1.0 93662: unlink i1of32.p0.0 93662: unlink o24of32.p0.0 93662: unlink i20of32.p0.0 93662: unlink o22of32.p2.0 93662: unlink i7of32.p2.0 93662: unlink i8of32.p2.0 93662: unlink i26of32.p2.0 93662: unlink i29of32.p2.0 93662: unlink o0of32.p2.0 93662: unlink i10of32.p0.0 93662: unlink o14of32.p0.0 93662: unlink i16of32.p2.0 93662: unlink o6of32.p0.0 93662: unlink i19of32.p2.0 93662: unlink o9of32.p0.0 93662: unlink o12of32.p2.0 93662: unlink i27of32.p1.0 93662: unlink i28of32.p1.0 93662: unlink i6of32.p1.0 93662: unlink o23of32.p1.0 93662: unlink i9of32.p1.0 93662: unlink o25of32.p1.0 93662: unlink i21of32.p1.0 93662: unlink o12of32.p0.0 93662: unlink i16of32.p0.0 93662: unlink o6of32.p2.0 93662: unlink i19of32.p0.0 93662: unlink o9of32.p2.0 93662: unlink o14of32.p2.0 93662: unlink o0of32.p0.0 93662: unlink i10of32.p2.0 93662: unlink i31of32.p1.0 93662: unlink i26of32.p0.0 93662: unlink i29of32.p0.0 93662: unlink o22of32.p0.0 93662: unlink i7of32.p0.0 93662: unlink i8of32.p0.0 93662: unlink i20of32.p2.0 93662: unlink i1of32.p2.0 93662: unlink o24of32.p2.0 93662: unlink o7of32.p1.0 93662: unlink o8of32.p1.0 93662: unlink i11of32.p1.0 93662: unlink i30of32.p2.0 93662: unlink o15of32.p1.0 93662: unlink o15of32.p0.0 93662: unlink i11of32.p0.0 93662: unlink o1of32.p2.0 93662: unlink o13of32.p2.0 93662: unlink o8of32.p0.0 93662: unlink i18of32.p2.0 93662: unlink o7of32.p0.0 93662: unlink i17of32.p2.0 93662: unlink i8of32.p1.0 93662: unlink o22of32.p1.0 93662: unlink i7of32.p1.0 93662: unlink i29of32.p1.0 93662: unlink i26of32.p1.0 93662: unlink i31of32.p0.0 93662: unlink o0of32.p1.0 93662: unlink i19of32.p1.0 93662: unlink i16of32.p1.0 93662: unlink o12of32.p1.0 93662: unlink i21of32.p0.0 93662: unlink o25of32.p0.0 93662: unlink i28of32.p2.0 93662: unlink i27of32.p2.0 93662: unlink i9of32.p2.0 93662: unlink i6of32.p2.0 93662: unlink o23of32.p2.0 93662: unlink i15of32.p1.0 93662: unlink o30of32.p2.0 93662: unlink o11of32.p1.0 93662: unlink o3of32.p1.0 93662: unlink i24of32.p2.0 93662: unlink i5of32.p2.0 93662: unlink o20of32.p2.0 93662: unlink i22of32.p0.0 93662: unlink o29of32.p0.0 93662: unlink o26of32.p0.0 93662: unlink i3of32.p0.0 93662: unlink o31of32.p1.0 93662: unlink o10of32.p2.0 93662: unlink o4of32.p0.0 93662: unlink i14of32.p2.0 93662: unlink o19of32.p0.0 93662: unlink o16of32.p0.0 93662: unlink i12of32.p0.0 93662: unlink o2of32.p2.0 93662: unlink o21of32.p1.0 93662: unlink i4of32.p1.0 93662: unlink i25of32.p1.0 93662: unlink i23of32.p2.0 93662: unlink i2of32.p2.0 93662: unlink o27of32.p2.0 93662: unlink o28of32.p2.0 93662: unlink i25of32.p0.0 93662: unlink o21of32.p0.0 93662: unlink i4of32.p0.0 93662: unlink i12of32.p1.0 93662: unlink o16of32.p1.0 93662: unlink o19of32.p1.0 93662: unlink o4of32.p1.0 93662: unlink o31of32.p0.0 93662: unlink o26of32.p1.0 93662: unlink i3of32.p1.0 93662: unlink o29of32.p1.0 93662: unlink i22of32.p1.0 93662: unlink o17of32.p2.0 93662: unlink o18of32.p2.0 93662: unlink o3of32.p0.0 93662: unlink i13of32.p2.0 93662: unlink o11of32.p0.0 93662: unlink i15of32.p0.0 93662: unlink o5of32.p2.0 93662: unlink i15of32.p2.0 93662: unlink o5of32.p0.0 93662: unlink o11of32.p2.0 93662: unlink o30of32.p1.0 93662: unlink o3of32.p2.0 93662: unlink i13of32.p0.0 93662: unlink o17of32.p0.0 93662: unlink o18of32.p0.0 93662: unlink i24of32.p1.0 93662: unlink i5of32.p1.0 93662: unlink o20of32.p1.0 93662: unlink o10of32.p1.0 93662: unlink o31of32.p2.0 93662: unlink i14of32.p1.0 93662: unlink o2of32.p1.0 93662: unlink o21of32.p2.0 93662: unlink i4of32.p2.0 93662: unlink i25of32.p2.0 93662: unlink i2of32.p0.0 93662: unlink o27of32.p0.0 93662: unlink o28of32.p0.0 93662: unlink i23of32.p0.0 93662: unlink i23of32.p1.0 93662: unlink o28of32.p1.0 93662: unlink i2of32.p1.0 93662: unlink o27of32.p1.0 93662: unlink i12of32.p2.0 93662: unlink o2of32.p0.0 93662: unlink o19of32.p2.0 93662: unlink o16of32.p2.0 93662: unlink o4of32.p2.0 93662: unlink i14of32.p0.0 93662: unlink o10of32.p0.0 93662: unlink o29of32.p2.0 93662: unlink o26of32.p2.0 93662: unlink i3of32.p2.0 93662: unlink i22of32.p2.0 93662: unlink i5of32.p0.0 93662: unlink o20of32.p0.0 93662: unlink i24of32.p0.0 93662: unlink o18of32.p1.0 93662: unlink o17of32.p1.0 93662: unlink i13of32.p1.0 93662: unlink o30of32.p0.0 93662: unlink o5of32.p1.0
Attachment
Hi, Thanks for working on this! On Wed, 10 May 2023 15:11:20 +1200 Thomas Munro <thomas.munro@gmail.com> wrote: > One complaint about PHJ is that it can, in rare cases, use a > surprising amount of temporary disk space where non-parallel HJ would > not. When it decides that it needs to double the number of batches to > try to fit each inner batch into memory, and then again and again > depending on your level of bad luck, it leaves behind all the earlier > generations of inner batch files to be cleaned up at the end of the > query. That's stupid. Here's a patch to unlink them sooner, as a > small improvement. This patch can indeed save a decent amount of temporary disk space. Considering its complexity is (currently?) quite low, it worth it. > The reason I didn't do this earlier is that sharedtuplestore.c > continues the pre-existing tradition where each parallel process > counts what it writes against its own temp_file_limit. At the time I > thought I'd need to have one process unlink all the files, but if a > process were to unlink files that it didn't create, that accounting > system would break. Without some new kind of shared temp_file_limit > mechanism that doesn't currently exist, per-process counters could go > negative, creating free money. In the attached patch, I realised > something that I'd missed before: there is a safe point for each > backend to unlink just the files that it created, and there is no way > for a process that created files not to reach that point. Indeed. For what it worth, from my new and non-experienced understanding of the parallel mechanism, waiting for all workers to reach WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION, after re-dispatching old batches in new ones, seems like a safe place to instruct each workers to clean their old temp files. > Here's an example query that tries 8, 16 and then 32 batches on my > machine, because reltuples is clobbered with a bogus value. Nice! Regards,
On 11/05/2023 00:00, Jehan-Guillaume de Rorthais wrote: > On Wed, 10 May 2023 15:11:20 +1200 > Thomas Munro <thomas.munro@gmail.com> wrote: >> The reason I didn't do this earlier is that sharedtuplestore.c >> continues the pre-existing tradition where each parallel process >> counts what it writes against its own temp_file_limit. At the time I >> thought I'd need to have one process unlink all the files, but if a >> process were to unlink files that it didn't create, that accounting >> system would break. Without some new kind of shared temp_file_limit >> mechanism that doesn't currently exist, per-process counters could go >> negative, creating free money. In the attached patch, I realised >> something that I'd missed before: there is a safe point for each >> backend to unlink just the files that it created, and there is no way >> for a process that created files not to reach that point. > > Indeed. > > For what it worth, from my new and non-experienced understanding of the > parallel mechanism, waiting for all workers to reach > WAIT_EVENT_HASH_GROW_BATCHES_REPARTITION, after re-dispatching old batches in > new ones, seems like a safe place to instruct each workers to clean their old > temp files. Looks good to me too at a quick glance. There's this one "XXX free" comment though: > for (int i = 1; i < old_nbatch; ++i) > { > ParallelHashJoinBatch *shared = > NthParallelHashJoinBatch(old_batches, i); > SharedTuplestoreAccessor *accessor; > > accessor = sts_attach(ParallelHashJoinBatchInner(shared), > ParallelWorkerNumber + 1, > &pstate->fileset); > sts_dispose(accessor); > /* XXX free */ > } I think that's referring to the fact that sts_dispose() doesn't free the 'accessor', or any of the buffers etc. that it contains. That's a pre-existing problem, though: ExecParallelHashRepartitionRest() already leaks the SharedTuplestoreAccessor structs and their buffers etc. of the old batches. I'm a little surprised there isn't aready an sts_free() function. Another thought is that it's a bit silly to have to call sts_attach() just to delete the files. Maybe sts_dispose() should take the same three arguments that sts_attach() does, instead. So that freeing would be nice to tidy up, although the amount of memory leaked is tiny so might not be worth it, and it's a pre-existing issue. I'm marking this as Ready for Committer. -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Sep 27, 2023 at 11:42 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > Looks good to me too at a quick glance. There's this one "XXX free" > comment though: > > > for (int i = 1; i < old_nbatch; ++i) > > { > > ParallelHashJoinBatch *shared = > > NthParallelHashJoinBatch(old_batches, i); > > SharedTuplestoreAccessor *accessor; > > > > accessor = sts_attach(ParallelHashJoinBatchInner(shared), > > ParallelWorkerNumber + 1, > > &pstate->fileset); > > sts_dispose(accessor); > > /* XXX free */ > > } > > I think that's referring to the fact that sts_dispose() doesn't free the > 'accessor', or any of the buffers etc. that it contains. That's a > pre-existing problem, though: ExecParallelHashRepartitionRest() already > leaks the SharedTuplestoreAccessor structs and their buffers etc. of the > old batches. I'm a little surprised there isn't aready an sts_free() > function. > > Another thought is that it's a bit silly to have to call sts_attach() > just to delete the files. Maybe sts_dispose() should take the same three > arguments that sts_attach() does, instead. > > So that freeing would be nice to tidy up, although the amount of memory > leaked is tiny so might not be worth it, and it's a pre-existing issue. > I'm marking this as Ready for Committer. (I thought I'd go around and nudge CF entries where both author and reviewer are committers.) Hi Thomas, do you have any additional thoughts on the above?
Hi, I see in [1] that the reporter mentioned a delay between the error message in parallel HashJoin and the return control back from PSQL. Your patch might reduce this delay. Also, I have the same complaint from users who processed gigabytes of data in parallel HashJoin. Presumably, they also stuck into the unlink of tons of temporary files. So, are you going to do something with this code? [1] https://www.postgresql.org/message-id/18349-83d33dd3d0c855c3%40postgresql.org -- regards, Andrei Lepikhov Postgres Professional
On Wed, Feb 21, 2024 at 7:34 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > I see in [1] that the reporter mentioned a delay between the error > message in parallel HashJoin and the return control back from PSQL. Your > patch might reduce this delay. > Also, I have the same complaint from users who processed gigabytes of > data in parallel HashJoin. Presumably, they also stuck into the unlink > of tons of temporary files. So, are you going to do something with this > code? Yeah, right. I will aim to get this into the tree next week. First, there are a couple of minor issues to resolve around freeing that Heikki mentioned. Then there is the question of whether we think this might be a candidate for back-patching, given the complaints you mention. Opinions? I would add that the problems you reach when you get to very large number of partitions are hard (see several very long threads about extreme skew for one version of the problem, but even with zero/normal skewness and perfect estimation of the number of partitions, if you ask a computer to partition 42TB of data into partitions that fit in a work_mem suitable for a Commodore 64, it's gonna hurt on several levels) and this would only slightly improve one symptom. One idea that might improve just the directory entry and file descriptor aspect, would be to scatter the partitions into (say) 1MB chunks within the file, and hope that the file system supports holes (a bit like logtape.c's multiplexing but I wouldn't do it quite like that).
On 22/2/2024 06:42, Thomas Munro wrote: > On Wed, Feb 21, 2024 at 7:34 PM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> I see in [1] that the reporter mentioned a delay between the error >> message in parallel HashJoin and the return control back from PSQL. Your >> patch might reduce this delay. >> Also, I have the same complaint from users who processed gigabytes of >> data in parallel HashJoin. Presumably, they also stuck into the unlink >> of tons of temporary files. So, are you going to do something with this >> code? > > Yeah, right. I will aim to get this into the tree next week. First, > there are a couple of minor issues to resolve around freeing that > Heikki mentioned. Then there is the question of whether we think this > might be a candidate for back-patching, given the complaints you > mention. Opinions? The code is related to performance, not a bug. Also, it adds one external function into the 'sharedtuplestore.h'. IMO, it isn't worth it to make back-patches. > > I would add that the problems you reach when you get to very large > number of partitions are hard (see several very long threads about > extreme skew for one version of the problem, but even with zero/normal > skewness and perfect estimation of the number of partitions, if you > ask a computer to partition 42TB of data into partitions that fit in a > work_mem suitable for a Commodore 64, it's gonna hurt on several > levels) and this would only slightly improve one symptom. One idea > that might improve just the directory entry and file descriptor > aspect, would be to scatter the partitions into (say) 1MB chunks > within the file, and hope that the file system supports holes (a bit > like logtape.c's multiplexing but I wouldn't do it quite like that). Thanks, I found in [1] good entry point to dive into this issue. [1] https://www.postgresql.org/message-id/CA+hUKGKDbv+5uiJZDdB1wttkMPFs9CDb6=02Qxitq4am-KBM_A@mail.gmail.com -- regards, Andrei Lepikhov Postgres Professional
On Thu, Feb 22, 2024 at 5:37 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 22/2/2024 06:42, Thomas Munro wrote: > > extreme skew for one version of the problem, but even with zero/normal > > skewness and perfect estimation of the number of partitions, if you Sorry, I meant to write "but even with no duplicates" there (mention of "normal" was brain fade).
On Wed, Feb 21, 2024 at 6:42 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Yeah, right. I will aim to get this into the tree next week. First, > there are a couple of minor issues to resolve around freeing that > Heikki mentioned. Then there is the question of whether we think this > might be a candidate for back-patching, given the complaints you > mention. Opinions? It doesn't appear to me that this got committed. On the procedural question, I would personally treat it as a non-back-patchable bug fix i.e. master-only but without regard to feature freeze. However, I can see arguments for either treating it as a back-patchable fix or for waiting until v18 development opens. What would you like to do? -- Robert Haas EDB: http://www.enterprisedb.com