Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200312234014.GI29065@telsasoft.com
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
Thanks for working on this.  I have some minor comments.

In 0005:

+                /* Restore the input path (we might have addes Sort on top). */

=> added?  There's at least two more of the same typo.

+                /* also ignore already sorted paths */

=> You say that in a couple places, but I don't think "also" makes sense since
there's nothing preceding it ?

In 0004:

+             * end up resorting the entire data set.  So, unless we can push

=> re-sorting

+ * Unlike generate_gather_paths, this does not look just as pathkeys of the

=> look just AT ?

+            /* now we know is_sorted == false */

=> I would just spell that "Assert", as I think you already do elsewhere.

+                /* continue */

=> Please consider saying "fall through", since "continue" means exactly the
opposite.


+generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
...
+            /* finally, consider incremental sort */
...
+                /* Also consider incremental sort. */

=> I think it's more confusing than useful with two comments - one is adequate.

In 0002:

+ * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node
...
+ * make_incrementalsort --- basic routine to build a IncrementalSort plan node

=> AN incremental

+ * Initial size of memtuples array.  We're trying to select this size so that
+ * array don't exceed ALLOCSET_SEPARATE_THRESHOLD and overhead of allocation
+ * be possible less.  However, we don't cosider array sizes less than 1024

Four typos (?)
that array DOESN'T
and THE overhead
CONSIDER
I'm not sure, but "be possible less" should maybe say "possibly be less" ?

+    bool        maxSpaceOnDisk;    /* true when maxSpace is value for on-disk

I suggest to call it IsMaxSpaceDisk

+    MemoryContext maincontext;    /* memory context for tuple sort metadata
+                       that persist across multiple batches */

persists

+ *    a new sort.  It allows evade recreation of tuple sort (and save resources)
+ *    when sorting multiple small batches.

allows to avoid?  Or allows avoiding?

+ *     When performing sorting by multiple keys input dataset could be already
+ *     presorted by some prefix of these keys.  We call them "presorted keys".

"already presorted" sounds redundant

+    int64        fullsort_group_count;    /* number of groups with equal presorted keys */
+    int64        prefixsort_group_count;    /* number of groups with equal presorted keys */

I guess these should have different comments

-- 
Justin



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Additional size of hash table is alway zero for hash aggregates
Next
From: Michael Paquier
Date:
Subject: Re: [Patch] pg_rewind: options to use restore_command fromrecovery.conf or command line