Re: Optimizing ResouceOwner to speed up COPY - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Optimizing ResouceOwner to speed up COPY
Date
Msg-id c8ade815-127b-4a78-a481-a3f47ebbeec0@vondra.me
Whole thread Raw
In response to Re: Optimizing ResouceOwner to speed up COPY  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Optimizing ResouceOwner to speed up COPY
List pgsql-hackers

On 10/17/25 10:31, Heikki Linnakangas wrote:
> On 17/10/2025 06:13, Chao Li wrote:
>> ```
>> @@ -250,12 +257,21 @@ ResourceOwnerAddToHash(ResourceOwner owner,
>> Datum value, const ResourceOwnerDesc
>>       idx = hash_resource_elem(value, kind) & mask;
>>       for (;;)
>>       {
>> +        /* found an exact match - just increment the counter */
>> +        if ((owner->hash[idx].kind == kind) &&
>> +            (owner->hash[idx].item == value))
>> +        {
>> +            owner->hash[idx].count += count;
>> +            return;
>> +        }
>> +
>>           if (owner->hash[idx].kind == NULL)
>>               break;                /* found a free slot */
>>           idx = (idx + 1) & mask;
>>       }
>> ```
>>
>> I think this is the “trade-off” you mention in your email: if a free
>> slot found earlier, then still gets duplicated entries. I have no
>> concern to this “trade-off”, but please add a comment here, otherwise
>> future readers may be confused, and might potentially consider that
>> were a bug, without reading your original design email.
> 
> +1 on such a comment. Here or maybe on the ResourceElem struct itself.
> 

Will do. It's probably appropriate to mention this deduplication in the
comment at the beginning of the file.

>>  typedef struct ResourceElem
>>  {
>>      Datum        item;
>> +    uint32        count;                /* number of occurrences */
>>      const ResourceOwnerDesc *kind;    /* NULL indicates a free hash
>> table slot */
>>  } ResourceElem;
> 
> Hmm, the 'count' is not used when the entry is stored in the array.
> Perhaps we should have a separate struct for array and hash elements
> now. Keeping the array small helps it to fit in CPU caches.
> 

Agreed. I had the same idea yesterday, but I haven't done it yet.

>> /*
>>  * Forget that an object is owned by a ResourceOwner
>>  *
>>  * Note: If same resource ID is associated with the ResourceOwner more
>> than
>>  * once, one instance is removed.
>>  *
>>  * Note: Forgetting a resource does not guarantee that there is room to
>>  * remember a new resource.  One exception is when you forget the most
>>  * recently remembered resource; that does make room for a new
>> remember call.
>>  * Some code callers rely on that exception.
>>  */
>> void
>> ResourceOwnerForget(ResourceOwner owner, Datum value, const
>> ResourceOwnerDesc *kind)
> 
> Does this break the exception noted in the above comment? I guess it
> still holds because we don't deduplicate entries in the array. That's
> very subtle, needs a comment at least.
> 

Right, it doesn't break the exception - as you say, it only applies to
the hash, not to the array. I'll add a comment.

Thanks

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pg_restore --no-policies should not restore policies' comment
Next
From: Dmitry Dolgov
Date:
Subject: Re: Changing shared_buffers without restart