Re: pgsql: Introduce pg_shmem_allocations_numa view - Mailing list pgsql-hackers

From Christoph Berg
Subject Re: pgsql: Introduce pg_shmem_allocations_numa view
Date
Msg-id aFq5HQj016rVS2lm@msg.df7cb.de
Whole thread Raw
In response to Re: pgsql: Introduce pg_shmem_allocations_numa view  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: pgsql: Introduce pg_shmem_allocations_numa view
List pgsql-hackers
Re: Bertrand Drouvot
> Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used
> in do_pages_move()).

I was also reading the kernel source around that place but you spotted
the problem before me. This patch resolves the issue here:

diff --git a/mm/migrate.c b/mm/migrate.c
index 8cf0f9c9599..542c81ec3ed 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
         if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status)))
             break;

-        pages += chunk_nr;
+        if (in_compat_syscall()) {
+            compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages;
+
+            pages32 += chunk_nr;
+            pages = (const void __user * __user *) pages32;
+        } else
+            pages += chunk_nr;
         status += chunk_nr;
         nr_pages -= chunk_nr;
     }


> Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
> by the wrong pointer arithmetic.

Good idea. Buggy kernels will be around for some time.

> +               #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> +
> +               for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE)
{

Perhaps optimize it to this:

#if sizeof(void *) == 4
#define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
#else
#define NUMA_QUERY_CHUNK_SIZE 1024
#endif

... or some other bigger number.

The loop could also include CHECK_FOR_INTERRUPTS();

> > I don't have much
> > experience with the kernel code, so don't want to rely too much on my
> > interpretation of it.
> 
> I don't have that much experience too but I think the issue is in do_pages_stat()
> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) instead.

Me neither, but I'll try submit this fix.

Christoph



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Safeguards against incorrect fd flags for fsync()
Next
From: Tomas Vondra
Date:
Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view