Re: pg_dump versus hash partitioning - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg_dump versus hash partitioning |
Date | |
Msg-id | CA+TgmoaZ8kDKiQMtDq82Qqqgc5JAfF_UEvc2qa46QwVVLvtMDw@mail.gmail.com Whole thread Raw |
In response to | pg_dump versus hash partitioning (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pg_dump versus hash partitioning
Re: pg_dump versus hash partitioning |
List | pgsql-hackers |
On Wed, Feb 1, 2023 at 11:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Over at [1] we have a complaint that dump-and-restore fails for > hash-partitioned tables if a partitioning column is an enum, > because the enum values are unlikely to receive the same OIDs > in the destination database as they had in the source, and the > hash codes are dependent on those OIDs. It seems to me that this is the root of the problem. We can't expect to hash on something that's not present in the dump file and have anything work. > So here's what I think we should actually do: make > --load-via-partition-root the default. We can provide a > switch to turn it off, for those who are brave or foolish > enough to risk that in the name of saving a few cycles, > but it ought to be the default. > > Furthermore, I think we should make this happen in time for > next week's releases. I can write the patch easily enough, > but we need a consensus PDQ that this is what to do. This seems extremely precipitous to me and I'm against it. I like the fact that we have --load-via-partition-root, but it is a bit of a hack. You don't get a single copy into the partition root, you get one per child table -- and those COPY statements are listed as data for the partitions where the data lives now, not for the parent table. I am not completely sure whether there is a scenario where that's an issue, but it's certainly an oddity. Also, and I think pretty significantly, using --load-via-partition-root forces you to pay the overhead of rerouting every tuple to the target partition whether you need it or not, which is potentially a large unnecessary expense. I don't think we should just foist that kind of overhead onto everyone in every situation for every data type because somebody had a problem in a certain case. And even if we do decide to do that at some point, I don't think it is right at all to rush such a change out on a short time scale, with little time to mull over consequences and alternative fixes. I think that could easily hurt more people than it helps. I think that not all of the cases that you list are of the same type. Loading a dump under a different encoding or on a different endianness are surely corner cases. They might come up for some people occasionally, but they're not typical. In the case of endianness, that's because little-Endian has pretty much taken over the world; in the case of encoding, that's because converting data between encodings is a real pain, and combining that with a database dump and restore is likely to be very little fun. It's hard to argue that collation changes fall into the same category: we know that they get changed all the time, often silently. But none of us database geeks think that's a good thing: just that it's a thing that we have to deal with. The enum case seems completely different to me. That's not the result of trying to migrate your data to another architecture or of the glibc maintainers not believing in sorting working the same way on Tuesday that it did on Monday. That's the result of the PostgreSQL project hashing data in a way that is does not make any real sense for the application at hand. Any hash function that we use for partitioning has to work based on data that is preserved by the dump-and-restore process. I would argue that the float case is not of the same kind: yes, if you round your data off, then the values are going to hash differently, but if you truncate your strings, those will hash differently too. Duh. Intentionally changing the value is supposed to change the hash code, that's kind of the point of hashing. So I think we should be asking ourselves what we could do about the enum case specifically, rather than resorting to a bazooka. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: