BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

Lists: pgsql-bugs
From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: david_sisson(at)dell(dot)com
Subject: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-20 22:48:09
Message-ID: 17757-dbdfc1f1c954a6db@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17757
Logged by: David Angel
Email address: david_sisson(at)dell(dot)com
PostgreSQL version: 14.5
Operating system: Linux
Description:

On an OS where hugepages are enabled, if no hugepages resources are assigned
in Kubernetes and the postgres instance is set to hugepages = off in the
config then one would assume that the DB would not use hugepages.
However, because the initdb process uses postgresql.conf.sample or
postgresql.conf.template instead of the actual specified configuration the
applied setting is actually hugepages = try during initdb.
In these cases, the initdb phase will attempt to allocate huge pages that
are available in the OS, but it will be denied access by Kubernetes and
fail.

Here is a PR with a possible fix:
http://github.com/postgres/postgres/pull/114/files

Here are some links for further information
Ours: http://github.com/CrunchyData/postgres-operator/issues/3477

Others with the same having no solution to disable huge pages.
http://github.com/CrunchyData/postgres-operator/issues/3039
http://github.com/CrunchyData/postgres-operator/issues/2258
http://github.com/CrunchyData/postgres-operator/issues/3126
http://github.com/CrunchyData/postgres-operator/issues/3421

Bitnami
http://github.com/bitnami/charts/issues/7901


From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-21 23:10:29
Message-ID: 4ce40288-5b88-f9fb-ea11-fc932c61472b@enterprisedb.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

On 1/20/23 23:48, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 17757
> Logged by: David Angel
> Email address: david_sisson(at)dell(dot)com
> PostgreSQL version: 14.5
> Operating system: Linux
> Description:
>
> On an OS where hugepages are enabled, if no hugepages resources are assigned
> in Kubernetes and the postgres instance is set to hugepages = off in the
> config then one would assume that the DB would not use hugepages.

There's no config at that point - it's initdb that creates it, by
copying the .sample file, IIRC. So not sure which file you're modifying.

> However, because the initdb process uses postgresql.conf.sample or
> postgresql.conf.template instead of the actual specified configuration the
> applied setting is actually hugepages = try during initdb.

Specified how?

> In these cases, the initdb phase will attempt to allocate huge pages that
> are available in the OS, but it will be denied access by Kubernetes and
> fail.

Well, so how exactly this fails? Does that mean Kubernetes broke mmap()
with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are
not available, or what? Because that's the only explanation I can see,
looking at the code.

Or it just does not realize there are no hugepages, returns something
and then crashes with SIGBUS later when trying to access it?

>
> Here is a PR with a possible fix:
> http://github.com/postgres/postgres/pull/114/files
>

I doubt we want to just go straight to changing the default value for
everyone. IMHO if the "try" logic is somehow broken, we should fix the
try logic, not mess with the defaults.

In the worst case, the operator can probably tweak the .sample config
before calling initdb.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-21 23:29:22
Message-ID: 20230121232922.juo7t3fhaso7qh3s@awork3.anarazel.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
> On 1/20/23 23:48, PG Bug reporting form wrote:
> > In these cases, the initdb phase will attempt to allocate huge pages that
> > are available in the OS, but it will be denied access by Kubernetes and
> > fail.
>
> Well, so how exactly this fails? Does that mean Kubernetes broke mmap()
> with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are
> not available, or what? Because that's the only explanation I can see,
> looking at the code.

Yea, that's what I was wondering about as well.

> Or it just does not realize there are no hugepages, returns something
> and then crashes with SIGBUS later when trying to access it?

I assume that that's the case. There's references to bus errors in a bunch of
the linked issues. E.g.
http://github.com/CrunchyData/postgres-operator/issues/413

selecting default max_connections ... sh: line 1: 60 Bus error (core dumped) "/usr/pgsql-10/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1

It's possible that the problem would go away if we used MAP_POPULATE for the
allocation.

I'd guess that this is annoying cgroups stuff :(

> I doubt we want to just go straight to changing the default value for
> everyone. IMHO if the "try" logic is somehow broken, we should fix the
> try logic, not mess with the defaults.

Agreed. But we could disable huge pages explicitly inside initdb - there's
really no point in using it there...

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-21 23:30:27
Message-ID: 1113115.1674343827@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
> On 1/20/23 23:48, PG Bug reporting form wrote:
>> Here is a PR with a possible fix:
>> http://github.com/postgres/postgres/pull/114/files

> I doubt we want to just go straight to changing the default value for
> everyone.

Yeah, that proposal is a non-starter. I could see providing an
initdb option to adjust the value applied during initdb, though.

Ideally, maybe what we want is a generalized switch that could
replace any variable in the sample config, along the lines of
the server's "-c foo=bar". I recall having tried to do that and
having run into quoting hazards, but I did not try very hard.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-21 23:33:03
Message-ID: 1113361.1674343983@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
>> I doubt we want to just go straight to changing the default value for
>> everyone. IMHO if the "try" logic is somehow broken, we should fix the
>> try logic, not mess with the defaults.

> Agreed. But we could disable huge pages explicitly inside initdb - there's
> really no point in using it there...

One of the things initdb is trying to do is establish a set of values
that is known to allow the server to start. Not using the same settings
that the server is expected to use would break that idea completely.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-21 23:45:01
Message-ID: 20230121234501.b7b2skgbdw73ujvh@awork3.anarazel.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On 2023-01-21 18:33:03 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
> >> I doubt we want to just go straight to changing the default value for
> >> everyone. IMHO if the "try" logic is somehow broken, we should fix the
> >> try logic, not mess with the defaults.

> > Agreed. But we could disable huge pages explicitly inside initdb - there's
> > really no point in using it there...
>
> One of the things initdb is trying to do is establish a set of values
> that is known to allow the server to start. Not using the same settings
> that the server is expected to use would break that idea completely.

Yea, I'm not saying like the approach. OTOH, we don't provide a proper way to
influence the configuration, which is bad, as this issue shows.

Perhaps we should add an option to force MAP_POPULATE being used? I'm fairly
certain that'd avoid the SIGBUS in this case. And it'd make sense to ensure
that we can actually use the memory in initdb.

Unfortunately it's not unproblematic to use it in general, because with large
shared_buffers values it can be quite slow, because the kernel initializes the
memory in a single thread. I've seen ~3GB/s on multi-socket machines.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-22 00:08:01
Message-ID: 1116926.1674346081@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Andres Freund <andres(at)anarazel(dot)de> writes:
> Perhaps we should add an option to force MAP_POPULATE being used? I'm fairly
> certain that'd avoid the SIGBUS in this case. And it'd make sense to ensure
> that we can actually use the memory in initdb.

> Unfortunately it's not unproblematic to use it in general, because with large
> shared_buffers values it can be quite slow, because the kernel initializes the
> memory in a single thread. I've seen ~3GB/s on multi-socket machines.

Hmm ... but if we can't use it by default, we're still back to the
problem of needing a way to tell initdb to do things differently.
I'd just as soon keep that to "set huge_pages = off" rather than
inventing whole new things.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-22 00:27:04
Message-ID: 20230122002704.yoskrrfkbgi7xcfs@awork3.anarazel.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On 2023-01-21 15:29:22 -0800, Andres Freund wrote:
> On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
> > On 1/20/23 23:48, PG Bug reporting form wrote:
> > > In these cases, the initdb phase will attempt to allocate huge pages that
> > > are available in the OS, but it will be denied access by Kubernetes and
> > > fail.
> >
> > Well, so how exactly this fails? Does that mean Kubernetes broke mmap()
> > with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are
> > not available, or what? Because that's the only explanation I can see,
> > looking at the code.
>
> Yea, that's what I was wondering about as well.
>
>
> > Or it just does not realize there are no hugepages, returns something
> > and then crashes with SIGBUS later when trying to access it?
>
> I assume that that's the case. There's references to bus errors in a bunch of
> the linked issues. E.g.
> http://github.com/CrunchyData/postgres-operator/issues/413
>
> selecting default max_connections ... sh: line 1: 60 Bus error (core dumped) "/usr/pgsql-10/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
>
> It's possible that the problem would go away if we used MAP_POPULATE for the
> allocation.

> I'd guess that this is annoying cgroups stuff :(

Ah, the fun:
http://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/hugetlb.html

The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
control group and enforces the limit during page fault. Since HugeTLB
doesn't support page reclaim, enforcing the limit at page fault time implies
that, the application will get SIGBUS signal if it tries to fault in HugeTLB
pages beyond its limit. Therefore the application needs to know exactly how many
HugeTLB pages it uses before hand, and the sysadmin needs to make sure that
there are enough available on the machine for all the users to avoid processes
getting SIGBUS.

but there's also

Reservation accounting

hugetlb.<hugepagesize>.rsvd.limit_in_bytes hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes hugetlb.<hugepagesize>.rsvd.usage_in_bytes hugetlb.<hugepagesize>.rsvd.failcnt

The HugeTLB controller allows to limit the HugeTLB reservations per control
group and enforces the controller limit at reservation time and at the fault
of HugeTLB memory for which no reservation exists. Since reservation limits
are enforced at reservation time (on mmap or shget), reservation limits
never causes the application to get SIGBUS signal if the memory was reserved
before hand. For MAP_NORESERVE allocations, the reservation limit behaves
the same as the fault limit, enforcing memory usage at fault time and
causing the application to receive a SIGBUS if it’s crossing its limit.

Reservation limits are superior to page fault limits described above, since
reservation limits are enforced at reservation time (on mmap or shget), and
never causes the application to get SIGBUS signal if the memory was reserved
before hand. This allows for easier fallback to alternatives such as
non-HugeTLB memory for example. In the case of page fault accounting, it’s
very hard to avoid processes getting SIGBUS since the sysadmin needs
precisely know the HugeTLB usage of all the tasks in the system and make
sure there is enough pages to satisfy all requests. Avoiding tasks getting
SIGBUS on overcommited systems is practically impossible with page fault
accounting.

So the problem is that the wrong time of cgroup limits are used. I don't know
if that's a kubernetes or a postgres-operator issue.

Greetings,

Andres Freund


From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-22 00:55:01
Message-ID: 127328f2-4e01-e9ec-c9b1-76aef967343f@enterprisedb.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

On 1/22/23 00:30, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>> On 1/20/23 23:48, PG Bug reporting form wrote:
>>> Here is a PR with a possible fix:
>>> http://github.com/postgres/postgres/pull/114/files
>
>> I doubt we want to just go straight to changing the default value for
>> everyone.
>
> Yeah, that proposal is a non-starter. I could see providing an
> initdb option to adjust the value applied during initdb, though.
>
> Ideally, maybe what we want is a generalized switch that could
> replace any variable in the sample config, along the lines of
> the server's "-c foo=bar". I recall having tried to do that and
> having run into quoting hazards, but I did not try very hard.
>

Yeah, I was looking for something like "-c" in initdb, only to realize
there's nothing like that. The main "problem" with adding that is that
we're unlikely to backpatch that (I guess), and thus it does not really
solve the issue for the OP.

I'm not sure we'd be keen to backpatch a change of the default, but
maybe we would ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-22 01:01:08
Message-ID: 1122629.1674349268@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
> On 1/22/23 00:30, Tom Lane wrote:
>> Yeah, that proposal is a non-starter. I could see providing an
>> initdb option to adjust the value applied during initdb, though.
>> Ideally, maybe what we want is a generalized switch that could
>> replace any variable in the sample config, along the lines of
>> the server's "-c foo=bar". I recall having tried to do that and
>> having run into quoting hazards, but I did not try very hard.

> Yeah, I was looking for something like "-c" in initdb, only to realize
> there's nothing like that. The main "problem" with adding that is that
> we're unlikely to backpatch that (I guess), and thus it does not really
> solve the issue for the OP.

> I'm not sure we'd be keen to backpatch a change of the default, but
> maybe we would ...

Back-patching a change of default seems like REALLY a non-starter.
Perhaps adding a switch (which would break nothing if not used)
could be discussed, though.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, david_sisson(at)dell(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-22 02:08:26
Message-ID: 20230122020826.x6geac6qjiinr66a@awork3.anarazel.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On 2023-01-22 01:55:01 +0100, Tomas Vondra wrote:
> I'm not sure we'd be keen to backpatch a change of the default, but
> maybe we would ...

After figuring out that it's clearly a configuration issue *somewhere* outside
of postgres's remit, I'm not that sure it's worth doing something concretely
to avoid the SIGBUS issue.

But if we end up doing something, I think a parameter triggering use of
MAP_POPULATE would be a good idea. It's actually useful outside of the SIGBUS
issue, because benchmarks reach a steady state noticably more quickly when
using it.

OTOH, in a production scenario with large shared_buffers I'd probably not want
to use it, because getting up more quickly and and distributing the memory
initialization across across cores is more important.

I think it'd be ok to explicitly specify such an option in initdb - after all,
initdb does do work to determine the correct shared buffers size etc, and
MAP_POPULATE will lead to a more reliable determination. Not just with huge
pages, but also with "small" pages and system-level memory overcommit.

Greetings,

Andres Freund


From: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Subject: RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 19:26:09
Message-ID: LV2PR19MB5765BF10D8D3FD015C6011468EC89@LV2PR19MB5765.namprd19.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf" file.
huge_pages can be turned on through outside manipulation but it can't be turned off.
Not without altering the sample config file.

Thanks,
David Angel 😊

Internal Use - Confidential

-----Original Message-----
From: Andres Freund <andres(at)anarazel(dot)de>
Sent: Saturday, January 21, 2023 8:08 PM
To: Tomas Vondra
Cc: Tom Lane; Sisson, David; pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

[EXTERNAL EMAIL]

Hi,

On 2023-01-22 01:55:01 +0100, Tomas Vondra wrote:
> I'm not sure we'd be keen to backpatch a change of the default, but
> maybe we would ...

After figuring out that it's clearly a configuration issue *somewhere* outside of postgres's remit, I'm not that sure it's worth doing something concretely to avoid the SIGBUS issue.

But if we end up doing something, I think a parameter triggering use of MAP_POPULATE would be a good idea. It's actually useful outside of the SIGBUS issue, because benchmarks reach a steady state noticably more quickly when using it.

OTOH, in a production scenario with large shared_buffers I'd probably not want to use it, because getting up more quickly and and distributing the memory initialization across across cores is more important.

I think it'd be ok to explicitly specify such an option in initdb - after all, initdb does do work to determine the correct shared buffers size etc, and MAP_POPULATE will lead to a more reliable determination. Not just with huge pages, but also with "small" pages and system-level memory overcommit.

Greetings,

Andres Freund


From: Christophe Pettus <xof(at)thebuild(dot)com>
To: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 19:38:09
Message-ID: C0176E91-8B71-4CA2-9FCE-B27944040D6D@thebuild.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

> On Jan 23, 2023, at 11:26, Sisson, David <David(dot)Sisson(at)dell(dot)com> wrote:
>
> I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf" file.

We are? I believe the default is "huge_pages = try", not off.


From: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
To: Christophe Pettus <xof(at)thebuild(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Subject: RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 19:51:14
Message-ID: LV2PR19MB5765325D85ADB08967725D058EC89@LV2PR19MB5765.namprd19.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdb runs.
There is no way to turn it off without altering the sample config file.

It is quite difficult to nearly impossible to alter the "postgresql.conf.sample" file using a 3rd party controller.
The file is read-only at runtime within Kubernetes.
Only some controllers let you modify the sample file without rebuilding their code.

You guys are awesome with truly outstanding responses.
I certainly didn't expect my initial solution to be used but to help in finding a good solution. 😊

Thanks,
David Angel

Internal Use - Confidential

-----Original Message-----
From: Christophe Pettus <xof(at)thebuild(dot)com>
Sent: Monday, January 23, 2023 1:38 PM
To: Sisson, David
Cc: Andres Freund; Tomas Vondra; Tom Lane; pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

[EXTERNAL EMAIL]

> On Jan 23, 2023, at 11:26, Sisson, David <David(dot)Sisson(at)dell(dot)com> wrote:
>
> I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf" file.

We are? I believe the default is "huge_pages = try", not off.


From: Andres Freund <andres(at)anarazel(dot)de>
To: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 19:55:04
Message-ID: 20230123195504.uvlcd3aytn3jg744@awork3.anarazel.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On 2023-01-23 19:26:09 +0000, Sisson, David wrote:
> I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf" file.
> huge_pages can be turned on through outside manipulation but it can't be
> turned off.

It's a fault of the environment if mmap(MAP_HUGETLB) causes a SIGBUS. Normally
huge_pages = try is harmless, because it'll just fall back. That source of
SIGBUSes needs to be fixed regardless of anything else - plenty allocators try
to use huge pages for example, so you'll run into problems regardless of
postgres' default.

That said, I'm for allowing to specify options to initdb.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 19:55:50
Message-ID: 1882151.1674503750@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

"Sisson, David" <David(dot)Sisson(at)dell(dot)com> writes:
> The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
> When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdb runs.

What "standard postgresql.conf file"? There is no such thing until
initdb creates it.

> There is no way to turn it off without altering the sample config file.

Yup, that's exactly why we are having this discussion.

regards, tom lane


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 20:00:45
Message-ID: CAKFQuwZD+iVWJ4-hko+CzOx3JMK8E87cwYbqPcGiOr9stau+qw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Jan 23, 2023 at 12:51 PM Sisson, David <David(dot)Sisson(at)dell(dot)com>
wrote:

> The default is "huge_pages = try" which is commented out in the
> "postgresql.conf.sample" file.
> When a consumer like myself turns it off in the standard "postgresql.conf"
> file, it should not be turned on when initdb runs.
> There is no way to turn it off without altering the sample config file.
>
>
Right, the present way to control what is seen by initdb is
postgresql.conf.sample since that is the template that initdb uses to then
produce an actual postgresql.conf for the newly created instance.
postgresql.conf is only ever a per-instance configuration file. It doesn't
make sense to "change postgresql.conf in hopes of influencing some future
initdb run."

David J.


From: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Sisson, David" <David(dot)Sisson(at)dell(dot)com>, "Howell, Stephen" <Stephen(dot)Howell(at)dell(dot)com>
Subject: RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 20:12:18
Message-ID: LV2PR19MB5765584F8202DAD0D3F58C778EC89@LV2PR19MB5765.namprd19.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

That makes sense, the PostgreSQL controllers are calling initdb to create the "postgresql.conf" file before they apply customizations to it.
To the consumer, it is just yaml to be added to the "postgresql.conf" file.

That makes it much harder to fix and means it is really the controllers at fault.

This probably needs to be explicitly documented when creating a HA cluster or within initdb docs.
http://www.postgresql.org/docs/15/app-initdb.html

Maybe something about how initdb uses sample and what configuration settings must be pre-configured.

Thanks,
David Angel

Internal Use - Confidential

-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Monday, January 23, 2023 1:56 PM
To: Sisson, David
Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

[EXTERNAL EMAIL]

"Sisson, David" <David(dot)Sisson(at)dell(dot)com> writes:
> The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
> When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdb runs.

What "standard postgresql.conf file"? There is no such thing until initdb creates it.

> There is no way to turn it off without altering the sample config file.

Yup, that's exactly why we are having this discussion.

regards, tom lane


From: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Howell, Stephen" <Stephen(dot)Howell(at)dell(dot)com>, "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Subject: RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 20:35:17
Message-ID: LV2PR19MB57655767D956E1C71F0E4CFA8EC89@LV2PR19MB5765.namprd19.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with "huge_pages=false".
Would that be acceptable?

Passing in a config setting into initdb would still require a rebuild of all controllers.
That could take months to years at best.

Thanks,
David Angel

Internal Use - Confidential

-----Original Message-----
From: Sisson, David <David_Sisson(at)Dell(dot)com>
Sent: Monday, January 23, 2023 2:12 PM
To: Tom Lane
Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs(at)lists(dot)postgresql(dot)org; Sisson, David; Howell, Stephen
Subject: RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

That makes sense, the PostgreSQL controllers are calling initdb to create the "postgresql.conf" file before they apply customizations to it.
To the consumer, it is just yaml to be added to the "postgresql.conf" file.

That makes it much harder to fix and means it is really the controllers at fault.

This probably needs to be explicitly documented when creating a HA cluster or within initdb docs.
http://www.postgresql.org/docs/15/app-initdb.html

Maybe something about how initdb uses sample and what configuration settings must be pre-configured.

Thanks,
David Angel

Internal Use - Confidential

-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Monday, January 23, 2023 1:56 PM
To: Sisson, David
Cc: Christophe Pettus; Andres Freund; Tomas Vondra; pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

[EXTERNAL EMAIL]

"Sisson, David" <David(dot)Sisson(at)dell(dot)com> writes:
> The default is "huge_pages = try" which is commented out in the "postgresql.conf.sample" file.
> When a consumer like myself turns it off in the standard "postgresql.conf" file, it should not be turned on when initdb runs.

What "standard postgresql.conf file"? There is no such thing until initdb creates it.

> There is no way to turn it off without altering the sample config file.

Yup, that's exactly why we are having this discussion.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christophe Pettus <xof(at)thebuild(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Howell, Stephen" <Stephen(dot)Howell(at)dell(dot)com>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 21:10:13
Message-ID: 20230123211013.uebdheyxgfakxuiv@awork3.anarazel.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On 2023-01-23 20:35:17 +0000, Sisson, David wrote:
> A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with "huge_pages=false".
> Would that be acceptable?

This is a kubernetes or postgres-operator bug (setting up the wrong cgroup
limit, which the docs explicitly warn against doing). I don't think we want to
accumulate workarounds like that in postgres.

> Passing in a config setting into initdb would still require a rebuild of all controllers.
> That could take months to years at best.

Huh. I don't know anything about the controller, but that seems problematic
independent of this specific issue. And you'd still need to deploy a new
version of postgres to get such changes...

> Internal Use - Confidential

Hardly.

Greetings,

Andres Freund


From: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christophe Pettus <xof(at)thebuild(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Howell, Stephen" <Stephen(dot)Howell(at)dell(dot)com>, "Sisson, David" <David(dot)Sisson(at)dell(dot)com>
Subject: RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 21:41:15
Message-ID: LV2PR19MB576597174A60B2687D79534B8EC89@LV2PR19MB5765.namprd19.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

The controllers generally always pull in the latest PostgreSQL.
It is easy to get the latest version with PostgreSQL updated.

Unfortunately, getting a bug fix is a lot harder.
One controller currently holding this defect for over a year with no end in sight.

Found this:
http://github.com/opencontainers/runtime-spec/issues/1050

Looks like a PR exists for it but the solution is invalid.
http://github.com/kailun-qin/runtime-spec/commit/a6505339204535150260d8e4f0bc112628f1fa87

More info:
http://www.postgresql.org/message-id/flat/20200218093240.jd3lgoxmisyl2tt5%40localhost#61c2c7fc3d3dd80512c9130b6967be16

It would be nice if "try" worked as expected.
I totally understand it is not a PostgreSQL issue but any assistance would be very appreciated.

Thanks,
David Angel

Internal Use - Confidential

-----Original Message-----
From: Andres Freund <andres(at)anarazel(dot)de>
Sent: Monday, January 23, 2023 3:10 PM
To: Sisson, David
Cc: Tom Lane; Christophe Pettus; Tomas Vondra; pgsql-bugs(at)lists(dot)postgresql(dot)org; Howell, Stephen
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

[EXTERNAL EMAIL]

Hi,

On 2023-01-23 20:35:17 +0000, Sisson, David wrote:
> A quick and dirty solution could be to alter initdb to catch the exception and retry using a copy of the sample with "huge_pages=false".
> Would that be acceptable?

This is a kubernetes or postgres-operator bug (setting up the wrong cgroup limit, which the docs explicitly warn against doing). I don't think we want to accumulate workarounds like that in postgres.

> Passing in a config setting into initdb would still require a rebuild of all controllers.
> That could take months to years at best.

Huh. I don't know anything about the controller, but that seems problematic independent of this specific issue. And you'd still need to deploy a new version of postgres to get such changes...

> Internal Use - Confidential

Hardly.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-23 22:51:46
Message-ID: 2139743.1674514306@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Andres Freund <andres(at)anarazel(dot)de> writes:
> It's a fault of the environment if mmap(MAP_HUGETLB) causes a SIGBUS. Normally
> huge_pages = try is harmless, because it'll just fall back. That source of
> SIGBUSes needs to be fixed regardless of anything else - plenty allocators try
> to use huge pages for example, so you'll run into problems regardless of
> postgres' default.

That seems likely to me too.

> That said, I'm for allowing to specify options to initdb.

Yeah, I think that has enough other potential applications to be worth
doing. Here's a quick draft patch (sans user-facing docs as yet).
It injects any given values into postgresql.auto.conf, not
postgresql.conf proper. I did that mainly because the latter looked
beyond the abilities of the primitive string-munging code we have in
there, but I think it can be argued to be a reasonable choice anyway.

regards, tom lane

Attachment Content-Type Size
override-config-options-during-initdb-0.1.patch text/x-diff 8.3 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-24 00:37:37
Message-ID: 20230124003737.owezrb6ffen6dhb3@awork3.anarazel.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On 2023-01-23 17:51:46 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > That said, I'm for allowing to specify options to initdb.
>
> Yeah, I think that has enough other potential applications to be worth
> doing. Here's a quick draft patch (sans user-facing docs as yet).
> It injects any given values into postgresql.auto.conf, not
> postgresql.conf proper. I did that mainly because the latter looked
> beyond the abilities of the primitive string-munging code we have in
> there, but I think it can be argued to be a reasonable choice anyway.

Oh, I had thought we'd just pass them on with -c to the processes that initdb
starts. But perhaps just persisting them isn't a bad idea...

- Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-24 00:45:19
Message-ID: 2151514.1674521119@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2023-01-23 17:51:46 -0500, Tom Lane wrote:
>> Yeah, I think that has enough other potential applications to be worth
>> doing. Here's a quick draft patch (sans user-facing docs as yet).
>> It injects any given values into postgresql.auto.conf, not
>> postgresql.conf proper. I did that mainly because the latter looked
>> beyond the abilities of the primitive string-munging code we have in
>> there, but I think it can be argued to be a reasonable choice anyway.

> Oh, I had thought we'd just pass them on with -c to the processes that initdb
> starts. But perhaps just persisting them isn't a bad idea...

It certainly seems to me that that would be the mainstream use-case,
so why not fill in the file as the user probably wants? They can
always change it. Also, as I mentioned, the expectation is that
initdb will set up a known-working combination of settings; and
we don't really know that if we leave off whatever was injected by
"-c". In the case at hand, if we don't propagate "huge_pages = off"
to the installed configuration, the server still won't work.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Sisson, David" <David(dot)Sisson(at)dell(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Date: 2023-01-24 01:00:20
Message-ID: 20230124010020.v6jyajnxypcdv644@awork3.anarazel.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On 2023-01-23 19:45:19 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2023-01-23 17:51:46 -0500, Tom Lane wrote:
> >> Yeah, I think that has enough other potential applications to be worth
> >> doing. Here's a quick draft patch (sans user-facing docs as yet).
> >> It injects any given values into postgresql.auto.conf, not
> >> postgresql.conf proper. I did that mainly because the latter looked
> >> beyond the abilities of the primitive string-munging code we have in
> >> there, but I think it can be argued to be a reasonable choice anyway.
>
> > Oh, I had thought we'd just pass them on with -c to the processes that initdb
> > starts. But perhaps just persisting them isn't a bad idea...
>
> It certainly seems to me that that would be the mainstream use-case,
> so why not fill in the file as the user probably wants? They can
> always change it. Also, as I mentioned, the expectation is that
> initdb will set up a known-working combination of settings; and
> we don't really know that if we leave off whatever was injected by
> "-c". In the case at hand, if we don't propagate "huge_pages = off"
> to the installed configuration, the server still won't work.

Yea, makes sense.

Greetings,

Andres Freund