#OCPBUGS-31857 | issue | 2 days ago | Internal Registry does not recognize the `ca-west-1` AWS Region Verified |
Issue 15925889: Internal Registry does not recognize the `ca-west-1` AWS Region Description: This is a clone of issue OCPBUGS-31641. The following is the description of the original issue: — This is a clone of issue OCPBUGS-29233. The following is the description of the original issue: — Description of problem: {code:none} Internal registry Pods will panic while deploying OCP on `ca-west-1` AWS Region{code} Version-Release number of selected component (if applicable): {code:none} 4.14.2 {code} How reproducible: {code:none} Every time {code} Steps to Reproduce: {code:none} 1. Deploy OCP on `ca-west-1` AWS Region {code} Actual results: {code:none} $ oc logs image-registry-85b69cd9fc-b78sb -n openshift-image-registry time="2024-02-08T11:43:09.287006584Z" level=info msg="start registry" distribution_version=v3.0.0+unknown go.version="go1.20.10 X:strictfipsruntime" openshift_version=4.14.0-202311021650.p0.g5e7788a.assembly.stream-5e7788a time="2024-02-08T11:43:09.287365337Z" level=info msg="caching project quota objects with TTL 1m0s" go.version="go1.20.10 X:strictfipsruntime" panic: invalid region provided: ca-west-1goroutine 1 [running]: github.com/distribution/distribution/v3/registry/handlers.NewApp({0x2873f40?, 0xc00005c088?}, 0xc000581800) /go/src/github.com/openshift/image-registry/vendor/github.com/distribution/distribution/v3/registry/handlers/app.go:130 +0x2bf1 github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp({0x2873f40, 0xc00005c088}, 0x0?, {0x2876820?, 0xc000676cf0}) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0xb9 github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp({0x2873f40?, 0xc00005c088}, {0x285ffd0?, 0xc000916070}, 0xc000581800, 0xc00095c000, {0x0?, 0x0}) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x485 github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer({0x2873f40, 0xc00005c088}, 0xc000581800, 0xc00095c000) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:212 +0x38a github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute({0x2858b60, 0xc000916000}) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:166 +0x86b main.main() /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x496 {code} Expected results: {code:none} The internal registry is deployed with no issues {code} Additional info: {code:none} This is a new AWS Region we are adding support to. The support will be backported to 4.14.z {code} Status: Verified | |||
#OCPBUGS-30892 | issue | 6 weeks ago | Misformatted node labels causing origin-tests to panic MODIFIED |
Issue 15876335: Misformatted node labels causing origin-tests to panic Description: This is a clone of issue OCPBUGS-30604. The following is the description of the original issue: --- Description of problem: {code:none} Panic thrown by origin-tests{code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} always{code} Steps to Reproduce: {code:none} 1. Create aws or rosa 4.15 cluster 2. run origin tests 3. {code} Actual results: {code:none} time="2024-03-07T17:03:50Z" level=info msg="resulting interval message" message="{RegisteredNode Node ip-10-0-8-83.ec2.internal event: Registered Node ip-10-0-8-83.ec2.internal in Controller map[reason:RegisteredNode roles:worker]}" E0307 17:03:50.319617 71 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23]) goroutine 310 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x84c6f20?, 0xc006fdc588}) k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc008c38120?}) k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:49 +0x75 panic({0x84c6f20, 0xc006fdc588}) runtime/panic.go:884 +0x213 github.com/openshift/origin/pkg/monitortests/testframework/watchevents.nodeRoles(0x0?) github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:251 +0x1e5 github.com/openshift/origin/pkg/monitortests/testframework/watchevents.recordAddOrUpdateEvent({0x96bcc00, 0xc0076e3310}, {0x7f2a0e47a1b8, 0xc007732330}, {0x281d36d?, 0x0?}, {0x9710b50, 0xc000c5e000}, {0x9777af, 0xedd7be6b7, ...}, ...) github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:116 +0x41b github.com/openshift/origin/pkg/monitortests/testframework/watchevents.startEventMonitoring.func2({0x8928f00?, 0xc00b528c80}) github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:65 +0x185 k8s.io/client-go/tools/cache.(*FakeCustomStore).Add(0x8928f00?, {0x8928f00?, 0xc00b528c80?}) k8s.io/client-go@v0.29.0/tools/cache/fake_custom_store.go:35 +0x31 k8s.io/client-go/tools/cache.watchHandler({0x0?, 0x0?, 0xe16d020?}, {0x9694a10, 0xc006b00180}, {0x96d2780, 0xc0078afe00}, {0x96f9e28?, 0x8928f00}, 0x0, ...) k8s.io/client-go@v0.29.0/tools/cache/reflector.go:756 +0x603 k8s.io/client-go/tools/cache.(*Reflector).watch(0xc0005dcc40, {0x0?, 0x0?}, 0xc005cdeea0, 0xc005bf8c40?) k8s.io/client-go@v0.29.0/tools/cache/reflector.go:437 +0x53b k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0005dcc40, 0xc005cdeea0) k8s.io/client-go@v0.29.0/tools/cache/reflector.go:357 +0x453 k8s.io/client-go/tools/cache.(*Reflector).Run.func1() k8s.io/client-go@v0.29.0/tools/cache/reflector.go:291 +0x26 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?) k8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:226 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc007974ec0?, {0x9683f80, 0xc0078afe50}, 0x1, 0xc005cdeea0) k8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:227 +0xb6 k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0005dcc40, 0xc005cdeea0) k8s.io/client-go@v0.29.0/tools/cache/reflector.go:290 +0x17d created by github.com/openshift/origin/pkg/monitortests/testframework/watchevents.startEventMonitoring github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:83 +0x6a5 panic: runtime error: slice bounds out of range [24:23] [recovered] panic: runtime error: slice bounds out of range [24:23]{code} Expected results: {code:none} execution of tests{code} Additional info: {code:none} {code} Status: MODIFIED | |||
#OCPBUGS-30239 | issue | 3 weeks ago | kdump doesn't create the dumpfile via ssh with OVN Verified |
KDUMP_COMMANDLINE_REMOVE="ostree hugepages hugepagesz slub_debug quiet log_buf_len swiotlb cma hugetlb_cma ignition.firstboot" KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0" KDUMP_IMG="vmlinuz" | |||
#OCPBUGS-29927 | issue | 2 months ago | origin needs workaround for ROSA's infra labels Verified |
Issue 15836332: origin needs workaround for ROSA's infra labels Description: This is a clone of issue OCPBUGS-29858. The following is the description of the original issue: --- The convention is a format like {{{}node-role.kubernetes.io/role: ""{}}}, not {{{}node-role.kubernetes.io: role{}}}, however ROSA uses the latter format to indicate the {{infra}} role. This changes the node watch code to ignore it, as well as other potential variations like {{{}node-role.kubernetes.io/{}}}. The current code panics when run against a ROSA cluster: {{ E0209 18:10:55.533265 78 runtime.go:79] Observed a panic: runtime.boundsError\{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23]) goroutine 233 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(\{0x7a71840?, 0xc0018e2f48}) k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(\{0x0, 0x0, 0x1000251f9fe?}) k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:49 +0x75 panic(\{0x7a71840, 0xc0018e2f48}) runtime/panic.go:884 +0x213 github.com/openshift/origin/pkg/monitortests/node/watchnodes.nodeRoles(0x7ecd7b3?) github.com/openshift/origin/pkg/monitortests/node/watchnodes/node.go:187 +0x1e5 github.com/openshift/origin/pkg/monitortests/node/watchnodes.startNodeMonitoring.func1(0}} Status: Verified | |||
#OCPBUGS-27959 | issue | 2 months ago | Panic: send on closed channel Verified |
Issue 15743646: Panic: send on closed channel Description: In a CI run of etcd-operator-e2e I've found the following panic in the operator logs: {code:java} E0125 11:04:58.158222 1 health.go:135] health check for member (ip-10-0-85-12.us-west-2.compute.internal) failed: err(context deadline exceeded) panic: send on closed channel goroutine 15608 [running]: github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth.func1() github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:58 +0xd2 created by github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:54 +0x2a5 {code} which unfortunately is an incomplete log file. The operator recovered itself by restarting, we should fix the panic nonetheless. Job run for reference: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1186/pull-ci-openshift-cluster-etcd-operator-master-e2e-operator/1750466468031500288 Status: Verified | |||
#OCPBUGS-32953 | issue | 2 days ago | operator panics in hosted cluster with OVN when obfuscation is enabled MODIFIED |
Issue 15966958: operator panics in hosted cluster with OVN when obfuscation is enabled Description: This is a clone of issue OCPBUGS-32702. The following is the description of the original issue: --- Description of problem: {code:none} The operator panics in HyperShift hosted cluster with OVN and with enabled networking obfuscation: {code} {noformat} 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 858 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x26985e0?, 0x454d700}) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0010d67e0?}) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x26985e0, 0x454d700}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/insights-operator/pkg/anonymization.getNetworksFromClusterNetworksConfig(...) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:292 github.com/openshift/insights-operator/pkg/anonymization.getNetworksForAnonymizer(0xc000556700, 0xc001154ea0, {0x0, 0x0, 0x0?}) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:253 +0x202 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).readNetworkConfigs(0xc0005be640) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:180 +0x245 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).AnonymizeMemoryRecord.func1() /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:354 +0x25 sync.(*Once).doSlow(0xc0010d6c70?, 0x21a9006?) /usr/lib/golang/src/sync/once.go:74 +0xc2 sync.(*Once).Do(...) /usr/lib/golang/src/sync/once.go:65 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).AnonymizeMemoryRecord(0xc0005be640, 0xc000cf0dc0) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:353 +0x78 github.com/openshift/insights-operator/pkg/recorder.(*Recorder).Record(0xc00075c4b0, {{0x2add75b, 0xc}, {0x0, 0x0, 0x0}, {0x2f38d28, 0xc0009c99c0}}) /go/src/github.com/openshift/insights-operator/pkg/recorder/recorder.go:87 +0x49f github.com/openshift/insights-operator/pkg/gather.recordGatheringFunctionResult({0x2f255c0, 0xc00075c4b0}, 0xc0010d7260, {0x2adf900, 0xd}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:157 +0xb9c github.com/openshift/insights-operator/pkg/gather.collectAndRecordGatherer({0x2f50058?, 0xc001240c90?}, {0x2f30880?, 0xc000994240}, {0x2f255c0, 0xc00075c4b0}, {0x0?, 0x8dcb80?, 0xc000a673a2?}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:113 +0x296 github.com/openshift/insights-operator/pkg/gather.CollectAndRecordGatherer({0x2f50058, 0xc001240c90}, {0x2f30880, 0xc000994240?}, {0x2f255c0, 0xc00075c4b0}, {0x0, 0x0, 0x0}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:89 +0xe5 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Gather.func2(0xc000a678a0, {0x2f50058, 0xc001240c90}, 0xc000796b60, 0x26f0460?) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:206 +0x1a8 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Gather(0xc000796b60) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:222 +0x450 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).periodicTrigger(0xc000796b60, 0xc000236a80) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:265 +0x2c5 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Run.func1() /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:161 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00007d7c0?, {0x2f282a0, 0xc0012cd800}, 0x1, 0xc000236a80) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001381fb0?, 0x3b9aca00, 0x0, 0x0?, 0x449705?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0xabfaca?, 0x88d6e6?, 0xc00078a360?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x25 created by github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Run /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:161 +0x1ea {noformat} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} Enable networking obfuscation for the Insights Operator and wait for gathering to happen in the operator. You will see the above stacktrace. {code} Steps to Reproduce: {code:none} 1. Create a HyperShift hosted cluster with OVN 2. Enable networking obfuscation for the Insights Operator 3. Wait for data gathering to happen in the operator {code} Actual results: {code:none} operator panics{code} Expected results: {code:none} there's no panic{code} Additional info: {code:none} {code} Status: ON_QA | |||
#OCPBUGS-29124 | issue | 7 weeks ago | [IBMCloud] Unhandled response during destroy disks POST |
Issue 15788246: [IBMCloud] Unhandled response during destroy disks Description: This is a clone of issue OCPBUGS-20085. The following is the description of the original issue: --- Description of problem: {code:none} During the destroy cluster operation, unexpected results from the IBM Cloud API calls for Disks can result in panics when response data (or responses) are missing, resulting in unexpected failures during destroy.{code} Version-Release number of selected component (if applicable): {code:none} 4.15{code} How reproducible: {code:none} Unknown, dependent on IBM Cloud API responses{code} Steps to Reproduce: {code:none} 1. Successfully create IPI cluster on IBM Cloud 2. Attempt to cleanup (destroy) the cluster {code} Actual results: {code:none} Golang panic attempting to parse a HTTP response that is missing or lacking data. level=info msg=Deleted instance "ci-op-97fkzvv2-e6ed7-5n5zg-master-0" E0918 18:03:44.787843 33 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 228 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x6a3d760?, 0x274b5790}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xfffffffe?}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x6a3d760, 0x274b5790}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:84 +0x12a github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).Retry(0xc000791ce0, 0xc000573700) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:99 +0x73 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion(0xc000791ce0, {{0xc00160c060, 0x29}, {0xc00160c090, 0x28}, {0xc0016141f4, 0x9}, {0x82b9f0d, 0x4}, {0xc00160c060, ...}}) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:78 +0x14f github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDisks(0xc000791ce0) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:118 +0x485 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:201 +0x3f k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x7f7801e503c8, 0x18}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:109 +0x1b k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x227a2f78?, 0xc00013c000?}, 0xc000a9b690?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:154 +0x57 k8s.io/apimachinery/pkg/util/wait.poll({0x227a2f78, 0xc00013c000}, 0xd0?, 0x146fea5?, 0x7f7801e503c8?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:245 +0x38 k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x227a2f78, 0xc00013c000}, 0x4136e7?, 0x28?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:229 +0x49 k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x100000000000000?, 0x806f00?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:214 +0x46 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc000791ce0, {{0x82bb9a3?, 0xc000a9b7d0?}, 0xc000111de0?}, 0x840366?, 0xc00054e900?) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:198 +0x108 created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xa87 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference{code} Expected results: {code:none} Destroy IBM Cloud Disks during cluster destroy, or provide a useful error message to follow up on.{code} Additional info: {code:none} The ability to reproduce is relatively low, as it requires the IBM Cloud API's to return specific data (or lack there of), which is currently unknown why the HTTP respoonse and/or data is missing. IBM Cloud already has a PR to attempt to mitigate this issue, like done with other destroy resource calls. Potentially followup for additional resources as necessary. https://github.com/openshift/installer/pull/7515{code} Status: POST | |||
#OCPBUGS-4466 | issue | 2 days ago | [gcp][CORS-2420] deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic POST |
Issue 14967075: [gcp][CORS-2420] deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic Description: Description of problem: {code:none} deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic{code} Version-Release number of selected component (if applicable): {code:none} $ openshift-install version openshift-install 4.13.0-0.nightly-2022-12-04-194803 built from commit cc689a21044a76020b82902056c55d2002e454bd release image registry.ci.openshift.org/ocp/release@sha256:9e61cdf7bd13b758343a3ba762cdea301f9b687737d77ef912c6788cbd6a67ea release architecture amd64 {code} How reproducible: {code:none} Always{code} Steps to Reproduce: {code:none} 1. create manifests 2. set 'spec.mastersSchedulable' as 'true', in <installation dir>/manifests/cluster-scheduler-02-config.yml 3. remove the worker machineset YAML file from <installation dir>/openshift directory 4. create cluster {code} Actual results: {code:none} Got "panic: runtime error: index out of range [0] with length 0".{code} Expected results: {code:none} The installation should succeed, or giving clear error messages. {code} Additional info: {code:none} $ openshift-install version openshift-install 4.13.0-0.nightly-2022-12-04-194803 built from commit cc689a21044a76020b82902056c55d2002e454bd release image registry.ci.openshift.org/ocp/release@sha256:9e61cdf7bd13b758343a3ba762cdea301f9b687737d77ef912c6788cbd6a67ea release architecture amd64 $ $ openshift-install create manifests --dir test1 ? SSH Public Key /home/fedora/.ssh/openshift-qe.pub ? Platform gcp INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" ? Project ID OpenShift QE (openshift-qe) ? Region us-central1 ? Base Domain qe.gcp.devcluster.openshift.com ? Cluster Name jiwei-1205a ? Pull Secret [? for help] ****** INFO Manifests created in: test1/manifests and test1/openshift $ $ vim test1/manifests/cluster-scheduler-02-config.yml $ yq-3.3.0 r test1/manifests/cluster-scheduler-02-config.yml spec.mastersSchedulable true $ $ rm -f test1/openshift/99_openshift-cluster-api_worker-machineset-?.yaml $ $ tree test1 test1 ├── manifests │ ├── cloud-controller-uid-config.yml │ ├── cloud-provider-config.yaml │ ├── cluster-config.yaml │ ├── cluster-dns-02-config.yml │ ├── cluster-infrastructure-02-config.yml │ ├── cluster-ingress-02-config.yml │ ├── cluster-network-01-crd.yml │ ├── cluster-network-02-config.yml │ ├── cluster-proxy-01-config.yaml │ ├── cluster-scheduler-02-config.yml │ ├── cvo-overrides.yaml │ ├── kube-cloud-config.yaml │ ├── kube-system-configmap-root-ca.yaml │ ├── machine-config-server-tls-secret.yaml │ └── openshift-config-secret-pull-secret.yaml └── openshift ├── 99_cloud-creds-secret.yaml ├── 99_kubeadmin-password-secret.yaml ├── 99_openshift-cluster-api_master-machines-0.yaml ├── 99_openshift-cluster-api_master-machines-1.yaml ├── 99_openshift-cluster-api_master-machines-2.yaml ├── 99_openshift-cluster-api_master-user-data-secret.yaml ├── 99_openshift-cluster-api_worker-user-data-secret.yaml ├── 99_openshift-machineconfig_99-master-ssh.yaml ├── 99_openshift-machineconfig_99-worker-ssh.yaml ├── 99_role-cloud-creds-secret-reader.yaml └── openshift-install-manifests.yaml2 directories, 26 files $ $ openshift-install create cluster --dir test1 INFO Consuming Openshift Manifests from target directory INFO Consuming Master Machines from target directory INFO Consuming Worker Machines from target directory INFO Consuming OpenShift Install (Manifests) from target directory INFO Consuming Common Manifests from target directory INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" panic: runtime error: index out of range [0] with length 0goroutine 1 [running]: github.com/openshift/installer/pkg/tfvars/gcp.TFVars({{{0xc000cf6a40, 0xc}, {0x0, 0x0}, {0xc0011d4a80, 0x91d}}, 0x1, 0x1, {0xc0010abda0, 0x58}, ...}) /go/src/github.com/openshift/installer/pkg/tfvars/gcp/gcp.go:70 +0x66f github.com/openshift/installer/pkg/asset/cluster.(*TerraformVariables).Generate(0x1daff070, 0xc000cef530?) /go/src/github.com/openshift/installer/pkg/asset/cluster/tfvars.go:479 +0x6bf8 github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000c78870, {0x1a777f40, 0x1daff070}, {0x0, 0x0}) /go/src/github.com/openshift/installer/pkg/asset/store/store.go:226 +0x5fa github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffc4c21413b?, {0x1a777f40, 0x1daff070}, {0x1dadc7e0, 0x8, 0x8}) /go/src/github.com/openshift/installer/pkg/asset/store/store.go:76 +0x48 main.runTargetCmd.func1({0x7ffc4c21413b, 0x5}) /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:259 +0x125 main.runTargetCmd.func2(0x1dae27a0?, {0xc000c702c0?, 0x2?, 0x2?}) /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:289 +0xe7 github.com/spf13/cobra.(*Command).execute(0x1dae27a0, {0xc000c70280, 0x2, 0x2}) /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:876 +0x67b github.com/spf13/cobra.(*Command).ExecuteC(0xc000c3a500) /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:990 +0x3bd github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:918 main.installerMain() /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0 main.main() /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff $ {code} Status: POST Comment 21560887 by Patrick Dillon at 2023-01-17T15:10:46.762+0000 This issue is orthogonal to compact clusters, i.e. this happens any time the worker machineset manifests are deleted on these platforms. A meaningful error message would be more helpful to customers and we should try to fix any panics. Reopening and moving to ASSIGNED because I think this should be pretty straightforward. If there are bigger concerns we can reevaluate but I think we should fix rather than close. Comment 24458427 by UNKNOWN at 2024-04-02T12:55:56.937+0000 | |||
#OCPBUGS-25372 | issue | 3 months ago | vsphere-problem-detector-operator pod CrashLoopBackOff with panic CLOSED |
Issue 15678085: vsphere-problem-detector-operator pod CrashLoopBackOff with panic Description: Description of problem: {code:none} Find in QE's CI (with vsphere-agent profile), storage CO is not avaliable and vsphere-problem-detector-operator pod is CrashLoopBackOff with panic. (Find must-garther here: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-vsphere-agent-disconnected-ha-f14/1734850632575094784/artifacts/vsphere-agent-disconnected-ha-f14/gather-must-gather/) The storage CO reports "unable to find VM by UUID": - lastTransitionTime: "2023-12-13T09:15:27Z" message: "VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable: unable to find VM ci-op-782gwsbd-b3d4e-master-2 by UUID \nVSphereProblemDetectorDeploymentControllerAvailable: Waiting for Deployment" reason: VSphereCSIDriverOperatorCR_VMwareVSphereController_vcenter_api_error::VSphereProblemDetectorDeploymentController_Deploying status: "False" type: Available (But I did not see the "unable to find VM by UUID" from vsphere-problem-detector-operator log in must-gather) The vsphere-problem-detector-operator log: 2023-12-13T10:10:56.620216117Z I1213 10:10:56.620159 1 vsphere_check.go:149] Connected to vcenter.devqe.ibmc.devcluster.openshift.com as ci_user_01@devqe.ibmc.devcluster.openshift.com 2023-12-13T10:10:56.625161719Z I1213 10:10:56.625108 1 vsphere_check.go:271] CountVolumeTypes passed 2023-12-13T10:10:56.625291631Z I1213 10:10:56.625258 1 zones.go:124] Checking tags for multi-zone support. 2023-12-13T10:10:56.625449771Z I1213 10:10:56.625433 1 zones.go:202] No FailureDomains configured. Skipping check. 2023-12-13T10:10:56.625497726Z I1213 10:10:56.625487 1 vsphere_check.go:271] CheckZoneTags passed 2023-12-13T10:10:56.625531795Z I1213 10:10:56.625522 1 info.go:44] vCenter version is 8.0.2, apiVersion is 8.0.2.0 and build is 22617221 2023-12-13T10:10:56.625562833Z I1213 10:10:56.625555 1 vsphere_check.go:271] ClusterInfo passed 2023-12-13T10:10:56.625603236Z I1213 10:10:56.625594 1 datastore.go:312] checking datastore /DEVQEdatacenter/datastore/vsanDatastore for permissions 2023-12-13T10:10:56.669205822Z panic: runtime error: invalid memory address or nil pointer dereference 2023-12-13T10:10:56.669338411Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x23096cb] 2023-12-13T10:10:56.669565413Z 2023-12-13T10:10:56.669591144Z goroutine 550 [running]: 2023-12-13T10:10:56.669838383Z github.com/openshift/vsphere-problem-detector/pkg/operator.getVM(0xc0005da6c0, 0xc0002d3b80) 2023-12-13T10:10:56.669991749Z github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:319 +0x3eb 2023-12-13T10:10:56.670212441Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*vSphereChecker).enqueueSingleNodeChecks.func1() 2023-12-13T10:10:56.670289644Z github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:238 +0x55 2023-12-13T10:10:56.670490453Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker.func1(0xc000c88760?, 0x0?) 2023-12-13T10:10:56.670702592Z github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:40 +0x55 2023-12-13T10:10:56.671142070Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker(0xc000c78660, 0xc000c887a0?) 2023-12-13T10:10:56.671331852Z github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:41 +0xe7 2023-12-13T10:10:56.671529761Z github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool.func1() 2023-12-13T10:10:56.671589925Z github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:28 +0x25 2023-12-13T10:10:56.671776328Z created by github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool 2023-12-13T10:10:56.671847478Z github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:27 +0x73 {code} Version-Release number of selected component (if applicable): {code:none} 4.15.0-0.nightly-2023-12-11-033133{code} How reproducible: {code:none} {code} Steps to Reproduce: {code:none} 1. See description 2. 3. {code} Actual results: {code:none} vpd is panic{code} Expected results: {code:none} vpd should not panic{code} Additional info: {code:none} I guess it is privileges issue, but our pod should not be panic.{code} Status: CLOSED Comment 23674939 by Manoj Hans at 2023-12-18T09:35:05.367+0000 I observed that cluster deployment succeeded when I added a taint to the node with the uninitialized taint(oc adm taint node "$NODE" node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule) after the bootstrap completed. However, I believe the vsphere-problem-detector-operator should provide meaningful information instead of causing a panic. I recommend addressing this issue on both the agent and storage sides. While it's not a blocker for version 4.15, as we have a workaround for installing the cluster using the agent-based installation. Comment 23674965 by UNKNOWN at 2023-12-18T10:00:30.932+0000 | |||
#OCPBUGS-26402 | issue | 7 weeks ago | operator-sdk panic "missing method DeprecationWarning" New |
Issue 15702358: operator-sdk panic "missing method DeprecationWarning" Description: Description of problem: {code:none} operator-sdk panic: interface conversion: *v1.Plugin is not plugin.Deprecated: missing method DeprecationWarning{code} Version-Release number of selected component (if applicable): {code:none} [cloud-user@preserve-olm-image-rhel9 build]$ ./operator-sdk version operator-sdk version: "v1.33.0", commit: "542966812906456a8d67cf7284fc6410b104e118", kubernetes version: "v1.27.0", go version: "go1.20.10", GOOS: "linux", GOARCH: "amd64"{code} How reproducible: {code:none} always{code} Steps to Reproduce: {code:none} 1. git clone git@github.com:operator-framework/operator-sdk.git git checkout master make build 2. run command "operator-sdk init --plugins=quarkus --domain=example.com" [cloud-user@preserve-olm-image-rhel9 build]$ ./operator-sdk init --plugins=quarkus --domain=example.com panic: interface conversion: *v1.Plugin is not plugin.Deprecated: missing method DeprecationWarning goroutine 1 [running]: sigs.k8s.io/kubebuilder/v3/pkg/cli.CLI.printDeprecationWarnings({{0x29230e3, 0xc}, {0xc0000be4d0, 0xab}, {0x29b4089, 0x37}, 0xc00061c570, 0xc00061c5a0, {0x3, 0x0}, ...}) /home/cloud-user/go/pkg/mod/sigs.k8s.io/kubebuilder/v3@v3.11.1/pkg/cli/cli.go:446 +0x7f sigs.k8s.io/kubebuilder/v3/pkg/cli.New({0xc00071f750?, 0xa?, 0xc0003188f0?}) /home/cloud-user/go/pkg/mod/sigs.k8s.io/kubebuilder/v3@v3.11.1/pkg/cli/cli.go:116 +0x198 github.com/operator-framework/operator-sdk/internal/cmd/operator-sdk/cli.GetPluginsCLIAndRoot() operator-sdk/internal/cmd/operator-sdk/cli/cli.go:161 +0x1a4a github.com/operator-framework/operator-sdk/internal/cmd/operator-sdk/cli.Run() operator-sdk/internal/cmd/operator-sdk/cli/cli.go:75 +0x27 main.main() operator-sdk/cmd/operator-sdk/main.go:28 +0x19 3. {code} Actual results: {code:none} {code} Expected results: {code:none} {code} Additional info: {code:none} {code} Status: New | |||
#OCPBUGS-31702 | issue | 2 days ago | Autoscaler should scale from zero when taints do not have a "value" field CLOSED |
Issue 15917540: Autoscaler should scale from zero when taints do not have a "value" field Description: This is a clone of issue OCPBUGS-31464. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-31421. The following is the description of the original issue: --- Description of problem:{code:none} When scaling from zero replicas, the cluster autoscaler can panic if there are taints on the machineset with no "value" field defined. {code} Version-Release number of selected component (if applicable):{code:none} 4.16/master {code} How reproducible:{code:none} always {code} Steps to Reproduce:{code:none} 1. create a machineset with a taint that has no value field and 0 replicas 2. enable the cluster autoscaler 3. force a workload to scale the tainted machineset {code} Actual results:{code:none} a panic like this is observed I0325 15:36:38.314276 1 clusterapi_provider.go:68] discovered node group: MachineSet/openshift-machine-api/k8hmbsmz-c2483-9dnddr4sjc (min: 0, max: 2, replicas: 0) panic: interface conversion: interface {} is nil, not string goroutine 79 [running]: k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredToTaint(...) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:246 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredScalableResource.Taints({0xc000103d40?, 0xc000121360?, 0xc002386f98?, 0x2?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:214 +0x8a5 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.(*nodegroup).TemplateNodeInfo(0xc002675930) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go:266 +0x2ea k8s.io/autoscaler/cluster-autoscaler/core/utils.GetNodeInfoFromTemplate({0x276b230, 0xc002675930}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60?, 0xc0023ffe90?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/utils/utils.go:41 +0x9d k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider.(*MixedTemplateNodeInfoProvider).Process(0xc00084f848, 0xc0023f7680, {0xc001dcdb00, 0x3, 0x0?}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60, 0xc0023ffe90}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go:155 +0x599 k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000617550, {0x4?, 0x0?, 0x3a56f60?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:352 +0xcaa main.run(0x0?, {0x2761b48, 0xc0004c04e0}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x2cd main.main.func2({0x0?, 0x0?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:617 +0x25 created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x105 {code} Expected results:{code:none} expect the machineset to scale up {code} Additional info: i think the e2e test that exercises this is only running on periodic jobs and as such we missed this error in OCPBUGS-27509 . [this search shows some failed results | https://search.dptools.openshift.org/?search=It+scales+from%2Fto+zero&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] Status: CLOSED Comment 24532843 by Zhaohua Sun at 2024-04-15T07:11:31.588+0000 Verified. clusterversion 4.13.0-0.nightly-2024-04-12-143349, autoscaler pod will not panic, machineset can scale up. {code:java} | |||
#OCPBUGS-31051 | issue | 2 days ago | numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference CLOSED |
Issue 15886278: numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference Description: This is a clone of issue OCPBUGS-30923. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-30342. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-30236. The following is the description of the original issue: --- Description of problem: {code:none} Pod numaresources-controller-manager is in CrashLoopBackOff state{code} {code:none} oc get po -n openshift-numaresources NAME READY STATUS RESTARTS AGE numaresources-controller-manager-766c55596b-9nb6b 0/1 CrashLoopBackOff 163 (3m52s ago) 14h secondary-scheduler-85959757db-dvpdj 1/1 Running 0 14h{code} {code:none} oc logs -n openshift-numaresources numaresources-controller-manager-766c55596b-9nb6b ... I0305 07:32:51.102133 1 shared_informer.go:341] caches populated I0305 07:32:51.102210 1 controller.go:220] "Starting workers" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" worker count=1 I0305 07:32:51.102295 1 kubeletconfig_controller.go:69] "Starting KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102412 1 panic.go:884] "Finish KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102448 1 controller.go:115] "Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" KubeletConfig="autosizing-master" namespace="" name="autosizing-master" reconcileID="91d2c547-993c-4ae1-beab-1afc0a72af68" panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1778a1c] goroutine 481 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa panic({0x19286e0, 0x2d16fc0}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift-kni/numaresources-operator/pkg/kubeletconfig.MCOKubeletConfToKubeletConf(...) /remote-source/app/pkg/kubeletconfig/kubeletconfig.go:29 github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).reconcileConfigMap(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, 0xc000226c40, {{0x0?, 0xc000e31260?}, {0xc000b98498?, 0x2de08f8?}}) /remote-source/app/controllers/kubeletconfig_controller.go:126 +0x11c github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).Reconcile(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, {{{0x0, 0x0}, {0xc000b98498, 0x11}}}) /remote-source/app/controllers/kubeletconfig_controller.go:90 +0x3cd sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1e4a2e0?, {0x1e475f0?, 0xc000e31260?}, {{{0x0?, 0xb?}, {0xc000b98498?, 0x0?}}}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004446e0, {0x1e47548, 0xc0003520f0}, {0x19b9940?, 0xc00093a1a0?}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004446e0, {0x1e47548, 0xc0003520f0}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587 {code} Version-Release number of selected component (if applicable): {code:none} numaresources-operator.v4.15.0 {code} How reproducible: {code:none} so far 100% {code} Steps to Reproduce: {code:none} 1. Create a KubeletConfig that configures autosizing: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: autosizing-master spec: autoSizingReserved: true machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: "" 2. Create a performance profile that targets subset of nodes 3. Proceed with numaresources-operator installation {code} Actual results: {code:none} Pod in CrashLoopBackOff state {code} Expected results: {code:none} numaresources-operator is successfully installed {code} Additional info: {code:none} Baremetal dualstack cluster deployed with GitOps-ZTP {code} Status: CLOSED | |||
#OCPBUGS-27782 | issue | 3 weeks ago | CrashLoopBackOff Issue in machine-api-controllers Pod New |
{code:none} 2024-01-23T14:52:13.719668149Z panic: flag logtostderr set at /go/src/sigs.k8s.io/cluster-api-provider-openstack/cmd/manager/main.go:59 before being defined 2024-01-23T14:52:13.719668149Z | |||
#OCPBUGS-29125 | issue | 2 months ago | [IBMCloud] Unhandled response during destroy disks New |
Issue 15788247: [IBMCloud] Unhandled response during destroy disks Description: This is a clone of issue OCPBUGS-20085. The following is the description of the original issue: --- Description of problem: {code:none} During the destroy cluster operation, unexpected results from the IBM Cloud API calls for Disks can result in panics when response data (or responses) are missing, resulting in unexpected failures during destroy.{code} Version-Release number of selected component (if applicable): {code:none} 4.15{code} How reproducible: {code:none} Unknown, dependent on IBM Cloud API responses{code} Steps to Reproduce: {code:none} 1. Successfully create IPI cluster on IBM Cloud 2. Attempt to cleanup (destroy) the cluster {code} Actual results: {code:none} Golang panic attempting to parse a HTTP response that is missing or lacking data. level=info msg=Deleted instance "ci-op-97fkzvv2-e6ed7-5n5zg-master-0" E0918 18:03:44.787843 33 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 228 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x6a3d760?, 0x274b5790}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xfffffffe?}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x6a3d760, 0x274b5790}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:84 +0x12a github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).Retry(0xc000791ce0, 0xc000573700) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:99 +0x73 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion(0xc000791ce0, {{0xc00160c060, 0x29}, {0xc00160c090, 0x28}, {0xc0016141f4, 0x9}, {0x82b9f0d, 0x4}, {0xc00160c060, ...}}) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:78 +0x14f github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDisks(0xc000791ce0) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:118 +0x485 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:201 +0x3f k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x7f7801e503c8, 0x18}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:109 +0x1b k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x227a2f78?, 0xc00013c000?}, 0xc000a9b690?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:154 +0x57 k8s.io/apimachinery/pkg/util/wait.poll({0x227a2f78, 0xc00013c000}, 0xd0?, 0x146fea5?, 0x7f7801e503c8?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:245 +0x38 k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x227a2f78, 0xc00013c000}, 0x4136e7?, 0x28?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:229 +0x49 k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x100000000000000?, 0x806f00?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:214 +0x46 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc000791ce0, {{0x82bb9a3?, 0xc000a9b7d0?}, 0xc000111de0?}, 0x840366?, 0xc00054e900?) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:198 +0x108 created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xa87 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference{code} Expected results: {code:none} Destroy IBM Cloud Disks during cluster destroy, or provide a useful error message to follow up on.{code} Additional info: {code:none} The ability to reproduce is relatively low, as it requires the IBM Cloud API's to return specific data (or lack there of), which is currently unknown why the HTTP respoonse and/or data is missing. IBM Cloud already has a PR to attempt to mitigate this issue, like done with other destroy resource calls. Potentially followup for additional resources as necessary. https://github.com/openshift/installer/pull/7515{code} Status: New | |||
#OCPBUGS-26931 | issue | 2 months ago | siteconfig-generator panics when handling a node level NMStateConfig override Verified |
Issue 15710531: siteconfig-generator panics when handling a node level NMStateConfig override Description: Description of problem: {code:none} When using a node level crTemplate to over-ride the NMStateconfig CR, the siteconfig-generator panics while trying to import the CR.{code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} 100%{code} Steps to Reproduce: {code:none} 1. Use these siteconfig and nmstateconfig yaml files --- apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "ex-ocp1" namespace: "ex-ocp1" spec: baseDomain: "example.com" pullSecretRef: name: "assisted-deployment-pull-secret" clusterImageSetNameRef: "ocp-4.12" sshPublicKey: "ssh-key" clusters: - clusterName: "ex-ocp1" #extraManifestPath: testSiteConfig/testUserExtraManifest networkType: "OVNKubernetes" clusterLabels: common: true sites: "ex-ocp1" group: "ex1" clusterNetwork: - cidr: 10.10.11.0/14 hostPrefix: 31 apiVIP: 10.10.10.5 ingressVIP: 10.10.10.6 serviceNetwork: - 10.10.12.0/16 additionalNTPSources: - 10.10.11.50 nodes: - hostName: master01 crTemplates: NMStateConfig: "nmstateconfig.yaml" role: "master" bmcAddress: redfish-virtualmedia://10.10.10.25/redfish/v1/Systems/1 bmcCredentialsName: name: "ex-ocp1-secret" bootMACAddress: "00:00:00:00:00:80" bootMode: "UEFI" rootDeviceHints: deviceName: "/dev/disk/by-path/pci-0000" apiVersion: agent-install.openshift.io/v1beta1 --- kind: NMStateConfig metadata: annotations: argocd.argoproj.io/sync-wave: "1" name: "master01" namespace: "ex-ocp1" labels: nmstate-label: "ex-ocp1" spec: interfaces: - name: eno5 macAddress: 00:00:00:00:00:18 config: dns-resolver: config: server: - 10.10.11.10 - 10.10.11.11 2. Build the siteconfig-generator 3. ./siteconfig-generator -manifestPath ../source-crs/extra-manifest sc.yaml {code} Actual results: {code:none} [dahir@thinkpad siteconfig-generator]$ ./siteconfig-generator -manifestPath ../source-crs/extra-manifest sc.yaml 2024/01/10 15:59:39 Overriding NMStateConfig with "nmstateconfig.yaml" for node master01 in cluster ex-ocp1 panic: interface conversion: interface {} is string, not map[string]interface {}goroutine 1 [running]: github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).getClusterCR(0xc00068f200?, 0x28000000014e9f20?, {{0xc000046360, 0x13}, {0xc00044c2e4, 0xa}, {{0xc00044c310, 0x7}, {0xc00044c320, 0x7}, ...}, ...}, ...) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:365 +0x7f7 github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).getClusterCR(0x19f39c8?, 0x56?, {{0xc000046360, 0x13}, {0xc00044c2e4, 0xa}, {{0xc00044c310, 0x7}, {0xc00044c320, 0x7}, ...}, ...}, ...) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:357 +0x505 github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).getClusterCR(0x15693c0?, 0xc000486868?, {{0xc000046360, 0x13}, {0xc00044c2e4, 0xa}, {{0xc00044c310, 0x7}, {0xc00044c320, 0x7}, ...}, ...}, ...) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:357 +0x505 github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).getClusterCR(0xc000486868?, 0x0?, {{0xc000046360, 0x13}, {0xc00044c2e4, 0xa}, {{0xc00044c310, 0x7}, {0xc00044c320, 0x7}, ...}, ...}, ...) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:357 +0x505 github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).getClusterCR(0x590a20?, 0xc00015ab58?, {{0xc000046360, 0x13}, {0xc00044c2e4, 0xa}, {{0xc00044c310, 0x7}, {0xc00044c320, 0x7}, ...}, ...}, ...) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:357 +0x505 github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).getClusterCRs.func2(0x15693c0?) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:173 +0x4d github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).instantiateCR(0x1762d62?, {0xc0003678a0, 0x20}, 0x2?, 0xc000032fd0, 0xc000033010) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:310 +0x602 github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).getClusterCRs(0xc000033de8, 0x0, {{0xc000046360, 0x13}, {0xc00044c2e4, 0xa}, {{0xc00044c310, 0x7}, {0xc00044c320, 0x7}, ...}, ...}) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:167 +0xcb0 github.com/openshift-kni/cnf-features-deploy/ztp/siteconfig-generator/siteConfig.(*SiteConfigBuilder).Build(0xc0001fbc38?, {{0xc000046360, 0x13}, {0xc00044c2e4, 0xa}, {{0xc00044c310, 0x7}, {0xc00044c320, 0x7}, 0x0}, ...}) /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/siteConfig/siteConfigBuilder.go:79 +0x265 main.main() /home/dahir/src/cnf-features-deploy/ztp/siteconfig-generator/main.go:49 +0x565 {code} Expected results: {code:none} The resulting CRs for the AI{code} Additional info: {code:none} {code} Status: Verified | |||
#OCPBUGS-31641 | issue | 2 days ago | Internal Registry does not recognize the `ca-west-1` AWS Region CLOSED |
Issue 15914720: Internal Registry does not recognize the `ca-west-1` AWS Region Description: This is a clone of issue OCPBUGS-29233. The following is the description of the original issue: --- Description of problem: {code:none} Internal registry Pods will panic while deploying OCP on `ca-west-1` AWS Region{code} Version-Release number of selected component (if applicable): {code:none} 4.14.2 {code} How reproducible: {code:none} Every time {code} Steps to Reproduce: {code:none} 1. Deploy OCP on `ca-west-1` AWS Region {code} Actual results: {code:none} $ oc logs image-registry-85b69cd9fc-b78sb -n openshift-image-registry time="2024-02-08T11:43:09.287006584Z" level=info msg="start registry" distribution_version=v3.0.0+unknown go.version="go1.20.10 X:strictfipsruntime" openshift_version=4.14.0-202311021650.p0.g5e7788a.assembly.stream-5e7788a time="2024-02-08T11:43:09.287365337Z" level=info msg="caching project quota objects with TTL 1m0s" go.version="go1.20.10 X:strictfipsruntime" panic: invalid region provided: ca-west-1goroutine 1 [running]: github.com/distribution/distribution/v3/registry/handlers.NewApp({0x2873f40?, 0xc00005c088?}, 0xc000581800) /go/src/github.com/openshift/image-registry/vendor/github.com/distribution/distribution/v3/registry/handlers/app.go:130 +0x2bf1 github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp({0x2873f40, 0xc00005c088}, 0x0?, {0x2876820?, 0xc000676cf0}) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0xb9 github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp({0x2873f40?, 0xc00005c088}, {0x285ffd0?, 0xc000916070}, 0xc000581800, 0xc00095c000, {0x0?, 0x0}) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x485 github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer({0x2873f40, 0xc00005c088}, 0xc000581800, 0xc00095c000) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:212 +0x38a github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute({0x2858b60, 0xc000916000}) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:166 +0x86b main.main() /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x496 {code} Expected results: {code:none} The internal registry is deployed with no issues {code} Additional info: {code:none} This is a new AWS Region we are adding support to. The support will be backported to 4.14.z {code} Status: CLOSED | |||
#OCPBUGS-31893 | issue | 3 weeks ago | The SR-IOV operator pod crashes if 'enableInjector' is set to nil New |
Issue 15927300: The SR-IOV operator pod crashes if 'enableInjector' is set to nil Description: Description of problem: {code:java} apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: creationTimestamp: "2024-04-07T13:58:37Z" generation: 37 name: default namespace: openshift-sriov-network-operator resourceVersion: "1209105" uid: 468944a1-0d98-4e92-9de0-9f763b49fd85 spec: enableOperatorWebhook: true logLevel: 2{code} {code:java} NAME READY STATUS RESTARTS AGE network-resources-injector-2gc5t 1/1 Running 0 8m35s network-resources-injector-rp429 1/1 Running 0 8m35s network-resources-injector-v9w5g 1/1 Running 0 8m34s operator-webhook-gpx8x 1/1 Running 0 8m34s operator-webhook-n8dxh 1/1 Running 0 8m34s operator-webhook-zgvmr 1/1 Running 0 8m34s sriov-network-config-daemon-7pv5q 1/1 Running 0 8m33s sriov-network-config-daemon-8wxb7 1/1 Running 0 8m33s sriov-network-operator-55f99d5b9-h5gnd 0/1 CrashLoopBackOff 2 (16s ago) 8m33s{code} {code:java} 2024-04-08T15:43:59.462468201Z INFO syncWebhookObjs controllers/sriovoperatorconfig_controller.go:114 Start to sync webhook objects 2024-04-08T15:43:59.465324559Z INFO runtime/panic.go:884 Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference {"controller": "sriovoperatorconfig", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovOperatorConfig", "SriovOperatorConfig": {"name":"default","namespace":"openshift-sriov-network-operator"}, "namespace": "openshift-sriov-network-operator", "name": "default", "reconcileID": "c2c342a2-3afc-436e-bfd4-b513a5bbaef4"} panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x19cf6a7] goroutine 404 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa panic({0x1c41460, 0x307ae30}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/k8snetworkplumbingwg/sriov-network-operator/controllers.(*SriovOperatorConfigReconciler).syncWebhookObjs(0xc000054640, {0x2200588, 0xc000b11200}, 0xc000005500) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/controllers/sriovoperatorconfig_controller.go:268 +0x6e7 github.com/k8snetworkplumbingwg/sriov-network-operator/controllers.(*SriovOperatorConfigReconciler).Reconcile(0xc000054640, {0x2200588, 0xc000b11200}, {{{0xc000715460, 0x20}, {0xc0009f9f80, 0x7}}}) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/controllers/sriovoperatorconfig_controller.go:114 +0x2cf sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x2202de0?, {0x2200588?, 0xc000b11200?}, {{{0xc000715460?, 0xb?}, {0xc0009f9f80?, 0x0?}}}) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000498000, {0x22004e0, 0xc00069e0f0}, {0x1d0ad00?, 0xc0006a7440?}) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000498000, {0x22004e0, 0xc00069e0f0}) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587 {code} Version-Release number of selected component (if applicable): 4.15.6 How reproducible: 100% Steps to Reproduce: 1. 'enableInjector' is set to nil 2. 3. Actual results: Expected results: Additional info: test link: [https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/test/conformance/tests/test_sriov_operator.go#L1803] Status: New | |||
#OCPBUGS-31808 | issue | 3 days ago | control-plane-machine-set operator pod stuck into crashloopbackoff state with the nil pointer dereference runtime error Verified |
Issue 15923264: control-plane-machine-set operator pod stuck into crashloopbackoff state with the nil pointer dereference runtime error Description: Description of problem:{code:none} control-plane-machine-set operator pod stuck into crashloopbackoff state with panic: runtime error: invalid memory address or nil pointer dereference while extracting the failureDomain from the controlplanemachineset. Below is the error trace for reference. ~~~ 2024-04-04T09:32:23.594257072Z I0404 09:32:23.594176 1 controller.go:146] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="c282f3e3-9f9d-40df-a24e-417ba2ea4106" 2024-04-04T09:32:23.594257072Z I0404 09:32:23.594221 1 controller.go:125] "msg"="Reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55" 2024-04-04T09:32:23.594274974Z I0404 09:32:23.594257 1 controller.go:146] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55" 2024-04-04T09:32:23.597509741Z I0404 09:32:23.597426 1 watch_filters.go:179] reconcile triggered by infrastructure change 2024-04-04T09:32:23.606311553Z I0404 09:32:23.606243 1 controller.go:220] "msg"="Starting workers" "controller"="controlplanemachineset" "worker count"=1 2024-04-04T09:32:23.606360950Z I0404 09:32:23.606340 1 controller.go:169] "msg"="Reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.609322467Z I0404 09:32:23.609217 1 panic.go:884] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.609322467Z I0404 09:32:23.609271 1 controller.go:115] "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="controlplanemachineset" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.612540681Z panic: runtime error: invalid memory address or nil pointer dereference [recovered] 2024-04-04T09:32:23.612540681Z panic: runtime error: invalid memory address or nil pointer dereference 2024-04-04T09:32:23.612540681Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1a5911c] 2024-04-04T09:32:23.612540681Z 2024-04-04T09:32:23.612540681Z goroutine 255 [running]: 2024-04-04T09:32:23.612540681Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() 2024-04-04T09:32:23.612571624Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa 2024-04-04T09:32:23.612571624Z panic({0x1c8ac60, 0x31c6ea0}) 2024-04-04T09:32:23.612571624Z /usr/lib/golang/src/runtime/panic.go:884 +0x213 2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.VSphereProviderConfig.ExtractFailureDomain(...) 2024-04-04T09:32:23.612571624Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/vsphere.go:120 2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.providerConfig.ExtractFailureDomain({{0x1f2a71a, 0x7}, {{{{...}, {...}}, {{...}, {...}, {...}, {...}, {...}, {...}, ...}, ...}}, ...}) 2024-04-04T09:32:23.612588145Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/providerconfig.go:212 +0x23c ~~~ {code} Version-Release number of selected component (if applicable):{code:none} {code} How reproducible:{code:none} {code} Steps to Reproduce:{code:none} 1. 2. 3. {code} Actual results:{code:none} control-plane-machine-set operator stuck into crashloopback off state while cluster upgrade. {code} Expected results:{code:none} control-plane-machine-set operator should be upgraded without any errors. {code} Additional info:{code:none} This is happening during the cluster upgrade of Vsphere IPI cluster from OCP version 4.14.z to 4.15.6 and may impact other z stream releases. from the official docs[1] I see providing the failure domain for the Vsphere platform is tech preview feature. [1] https://docs.openshift.com/container-platform/4.15/machine_management/control_plane_machine_management/cpmso-configuration.html#cpmso-yaml-failure-domain-vsphere_cpmso-configuration {code} Status: Verified ... I0424 07:02:35.617844 1 panic.go:884] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="394fd0b0-8f9f-4680-a2a0-00e407b3c5e6" I0424 07:02:35.617968 1 controller.go:115] "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="controlplanemachineset" "reconcileID"="394fd0b0-8f9f-4680-a2a0-00e407b3c5e6" panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1a5911c] /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa panic({0x1c8ac60, 0x31c6ea0}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.VSphereProviderConfig.ExtractFailureDomain(...) | |||
#OCPBUGS-32176 | issue | 5 days ago | cluster-etcd-operator panic in CI CLOSED |
Issue 15938764: cluster-etcd-operator panic in CI Description: Seen [in CI|https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-network-operator/2318/pull-ci-openshift-cluster-network-operator-master-e2e-metal-ipi-ovn-ipv6/1777622844193116160]: {code:none} I0409 09:52:54.280834 1 builder.go:299] openshift-cluster-etcd-operator version v0.0.0-alpha.0-1430-g3d5483e-3d5483e1 ... E0409 10:08:08.921203 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 1581 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x28cd3c0?, 0x4b191e0}) k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0xc0016eccd0, 0x1, 0x27036c0?}) k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:49 +0x6b panic({0x28cd3c0?, 0x4b191e0?}) runtime/panic.go:914 +0x21f github.com/openshift/cluster-etcd-operator/pkg/operator/etcdcertsigner.addCertSecretToMap(0x0?, 0x0) github.com/openshift/cluster-etcd-operator/pkg/operator/etcdcertsigner/etcdcertsignercontroller.go:341 +0x27 github.com/openshift/cluster-etcd-operator/pkg/operator/etcdcertsigner.(*EtcdCertSignerController).syncAllMasterCertificates(0xc000521ea0, {0x32731e8, 0xc0006fd1d0}, {0x3280cb0, 0xc000194ee0}) github.com/openshift/cluster-etcd-operator/pkg/operator/etcdcertsigner/etcdcertsignercontroller.go:252 +0xa65 ...{code} It looks like {{syncAllMasterCertificates}} needs to be skipping the {{addCertSecretToMap}} calls for certs where {{EnsureTargetCertKeyPair}} returned an error. Status: CLOSED | |||
#OCPBUGS-15500 | issue | 2 weeks ago | openshift-tests panics when retrieving etcd logs MODIFIED |
Issue 15342416: openshift-tests panics when retrieving etcd logs Description: Description of problem:{code:none} Since we migrated some our jobs to OCP 4.14, we are experiencing a lot of flakiness with the "openshift-tests" binary which panics when trying to retrieve the logs of etcd: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2212/pull-ci-openshift-assisted-test-infra-master-e2e-metal-assisted/1673615526967906304#1:build-log.txt%3A161-191 Here's the impact on our jobs: https://search.ci.openshift.org/?search=error+reading+pod+logs&maxAge=48h&context=1&type=build-log&name=.*assisted.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job {code} Version-Release number of selected component (if applicable):{code:none} N/A {code} How reproducible:{code:none} Happens from time to time against OCP 4.14 {code} Steps to Reproduce:{code:none} 1. Provision an OCP cluster 4.14 2. Run the conformance tests on it with "openshift-tests" {code} Actual results:{code:none} The binary "openshift-tests" panics from time to time: [2023-06-27 10:12:07] time="2023-06-27T10:12:07Z" level=error msg="error reading pod logs" error="container \"etcd\" in pod \"etcd-test-infra-cluster-a1729bd4-master-2\" is not available" pod=etcd-test-infra-cluster-a1729bd4-master-2 [2023-06-27 10:12:07] panic: runtime error: invalid memory address or nil pointer dereference [2023-06-27 10:12:07] [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x26eb9b5] [2023-06-27 10:12:07] [2023-06-27 10:12:07] goroutine 1 [running]: [2023-06-27 10:12:07] bufio.(*Scanner).Scan(0xc005954250) [2023-06-27 10:12:07] bufio/scan.go:214 +0x855 [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.IntervalsFromPodLogs({0x8d91460, 0xc004a43d40}, {0xc8b83c0?, 0xc006138000?, 0xc8b83c0?}, {0x8d91460?, 0xc004a43d40?, 0xc8b83c0?}) [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation/podlogs.go:130 +0x8cd [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.InsertIntervalsFromCluster({0x8d441e0, 0xc000ffd900}, 0xc0008b4000?, {0xc005f88000?, 0x539, 0x0?}, 0x25e1e39?, {0xc11ecb5d446c4f2c, 0x4fb99e6af, 0xc8b83c0}, ...) [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation/types.go:65 +0x274 [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*MonitorEventsOptions).End(0xc001083050, {0x8d441e0, 0xc000ffd900}, 0x1?, {0x7fff15b2ccde, 0x16}) [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo/options_monitor_events.go:170 +0x225 [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*Options).Run(0xc0013e2000, 0xc00012e380, {0x8126d1e, 0xf}) [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo/cmd_runsuite.go:506 +0x2d9a [2023-06-27 10:12:07] main.newRunCommand.func1.1() [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:330 +0x2d4 [2023-06-27 10:12:07] main.mirrorToFile(0xc0013e2000, 0xc0014cdb30) [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:476 +0x5f2 [2023-06-27 10:12:07] main.newRunCommand.func1(0xc0013e0300?, {0xc000862ea0?, 0x6?, 0x6?}) [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:311 +0x5c [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).execute(0xc0013e0300, {0xc000862e40, 0x6, 0x6}) [2023-06-27 10:12:07] github.com/spf13/cobra@v1.6.0/command.go:916 +0x862 [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).ExecuteC(0xc0013e0000) [2023-06-27 10:12:07] github.com/spf13/cobra@v1.6.0/command.go:1040 +0x3bd [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).Execute(...) [2023-06-27 10:12:07] github.com/spf13/cobra@v1.6.0/command.go:968 [2023-06-27 10:12:07] main.main.func1(0xc00011b300?) [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:96 +0x8a [2023-06-27 10:12:07] main.main() [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:97 +0x516 {code} Expected results:{code:none} No panics {code} Additional info:{code:none} The source of the panic has been pin-pointed here: https://github.com/openshift/origin/pull/27772#discussion_r1243600596 {code} Status: MODIFIED | |||
#OCPBUGS-31320 | issue | 2 weeks ago | Upgrading baremetal UPI cluster with different CPUs failed, the node won't boot with new kernel CLOSED |
RHEL-32263 tracks a related, seemingly benign, kernel panic. Since that's not believed to cause issues for OpenShift we will not track that as an OCPBUGS bug. | |||
#OCPBUGS-32678 | issue | 11 days ago | Panic in cluster version operator code New |
Issue 15959289: Panic in cluster version operator code Description: This [payload run|https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.ci/release/4.16.0-0.ci-2024-04-21-112241] detects a panic in CVO code. The following payloads did not see the same panic. Bug should be prioritized by CVO team accordingly. Relevant Job run: [https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-ovn-upgrade/1782008003688402944] Panic trace as showed [in this log|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-ovn-upgrade/1782008003688402944/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-6c54b9b56c-kbwjm_cluster-version-operator_previous.log]: {noformat} I0421 13:06:29.113325 1 availableupdates.go:61] First attempt to retrieve available updates I0421 13:06:29.119731 1 cvo.go:721] Finished syncing available updates "openshift-cluster-version/version" (6.46969ms) I0421 13:06:29.120687 1 sync_worker.go:229] Notify the sync worker: Cluster operator etcd changed Degraded from "False" to "True" I0421 13:06:29.120697 1 sync_worker.go:579] Cluster operator etcd changed Degraded from "False" to "True" E0421 13:06:29.121014 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 185 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1bbc580?, 0x30cdc90}) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1e3efe0?}) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x1bbc580?, 0x30cdc90?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWork).calculateNextFrom(0xc002944000, 0x0) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:725 +0x58 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1() /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:584 +0x2f2 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000101800?, {0x2194c80, 0xc0026245d0}, 0x1, 0xc000118120) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x989680, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start(0xc002398c80, {0x21b41b8, 0xc0004be230}, 0x10) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:564 +0x135 github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run.func2() /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:431 +0x5d created by github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run in goroutine 118 /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:429 +0x49d E0421 13:06:29.121188 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 185 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1bbc580?, 0x30cdc90}) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000000002?}) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x1bbc580?, 0x30cdc90?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1e3efe0?}) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd panic({0x1bbc580?, 0x30cdc90?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWork).calculateNextFrom(0xc002944000, 0x0) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:725 +0x58 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1() /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:584 +0x2f2 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000101800?, {0x2194c80, 0xc0026245d0}, 0x1, 0xc000118120) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x989680, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start(0xc002398c80, {0x21b41b8, 0xc0004be230}, 0x10) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:564 +0x135 github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run.func2() /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:431 +0x5d created by github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run in goroutine 118 /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:429 +0x49d I0421 13:06:29.120720 1 cvo.go:738] Started syncing upgradeable "openshift-cluster-version/version" I0421 13:06:29.123165 1 upgradeable.go:69] Upgradeability last checked 5.274200045s ago, will not re-check until 2024-04-21T13:08:23Z I0421 13:06:29.123195 1 cvo.go:740] Finished syncing upgradeable "openshift-cluster-version/version" (2.469943ms) panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x195c018] goroutine 185 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000000002?}) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd panic({0x1bbc580?, 0x30cdc90?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1e3efe0?}) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd panic({0x1bbc580?, 0x30cdc90?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWork).calculateNextFrom(0xc002944000, 0x0) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:725 +0x58 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1() /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:584 +0x2f2 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000101800?, {0x2194c80, 0xc0026245d0}, 0x1, 0xc000118120) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x989680, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start(0xc002398c80, {0x21b41b8, 0xc0004be230}, 0x10) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:564 +0x135 github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run.func2() /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:431 +0x5d created by github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run in goroutine 118 /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:429 +0x49d {noformat} Status: New So commit {{5e73debd}}. And the first line in the panic stack that isn't a library is: if w.Empty() { w.State = desired.State <- this panics w.Attempt = 0 | |||
#OCPBUGS-30236 | issue | 4 weeks ago | numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference Verified |
Issue 15854922: numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference Description: Description of problem: {code:none} Pod numaresources-controller-manager is in CrashLoopBackOff state{code} {code:none} oc get po -n openshift-numaresources NAME READY STATUS RESTARTS AGE numaresources-controller-manager-766c55596b-9nb6b 0/1 CrashLoopBackOff 163 (3m52s ago) 14h secondary-scheduler-85959757db-dvpdj 1/1 Running 0 14h{code} {code:none} oc logs -n openshift-numaresources numaresources-controller-manager-766c55596b-9nb6b ... I0305 07:32:51.102133 1 shared_informer.go:341] caches populated I0305 07:32:51.102210 1 controller.go:220] "Starting workers" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" worker count=1 I0305 07:32:51.102295 1 kubeletconfig_controller.go:69] "Starting KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102412 1 panic.go:884] "Finish KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102448 1 controller.go:115] "Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" KubeletConfig="autosizing-master" namespace="" name="autosizing-master" reconcileID="91d2c547-993c-4ae1-beab-1afc0a72af68" panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1778a1c] goroutine 481 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa panic({0x19286e0, 0x2d16fc0}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift-kni/numaresources-operator/pkg/kubeletconfig.MCOKubeletConfToKubeletConf(...) /remote-source/app/pkg/kubeletconfig/kubeletconfig.go:29 github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).reconcileConfigMap(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, 0xc000226c40, {{0x0?, 0xc000e31260?}, {0xc000b98498?, 0x2de08f8?}}) /remote-source/app/controllers/kubeletconfig_controller.go:126 +0x11c github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).Reconcile(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, {{{0x0, 0x0}, {0xc000b98498, 0x11}}}) /remote-source/app/controllers/kubeletconfig_controller.go:90 +0x3cd sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1e4a2e0?, {0x1e475f0?, 0xc000e31260?}, {{{0x0?, 0xb?}, {0xc000b98498?, 0x0?}}}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004446e0, {0x1e47548, 0xc0003520f0}, {0x19b9940?, 0xc00093a1a0?}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004446e0, {0x1e47548, 0xc0003520f0}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587 {code} Version-Release number of selected component (if applicable): {code:none} numaresources-operator.v4.15.0 {code} How reproducible: {code:none} so far 100% {code} Steps to Reproduce: {code:none} 1. Create a KubeletConfig that configures autosizing: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: autosizing-master spec: autoSizingReserved: true machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: "" 2. Create a performance profile that targets subset of nodes 3. Proceed with numaresources-operator installation {code} Actual results: {code:none} Pod in CrashLoopBackOff state {code} Expected results: {code:none} numaresources-operator is successfully installed {code} Additional info: {code:none} Baremetal dualstack cluster deployed with GitOps-ZTP {code} Status: Verified | |||
#OCPBUGS-29676 | issue | 6 weeks ago | namespace "openshift-cluster-api" not found in CustomNoUpgrade Verified |
Last Transition Time: 2024-02-20T10:58:04Z Message: Panic detected: feature "AdminNetworkPolicy" is not registered in FeatureGates [] {code} | |||
#OCPBUGS-29858 | issue | 2 months ago | origin needs workaround for ROSA's infra labels Verified |
Issue 15833167: origin needs workaround for ROSA's infra labels Description: The convention is a format like {{{}node-role.kubernetes.io/role: ""{}}}, not {{{}node-role.kubernetes.io: role{}}}, however ROSA uses the latter format to indicate the {{infra}} role. This changes the node watch code to ignore it, as well as other potential variations like {{{}node-role.kubernetes.io/{}}}. The current code panics when run against a ROSA cluster: {{ E0209 18:10:55.533265 78 runtime.go:79] Observed a panic: runtime.boundsError\{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23]) goroutine 233 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(\{0x7a71840?, 0xc0018e2f48}) k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(\{0x0, 0x0, 0x1000251f9fe?}) k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:49 +0x75 panic(\{0x7a71840, 0xc0018e2f48}) runtime/panic.go:884 +0x213 github.com/openshift/origin/pkg/monitortests/node/watchnodes.nodeRoles(0x7ecd7b3?) github.com/openshift/origin/pkg/monitortests/node/watchnodes/node.go:187 +0x1e5 github.com/openshift/origin/pkg/monitortests/node/watchnodes.startNodeMonitoring.func1(0}} Status: Verified | |||
#OCPBUGS-29637 | issue | 5 days ago | image-registry co is degraded on Azure MAG, Azure Stack Hub cloud or with azure workload identity Verified |
Issue 15822717: image-registry co is degraded on Azure MAG, Azure Stack Hub cloud or with azure workload identity Description: Description of problem: {code:none} Install IPI cluster against 4.15 nightly build on Azure MAG and Azure Stack Hub or with Azure workload identity, image-registry co is degraded with different errors. On MAG: $ oc get co image-registry NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE image-registry 4.15.0-0.nightly-2024-02-16-235514 True False True 5h44m AzurePathFixControllerDegraded: Migration failed: panic: Get "https://imageregistryjima41xvvww.blob.core.windows.net/jima415a-hfxfh-image-registry-vbibdmawmsvqckhvmmiwisebryohfbtm?comp=list&prefix=docker&restype=container": dial tcp: lookup imageregistryjima41xvvww.blob.core.windows.net on 172.30.0.10:53: no such host... $ oc get pod -n openshift-image-registry NAME READY STATUS RESTARTS AGE azure-path-fix-ssn5w 0/1 Error 0 5h47m cluster-image-registry-operator-86cdf775c7-7brn6 1/1 Running 1 (5h50m ago) 5h58m image-registry-5c6796b86d-46lvx 1/1 Running 0 5h47m image-registry-5c6796b86d-9st5d 1/1 Running 0 5h47m node-ca-48lsh 1/1 Running 0 5h44m node-ca-5rrsl 1/1 Running 0 5h47m node-ca-8sc92 1/1 Running 0 5h47m node-ca-h6trz 1/1 Running 0 5h47m node-ca-hm7s2 1/1 Running 0 5h47m node-ca-z7tv8 1/1 Running 0 5h44m $ oc logs azure-path-fix-ssn5w -n openshift-image-registry panic: Get "https://imageregistryjima41xvvww.blob.core.windows.net/jima415a-hfxfh-image-registry-vbibdmawmsvqckhvmmiwisebryohfbtm?comp=list&prefix=docker&restype=container": dial tcp: lookup imageregistryjima41xvvww.blob.core.windows.net on 172.30.0.10:53: no such hostgoroutine 1 [running]: main.main() /go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:49 +0x125 The blob storage endpoint seems not correct, should be: $ az storage account show -n imageregistryjima41xvvww -g jima415a-hfxfh-rg --query primaryEndpoints { "blob": "https://imageregistryjima41xvvww.blob.core.usgovcloudapi.net/", "dfs": "https://imageregistryjima41xvvww.dfs.core.usgovcloudapi.net/", "file": "https://imageregistryjima41xvvww.file.core.usgovcloudapi.net/", "internetEndpoints": null, "microsoftEndpoints": null, "queue": "https://imageregistryjima41xvvww.queue.core.usgovcloudapi.net/", "table": "https://imageregistryjima41xvvww.table.core.usgovcloudapi.net/", "web": "https://imageregistryjima41xvvww.z2.web.core.usgovcloudapi.net/" } On Azure Stack Hub: $ oc get co image-registry NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE image-registry 4.15.0-0.nightly-2024-02-16-235514 True False True 3h32m AzurePathFixControllerDegraded: Migration failed: panic: open : no such file or directory... $ oc get pod -n openshift-image-registry NAME READY STATUS RESTARTS AGE azure-path-fix-8jdg7 0/1 Error 0 3h35m cluster-image-registry-operator-86cdf775c7-jwnd4 1/1 Running 1 (3h38m ago) 3h54m image-registry-658669fbb4-llv8z 1/1 Running 0 3h35m image-registry-658669fbb4-lmfr6 1/1 Running 0 3h35m node-ca-2jkjx 1/1 Running 0 3h35m node-ca-dcg2v 1/1 Running 0 3h35m node-ca-q6xmn 1/1 Running 0 3h35m node-ca-r46r2 1/1 Running 0 3h35m node-ca-s8jkb 1/1 Running 0 3h35m node-ca-ww6ql 1/1 Running 0 3h35m $ oc logs azure-path-fix-8jdg7 -n openshift-image-registry panic: open : no such file or directorygoroutine 1 [running]: main.main() /go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:36 +0x145 On cluster with Azure workload identity: Some operator's PROGRESSING is True image-registry 4.15.0-0.nightly-2024-02-16-235514 True True False 43m Progressing: The deployment has not completed... pod azure-path-fix is in CreateContainerConfigError status, and get error in its Event. "state": { "waiting": { "message": "couldn't find key REGISTRY_STORAGE_AZURE_ACCOUNTKEY in Secret openshift-image-registry/image-registry-private-configuration", "reason": "CreateContainerConfigError" } } {code} Version-Release number of selected component (if applicable): {code:none} 4.15.0-0.nightly-2024-02-16-235514 {code} How reproducible: {code:none} Always{code} Steps to Reproduce: {code:none} 1. Install IPI cluster on MAG or Azure Stack Hub or config Azure workload identity 2. 3. {code} Actual results: {code:none} Installation failed and image-registry operator is degraded{code} Expected results: {code:none} Installation is successful.{code} Additional info: {code:none} Seems that issue is related with https://github.com/openshift/image-registry/pull/393{code} Status: Verified Comment 24192970 by Stephen Benjamin at 2024-02-19T21:48:10.864+0000 We're seeing similar errors on regular Azure jobs in 4.16 payloads: [https://search.ci.openshift.org/?search=AzurePathFixControllerDegraded%3A+Migration+failed%3A+panic%3A&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] Comment 24194013 by Wen Wang at 2024-02-20T02:20:27.030+0000 [~fmissi] [~stbenjam] meet the issues:[AzurePathFixControllerDegraded|https://search.ci.openshift.org/, search=AzurePathFixControllerDegraded%3A+Migration+failed%3A+panic%3A&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] could you help to check, thanks Comment 24194860 by Flavian Missi at 2024-02-20T07:39:03.018+0000 [https://search.ci.openshift.org/?search=AzurePathFixControllerDegraded%3A+Migration+failed%3A+panic%3A&maxAge=48h&context=1&type=junit&name=.*4.16.*azure.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] Comment 24197026 by Flavian Missi at 2024-02-20T12:06:26.787+0000 | |||
#OCPBUGS-32396 | issue | 2 days ago | Azure upgrades to 4.14.15+ fail with UPI storage account CLOSED |
Issue 15951280: Azure upgrades to 4.14.15+ fail with UPI storage account Description: This is a clone of issue OCPBUGS-32328. The following is the description of the original issue: --- Description of problem: {code:none} Cluster with user provisioned image registry storage accounts fails to upgrade to 4.14.20 due to image-registry-operator being degraded. message: "Progressing: The registry is ready\nNodeCADaemonProgressing: The daemon set node-ca is deployed\nAzurePathFixProgressing: Migration failed: panic: AZURE_CLIENT_ID is required for authentication\nAzurePathFixProgressing: \nAzurePathFixProgressing: goroutine 1 [running]:\nAzurePathFixProgressing: main.main()\nAzurePathFixProgressing: \t/go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:25 +0x15c\nAzurePathFixProgressing: " cmd/move-blobs was introduced due to https://issues.redhat.com/browse/OCPBUGS-29003. {code} Version-Release number of selected component (if applicable): {code:none} 4.14.15+{code} How reproducible: {code:none} I have not reproduced myself but I imagine you would hit this every time when upgrading from 4.13->4.14.15+ with Azure UPI image registry{code} Steps to Reproduce: {code:none} 1.Starting on version 4.13, Configuring the registry for Azure user-provisioned infrastructure - https://docs.openshift.com/container-platform/4.14/registry/configuring_registry_storage/configuring-registry-storage-azure-user-infrastructure.html. 2. Upgrade to 4.14.15+ 3. {code} Actual results: {code:none} Upgrade does not complete succesfully $ oc get co .... image-registry 4.14.20 True False True 617d AzurePathFixControllerDegraded: Migration failed: panic: AZURE_CLIENT_ID is required for authentication... $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.38 True True 7h41m Unable to apply 4.14.20: wait has exceeded 40 minutes for these operators: image-registry{code} Expected results: {code:none} Upgrade to complete successfully{code} Additional info: {code:none} {code} Status: CLOSED | |||
#OCPBUGS-29932 | issue | 7 weeks ago | image registry operator displays panic in status from move-blobs command Verified |
Issue 15836470: image registry operator displays panic in status from move-blobs command Description: Description of problem: {code:none} Sample job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-qe-ocp-qe-perfscale-ci-main-azure-4.15-nightly-x86-data-path-9nodes/1760228008968327168{code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} Anytime there is an error from the move-blobs command{code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} An error message is shown{code} Expected results: {code:none} A panic is shown followed by the error message{code} Additional info: {code:none} {code} Status: Verified The only way I know is to deploy an Azure cluster, copy the azure-path-fix job yaml from the image-registry namespace to a local file, change the job name, remove a required env var (like the one containing the account name), then run the new job - this way the move-blobs command will fail at validation, which should exit with status 1 and log the failure (but no longer panic). You want to be looking at the logs for the pods that have status "Error" - the word "panic" should not be present there. | |||
#OCPBUGS-31021 | issue | 10 days ago | Hit panic error when trying to prepare subcommand via oc-mirror v2 CLOSED |
Issue 15885210: Hit panic error when trying to prepare subcommand via oc-mirror v2 Description: Description of problem: {code:none} When use oc-mirror v2 to mirror operators , the command will panic: oc-mirror --v2 prepare --from file://outfilter --config config.yaml file://out -p 50015--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used. panic: runtime error: invalid memory address or nil pointer dereference[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x3c1ba6d] goroutine 1 [running]:github.com/openshift/oc-mirror/v2/pkg/cli.(*ExecutorSchema).setupLogsLevelAndDir(0xc000898a80) /go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:685 +0xcdgithub.com/openshift/oc-mirror/v2/pkg/cli.(*ExecutorSchema).CompletePrepare(0xc000898a80, {0xc000a8c1a0?, 0x0?, 0x0?}) /go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:864 +0x413github.com/openshift/oc-mirror/v2/pkg/cli.NewPrepareCommand.func1(0xc0009a5800?, {0xc00089d300, 0x1, 0x8}) /go/src/github.com/openshift/oc-mirror/vendor/github.com/openshift/oc-mirror/v2/pkg/cli/executor.go:795 +0x18agithub.com/spf13/cobra.(*Command).execute(0xc000005500, {0xc00089d280, 0x8, 0x8}) /go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:944 +0x863github.com/spf13/cobra.(*Command).ExecuteC(0xc000004f00) /go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:1068 +0x3a5github.com/spf13/cobra.(*Command).Execute(0x6c9dd68?) /go/src/github.com/openshift/oc-mirror/vendor/github.com/spf13/cobra/command.go:992 +0x13main.main() /go/src/github.com/openshift/oc-mirror/cmd/oc-mirror/main.go:10 +0x18 {code} Version-Release number of selected component (if applicable): {code:none} oc-mirror version WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403070215.p0.gc4f8295.assembly.stream.el9-c4f8295", GitCommit:"c4f829512107f7d0f52a057cd429de2030b9b3b3", GitTreeState:"clean", BuildDate:"2024-03-07T03:46:24Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}{code} How reproducible: {code:none} always{code} Steps to Reproduce: {code:none} run command : oc-mirror --v2 prepare --from file://outfilter --config config.yaml file://out -p 50015{code} Actual results: {code:java} command panic {code} Expected results: {code:none} No panic{code} Status: CLOSED | |||
#OCPBUGS-13589 | issue | 2 weeks ago | Rule upstream-ocp4-kubelet-enable-protect-kernel-sysctl-file-exist fail for rhel9 based RHCOS systems Verified |
{{% set kernel_root_maxkeys_val = 1000000 %}} {{% set kernel_panic_val = 10 %}} {{% set kernel_panic_on_oops_val = 1 %}} {{% set vm_overcommit_memory_val = 1 %}} {{% set vm_panic_on_oom_val = 0 %}}{noformat} And these are the default sysctls values on the node: kernel.keys.root_maxkeys = 1000000 sh-5.1# sysctl kernel.panic kernel.panic = 10 sh-5.1# sysctl kernel.panic_on_oops kernel.panic_on_oops = 1 sh-5.1# sysctl vm.overcommit_memory | |||
#OCPBUGS-31421 | issue | 3 weeks ago | Autoscaler should scale from zero when taints do not have a "value" field Verified |
Issue 15902545: Autoscaler should scale from zero when taints do not have a "value" field Description: Description of problem:{code:none} When scaling from zero replicas, the cluster autoscaler can panic if there are taints on the machineset with no "value" field defined. {code} Version-Release number of selected component (if applicable):{code:none} 4.16/master {code} How reproducible:{code:none} always {code} Steps to Reproduce:{code:none} 1. create a machineset with a taint that has no value field and 0 replicas 2. enable the cluster autoscaler 3. force a workload to scale the tainted machineset {code} Actual results:{code:none} a panic like this is observed I0325 15:36:38.314276 1 clusterapi_provider.go:68] discovered node group: MachineSet/openshift-machine-api/k8hmbsmz-c2483-9dnddr4sjc (min: 0, max: 2, replicas: 0) panic: interface conversion: interface {} is nil, not string goroutine 79 [running]: k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredToTaint(...) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:246 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredScalableResource.Taints({0xc000103d40?, 0xc000121360?, 0xc002386f98?, 0x2?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:214 +0x8a5 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.(*nodegroup).TemplateNodeInfo(0xc002675930) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go:266 +0x2ea k8s.io/autoscaler/cluster-autoscaler/core/utils.GetNodeInfoFromTemplate({0x276b230, 0xc002675930}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60?, 0xc0023ffe90?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/utils/utils.go:41 +0x9d k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider.(*MixedTemplateNodeInfoProvider).Process(0xc00084f848, 0xc0023f7680, {0xc001dcdb00, 0x3, 0x0?}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60, 0xc0023ffe90}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go:155 +0x599 k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000617550, {0x4?, 0x0?, 0x3a56f60?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:352 +0xcaa main.run(0x0?, {0x2761b48, 0xc0004c04e0}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x2cd main.main.func2({0x0?, 0x0?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:617 +0x25 created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x105 {code} Expected results:{code:none} expect the machineset to scale up {code} Additional info: i think the e2e test that exercises this is only running on periodic jobs and as such we missed this error in OCPBUGS-27509 . [this search shows some failed results | https://search.dptools.openshift.org/?search=It+scales+from%2Fto+zero&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] Status: Verified 2. the user could disable autoscaling for that group, temporarily 3. the user could manually scale the replicas to 1 and avoid the autoscaler panicking 4. the user could remove the taints from the MachineSet temporarily I can reproduced this in 4.15.0-0.nightly-2024-03-29-144548 and I tested with clusterversion 4.16.0-0.nightly-2024-04-01-213440, autoscaler pod will not panic, machineset can scale up. {code:java} | |||
#OCPBUGS-32514 | issue | 8 days ago | IBU upgrade fails during the upgrade stage when running recert: `Err` value: could not remove key: f:etcd-peer-sno.kni-qe-2.lab.eng.rdu2.redhat.com.crt' MODIFIED |
Issue 15957081: IBU upgrade fails during the upgrade stage when running recert: `Err` value: could not remove key: f:etcd-peer-sno.kni-qe-2.lab.eng.rdu2.redhat.com.crt' Description: Description of problem: {code:none} IBU upgrade fails during the upgrade stage when running recert with the following error: Rollback due to postpivot failure: failed to run once recert for post pivot: failed recert full flow: failed to run recert tool container: 2024-04-19 15:30:24 - INFO - src/cluster_crypto/crypto_utils.rs:245: using openssl: OpenSSL 3.0.7 1 Nov 2022 (Library: OpenSSL 3.0.7 1 Nov 2022) 2024-04-19 15:30:36 - WARN - src/cluster_crypto/crypto_objects.rs:81: ignoring error from processing pem-looking text at location k8s:ConfigMap/kube-system:cluster-config-v1:/data/install-config, without encoding, unknown: processing pem bundle 2024-04-19 15:30:39 - WARN - src/cluster_crypto/crypto_objects.rs:81: ignoring error from processing pem-looking text at location k8s:ConfigMap/openshift-etcd:cluster-config-v1:/data/install-config, without encoding, unknown: processing pem bundle 2024-04-19 15:30:46 - INFO - src/cluster_crypto/cert_key_pair.rs:173: Using custom private key for CN kube-apiserver-localhost-signer 2024-04-19 15:30:47 - INFO - src/cluster_crypto/cert_key_pair.rs:173: Using custom private key for CN kube-apiserver-service-network-signer 2024-04-19 15:30:47 - INFO - src/cluster_crypto/cert_key_pair.rs:173: Using custom private key for CN kube-apiserver-lb-signer thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: could not remove key: f:etcd-peer-sno.kni-qe-2.lab.eng.rdu2.redhat.com.crt', src/ocp_postprocess/hostname_rename/etcd_rename.rs:69:101 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace {code} Version-Release number of selected component (if applicable): {code:none} 4.16.0-0.nightly-2024-04-16-015315 lifecycle-agent.v4.16.0(lifecycle-agent-operator-bundle-container-v4.16.0-32, brew.registry.redhat.io/openshift4/recert-rhel9@sha256:f1b6f75ce508ee25b496caa833e0be70abb1dd5f2dc3dbae1e0e11f5f8600f68){code} How reproducible: {code:none} 2/2 times so far{code} Steps to Reproduce: {code:none} 1. Generate a seed image on a 4.16 SNO 2. Run the IBU upgrade process on a 4.14 SNO with 4.16 LCA installed 3. Check Upgrade stage result {code} Actual results: {code:none} Upgrade failed while running recert with the following error: Rollback due to postpivot failure: failed to run once recert for post pivot: failed recert full flow: failed to run recert tool container: 2024-04-19 15:30:24 - INFO - src/cluster_crypto/crypto_utils.rs:245: using openssl: OpenSSL 3.0.7 1 Nov 2022 (Library: OpenSSL 3.0.7 1 Nov 2022) 2024-04-19 15:30:36 - WARN - src/cluster_crypto/crypto_objects.rs:81: ignoring error from processing pem-looking text at location k8s:ConfigMap/kube-system:cluster-config-v1:/data/install-config, without encoding, unknown: processing pem bundle 2024-04-19 15:30:39 - WARN - src/cluster_crypto/crypto_objects.rs:81: ignoring error from processing pem-looking text at location k8s:ConfigMap/openshift-etcd:cluster-config-v1:/data/install-config, without encoding, unknown: processing pem bundle 2024-04-19 15:30:46 - INFO - src/cluster_crypto/cert_key_pair.rs:173: Using custom private key for CN kube-apiserver-localhost-signer 2024-04-19 15:30:47 - INFO - src/cluster_crypto/cert_key_pair.rs:173: Using custom private key for CN kube-apiserver-service-network-signer 2024-04-19 15:30:47 - INFO - src/cluster_crypto/cert_key_pair.rs:173: Using custom private key for CN kube-apiserver-lb-signer thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: could not remove key: f:etcd-peer-sno.kni-qe-2.lab.eng.rdu2.redhat.com.crt', src/ocp_postprocess/hostname_rename/etcd_rename.rs:69:101 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace : exit status 101 {code} Expected results: {code:none} No failure{code} Additional info: {code:none} seed image can be pulled from registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/ibu/seed:4.16.0-0.nightly-2024-04-16-015315{code} Status: MODIFIED Comment 24611914 by GitLab CEE Bot at 2024-04-25T21:05:51.450+0000 [CPaaS Service Account|https://gitlab.cee.redhat.com/cpaas-bot] mentioned this issue in [merge request !45|https://gitlab.cee.redhat.com/cpaas-midstream/ibu/recert/-/merge_requests/45] of [cpaas-midstream / ibu / recert|https://gitlab.cee.redhat.com/cpaas-midstream/ibu/recert] on branch [rhaos-4.16-rhel-9__upstream__c05662c735a0db45615b2c3ac9d84cba|https://gitlab.cee.redhat.com/cpaas-midstream/ibu/recert/-/tree/rhaos-4.16-rhel-9__upstream__c05662c735a0db45615b2c3ac9d84cba]:{quote}Updated US source to: 467c0ee Merge pull request #131 from mresvanis/fix-etcd-peer-panic{quote} Comment 24612317 by UNKNOWN at 2024-04-25T23:39:35.847+0000 | |||
#OCPBUGS-30923 | issue | 2 weeks ago | numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference CLOSED |
Issue 15877795: numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference Description: This is a clone of issue OCPBUGS-30342. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-30236. The following is the description of the original issue: --- Description of problem: {code:none} Pod numaresources-controller-manager is in CrashLoopBackOff state{code} {code:none} oc get po -n openshift-numaresources NAME READY STATUS RESTARTS AGE numaresources-controller-manager-766c55596b-9nb6b 0/1 CrashLoopBackOff 163 (3m52s ago) 14h secondary-scheduler-85959757db-dvpdj 1/1 Running 0 14h{code} {code:none} oc logs -n openshift-numaresources numaresources-controller-manager-766c55596b-9nb6b ... I0305 07:32:51.102133 1 shared_informer.go:341] caches populated I0305 07:32:51.102210 1 controller.go:220] "Starting workers" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" worker count=1 I0305 07:32:51.102295 1 kubeletconfig_controller.go:69] "Starting KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102412 1 panic.go:884] "Finish KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102448 1 controller.go:115] "Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" KubeletConfig="autosizing-master" namespace="" name="autosizing-master" reconcileID="91d2c547-993c-4ae1-beab-1afc0a72af68" panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1778a1c] goroutine 481 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa panic({0x19286e0, 0x2d16fc0}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift-kni/numaresources-operator/pkg/kubeletconfig.MCOKubeletConfToKubeletConf(...) /remote-source/app/pkg/kubeletconfig/kubeletconfig.go:29 github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).reconcileConfigMap(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, 0xc000226c40, {{0x0?, 0xc000e31260?}, {0xc000b98498?, 0x2de08f8?}}) /remote-source/app/controllers/kubeletconfig_controller.go:126 +0x11c github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).Reconcile(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, {{{0x0, 0x0}, {0xc000b98498, 0x11}}}) /remote-source/app/controllers/kubeletconfig_controller.go:90 +0x3cd sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1e4a2e0?, {0x1e475f0?, 0xc000e31260?}, {{{0x0?, 0xb?}, {0xc000b98498?, 0x0?}}}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004446e0, {0x1e47548, 0xc0003520f0}, {0x19b9940?, 0xc00093a1a0?}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004446e0, {0x1e47548, 0xc0003520f0}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587 {code} Version-Release number of selected component (if applicable): {code:none} numaresources-operator.v4.15.0 {code} How reproducible: {code:none} so far 100% {code} Steps to Reproduce: {code:none} 1. Create a KubeletConfig that configures autosizing: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: autosizing-master spec: autoSizingReserved: true machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: "" 2. Create a performance profile that targets subset of nodes 3. Proceed with numaresources-operator installation {code} Actual results: {code:none} Pod in CrashLoopBackOff state {code} Expected results: {code:none} numaresources-operator is successfully installed {code} Additional info: {code:none} Baremetal dualstack cluster deployed with GitOps-ZTP {code} Status: CLOSED | |||
#OCPBUGS-33172 | issue | 32 hours ago | nil pointer dereference in AzurePathFix controller ON_QA |
Issue 15979049: nil pointer dereference in AzurePathFix controller Description: Seeing this in hypershift e2e. I think it is racing with the Infrastructure status being populated and {{PlatformStatus}} being nil. https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.16-periodics-e2e-aws-ovn/1785458059246571520/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestAutoscaling_Teardown/namespaces/e2e-clusters-rjhhw-example-g6tsn/core/pods/logs/cluster-image-registry-operator-5597f9f4d4-dfvc6-cluster-image-registry-operator-previous.log {code} I0501 00:13:11.951062 1 azurepathfixcontroller.go:324] Started AzurePathFixController I0501 00:13:11.951056 1 base_controller.go:73] Caches are synced for LoggingSyncer I0501 00:13:11.951072 1 imageregistrycertificates.go:214] Started ImageRegistryCertificatesController I0501 00:13:11.951077 1 base_controller.go:110] Starting #1 worker of LoggingSyncer controller ... E0501 00:13:11.951369 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 534 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2d6bd00?, 0x57a60e0}) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x3bcb370?}) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x2d6bd00?, 0x57a60e0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).sync(0xc000003d40) /go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:171 +0x97 github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).processNextWorkItem(0xc000003d40) /go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:154 +0x292 github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).runWorker(...) /go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:133 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001186820?, {0x3bd1320, 0xc000cace40}, 0x1, 0xc000ca2540) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0011bac00?, 0x3b9aca00, 0x0, 0xd0?, 0x447f9c?) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0xc001385f68?, 0xc001385f78?) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e created by github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).Run in goroutine 248 /go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:322 +0x1a6 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x2966e97] {code} https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/operator/azurepathfixcontroller.go#L171 Status: ON_QA Comment 24657379 by Flavian Missi at 2024-05-02T08:10:30.356+0000 Hi [~sjenning], thanks for reporting this. Does this eventually resolve itself? I got a PR with a fix out but wonder how we should backport it. If the panic eventually resolves itself (once the platform status is populated) I think we might be okay with fixing this on 4.16 only. WDYT? Comment 24660182 by Seth Jennings at 2024-05-02T13:29:35.566+0000 | |||
#OCPBUGS-33193 | issue | 2 days ago | operator panics in hosted cluster with OVN when obfuscation is enabled New |
Issue 15980132: operator panics in hosted cluster with OVN when obfuscation is enabled Description: This is a clone of issue OCPBUGS-32702. The following is the description of the original issue: --- Description of problem: {code:none} The operator panics in HyperShift hosted cluster with OVN and with enabled networking obfuscation: {code} {noformat} 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 858 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x26985e0?, 0x454d700}) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0010d67e0?}) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x26985e0, 0x454d700}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/insights-operator/pkg/anonymization.getNetworksFromClusterNetworksConfig(...) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:292 github.com/openshift/insights-operator/pkg/anonymization.getNetworksForAnonymizer(0xc000556700, 0xc001154ea0, {0x0, 0x0, 0x0?}) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:253 +0x202 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).readNetworkConfigs(0xc0005be640) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:180 +0x245 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).AnonymizeMemoryRecord.func1() /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:354 +0x25 sync.(*Once).doSlow(0xc0010d6c70?, 0x21a9006?) /usr/lib/golang/src/sync/once.go:74 +0xc2 sync.(*Once).Do(...) /usr/lib/golang/src/sync/once.go:65 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).AnonymizeMemoryRecord(0xc0005be640, 0xc000cf0dc0) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:353 +0x78 github.com/openshift/insights-operator/pkg/recorder.(*Recorder).Record(0xc00075c4b0, {{0x2add75b, 0xc}, {0x0, 0x0, 0x0}, {0x2f38d28, 0xc0009c99c0}}) /go/src/github.com/openshift/insights-operator/pkg/recorder/recorder.go:87 +0x49f github.com/openshift/insights-operator/pkg/gather.recordGatheringFunctionResult({0x2f255c0, 0xc00075c4b0}, 0xc0010d7260, {0x2adf900, 0xd}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:157 +0xb9c github.com/openshift/insights-operator/pkg/gather.collectAndRecordGatherer({0x2f50058?, 0xc001240c90?}, {0x2f30880?, 0xc000994240}, {0x2f255c0, 0xc00075c4b0}, {0x0?, 0x8dcb80?, 0xc000a673a2?}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:113 +0x296 github.com/openshift/insights-operator/pkg/gather.CollectAndRecordGatherer({0x2f50058, 0xc001240c90}, {0x2f30880, 0xc000994240?}, {0x2f255c0, 0xc00075c4b0}, {0x0, 0x0, 0x0}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:89 +0xe5 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Gather.func2(0xc000a678a0, {0x2f50058, 0xc001240c90}, 0xc000796b60, 0x26f0460?) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:206 +0x1a8 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Gather(0xc000796b60) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:222 +0x450 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).periodicTrigger(0xc000796b60, 0xc000236a80) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:265 +0x2c5 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Run.func1() /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:161 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00007d7c0?, {0x2f282a0, 0xc0012cd800}, 0x1, 0xc000236a80) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001381fb0?, 0x3b9aca00, 0x0, 0x0?, 0x449705?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0xabfaca?, 0x88d6e6?, 0xc00078a360?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x25 created by github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Run /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:161 +0x1ea {noformat} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} Enable networking obfuscation for the Insights Operator and wait for gathering to happen in the operator. You will see the above stacktrace. {code} Steps to Reproduce: {code:none} 1. Create a HyperShift hosted cluster with OVN 2. Enable networking obfuscation for the Insights Operator 3. Wait for data gathering to happen in the operator {code} Actual results: {code:none} operator panics{code} Expected results: {code:none} there's no panic{code} Additional info: {code:none} {code} Status: New | |||
#OCPBUGS-33208 | issue | 2 days ago | nil pointer dereference in AzurePathFix controller New |
Issue 15981031: nil pointer dereference in AzurePathFix controller Description: This is a clone of issue OCPBUGS-33172. The following is the description of the original issue: --- Seeing this in hypershift e2e. I think it is racing with the Infrastructure status being populated and {{PlatformStatus}} being nil. https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.16-periodics-e2e-aws-ovn/1785458059246571520/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestAutoscaling_Teardown/namespaces/e2e-clusters-rjhhw-example-g6tsn/core/pods/logs/cluster-image-registry-operator-5597f9f4d4-dfvc6-cluster-image-registry-operator-previous.log {code} I0501 00:13:11.951062 1 azurepathfixcontroller.go:324] Started AzurePathFixController I0501 00:13:11.951056 1 base_controller.go:73] Caches are synced for LoggingSyncer I0501 00:13:11.951072 1 imageregistrycertificates.go:214] Started ImageRegistryCertificatesController I0501 00:13:11.951077 1 base_controller.go:110] Starting #1 worker of LoggingSyncer controller ... E0501 00:13:11.951369 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 534 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2d6bd00?, 0x57a60e0}) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x3bcb370?}) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x2d6bd00?, 0x57a60e0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).sync(0xc000003d40) /go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:171 +0x97 github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).processNextWorkItem(0xc000003d40) /go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:154 +0x292 github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).runWorker(...) /go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:133 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001186820?, {0x3bd1320, 0xc000cace40}, 0x1, 0xc000ca2540) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0011bac00?, 0x3b9aca00, 0x0, 0xd0?, 0x447f9c?) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0xc001385f68?, 0xc001385f78?) /go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e created by github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).Run in goroutine 248 /go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:322 +0x1a6 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x2966e97] {code} https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/operator/azurepathfixcontroller.go#L171 Status: New | |||
#OCPBUGS-26605 | issue | 2 months ago | e2e-gcp-op-layering CI job continuously failing Verified |
Issue 15709382: e2e-gcp-op-layering CI job continuously failing Description: Description of problem: The e2e-gcp-op-layering CI job seems to be continuously and consistently failing during the teardown process. In particular, it appears to be the {{TestOnClusterBuildRollsOutImage}} test that is failing whenever it attempts to tear down the node. See: [https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4060/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering/1744805949165539328] for an example of a failing job. Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} Always{code} Steps to Reproduce: {code:none} Open a PR to the GitHub MCO repository.{code} Actual results: {code:none} The teardown portion of the TestOnClusterBuildsRollout test fails thusly: utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f utils.go:1098: Error Trace: /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149 /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79 /usr/lib/golang/src/testing/testing.go:1150 /usr/lib/golang/src/testing/testing.go:1328 /usr/lib/golang/src/testing/testing.go:1570 Error: Received unexpected error: exit status 1 Test: TestOnClusterBuildRollsOutImage utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f utils.go:1098: Error Trace: /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149 /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79 /usr/lib/golang/src/testing/testing.go:1150 /usr/lib/golang/src/testing/testing.go:1328 /usr/lib/golang/src/testing/testing.go:1312 /usr/lib/golang/src/runtime/panic.go:522 /usr/lib/golang/src/testing/testing.go:980 /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149 /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79 /usr/lib/golang/src/testing/testing.go:1150 /usr/lib/golang/src/testing/testing.go:1328 /usr/lib/golang/src/testing/testing.go:1570 Error: Received unexpected error: exit status 1 Test: TestOnClusterBuildRollsOutImage{code} Expected results: {code:none} This part of the test should pass.{code} Additional info: {code:none} The way the test teardown process currently works is that it shells out to the oc command to delete the underlying Machine and Node. We delete the underlying machine and node so that the cloud provider will provision us a new one due to issues with opting out of on-cluster builds that have yet to be resolved. At the time this test was written, it was implemented in this way to avoid having to vendor the Machine client and API into the MCO codebase which has since happened. I suspect the issue is that oc is failing in some way since we get an exit status 1 from where it is invoked. Now that the Machine client and API are vendored into the MCO codebase, it makes more sense for us to use those directly instead of shelling out to oc in order to do this since we would get more verbose error messages instead.{code} Status: Verified | |||
#TRT-1632 | issue | 3 days ago | openshift-controller-manager pod panic due to type assertion CLOSED |
Issue 15973057: openshift-controller-manager pod panic due to type assertion Description: Caught by the test: Undiagnosed panic detected in pod Sample job run: [https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-upgrade/1783981854974545920] Error message {code} { pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:02.367266 1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret) pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:03.368403 1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret) pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:04.370157 1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)} {code} [Sippy indicates|https://sippy.dptools.openshift.org/sippy-ng/tests/4.16/analysis?test=Undiagnosed%20panic%20detected%20in%20pod&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22Undiagnosed%20panic%20detected%20in%20pod%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D] it's happening a small percentage of the time since around Apr 25th. Took out the last payload so labeling trt-incident for now. See the linked OCPBUG for the actual component. Status: CLOSED | |||
#OCPBUGS-32328 | issue | 3 days ago | Azure upgrades to 4.14.15+ fail with UPI storage account Verified |
Issue 15946396: Azure upgrades to 4.14.15+ fail with UPI storage account Description: Description of problem: {code:none} Cluster with user provisioned image registry storage accounts fails to upgrade to 4.14.20 due to image-registry-operator being degraded. message: "Progressing: The registry is ready\nNodeCADaemonProgressing: The daemon set node-ca is deployed\nAzurePathFixProgressing: Migration failed: panic: AZURE_CLIENT_ID is required for authentication\nAzurePathFixProgressing: \nAzurePathFixProgressing: goroutine 1 [running]:\nAzurePathFixProgressing: main.main()\nAzurePathFixProgressing: \t/go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:25 +0x15c\nAzurePathFixProgressing: " cmd/move-blobs was introduced due to https://issues.redhat.com/browse/OCPBUGS-29003. {code} Version-Release number of selected component (if applicable): {code:none} 4.14.15+{code} How reproducible: {code:none} I have not reproduced myself but I imagine you would hit this every time when upgrading from 4.13->4.14.15+ with Azure UPI image registry{code} Steps to Reproduce: {code:none} 1.Starting on version 4.13, Configuring the registry for Azure user-provisioned infrastructure - https://docs.openshift.com/container-platform/4.14/registry/configuring_registry_storage/configuring-registry-storage-azure-user-infrastructure.html. 2. Upgrade to 4.14.15+ 3. {code} Actual results: {code:none} Upgrade does not complete succesfully $ oc get co .... image-registry 4.14.20 True False True 617d AzurePathFixControllerDegraded: Migration failed: panic: AZURE_CLIENT_ID is required for authentication... $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.38 True True 7h41m Unable to apply 4.14.20: wait has exceeded 40 minutes for these operators: image-registry{code} Expected results: {code:none} Upgrade to complete successfully{code} Additional info: {code:none} {code} Status: Verified {code:none} 2024-04-16T12:33:48.474442188Z panic: AZURE_CLIENT_ID is required for authentication 2024-04-16T12:33:48.474442188Z | |||
#OCPBUGS-30067 | issue | 7 weeks ago | [4.14] Panic: send on closed channel Verified |
Issue 15844608: [4.14] Panic: send on closed channel Description: This is a clone of issue OCPBUGS-28628. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-27959. The following is the description of the original issue: --- In a CI run of etcd-operator-e2e I've found the following panic in the operator logs: {code:java} E0125 11:04:58.158222 1 health.go:135] health check for member (ip-10-0-85-12.us-west-2.compute.internal) failed: err(context deadline exceeded) panic: send on closed channel goroutine 15608 [running]: github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth.func1() github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:58 +0xd2 created by github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:54 +0x2a5 {code} which unfortunately is an incomplete log file. The operator recovered itself by restarting, we should fix the panic nonetheless. Job run for reference: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1186/pull-ci-openshift-cluster-etcd-operator-master-e2e-operator/1750466468031500288 Status: Verified | |||
#OCPBUGS-28535 | issue | 2 months ago | CCO Pod crashes on BM cluster when AWS Root Credential exists Verified |
Issue 15751578: CCO Pod crashes on BM cluster when AWS Root Credential exists Description: Description of problem: {code:none} Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1996624, when the AWS root credential (must possesses the "iam:SimulatePrincipalPolicy" permission) exists on a BM cluster, the CCO Pod crashes when running the secretannotator controller. {code} Steps to Reproduce: {code:none} 1. Install a BM cluster fxie-mac:cloud-credential-operator fxie$ oc get infrastructures.config.openshift.io cluster -o yaml apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2024-01-28T19:50:05Z" generation: 1 name: cluster resourceVersion: "510" uid: 45bc2a29-032b-4c74-8967-83c73b0141c4 spec: cloudConfig: name: "" platformSpec: type: None status: apiServerInternalURI: https://api-int.fxie-bm1.qe.devcluster.openshift.com:6443 apiServerURL: https://api.fxie-bm1.qe.devcluster.openshift.com:6443 controlPlaneTopology: SingleReplica cpuPartitioning: None etcdDiscoveryDomain: "" infrastructureName: fxie-bm1-x74wn infrastructureTopology: SingleReplica platform: None platformStatus: type: None 2. Create an AWS user with IAMReadOnlyAccess permissions: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iam:GenerateCredentialReport", "iam:GenerateServiceLastAccessedDetails", "iam:Get*", "iam:List*", "iam:SimulateCustomPolicy", "iam:SimulatePrincipalPolicy" ], "Resource": "*" } ] } 3. Create AWS root credentials with a set of access keys of the user above 4. Trigger a reconcile of the secretannotator controller, e.g. via editting cloudcredential/cluster {code} Logs: {quote}time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:CreateAccessKey" controller=secretannotator time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:CreateUser" controller=secretannotator time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:DeleteAccessKey" controller=secretannotator time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:DeleteUser" controller=secretannotator time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:DeleteUserPolicy" controller=secretannotator time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:PutUserPolicy" controller=secretannotator time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:TagUser" controller=secretannotator time="2024-01-29T04:47:27Z" level=warning msg="Tested creds not able to perform all requested actions" controller=secretannotator I0129 04:47:27.988535 1 reflector.go:289] Starting reflector *v1.Infrastructure (10h37m20.569091933s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 I0129 04:47:27.988546 1 reflector.go:325] Listing and watching *v1.Infrastructure from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 I0129 04:47:27.989503 1 reflector.go:351] Caches populated for *v1.Infrastructure from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a964a0] goroutine 341 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:115 +0x1e5 panic(\{0x3fe72a0?, 0x809b9e0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f {+}_{*}github.com/openshift/cloud-credential-operator/pkg/operator/utils/aws.{*}{*}LoadInfrastructureRegion{*}_{+}(\{0x562e1c0?, 0xc002c99a70?}, \{0x5639ef0, 0xc0001b6690}) /go/src/github.com/openshift/cloud-credential-operator/pkg/operator/utils/aws/utils.go:72 +0x40 github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws.(*ReconcileCloudCredSecret).validateCloudCredsSecret(0xc0008c2000, 0xc002586000) /go/src/github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws/reconciler.go:206 +0x1a5 github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws.(*ReconcileCloudCredSecret).Reconcile(0xc0008c2000, \{0x30?, 0xc000680c00?}, {{{{}0x4f38a3d?, 0x0?}, {0x4f33a20?, 0x416325?{}}}}) /go/src/github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws/reconciler.go:166 +0x605 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x561ff20?, \{0x561ff20?, 0xc002ff3b00?}, {{{{}0x4f38a3d?, 0x3b180c0?}, {0x4f33a20?, 0x55eea08?{}}}}) /go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0xb7 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000189360, \{0x561ff58, 0xc0007e5040}, \{0x4589f00?, 0xc000570b40?}) /go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314 +0x365 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000189360, \{0x561ff58, 0xc0007e5040}) /go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265 +0x1c9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226 +0x79 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 183 /go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222 +0x565 {quote} Actual results: {code:none} CCO Pod crashes and restarts in a loop: fxie-mac:cloud-credential-operator fxie$ oc get po -n openshift-cloud-credential-operator -w NAME READY STATUS RESTARTS AGE cloud-credential-operator-657bdffdff-9wzrs 2/2 Running 3 (2m35s ago) 8h{code} Status: Verified {quote} This is indeed a bit bizarre but a customer has encountered a very similar problem before. In [https://bugzilla.redhat.com/show_bug.cgi?id=1996624], an OpenStack cluster incorporates the AWS root credential secret, causing CCO to panic. {quote} The actuator is selected based on the platform [here|https://github.com/openshift/cloud-credential-operator/blob/5efa618d93cad5c44c9e92844833156176b827a4/pkg/operator/controller.go#L87]. A BM cluster (Platform: None) should have a dummy actuator, which should never reach this code. {quote} The panic happens in the secretannotator controller, which adopts the AWS implementation in the "default" case, see [https://github.com/openshift/cloud-credential-operator/blob/2b5a6b6176695b9632ff4c6b85931f6c6d408961/pkg/operator/secretannotator/secretannotator_controller.go#L55]. {quote}Is the platform being changed as part of the reproduction of this bug? | |||
#OCPBUGS-27892 | issue | 3 months ago | panic in poller CLOSED |
Issue 15741466: panic in poller Description: Description of problem: {code:none} {code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} {code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} {code} Expected results: {code:none} {code} Additional info: {code:none} {code} | |||
#OCPBUGS-18138 | issue | 5 days ago | CI sees: pods/openshift-console-operator_console-operator-6bf69485c8-2s498_console-operator.log.gz:E0821 06:29:09.418195 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" Verified |
Issue 15443283: CI sees: pods/openshift-console-operator_console-operator-6bf69485c8-2s498_console-operator.log.gz:E0821 06:29:09.418195 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" Description: Description of problem: {code:none} example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-jboss-fuse-camel-k-test-container-main-camel-k-ocp4.14-lp-interop-camel-k-interop-aws/1693503447845834752 : Undiagnosed panic detected in pod expand_less0s{ pods/openshift-console-operator_console-operator-6bf69485c8-2s498_console-operator.log.gz:E0821 06:29:09.418195 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}{code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} intermittent. But seems to happen every few days or so: $ podman run -it corbinu/alpine-w3m -dump -cols 200 "https://search.ci.openshift.org/?search=openshift-console-operator_console-operator.*invalid+memory+address+or+nil+pointer+dereference&maxAge=336h&context=1&type=junit&name=.*4.14.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job" [openshift-console-op] [14d] [Links ] [junit ] [Search] Job: [.*4.14.* ] [ ] [5 ] [20971520 ] [job ] [ ] Wrap lines periodic-ci-jboss-fuse-camel-k-test-container-main-camel-k-ocp4.14-lp-interop-camel-k-interop-aws (all) - 2 runs, 0% failed, 50% of runs match #1693503447845834752 junit 4 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6bf69485c8-2s498_console-operator.log.gz:E0821 06:29:09.418195 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-cpu-partitioning (all) - 54 runs, 7% failed, 25% of failures match = 2% impact #1692398384507260928 junit 7 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6bf69485c8-tppxt_console-operator.log.gz:E0818 05:14:35.489038 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-fips-serial (all) - 2 runs, 0% failed, 50% of runs match #1692131892385550336 junit 8 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6bf69485c8-lr54b_console-operator.log.gz:E0817 11:36:17.381961 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-nightly-4.14-e2e-gcp-ovn-upi (all) - 2 runs, 50% failed, 100% of failures match = 50% impact #1692132131523792896 junit 8 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6bf69485c8-ztc7h_console-operator.log.gz:E0817 11:48:41.058274 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-shared-vpc-phz-techpreview (all) - 42 runs, 12% failed, 20% of failures match = 2% impact #1691881846972878848 junit 8 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6bf69485c8-ppmnm_console-operator.log.gz:E0816 19:12:20.716903 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-knative-serverless-operator-main-ocp4.14-lp-interop-operator-e2e-interop-aws-ocp414 (all) - 4 runs, 50% failed, 50% of failures match = 25% impact #1691835547812630528 junit 9 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6bf69485c8-rm6fk_console-operator.log.gz:E0816 16:04:44.717384 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-nightly-4.14-e2e-gcp-ovn-rt (all) - 24 runs, 21% failed, 20% of failures match = 4% impact #1691779459859877888 junit 9 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6df6498794-wqjk6_console-operator.log.gz:E0816 12:19:02.860700 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-cluster-etcd-operator-release-4.14-periodics-e2e-aws-etcd-recovery (all) - 14 runs, 100% failed, 7% of failures match = 7% impact #1691303819960389632 junit 10 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-9f9fc5c5f-54xpb_console-operator_previous.log.gz:E0815 04:49:27.747727 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-proxy (all) - 34 runs, 24% failed, 13% of failures match = 3% impact #1691023830476132352 junit 11 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6849dd7f6b-4r246_console-operator.log.gz:E0814 10:21:13.247983 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn (all) - 29 runs, 28% failed, 13% of failures match = 3% impact #1690908928604377088 junit 11 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-bfcc5bfc7-qndlf_console-operator.log.gz:E0814 03:03:21.007675 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-csi (all) - 24 runs, 13% failed, 33% of failures match = 4% impact #1690909013211877376 junit 11 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-bfcc5bfc7-6g886_console-operator.log.gz:E0814 02:43:39.815176 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) periodic-ci-openshift-release-master-nightly-4.14-e2e-vsphere-zones (all) - 2 runs, 0% failed, 50% of runs match #1690604928210309120 junit 12 days ago # Undiagnosed panic detected in pod pods/openshift-console-operator_console-operator-6bf69485c8-sfnjf_console-operator.log.gz:E0813 06:29:07.539267 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) Found in 0.10% of runs (0.24% of failures) across 12026 total runs and 737 jobs (41.81% failed) in 1.057s - clear search | chart view - source code located on github {code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} {code} Expected results: {code:none} {code} Additional info: {code:none} https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-jboss-fuse-camel-k-test-container-main-camel-k-ocp4.14-lp-interop-camel-k-interop-aws/1693503447845834752/artifacts/camel-k-interop-aws/gather-extra/artifacts/pods/openshift-console-operator_console-operator-6bf69485c8-2s498_console-operator.log has the stack trace: E0821 06:29:09.418195 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 262 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x26aba20?, 0x475f720}) /go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x733dad?}) /go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x26aba20, 0x475f720}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/console-operator/pkg/console/controllers/util.IncludeNamesFilter.func1({0x27daba0?, 0xc00045a900?}) /go/src/github.com/openshift/console-operator/pkg/console/controllers/util/util.go:34 +0x16f k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnDelete({0xc0005e82d0?, {0x2f66350?, 0xc000d21b18?}}, {0x27daba0, 0xc00045a900}) /go/src/github.com/openshift/console-operator/vendor/k8s.io/client-go/tools/cache/controller.go:327 +0x46 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go/src/github.com/openshift/console-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:978 +0xaf k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x475d67?, {0x2f3e8e0, 0xc000da8b40}, 0x1, 0xc000df8060) /go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc000fe8788?) /go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 k8s.io/client-go/tools/cache.(*processorListener).run(0xc000309710) /go/src/github.com/openshift/console-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:967 +0x6b k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x5a created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start /go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x85{code} Status: Verified Comment 22890188 by Yanping Zhang at 2023-08-30T09:36:59.205+0000 [~jhadvig@redhat.com] Seems the panic error is not the same as in bug OCPBUGS-17422. As I still see many jobs with the panic error after search with: [https://search.ci.openshift.org/?search=openshift-console-operator_console-operator.*invalid+memory+address+or+nil+pointer+dereference&maxAge=336h&context=1&type=junit&name=.*4.14.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] Comment 22906813 by Yanping Zhang at 2023-09-01T09:27:08.721+0000 Jakub, thanks. I clicked into each jobs with the panic error, they were all using payload builds before Aug 16. The jobs using build with fix code have no this panic error. Move the bug to Verified. Comment 24627487 by OpenShift Jira Automation Bot at 2024-04-29T16:48:33.620+0000 | |||
#OCPBUGS-29123 | issue | 7 weeks ago | [IBMCloud] Unhandled response during destroy disks Verified |
Issue 15788238: [IBMCloud] Unhandled response during destroy disks Description: This is a clone of issue OCPBUGS-20085. The following is the description of the original issue: --- Description of problem: {code:none} During the destroy cluster operation, unexpected results from the IBM Cloud API calls for Disks can result in panics when response data (or responses) are missing, resulting in unexpected failures during destroy.{code} Version-Release number of selected component (if applicable): {code:none} 4.15{code} How reproducible: {code:none} Unknown, dependent on IBM Cloud API responses{code} Steps to Reproduce: {code:none} 1. Successfully create IPI cluster on IBM Cloud 2. Attempt to cleanup (destroy) the cluster {code} Actual results: {code:none} Golang panic attempting to parse a HTTP response that is missing or lacking data. level=info msg=Deleted instance "ci-op-97fkzvv2-e6ed7-5n5zg-master-0" E0918 18:03:44.787843 33 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 228 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x6a3d760?, 0x274b5790}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xfffffffe?}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x6a3d760, 0x274b5790}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:84 +0x12a github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).Retry(0xc000791ce0, 0xc000573700) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:99 +0x73 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion(0xc000791ce0, {{0xc00160c060, 0x29}, {0xc00160c090, 0x28}, {0xc0016141f4, 0x9}, {0x82b9f0d, 0x4}, {0xc00160c060, ...}}) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:78 +0x14f github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDisks(0xc000791ce0) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:118 +0x485 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:201 +0x3f k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x7f7801e503c8, 0x18}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:109 +0x1b k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x227a2f78?, 0xc00013c000?}, 0xc000a9b690?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:154 +0x57 k8s.io/apimachinery/pkg/util/wait.poll({0x227a2f78, 0xc00013c000}, 0xd0?, 0x146fea5?, 0x7f7801e503c8?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:245 +0x38 k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x227a2f78, 0xc00013c000}, 0x4136e7?, 0x28?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:229 +0x49 k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x100000000000000?, 0x806f00?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:214 +0x46 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc000791ce0, {{0x82bb9a3?, 0xc000a9b7d0?}, 0xc000111de0?}, 0x840366?, 0xc00054e900?) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:198 +0x108 created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xa87 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference{code} Expected results: {code:none} Destroy IBM Cloud Disks during cluster destroy, or provide a useful error message to follow up on.{code} Additional info: {code:none} The ability to reproduce is relatively low, as it requires the IBM Cloud API's to return specific data (or lack there of), which is currently unknown why the HTTP respoonse and/or data is missing. IBM Cloud already has a PR to attempt to mitigate this issue, like done with other destroy resource calls. Potentially followup for additional resources as necessary. https://github.com/openshift/installer/pull/7515{code} Status: Verified | |||
#OCPBUGS-30795 | issue | 10 days ago | cpu_util test prometheus queries intermittently return empty query results ASSIGNED |
Issue 15869699: cpu_util test prometheus queries intermittently return empty query results Description: Description of problem:{code:none} when cpu_util test runs, sometimes prometheus starts returning empty query results. It appears to be influenced by the workload size (percentage of isolated CPUs). For SPR-EE bm, 85% workload seems to work fine For Ice Lake bm, prometheus often works properly, but not always, at a much lower 40% workload Once prometheus starts returning empty results, the pods must be restarted for subsequent queries to return results, given new data has been gathered since restarting the pods. This seems to start with the must-gather stage of the test suite most of the time. Symptom in the Jenkins job log: 2024/03/08 17:58:52 run command 'oc [adm must-gather]' 2024/03/08 18:08:09 Command in prom pod: [bash -c curl "-s" 'http://localhost:9090/api/v1/query' --data-urlencode 'query=max_over_time((sum(namedprocess_namegroup_cpu_rate{groupname!~"conmon"})+sum(pod:container_cpu_usage:sum{pod!~"process-exp.*",pod!~"oslat.*",pod!~"stress.*",pod!~"cnfgotestpriv.*"}))[9m26s:30s])'; echo] 2024/03/08 18:08:09 output: {"status":"success","data":{"resultType":"vector","result":[]}} 2024/03/08 18:08:09 Query max over time total mgmt cpu usage 2024/03/08 18:08:09 Must-gather dirs to be removed: [/var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/cpu/must-gather.local.4674322353555706431] [PANICKED] Test Panicked In [It] at: /usr/local/go/src/runtime/panic.go:113 @ 03/08/24 18:08:09.407 runtime error: index out of range [0] with length 0 Full Stack Trace gitlab.cee.redhat.com/cnf/cnf-gotests/test/ran/cpu/tests.checkCPUUsage(0x83fb46d7e8, 0x4, {0x3699cc0, 0xa}) /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/cpu/tests/sno_cpu_utilization.go:200 +0x16af gitlab.cee.redhat.com/cnf/cnf-gotests/test/ran/cpu/tests.glob..func1.5.4.2() /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/cpu/tests/sno_cpu_utilization.go:156 +0x165 < Exit [It] should use less than 2 core(s) - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/cpu/tests/sno_cpu_utilization.go:144 @ 03/08/24 18:08:09.408 (9m27.397s) > Enter [AfterEach] Management CPU utilization with workload pods running - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/cpu/tests/sno_cpu_utilization.go:115 @ 03/08/24 18:08:09.408 < Exit [AfterEach] Management CPU utilization with workload pods running - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/cpu/tests/sno_cpu_utilization.go:115 @ 03/08/24 18:08:09.408 (0s) > Enter [ReportAfterEach] TOP-LEVEL - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/cpu/cpu_suite_test.go:66 @ 03/08/24 18:08:09.408 < Exit [ReportAfterEach] TOP-LEVEL - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/cpu/cpu_suite_test.go:66 @ 03/08/24 18:08:09.408 (0s) • [PANICKED] [567.732 seconds] empty results can be confirmed when running the query manually in the prometheus-k8s pod. {code} Version-Release number of selected component (if applicable):{code:none} As far back at least to 4.15.0-rc.2 and forward into 4.16 currently nightlies {code} How reproducible:{code:none} Always or often depending on workload percentage and bm used {code} Steps to Reproduce:{code:none} 1. Deploy SNO with Telco DU profile 2. Run cpu_util test 3. Observe test logs to monitor for error conditions that occur after prometheus on the spoke starts returning empty results. {code} Actual results:{code:none} prometheus queries stop responding which prevents metrics gathering by the test suite {code} Expected results:{code:none} prometheus query should always work with a large workload ~85% or so {code} Additional info:{code:none} {code} Status: ASSIGNED | |||
#OCPBUGS-32750 | issue | 10 days ago | OCB os-builder and controller pods panic when we opt-in and wait for the new image to be built POST |
Issue 15962201: OCB os-builder and controller pods panic when we opt-in and wait for the new image to be built Description: Description of problem:{code:none} When we enable techpreview and we create a MachineOsConfig resource for the worker pool, the controller pod and the os-builder pods panic. {code} Version-Release number of selected component (if applicable):{code:none} pre-merge https://github.com/openshift/machine-config-operator/pull/4327 {code} How reproducible:{code:none} Always {code} Steps to Reproduce:{code:none} 1. enable techpreview oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}' 2. Create a MachineOsConfig resource for the worker pool oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: MachineOSConfig metadata: name: worker spec: machineConfigPool: name: worker buildInputs: imageBuilder: imageBuilderType: PodImageBuilder baseImagePullSecret: name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy") renderedImagePushSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest" EOF {code} Actual results:{code:none} The build pod is triggered, and once it finishes the os-builder pod and the controller pod panic {code} Expected results:{code:none} No pod should panic {code} Additional info:{code:none} {code} Status: POST Comment 24598036 by Sergio Regidor de la Rosa at 2024-04-24T12:01:10.419+0000 In the latest commit in the PR the panic has been fixed https://github.com/openshift/machine-config-operator/commit/fbbb0ab69ff5f8412279e9b061ea741f7b3d298d | |||
#OCPBUGS-32185 | issue | 4 days ago | ipv6 installs timing out in CI CLOSED |
{ pods/openshift-etcd-operator_etcd-operator-c6d6b8b-czln9_etcd-operator_previous.log.gz:E0409 10:08:08.921203 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)} Comment 24599951 by Dan Winship at 2024-04-24T14:31:23.537+0000 Yeah, I filed a bug about the panic too (OCPBUGS-32176), but cluster-etcd-operator recovers from it anyway. Comment 24635350 by Mahnoor Asghar at 2024-04-30T10:15:47.321+0000 | |||
#OCPBUGS-22474 | issue | 3 weeks ago | When tuned pod gets restarted tuning gets reapplied even though no configuration changes are required ASSIGNED |
Comment 23359802 by Marius Cornea at 2023-10-27T09:26:48.562+0000 I initially noticed an issue when making a sysctl(kernel.hung_task_timeout_secs) change which resulted in the node locking due to rcu stalls. Then I tried restarting the pod without making any change which lead to the same result of kernel panic. | |||
#OCPBUGS-30342 | issue | 2 weeks ago | numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference CLOSED |
Issue 15860467: numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference Description: This is a clone of issue OCPBUGS-30236. The following is the description of the original issue: --- Description of problem: {code:none} Pod numaresources-controller-manager is in CrashLoopBackOff state{code} {code:none} oc get po -n openshift-numaresources NAME READY STATUS RESTARTS AGE numaresources-controller-manager-766c55596b-9nb6b 0/1 CrashLoopBackOff 163 (3m52s ago) 14h secondary-scheduler-85959757db-dvpdj 1/1 Running 0 14h{code} {code:none} oc logs -n openshift-numaresources numaresources-controller-manager-766c55596b-9nb6b ... I0305 07:32:51.102133 1 shared_informer.go:341] caches populated I0305 07:32:51.102210 1 controller.go:220] "Starting workers" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" worker count=1 I0305 07:32:51.102295 1 kubeletconfig_controller.go:69] "Starting KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102412 1 panic.go:884] "Finish KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102448 1 controller.go:115] "Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" KubeletConfig="autosizing-master" namespace="" name="autosizing-master" reconcileID="91d2c547-993c-4ae1-beab-1afc0a72af68" panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1778a1c] goroutine 481 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa panic({0x19286e0, 0x2d16fc0}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift-kni/numaresources-operator/pkg/kubeletconfig.MCOKubeletConfToKubeletConf(...) /remote-source/app/pkg/kubeletconfig/kubeletconfig.go:29 github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).reconcileConfigMap(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, 0xc000226c40, {{0x0?, 0xc000e31260?}, {0xc000b98498?, 0x2de08f8?}}) /remote-source/app/controllers/kubeletconfig_controller.go:126 +0x11c github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).Reconcile(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, {{{0x0, 0x0}, {0xc000b98498, 0x11}}}) /remote-source/app/controllers/kubeletconfig_controller.go:90 +0x3cd sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1e4a2e0?, {0x1e475f0?, 0xc000e31260?}, {{{0x0?, 0xb?}, {0xc000b98498?, 0x0?}}}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004446e0, {0x1e47548, 0xc0003520f0}, {0x19b9940?, 0xc00093a1a0?}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004446e0, {0x1e47548, 0xc0003520f0}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587 {code} Version-Release number of selected component (if applicable): {code:none} numaresources-operator.v4.15.0 {code} How reproducible: {code:none} so far 100% {code} Steps to Reproduce: {code:none} 1. Create a KubeletConfig that configures autosizing: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: autosizing-master spec: autoSizingReserved: true machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: "" 2. Create a performance profile that targets subset of nodes 3. Proceed with numaresources-operator installation {code} Actual results: {code:none} Pod in CrashLoopBackOff state {code} Expected results: {code:none} numaresources-operator is successfully installed {code} Additional info: {code:none} Baremetal dualstack cluster deployed with GitOps-ZTP {code} Status: CLOSED | |||
#OCPBUGS-27422 | issue | 3 days ago | Invalid memory address or nil pointer dereference in Cloud Network Config Controller Verified |
Issue 15731260: Invalid memory address or nil pointer dereference in Cloud Network Config Controller Description: Description of problem: {code:none} Invalid memory address or nil pointer dereference in Cloud Network Config Controller {code} Version-Release number of selected component (if applicable): {code:none} 4.12{code} How reproducible: {code:none} sometimes{code} Steps to Reproduce: {code:none} 1. Happens by itself sometimes 2. 3. {code} Actual results: {code:none} Panic and pod restarts{code} Expected results: {code:none} Panics due to Invalid memory address or nil pointer dereference should not occur{code} Additional info: {code:none} E0118 07:54:18.703891 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 93 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x203c8c0?, 0x3a27b20}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x203c8c0, 0x3a27b20}) /usr/lib/golang/src/runtime/panic.go:884 +0x212 github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0) /go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0 github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25 created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1a40b30] goroutine 93 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7 panic({0x203c8c0, 0x3a27b20}) /usr/lib/golang/src/runtime/panic.go:884 +0x212 github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0) /go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0 github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25 created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa{code} Status: Verified | |||
#OCPBUGS-24417 | issue | 2 weeks ago | Installer panic when passing non-existant OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE image New |
Issue 15657078: Installer panic when passing non-existant OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE image Description: Description of problem: I'm overriding the OS image, however in this case the "rhcos-4.15" image doesn't exist in the cloud: {code:none} export OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE="rhcos-4.15" {code} {code:none} nerc-dev ❯ openstack image show rhcos-4.15 No Image found for rhcos-4.15 {code} The installer will fail with a stacktrace {code:none} ERROR ERROR Error: Your query returned no results. Please change your search criteria and try again. ERROR ERROR with data.openstack_images_image_v2.base_image, ERROR on main.tf line 29, in data "openstack_images_image_v2" "base_image": ERROR 29: data "openstack_images_image_v2" "base_image" { ERROR panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x194e88c] goroutine 1 [running]: github.com/openshift/installer/pkg/asset.PersistToFile({0x229da6c0?, 0x2799e520?}, {0x7fff7c1d994f, 0xf}) /go/src/github.com/openshift/installer/pkg/asset/asset.go:57 +0xac github.com/openshift/installer/pkg/asset.(*fileWriterAdapter).PersistToFile(0x22971e40?, {0x7fff7c1d994f?, 0x2799e520?}) /go/src/github.com/openshift/installer/pkg/asset/filewriter.go:19 +0x31 main.runTargetCmd.func1({0x7fff7c1d994f, 0xf}) /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:277 +0x24a main.runTargetCmd.func2(0x27830180?, {0xc000d92c30?, 0x3?, 0x3?}) /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:302 +0xe7 github.com/spf13/cobra.(*Command).execute(0x27830180, {0xc000d92ba0, 0x3, 0x3}) /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:944 +0x847 github.com/spf13/cobra.(*Command).ExecuteC(0xc000157200) /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:1068 +0x3bd github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:992 main.installerMain() /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:56 +0x2b0 main.main() /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:33 +0xff {code} Version-Release number of selected component (if applicable): Seen with openshift-install-4.15 4.15.0-ec.2. This probably affects older versions too. How reproducible:{code:none} {code} Steps to Reproduce:{code:none} 1. 2. 3. {code} Actual results:{code:none} {code} Expected results:{code:none} {code} Additional info:{code:none} {code} Status: New In neither case does `create cluster` result in a panic, as of today. I am not overly concerned about older versions of Installer, because this has never been a supported option as far as I know. | |||
#OCPBUGS-27934 | issue | 3 months ago | panic in poller CLOSED |
Issue 15742717: panic in poller Description: This is a clone of issue OCPBUGS-27892. The following is the description of the original issue: --- Description of problem: {code:none} {code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} {code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} {code} Expected results: {code:none} {code} Additional info: {code:none} {code} | |||
#OCPBUGS-25364 | issue | 4 weeks ago | route-controller-manager pod panics creating cache New |
Issue 15677423: route-controller-manager pod panics creating cache Description: Description of problem: Observed in 4.12.19 ROSA cluster and raised as a ClusterOperatorDown alert for the openshift-controller-manager. Issue appears to have began shortly after installation completed; the cluster was potentially never healthy. Upon investigating, it was found that all {{route-controller-manager}} pods were in a CrashLoopBackoff state. Each of their logs contained only the following: {code:java} $ oc logs route-controller-manager-7c6d8d8b66-nqxhk -n openshift-route-controller-manager -p I1213 20:30:25.500136 1 controller_manager.go:26] Starting controllers on 0.0.0.0:8443 (4.12.0-202305101515.p0.g9e74d17.assembly.stream-9e74d17) unexpected fault address 0xcc001427540 fatal error: fault [signal SIGSEGV: segmentation violation code=0x1 addr=0xcc001427540 pc=0x221fad4] goroutine 1 [running]: runtime.throw({0x2f287c5?, 0xc00091ed78?}) runtime/panic.go:1047 +0x5d fp=0xc00091ed60 sp=0xc00091ed30 pc=0x109821d runtime.sigpanic() runtime/signal_unix.go:842 +0x2c5 fp=0xc00091edb0 sp=0xc00091ed60 pc=0x10af465 k8s.io/apiserver/pkg/authentication/token/cache.newStripedCache(0x20, 0x3089620, 0xc00091ee78) k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cache_striped.go:37 +0xf4 fp=0xc00091ee30 sp=0xc00091edb0 pc=0x221fad4 k8s.io/apiserver/pkg/authentication/token/cache.newWithClock({0x3298a60?, 0xc000226380}, 0x0, 0x45d964b800, 0x45d964b800, {0x32be670, 0x4756188}) k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cached_token_authenticator.go:112 +0xd4 fp=0xc00091eea0 sp=0xc00091ee30 pc=0x221ff94 k8s.io/apiserver/pkg/authentication/token/cache.New(...) k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cached_token_authenticator.go:91 github.com/openshift/route-controller-manager/pkg/cmd/controller/route.newRemoteAuthenticator({0x32a3f28, 0xc00043e600}, 0xc0005c4ea0, 0x30872f0?) github.com/openshift/route-controller-manager/pkg/cmd/controller/route/apiserver_authenticator.go:33 +0x1b9 fp=0xc00091f0f8 sp=0xc00091eea0 pc=0x28e8f79 github.com/openshift/route-controller-manager/pkg/cmd/controller/route.RunControllerServer({{{0x2f32887, 0xc}, {0x2f27140, 0x3}, {{0x2f62ce1, 0x25}, {0x2f62d06, 0x25}}, {0x2f7010f, 0x2b}, ...}, ...}, ...) github.com/openshift/route-controller-manager/pkg/cmd/controller/route/standalone_apiserver.go:45 +0x18d fp=0xc00091f5f8 sp=0xc00091f0f8 pc=0x28ebfcd github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.RunRouteControllerManager(0xc00052ed80, 0x4?, {0x32b8bd0, 0xc00043c1c0}) github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/controller_manager.go:28 +0x24b fp=0xc00091f8a0 sp=0xc00091f5f8 pc=0x28f6a4b github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.(*RouteControllerManager).StartControllerManager(0xc0008dc940, {0x32b8bd0, 0xc00043c1c0}) github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/cmd.go:119 +0x3f2 fp=0xc00091fa00 sp=0xc00091f8a0 pc=0x28f6752 github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.NewRouteControllerManagerCommand.func1(0xc0008bd900?, {0x2f279cd?, 0x2?, 0x2?}) github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/cmd.go:49 +0xc5 fp=0xc00091fb20 sp=0xc00091fa00 pc=0x28f5f65 github.com/spf13/cobra.(*Command).execute(0xc0008bd900, {0xc0008dcce0, 0x2, 0x2}) github.com/spf13/cobra@v1.4.0/command.go:860 +0x663 fp=0xc00091fbf8 sp=0xc00091fb20 pc=0x14a7483 github.com/spf13/cobra.(*Command).ExecuteC(0xc0008bc000) github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd fp=0xc00091fcb0 sp=0xc00091fbf8 pc=0x14a7b9d github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.4.0/command.go:902 k8s.io/component-base/cli.run(0xc0008bc000) k8s.io/component-base@v0.25.2/cli/run.go:146 +0x317 fp=0xc00091fd70 sp=0xc00091fcb0 pc=0x24c8917 k8s.io/component-base/cli.Run(0x32b8bd0?) k8s.io/component-base@v0.25.2/cli/run.go:46 +0x1d fp=0xc00091fdf0 sp=0xc00091fd70 pc=0x24c84fd main.main() github.com/openshift/route-controller-manager/cmd/route-controller-manager/main.go:28 +0x17f fp=0xc00091ff80 sp=0xc00091fdf0 pc=0x28f735f runtime.main() runtime/proc.go:250 +0x212 fp=0xc00091ffe0 sp=0xc00091ff80 pc=0x109ae52 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc00091ffe8 sp=0xc00091ffe0 pc=0x10cdb41 goroutine 2 [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000084fb0 sp=0xc000084f90 pc=0x109b216 runtime.goparkunlock(...) runtime/proc.go:369 runtime.forcegchelper() runtime/proc.go:302 +0xad fp=0xc000084fe0 sp=0xc000084fb0 pc=0x109b0ad runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x10cdb41 created by runtime.init.6 runtime/proc.go:290 +0x25 goroutine 3 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000085790 sp=0xc000085770 pc=0x109b216 runtime.goparkunlock(...) runtime/proc.go:369 runtime.bgsweep(0x0?) runtime/mgcsweep.go:297 +0xd7 fp=0xc0000857c8 sp=0xc000085790 pc=0x1083df7 runtime.gcenable.func1() runtime/mgc.go:178 +0x26 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x1078986 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x10cdb41 created by runtime.gcenable runtime/mgc.go:178 +0x6b goroutine 4 [GC scavenge wait]: runtime.gopark(0xc0000b2000?, 0x328cad0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000085f70 sp=0xc000085f50 pc=0x109b216 runtime.goparkunlock(...) runtime/proc.go:369 runtime.(*scavengerState).park(0x47246a0) runtime/mgcscavenge.go:389 +0x53 fp=0xc000085fa0 sp=0xc000085f70 pc=0x1081dd3 runtime.bgscavenge(0x0?) runtime/mgcscavenge.go:622 +0x65 fp=0xc000085fc8 sp=0xc000085fa0 pc=0x10823e5 runtime.gcenable.func2() runtime/mgc.go:179 +0x26 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x1078926 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x10cdb41 created by runtime.gcenable runtime/mgc.go:179 +0xaa goroutine 5 [finalizer wait]: runtime.gopark(0x4725820?, 0xc000009860?, 0x0?, 0x0?, 0xc000084770?) runtime/proc.go:363 +0xd6 fp=0xc000084628 sp=0xc000084608 pc=0x109b216 runtime.goparkunlock(...) runtime/proc.go:369 runtime.runfinq() runtime/mfinal.go:180 +0x10f fp=0xc0000847e0 sp=0xc000084628 pc=0x1077a0f runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x10cdb41 created by runtime.createfing runtime/mfinal.go:157 +0x45 goroutine 7 [GC worker (idle)]: runtime.gopark(0x1061c3d?, 0xc0003af980?, 0xa0?, 0x67?, 0xc0000867a8?) runtime/proc.go:363 +0xd6 fp=0xc000086750 sp=0xc000086730 pc=0x109b216 runtime.gcBgMarkWorker() runtime/mgc.go:1235 +0xf1 fp=0xc0000867e0 sp=0xc000086750 pc=0x107aad1 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x10cdb41 created by runtime.gcBgMarkStartWorkers runtime/mgc.go:1159 +0x25 goroutine 8 [GC worker (idle)]: runtime.gopark(0xd92381ca853?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000086f50 sp=0xc000086f30 pc=0x109b216 runtime.gcBgMarkWorker() runtime/mgc.go:1235 +0xf1 fp=0xc000086fe0 sp=0xc000086f50 pc=0x107aad1 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x10cdb41 created by runtime.gcBgMarkStartWorkers runtime/mgc.go:1159 +0x25 goroutine 22 [GC worker (idle)]: runtime.gopark(0xd92381ca42e?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000080750 sp=0xc000080730 pc=0x109b216 runtime.gcBgMarkWorker() runtime/mgc.go:1235 +0xf1 fp=0xc0000807e0 sp=0xc000080750 pc=0x107aad1 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x10cdb41 created by runtime.gcBgMarkStartWorkers runtime/mgc.go:1159 +0x25 goroutine 23 [GC worker (idle)]: runtime.gopark(0xd9238f02c33?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000080f50 sp=0xc000080f30 pc=0x109b216 runtime.gcBgMarkWorker() runtime/mgc.go:1235 +0xf1 fp=0xc000080fe0 sp=0xc000080f50 pc=0x107aad1 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x10cdb41 created by runtime.gcBgMarkStartWorkers runtime/mgc.go:1159 +0x25 goroutine 50 [GC worker (idle)]: runtime.gopark(0xd9238f09c5d?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000586750 sp=0xc000586730 pc=0x109b216 runtime.gcBgMarkWorker() runtime/mgc.go:1235 +0xf1 fp=0xc0005867e0 sp=0xc000586750 pc=0x107aad1 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0005867e8 sp=0xc0005867e0 pc=0x10cdb41 created by runtime.gcBgMarkStartWorkers runtime/mgc.go:1159 +0x25 goroutine 9 [GC worker (idle)]: runtime.gopark(0xd9238f03015?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000087750 sp=0xc000087730 pc=0x109b216 runtime.gcBgMarkWorker() runtime/mgc.go:1235 +0xf1 fp=0xc0000877e0 sp=0xc000087750 pc=0x107aad1 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x10cdb41 created by runtime.gcBgMarkStartWorkers runtime/mgc.go:1159 +0x25 goroutine 10 [GC worker (idle)]: runtime.gopark(0xd9238f0d9c7?, 0xc00058a000?, 0x18?, 0x14?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000087f50 sp=0xc000087f30 pc=0x109b216 runtime.gcBgMarkWorker() runtime/mgc.go:1235 +0xf1 fp=0xc000087fe0 sp=0xc000087f50 pc=0x107aad1 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x10cdb41 created by runtime.gcBgMarkStartWorkers runtime/mgc.go:1159 +0x25 goroutine 11 [GC worker (idle)]: runtime.gopark(0xd9238f0392d?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000582750 sp=0xc000582730 pc=0x109b216 runtime.gcBgMarkWorker() runtime/mgc.go:1235 +0xf1 fp=0xc0005827e0 sp=0xc000582750 pc=0x107aad1 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0005827e8 sp=0xc0005827e0 pc=0x10cdb41 created by runtime.gcBgMarkStartWorkers runtime/mgc.go:1159 +0x25 goroutine 29 [select, locked to thread]: runtime.gopark(0xc0005897a8?, 0x2?, 0x0?, 0x0?, 0xc0005897a4?) runtime/proc.go:363 +0xd6 fp=0xc000589618 sp=0xc0005895f8 pc=0x109b216 runtime.selectgo(0xc0005897a8, 0xc0005897a0, 0x0?, 0x0, 0x1?, 0x1) runtime/select.go:328 +0x7bc fp=0xc000589758 sp=0xc000589618 pc=0x10ab53c runtime.ensureSigM.func1() runtime/signal_unix.go:991 +0x1b0 fp=0xc0005897e0 sp=0xc000589758 pc=0x10af9b0 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0005897e8 sp=0xc0005897e0 pc=0x10cdb41 created by runtime.ensureSigM runtime/signal_unix.go:974 +0xbd goroutine 30 [syscall]: runtime.notetsleepg(0x0?, 0x0?) runtime/lock_futex.go:236 +0x34 fp=0xc000588fa0 sp=0xc000588f68 pc=0x10687b4 os/signal.signal_recv() runtime/sigqueue.go:152 +0x2f fp=0xc000588fc0 sp=0xc000588fa0 pc=0x10ca0ef os/signal.loop() os/signal/signal_unix.go:23 +0x19 fp=0xc000588fe0 sp=0xc000588fc0 pc=0x144cd59 runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc000588fe8 sp=0xc000588fe0 pc=0x10cdb41 created by os/signal.Notify.func1.1 os/signal/signal.go:151 +0x2a goroutine 31 [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:363 +0xd6 fp=0xc000582f00 sp=0xc000582ee0 pc=0x109b216 runtime.chanrecv(0xc0008e7680, 0x0, 0x1) runtime/chan.go:583 +0x49b fp=0xc000582f90 sp=0xc000582f00 pc=0x1062e9b runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:442 +0x18 fp=0xc000582fb8 sp=0xc000582f90 pc=0x1062998 k8s.io/apiserver/pkg/server.SetupSignalContext.func1() k8s.io/apiserver@v0.25.2/pkg/server/signal.go:48 +0x2b fp=0xc000582fe0 sp=0xc000582fb8 pc=0x24c82eb runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc000582fe8 sp=0xc000582fe0 pc=0x10cdb41 created by k8s.io/apiserver/pkg/server.SetupSignalContext k8s.io/apiserver@v0.25.2/pkg/server/signal.go:47 +0xe5 goroutine 32 [select]: runtime.gopark(0xc0005837a0?, 0x2?, 0x0?, 0x0?, 0xc000583764?) runtime/proc.go:363 +0xd6 fp=0xc0005835e0 sp=0xc0005835c0 pc=0x109b216 runtime.selectgo(0xc0005837a0, 0xc000583760, 0x0?, 0x0, 0x0?, 0x1) runtime/select.go:328 +0x7bc fp=0xc000583720 sp=0xc0005835e0 pc=0x10ab53c k8s.io/klog/v2.(*flushDaemon).run.func1() k8s.io/klog/v2@v2.80.1/klog.go:1135 +0x11e fp=0xc0005837e0 sp=0xc000583720 pc=0x118f27e runtime.goexit() runtime/asm_amd64.s:1594 +0x1 fp=0xc0005837e8 sp=0xc0005837e0 pc=0x10cdb41 created by k8s.io/klog/v2.(*flushDaemon).run k8s.io/klog/v2@v2.80.1/klog.go:1131 +0x17b {code} Attempting to reproduce this in other workloads by deploying images which also invoke {{cache.New}} has failed: so far no other pod on the cluster is known to be crashlooping. SRE attempted to restart the nodes running the affected pods. After booting, the {{route-controller-manager}} pod was able to run for a short time, but eventually re-entered a CrashLoopBackoff state. Logs did not change after rebooting. Version-Release number of selected component (if applicable): {code:none} 4.12.19 {code} How reproducible: {code:none} Unsure: only observed on a single cluster, but very reproducible on that cluster {code} Steps to Reproduce: {code:none} 1. Install cluster 2. Observe route-controller-manager pods crashlooping {code} Actual results: {code:none} $ oc get po -n openshift-route-controller-manager NAME READY STATUS RESTARTS AGE route-controller-manager-7c6d8d8b66-nqxhk 0/1 CrashLoopBackOff 26 (4m13s ago) 113m route-controller-manager-7c6d8d8b66-qspnx 0/1 CrashLoopBackOff 26 (4m18s ago) 113m route-controller-manager-7c6d8d8b66-twtm8 0/1 CrashLoopBackOff 26 (4m7s ago) 113m {code} Expected results: {code:none} All route-controller-manager pods are running {code} Status: New | |||
#OCPBUGS-32702 | issue | 5 days ago | operator panics in hosted cluster with OVN when obfuscation is enabled ON_QA |
Issue 15960547: operator panics in hosted cluster with OVN when obfuscation is enabled Description: Description of problem: {code:none} The operator panics in HyperShift hosted cluster with OVN and with enabled networking obfuscation: {code} {noformat} 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 858 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x26985e0?, 0x454d700}) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0010d67e0?}) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x26985e0, 0x454d700}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/insights-operator/pkg/anonymization.getNetworksFromClusterNetworksConfig(...) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:292 github.com/openshift/insights-operator/pkg/anonymization.getNetworksForAnonymizer(0xc000556700, 0xc001154ea0, {0x0, 0x0, 0x0?}) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:253 +0x202 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).readNetworkConfigs(0xc0005be640) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:180 +0x245 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).AnonymizeMemoryRecord.func1() /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:354 +0x25 sync.(*Once).doSlow(0xc0010d6c70?, 0x21a9006?) /usr/lib/golang/src/sync/once.go:74 +0xc2 sync.(*Once).Do(...) /usr/lib/golang/src/sync/once.go:65 github.com/openshift/insights-operator/pkg/anonymization.(*Anonymizer).AnonymizeMemoryRecord(0xc0005be640, 0xc000cf0dc0) /go/src/github.com/openshift/insights-operator/pkg/anonymization/anonymizer.go:353 +0x78 github.com/openshift/insights-operator/pkg/recorder.(*Recorder).Record(0xc00075c4b0, {{0x2add75b, 0xc}, {0x0, 0x0, 0x0}, {0x2f38d28, 0xc0009c99c0}}) /go/src/github.com/openshift/insights-operator/pkg/recorder/recorder.go:87 +0x49f github.com/openshift/insights-operator/pkg/gather.recordGatheringFunctionResult({0x2f255c0, 0xc00075c4b0}, 0xc0010d7260, {0x2adf900, 0xd}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:157 +0xb9c github.com/openshift/insights-operator/pkg/gather.collectAndRecordGatherer({0x2f50058?, 0xc001240c90?}, {0x2f30880?, 0xc000994240}, {0x2f255c0, 0xc00075c4b0}, {0x0?, 0x8dcb80?, 0xc000a673a2?}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:113 +0x296 github.com/openshift/insights-operator/pkg/gather.CollectAndRecordGatherer({0x2f50058, 0xc001240c90}, {0x2f30880, 0xc000994240?}, {0x2f255c0, 0xc00075c4b0}, {0x0, 0x0, 0x0}) /go/src/github.com/openshift/insights-operator/pkg/gather/gather.go:89 +0xe5 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Gather.func2(0xc000a678a0, {0x2f50058, 0xc001240c90}, 0xc000796b60, 0x26f0460?) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:206 +0x1a8 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Gather(0xc000796b60) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:222 +0x450 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).periodicTrigger(0xc000796b60, 0xc000236a80) /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:265 +0x2c5 github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Run.func1() /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:161 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00007d7c0?, {0x2f282a0, 0xc0012cd800}, 0x1, 0xc000236a80) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001381fb0?, 0x3b9aca00, 0x0, 0x0?, 0x449705?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0xabfaca?, 0x88d6e6?, 0xc00078a360?) /go/src/github.com/openshift/insights-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x25 created by github.com/openshift/insights-operator/pkg/controller/periodic.(*Controller).Run /go/src/github.com/openshift/insights-operator/pkg/controller/periodic/periodic.go:161 +0x1ea {noformat} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} Enable networking obfuscation for the Insights Operator and wait for gathering to happen in the operator. You will see the above stacktrace. {code} Steps to Reproduce: {code:none} 1. Create a HyperShift hosted cluster with OVN 2. Enable networking obfuscation for the Insights Operator 3. Wait for data gathering to happen in the operator {code} Actual results: {code:none} operator panics{code} Expected results: {code:none} there's no panic{code} Additional info: {code:none} {code} Status: Verified | |||
#OCPBUGS-13207 | issue | 10 months ago | Kernel panic after rebuilding the initrd in os layer (waiting on Dracut) CLOSED |
Issue 15245137: Kernel panic after rebuilding the initrd in os layer (waiting on Dracut) Description: High level objective we are trying to achieve is to update kernel modules for storage and network with OS layering. We are able to install driver RPM as a OS layer but *rebuilding the initrd is failing with kernel panic when OS is booted.* To isolate the issue, we made a CoreOS layer that does nothing but rebuilding the initrd without adding any kernel module. But we observed kernel panic after reboot. *Detailed steps* *--------------------* (1) Created a Docker file. Here's the content of the Containerfile {code:java} # oc adm release info --image-for rhel-coreos-8 4.12.14 FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:90625d9f3713874b50b91b59b61d3a9d7c39c61ef38da8147c5d9203edfe50a9 RUN KERNEL=$(rpm -q kernel | cut -c 8-); dracut --reproducible -v --add 'ostree' -f --no-hostonly --omit-drivers 'nouveau' --omit 'nfs' --add 'iscsi' --add 'ifcfg' --add 'fips' --omit 'network-legacy' /lib/modules/$KERNEL/initramfs.img $KERNEL {code} (2) Built the image using docker file. We got some warning/error messages while building. 'Build log is attached, (3) Created a Machine Config: {quote}apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: initrd-trials spec: osImageURL: quay.io/pkoppa0/initrd_trials {quote} (4) ** Kernel panic was seen after reboot. Screenshot attached. Status: CLOSED Comment 22226781 by Paniraja Koppa at 2023-05-08T06:19:11.174+0000 Kernel Panic messages ------------------------------ 001 5.200823] Kernel panic - not syncing: Attempted to kill init! exitcode=0×0000000b 5.2008231 5.2640651 panic+0xe?/0xZac 5.2730791 5.5156031 ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0×0000000b Comment 22231239 by Joseph Marrero Corchado at 2023-05-08T17:33:33.580+0000 | |||
#OCPBUGS-32732 | issue | 11 days ago | inactivityTimeoutSeconds parameter intermittently does not work during oauthaccesstoken creation New |
Issue 15960888: inactivityTimeoutSeconds parameter intermittently does not work during oauthaccesstoken creation Description: *Description of problem:* {color:#de350b}inactivityTimeoutSeconds{color} parameter intermittently does not work during oauthaccesstoken creation. This results in the parameter being missing from the token. *Version-Release number of selected component (if applicable):* 4.16.0-0.nightly-2024-04-22-023835 *How reproducible:* Intermittent *Steps to Reproduce:* *1.* Update the OAuth cluster configuration parameter to {color:#de350b}accessTokenInactivityTimeout: 300 {color} *2.* seconds and wait for the cluster to restart. *3.* Log in with a normal user and get a user token. *4.* Check if the token includes {color:#de350b}inactivityTimeoutSeconds: 300{color}. issue observed Intermittently 1 or 2 time out of 10 time token creation. Further command and logs are under additional Info part. *Actual results:* The {color:#de350b}inactivityTimeoutSeconds: 300{color} parameter is missing from the created user token after enable *Expected results:* The user token should include the{color:#de350b} inactivityTimeoutSeconds: 300{color} parameter. *Additional Info:* Observed {color:#de350b}"APIServer panic'd: net/http: abort Handler"{color} error in audit failure logs during same timestamps Audit logs: [attached|https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-aws-ipi-disc-priv-tp-amd-f9-destructive/1777161570032291840/artifacts/aws-ipi-disc-priv-tp-amd-f9-destructive/gather-must-gather/artifacts/must-gather.tar] Must-gather logs: [attached|https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-aws-ipi-disc-priv-tp-amd-f9-destructive/1777161570032291840/artifacts/aws-ipi-disc-priv-tp-amd-f9-destructive/gather-must-gather/artifacts/must-gather.tar] CI failure logs : [Attached|https://reportportal-openshift.apps.ocp-c1.prod.psi.redhat.com/ui/#prow/launches/1260/538576/65717790/65718204/log?item0Params=filter.eq.hasStats%3Dtrue%26filter.eq.hasChildren%3Dfalse%26filter.in.type%3DSTEP%26launchesLimit%3D500%26isLatest%3Dfalse%26filter.in.status%3DFAILED%252CINTERRUPTED%26page.sort%3DstartTime%252CASC%26filter.cnt.name%3DAuthentication%26page.page%3D1%26filter.%2521cnt.issueComment%3DAnalyzedDone%26filter.btw.startTime%3D-10080%253B1440%253B%252B0800%26filter.in.issueType%3Dti_s4scyws6guht%C2%A0] Token creation logs, do't not have {color:#de350b}inactivityTimeoutSeconds{color} : {code:java} # grep -hir 'sha256~YW0..<snip>..AnZo3jSbM' . | jq { "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "RequestResponse", "auditID": "99b8a965-f9d1-474c-a55d-d383d7229131", "stage": "ResponseComplete", "requestURI": "/apis/oauth.openshift.io/v1/oauthaccesstokens", "verb": "create", "user": { "username": "system:serviceaccount:openshift-authentication:oauth-openshift", "groups": [ "system:serviceaccounts", "system:serviceaccounts:openshift-authentication", "system:authenticated" ], "extra": { "authentication.kubernetes.io/node-name": [ "ip-10-0-63-74.ec2.internal" ], "authentication.kubernetes.io/node-uid": [ "b30ae22f-a47d-4e1c-90e5-e6a752e9152a" ], "authentication.kubernetes.io/pod-name": [ "oauth-openshift-5dfdb8498b-2mwnp" ], "authentication.kubernetes.io/pod-uid": [ "7ec7803a-7172-4fff-9a47-f634b9d638ab" ] } }, "sourceIPs": [ "10.0.63.74", "10.130.0.2" ], "userAgent": "oauth-server/v0.0.0 (linux/amd64) kubernetes/$Format", "objectRef": { "resource": "oauthaccesstokens", "name": "sha256~YW0..<snip>..AnZo3jSbM", "apiGroup": "oauth.openshift.io", "apiVersion": "v1" }, "responseStatus": { "metadata": {}, "code": 201 }, "requestObject": { "kind": "OAuthAccessToken", "apiVersion": "oauth.openshift.io/v1", "metadata": { "name": "sha256~YW0..<snip>..AnZo3jSbM", "creationTimestamp": null }, "clientName": "openshift-challenging-client", "expiresIn": 86400, "scopes": [ "user:full" ], "redirectURI": "https://oauth-openshift.apps.ci-op-2r89prs7-68d86.qe.devcluster.openshift.com/oauth/token/implicit", "userName": "testuser-1", "userUID": "496de345-40c3-4f22-afcb-223c8ea9cfe0", "authorizeToken": "sha256~d7LPmIBv-BhG6D76PDTYDY3LF76pFcJUzv29emH3F3A" }, "responseObject": { "kind": "OAuthAccessToken", "apiVersion": "oauth.openshift.io/v1", "metadata": { "name": "sha256~YW0..<snip>..AnZo3jSbM", "uid": "484cc097-0434-4d4f-a356-3c2bb0902116", "resourceVersion": "739908", "creationTimestamp": "2024-04-08T16:33:16Z" }, "clientName": "openshift-challenging-client", "expiresIn": 86400, "scopes": [ "user:full" ], "redirectURI": "https://oauth-openshift.apps.ci-op-2r89prs7-68d86.qe.devcluster.openshift.com/oauth/token/implicit", "userName": "testuser-1", "userUID": "496de345-40c3-4f22-afcb-223c8ea9cfe0", "authorizeToken": "sha256~d7LPmIBv-BhG6D76PDTYDY3LF76pFcJUzv29emH3F3A" }, "requestReceivedTimestamp": "2024-04-08T16:33:16.973025Z", "stageTimestamp": "2024-04-08T16:33:16.985994Z", "annotations": { "authorization.k8s.io/decision": "allow", "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"system:openshift:openshift-authentication\" of ClusterRole \"cluster-admin\" to ServiceAccount \"oauth-openshift/openshift-authentication\"" } } {code} User useroauthaccesstokens Apiserver Panic logs during that timestamp: {code:java} # grep -hir '2024-04-08T16:33' . | egrep -i 'fail|error' | grep oauthaccesstokens | jq { "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Metadata", "auditID": "05bfb2a0-41fa-470a-a052-f5d6d1d537ba", "stage": "Panic", "requestURI": "/apis/oauth.openshift.io/v1/useroauthaccesstokens?allowWatchBookmarks=true&resourceVersion=705601&timeout=8m19s&timeoutSeconds=499&watch=true", "verb": "watch", "user": { "username": "system:kube-controller-manager", "groups": [ "system:authenticated" ] }, "sourceIPs": [ "10.0.69.141" ], "userAgent": "kube-controller-manager/v1.29.3+e994e5d (linux/amd64) kubernetes/9ebebe1/kube-controller-manager", "objectRef": { "resource": "useroauthaccesstokens", "apiGroup": "oauth.openshift.io", "apiVersion": "v1" }, "responseStatus": { "metadata": {}, "status": "Failure", "message": "APIServer panic'd: net/http: abort Handler", "reason": "InternalError", "code": 500 }, "requestReceivedTimestamp": "2024-04-08T16:30:30.435578Z", "stageTimestamp": "2024-04-08T16:33:14.327166Z", "annotations": { "authorization.k8s.io/decision": "allow", "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"system:kube-controller-manager\" of ClusterRole \"system:kube-controller-manager\" to User \"system:kube-controller-manager\"" } } { "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Metadata", "auditID": "6b9a0d14-3605-4f70-9e3f-b87d89428bc2", "stage": "Panic", "requestURI": "/apis/oauth.openshift.io/v1/useroauthaccesstokens?allowWatchBookmarks=true&resourceVersion=739837&timeout=8m59s&timeoutSeconds=539&watch=true", "verb": "watch", "user": { "username": "system:kube-controller-manager", "groups": [ "system:authenticated" ] }, "sourceIPs": [ "10.0.69.141" ], "userAgent": "kube-controller-manager/v1.29.3+e994e5d (linux/amd64) kubernetes/9ebebe1/kube-controller-manager", "objectRef": { "resource": "useroauthaccesstokens", "apiGroup": "oauth.openshift.io", "apiVersion": "v1" }, "responseStatus": { "metadata": {}, "status": "Failure", "message": "APIServer panic'd: net/http: abort Handler", "reason": "InternalError", "code": 500 }, "requestReceivedTimestamp": "2024-04-08T16:33:15.661370Z", "stageTimestamp": "2024-04-08T16:34:26.580751Z", "annotations": { "authorization.k8s.io/decision": "allow", "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"system:kube-controller-manager\" of ClusterRole \"system:kube-controller-manager\" to User \"system:kube-controller-manager\"" } } {code} *Executed commands:* {code:java} [16:31:57] INFO> Shell Commands: oc new-project 59bni --kubeconfig=/alabama/workdir/aws-ipi-disc-priv-tp-amd-f9-destructive-cucushift-ex/ocp4_testuser-1.kubeconfig Now using project "59bni" on server "https://api.ci-op-2r89prs7-68d86.qe.devcluster.openshift.com:6443". You can add applications to this project with the 'new-app' command. For example, try: oc new-app rails-postgresql-example to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application: kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname [16:31:58] INFO> Exit Status: 0 [16:32:06] INFO> Shell Commands: oc get oauth cluster -o yaml --kubeconfig=/alabama/workdir/aws-ipi-disc-priv-tp-amd-f9-destructive-cucushift-ex/ocp4_admin.kubeconfig apiVersion: config.openshift.io/v1 kind: OAuth metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" release.openshift.io/create-only: "true" creationTimestamp: "2024-04-08T03:16:27Z" generation: 12 name: cluster ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: dc192b4d-95aa-492d-a823-112e839bab11 resourceVersion: "736369" uid: 149155f1-69b4-45ac-99a6-6518ebbd9836 spec: identityProviders: - htpasswd: fileData: name: cucushift-htpass-secret mappingMethod: claim name: cucushift-htpasswd-provider type: HTPasswd [16:32:07] INFO> Shell Commands: oc patch oauth.config cluster -p \{\"spec\":\{\"tokenConfig\":\{\"accessTokenInactivityTimeout\":\ \"300s\"\}\}\} --type=merge --kubeconfig=/alabama/workdir/aws-ipi-disc-priv-tp-amd-f9-destructive-cucushift-ex/ocp4_admin.kubeconfig oauth.config.openshift.io/cluster patched [16:33:15] INFO> #### Operator kube-apiserver Expected conditions: {"Available"=>"True", "Progressing"=>"False", "Degraded"=>"False"} [16:33:15] INFO> #### After 1.004526632001216 seconds and 1 iterations operator kube-apiserver becomes: {"Available"=>"True", "Progressing"=>"False", "Degraded"=>"False"} And I wait up to 180 seconds for the steps to pass: # features/step_definitions/meta_steps.rb:33 """ When I run the :get admin command with: | resource | pod | | l | app=oauth-openshift | | n | openshift-authentication | Then the step should succeed And the output should not contain "Terminating" """ [16:33:15] INFO> Shell Commands: oc get pod -l app\=oauth-openshift --kubeconfig=/alabama/workdir/aws-ipi-disc-priv-tp-amd-f9-destructive-cucushift-ex/ocp4_admin.kubeconfig -n openshift-authentication NAME READY STATUS RESTARTS AGE oauth-openshift-5dfdb8498b-2k428 1/1 Running 0 3m7s oauth-openshift-5dfdb8498b-2mwnp 1/1 Running 0 2m11s oauth-openshift-5dfdb8498b-bz96z 1/1 Running 0 2m39s [16:33:16] INFO> Exit Status: 0 When I run the :login client command with: # features/step_definitions/cli.rb:13 | server | <%= env.api_endpoint_url %> | | username | <%= user.name %> | | password | <%= user.password %> | | config | test.kubeconfig | | skip_tls_verify | true | WARNING: Using insecure TLS client config. Setting this option is not supported! Login successful. You have one project on this server: "59bni" Using project "59bni". [16:33:17] INFO> Exit Status: 0 Then the step should succeed # features/step_definitions/common.rb:4 When I run the :whoami client command with: # features/step_definitions/cli.rb:13 | t | | | config | test.kubeconfig | Then the step should succeed # features/step_definitions/common.rb:4 And evaluation of `@result[:stdout].chomp` is stored in the :tokenval clipboard # features/step_definitions/common.rb:128 When I run the :get admin command with: # features/step_definitions/cli.rb:37 | resource | oauthaccesstoken | | resource_name | <%= get_oauthaccesstoken(cb.tokenval) %> | | o | yaml | [16:33:18] INFO> Shell Commands: oc get oauthaccesstoken sha256\~YW0oKa..<snip>..o3jSbM -o yaml --kubeconfig=/alabama/workdir/aws-ipi-disc-priv-tp-amd-f9-destructive-cucushift-ex/ocp4_admin.kubeconfig apiVersion: oauth.openshift.io/v1 authorizeToken: sha256~d7LPmIBv-BhG6D76PDTYDY3LF76pFcJUzv29emH3F3A clientName: openshift-challenging-client expiresIn: 86400 kind: OAuthAccessToken metadata: creationTimestamp: "2024-04-08T16:33:16Z" name: sha256~YW0oKa..<snip>..o3jSbM resourceVersion: "739908" uid: 484cc097-0434-4d4f-a356-3c2bb0902116 redirectURI: https://oauth-openshift.apps.ci-op-2r89prs7-68d86.qe.devcluster.openshift.com/oauth/token/implicit scopes: - user:full userName: testuser-1 userUID: 496de345-40c3-4f22-afcb-223c8ea9cfe0 [16:33:19] INFO> Exit Status: 0 Then the output should contain: # features/step_definitions/common.rb:33 | inactivityTimeoutSeconds: 300 | pattern not found: inactivityTimeoutSeconds: 300 (RuntimeError) /verification-tests/features/step_definitions/common.rb:103:in `block (3 levels) in <top (required)>' /verification-tests/features/step_definitions/common.rb:97:in `each' /verification-tests/features/step_definitions/common.rb:97:in `block (2 levels) in <top (required)>' /verification-tests/features/step_definitions/common.rb:56:in `each' /verification-tests/features/step_definitions/common.rb:56:in `/^(the|all)? outputs?( by order)? should( not)? (contain|match)(?: ([0-9]+|<%=.+?%>) times)?:$/' features/tierN/apiserver/auth/token.feature:103:in `the output should contain:' Given 60 seconds have passed # features/step_definitions/common.rb:124 {code} Status: New | |||
#OCPBUGS-31620 | issue | 8 days ago | numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference CLOSED |
Issue 15913487: numaresources-controller-manager in CrashLoopBackOff - invalid memory address or nil pointer dereference Description: This is a clone of issue OCPBUGS-31051. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-30923. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-30342. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-30236. The following is the description of the original issue: --- Description of problem: {code:none} Pod numaresources-controller-manager is in CrashLoopBackOff state{code} {code:none} oc get po -n openshift-numaresources NAME READY STATUS RESTARTS AGE numaresources-controller-manager-766c55596b-9nb6b 0/1 CrashLoopBackOff 163 (3m52s ago) 14h secondary-scheduler-85959757db-dvpdj 1/1 Running 0 14h{code} {code:none} oc logs -n openshift-numaresources numaresources-controller-manager-766c55596b-9nb6b ... I0305 07:32:51.102133 1 shared_informer.go:341] caches populated I0305 07:32:51.102210 1 controller.go:220] "Starting workers" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" worker count=1 I0305 07:32:51.102295 1 kubeletconfig_controller.go:69] "Starting KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102412 1 panic.go:884] "Finish KubeletConfig reconcile loop" object="/autosizing-master" I0305 07:32:51.102448 1 controller.go:115] "Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" controller="kubeletconfig" controllerGroup="machineconfiguration.openshift.io" controllerKind="KubeletConfig" KubeletConfig="autosizing-master" namespace="" name="autosizing-master" reconcileID="91d2c547-993c-4ae1-beab-1afc0a72af68" panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1778a1c] goroutine 481 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa panic({0x19286e0, 0x2d16fc0}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift-kni/numaresources-operator/pkg/kubeletconfig.MCOKubeletConfToKubeletConf(...) /remote-source/app/pkg/kubeletconfig/kubeletconfig.go:29 github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).reconcileConfigMap(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, 0xc000226c40, {{0x0?, 0xc000e31260?}, {0xc000b98498?, 0x2de08f8?}}) /remote-source/app/controllers/kubeletconfig_controller.go:126 +0x11c github.com/openshift-kni/numaresources-operator/controllers.(*KubeletConfigReconciler).Reconcile(0xc0004d29c0, {0x1e475f0, 0xc000e31260}, {{{0x0, 0x0}, {0xc000b98498, 0x11}}}) /remote-source/app/controllers/kubeletconfig_controller.go:90 +0x3cd sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1e4a2e0?, {0x1e475f0?, 0xc000e31260?}, {{{0x0?, 0xb?}, {0xc000b98498?, 0x0?}}}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xc8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004446e0, {0x1e47548, 0xc0003520f0}, {0x19b9940?, 0xc00093a1a0?}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3ca sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004446e0, {0x1e47548, 0xc0003520f0}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x587 {code} Version-Release number of selected component (if applicable): {code:none} numaresources-operator.v4.15.0 {code} How reproducible: {code:none} so far 100% {code} Steps to Reproduce: {code:none} 1. Create a KubeletConfig that configures autosizing: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: autosizing-master spec: autoSizingReserved: true machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: "" 2. Create a performance profile that targets subset of nodes 3. Proceed with numaresources-operator installation {code} Actual results: {code:none} Pod in CrashLoopBackOff state {code} Expected results: {code:none} numaresources-operator is successfully installed {code} Additional info: {code:none} Baremetal dualstack cluster deployed with GitOps-ZTP {code} Status: CLOSED | |||
#TRT-1643 | issue | 35 hours ago | Another controlelr manager undiagnosed panic capable of failing payloads In Progress |
Issue 15982864: Another controller manager undiagnosed panic capable of failing payloads Description: Similar to https://issues.redhat.com/browse/TRT-1632, this job failed a 4.17 nightly payload last night when one of ten jobs hit this crash: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade/1786172902819762176 {code} : Undiagnosed panic detected in pod expand_less 0s { pods/openshift-controller-manager_controller-manager-6f79ddc977-fcnm7_controller-manager.log.gz:E0503 01:49:29.881473 1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c65060), concrete:(*abi.Type)(0x3e5bc20), asserted:(*abi.Type)(0x418ee20), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.BuildConfig)} {code} Status: In Progress | |||
#OCPBUGS-28965 | issue | 6 weeks ago | [IBMCloud] Cluster install failed and machine-api-controllers stucks in CrashLoopBackOff Verified |
Issue 15775584: [IBMCloud] Cluster install failed and machine-api-controllers stucks in CrashLoopBackOff Description: Description of problem: {code:none} Cluster install failed on ibm cloud and machine-api-controllers stucks in CrashLoopBackOff {code} Version-Release number of selected component (if applicable): {code:none} from 4.16.0-0.nightly-2024-02-02-224339{code} How reproducible: {code:none} Always{code} Steps to Reproduce: {code:none} 1. Install cluster on IBMCloud 2. 3. {code} Actual results: {code:none} Cluster install failed $ oc get node NAME STATUS ROLES AGE VERSION maxu-16-gp2vp-master-0 Ready control-plane,master 7h11m v1.29.1+2f773e8 maxu-16-gp2vp-master-1 Ready control-plane,master 7h11m v1.29.1+2f773e8 maxu-16-gp2vp-master-2 Ready control-plane,master 7h11m v1.29.1+2f773e8 $ oc get machine -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE maxu-16-gp2vp-master-0 7h15m maxu-16-gp2vp-master-1 7h15m maxu-16-gp2vp-master-2 7h15m maxu-16-gp2vp-worker-1-xfvqq 7h5m maxu-16-gp2vp-worker-2-5hn7c 7h5m maxu-16-gp2vp-worker-3-z74z2 7h5m openshift-machine-api machine-api-controllers-6cb7fcdcdb-k6sv2 6/7 CrashLoopBackOff 92 (31s ago) 7h1m $ oc logs -n openshift-machine-api -c machine-controller machine-api-controllers-6cb7fcdcdb-k6sv2 I0204 10:53:34.336338 1 main.go:120] Watching machine-api objects only in namespace "openshift-machine-api" for reconciliation.panic: runtime error: invalid memory address or nil pointer dereference[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x285fe72] goroutine 25 [running]:k8s.io/klog/v2/textlogger.(*tlogger).Enabled(0x0?, 0x0?) /go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/k8s.io/klog/v2/textlogger/textlogger.go:81 +0x12sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Enabled(0xc000438100, 0x0?) /go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:114 +0x92github.com/go-logr/logr.Logger.Info({{0x3232210?, 0xc000438100?}, 0x0?}, {0x2ec78f3, 0x17}, {0x0, 0x0, 0x0}) /go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/github.com/go-logr/logr/logr.go:276 +0x72sigs.k8s.io/controller-runtime/pkg/metrics/server.(*defaultServer).Start(0xc0003bd2c0, {0x322e350?, 0xc00058a140}) /go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/sigs.k8s.io/controller-runtime/pkg/metrics/server/server.go:185 +0x75sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1(0xc0002c4540) /go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:223 +0xc8created by sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile in goroutine 24 /go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:207 +0x19d{code} Expected results: {code:none} Cluster install succeed{code} Additional info: {code:none} may relate to this pr https://github.com/openshift/machine-api-provider-ibmcloud/pull/34{code} Status: Verified $ oc logs -n openshift-machine-api -c machine-controller machine-api-controllers-f4dbbfffd-kl6kw /machine-controller-manager flag redefined: vpanic: /machine-controller-manager flag redefined: v goroutine 1 [running]:flag.(*FlagSet).Var(0xc0001ee000, {0x7fdb2a288658, 0xc000035ce8}, {0x3202f70, 0x1}, {0x2f1c2bd, 0x38}) /usr/lib/golang/src/flag/flag.go:1028 +0x3a5k8s.io/klog/v2/textlogger.(*Config).AddFlags(0xc000035d10, 0x0?) /go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/k8s.io/klog/v2/textlogger/options.go:131 +0x69main.main() /go/src/github.com/openshift/machine-api-provider-ibmcloud/cmd/manager/main.go:96 +0x1ea{code} | |||
#OCPBUGS-30604 | issue | 6 weeks ago | Misformatted node labels causing origin-tests to panic Verified |
Issue 15864053: Misformatted node labels causing origin-tests to panic Description: Description of problem: {code:none} Panic thrown by origin-tests{code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} always{code} Steps to Reproduce: {code:none} 1. Create aws or rosa 4.15 cluster 2. run origin tests 3. {code} Actual results: {code:none} time="2024-03-07T17:03:50Z" level=info msg="resulting interval message" message="{RegisteredNode Node ip-10-0-8-83.ec2.internal event: Registered Node ip-10-0-8-83.ec2.internal in Controller map[reason:RegisteredNode roles:worker]}" E0307 17:03:50.319617 71 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23]) goroutine 310 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x84c6f20?, 0xc006fdc588}) k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc008c38120?}) k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:49 +0x75 panic({0x84c6f20, 0xc006fdc588}) runtime/panic.go:884 +0x213 github.com/openshift/origin/pkg/monitortests/testframework/watchevents.nodeRoles(0x0?) github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:251 +0x1e5 github.com/openshift/origin/pkg/monitortests/testframework/watchevents.recordAddOrUpdateEvent({0x96bcc00, 0xc0076e3310}, {0x7f2a0e47a1b8, 0xc007732330}, {0x281d36d?, 0x0?}, {0x9710b50, 0xc000c5e000}, {0x9777af, 0xedd7be6b7, ...}, ...) github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:116 +0x41b github.com/openshift/origin/pkg/monitortests/testframework/watchevents.startEventMonitoring.func2({0x8928f00?, 0xc00b528c80}) github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:65 +0x185 k8s.io/client-go/tools/cache.(*FakeCustomStore).Add(0x8928f00?, {0x8928f00?, 0xc00b528c80?}) k8s.io/client-go@v0.29.0/tools/cache/fake_custom_store.go:35 +0x31 k8s.io/client-go/tools/cache.watchHandler({0x0?, 0x0?, 0xe16d020?}, {0x9694a10, 0xc006b00180}, {0x96d2780, 0xc0078afe00}, {0x96f9e28?, 0x8928f00}, 0x0, ...) k8s.io/client-go@v0.29.0/tools/cache/reflector.go:756 +0x603 k8s.io/client-go/tools/cache.(*Reflector).watch(0xc0005dcc40, {0x0?, 0x0?}, 0xc005cdeea0, 0xc005bf8c40?) k8s.io/client-go@v0.29.0/tools/cache/reflector.go:437 +0x53b k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0005dcc40, 0xc005cdeea0) k8s.io/client-go@v0.29.0/tools/cache/reflector.go:357 +0x453 k8s.io/client-go/tools/cache.(*Reflector).Run.func1() k8s.io/client-go@v0.29.0/tools/cache/reflector.go:291 +0x26 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?) k8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:226 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc007974ec0?, {0x9683f80, 0xc0078afe50}, 0x1, 0xc005cdeea0) k8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:227 +0xb6 k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0005dcc40, 0xc005cdeea0) k8s.io/client-go@v0.29.0/tools/cache/reflector.go:290 +0x17d created by github.com/openshift/origin/pkg/monitortests/testframework/watchevents.startEventMonitoring github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:83 +0x6a5 panic: runtime error: slice bounds out of range [24:23] [recovered] panic: runtime error: slice bounds out of range [24:23]{code} Expected results: {code:none} execution of tests{code} Additional info: {code:none} {code} Status: Verified | |||
#OCPBUGS-32414 | issue | 2 days ago | control-plane-machine-set operator pod stuck into crashloopbackoff state with the nil pointer dereference runtime error CLOSED |
Issue 15952002: control-plane-machine-set operator pod stuck into crashloopbackoff state with the nil pointer dereference runtime error Description: Backport for 4.15 - Manually Cloned from https://issues.redhat.com/browse/OCPBUGS-31808 Description of problem: {code:none} control-plane-machine-set operator pod stuck into crashloopbackoff state with panic: runtime error: invalid memory address or nil pointer dereference while extracting the failureDomain from the controlplanemachineset. Below is the error trace for reference. ~~~ 2024-04-04T09:32:23.594257072Z I0404 09:32:23.594176 1 controller.go:146] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="c282f3e3-9f9d-40df-a24e-417ba2ea4106" 2024-04-04T09:32:23.594257072Z I0404 09:32:23.594221 1 controller.go:125] "msg"="Reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55" 2024-04-04T09:32:23.594274974Z I0404 09:32:23.594257 1 controller.go:146] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55" 2024-04-04T09:32:23.597509741Z I0404 09:32:23.597426 1 watch_filters.go:179] reconcile triggered by infrastructure change 2024-04-04T09:32:23.606311553Z I0404 09:32:23.606243 1 controller.go:220] "msg"="Starting workers" "controller"="controlplanemachineset" "worker count"=1 2024-04-04T09:32:23.606360950Z I0404 09:32:23.606340 1 controller.go:169] "msg"="Reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.609322467Z I0404 09:32:23.609217 1 panic.go:884] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.609322467Z I0404 09:32:23.609271 1 controller.go:115] "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="controlplanemachineset" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.612540681Z panic: runtime error: invalid memory address or nil pointer dereference [recovered] 2024-04-04T09:32:23.612540681Z panic: runtime error: invalid memory address or nil pointer dereference 2024-04-04T09:32:23.612540681Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1a5911c] 2024-04-04T09:32:23.612540681Z 2024-04-04T09:32:23.612540681Z goroutine 255 [running]: 2024-04-04T09:32:23.612540681Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() 2024-04-04T09:32:23.612571624Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa 2024-04-04T09:32:23.612571624Z panic({0x1c8ac60, 0x31c6ea0}) 2024-04-04T09:32:23.612571624Z /usr/lib/golang/src/runtime/panic.go:884 +0x213 2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.VSphereProviderConfig.ExtractFailureDomain(...) 2024-04-04T09:32:23.612571624Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/vsphere.go:120 2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.providerConfig.ExtractFailureDomain({{0x1f2a71a, 0x7}, {{{{...}, {...}}, {{...}, {...}, {...}, {...}, {...}, {...}, ...}, ...}}, ...}) 2024-04-04T09:32:23.612588145Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/providerconfig.go:212 +0x23c ~~~ {code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} {code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} control-plane-machine-set operator stuck into crashloopback off state while cluster upgrade. {code} Expected results: {code:none} control-plane-machine-set operator should be upgraded without any errors. {code} Additional info: {code:none} This is happening during the cluster upgrade of Vsphere IPI cluster from OCP version 4.14.z to 4.15.6 and may impact other z stream releases. from the official docs[1] I see providing the failure domain for the Vsphere platform is tech preview feature. [1] https://docs.openshift.com/container-platform/4.15/machine_management/control_plane_machine_management/cpmso-configuration.html#cpmso-yaml-failure-domain-vsphere_cpmso-configuration {code} Status: CLOSED huliu-vs424d-s5tqz-worker-0-rbbv2 Running 128m liuhuali@Lius-MacBook-Pro huali-test % oc logs control-plane-machine-set-operator-5bc5d47c7d-j6w4c |grep panic liuhuali@Lius-MacBook-Pro huali-test % {code} machine-api-operator-dcfdbd4bf-9wwph 2/2 Running 0 39m liuhuali@Lius-MacBook-Pro huali-test % oc logs control-plane-machine-set-operator-7c778bbb86-msmfl |grep panic liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion version -o='jsonpath={.status.history[*].version}' | |||
#OCPBUGS-17671 | issue | 5 days ago | e2e-azure-operator fails on TestManagedDNSToUnmanagedDNSIngressController: "DNSRecord zone expected to have status=Unknown but got status=True" ASSIGNED |
Issue 15420933: e2e-azure-operator fails on TestManagedDNSToUnmanagedDNSIngressController: "DNSRecord zone expected to have status=Unknown but got status=True" Description: h2. Description of problem CI is flaky because of test failures such as the following: {noformat} TestAll/parallel/TestManagedDNSToUnmanagedDNSIngressController === RUN TestAll/parallel/TestManagedDNSToUnmanagedDNSIngressController util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:551: verified connectivity with workload with req http://168.61.75.99 and response 200 unmanaged_dns_test.go:148: Updating ingresscontroller managed-migrated to dnsManagementPolicy=Unmanaged unmanaged_dns_test.go:161: Waiting for stable conditions on ingresscontroller managed-migrated after dnsManagementPolicy=Unmanaged unmanaged_dns_test.go:177: verifying conditions on DNSRecord zone {ID:/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-k8s8zfit-04a70-rdnbw-rg/providers/Microsoft.Network/privateDnsZo nes/ci-op-k8s8zfit-04a70.ci.azure.devcluster.openshift.com Tags:map[]} unmanaged_dns_test.go:177: DNSRecord zone expected to have status=Unknown but got status=True panic.go:522: deleted ingresscontroller managed-migrated {noformat} This particular failure comes from [https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/970/pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator/1690101593501863936]. Search.ci has other similar failures. h2. Version-Release number of selected component (if applicable) I have seen this in recent 4.14 CI job runs. I also found a failure from February 2023, which precedes the 4.13 branch cut in March 2023, which means these failures go back at least to 4.13: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/874/pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator/1626100610514292736 h2. How reproducible Presently, [search.ci shows|https://search.ci.openshift.org/?search=FAIL%3A+TestAll%2Fparallel%2FTestManagedDNSToUnmanagedDNSIngressController&maxAge=336h&context=1&type=build-log&name=cluster-ingress-operator-master-e2e-azure-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] the following stats for the past 14 days: {noformat} Found in 6.98% of runs (14.29% of failures) across 43 total runs and 1 jobs (48.84% failed) pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator (all) - 43 runs, 49% failed, 14% of failures match = 7% impact {noformat} h2. Steps to Reproduce 1. Post a PR and have bad luck. 2. Check search.ci using the link above. h2. Actual results CI fails. h2. Expected results CI passes, or fails on some other test failure. Status: ASSIGNED | |||
#OCPBUGS-31779 | issue | 2 days ago | Sriov network operator pod is crashing due to panic: runtime error: invalid memory address or nil pointer dereference New |
Issue 15922499: Sriov network operator pod is crashing due to panic: runtime error: invalid memory address or nil pointer dereference Description: Description of problem: {code:none} The SRIOV network operator pod is crashing and going into Crash Loop Back off state. Upon Checking the pod logs panic error messages can be seen : 2024-04-03T17:48:33.008379552Z panic: runtime error: invalid memory address or nil pointer dereference 2024-04-03T17:48:33.008379552Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x176c03f]{code} {code:none} Also on checking the Sriov Operator Config everything looks fine : apiVersion: v1 items: - apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: creationTimestamp: "2024-03-26T18:36:10Z" generation: 2 name: default namespace: openshift-sriov-network-operator resourceVersion: "18324322" uid: 61f07940-ac92-437d-bacb-41fb1224cc34 spec: disableDrain: true enableInjector: true enableOperatorWebhook: true logLevel: 2 {code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} {code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} {code} Expected results: {code:none} {code} Additional info: {code:none} {code} Status: New | |||
#OCPBUGS-30631 | issue | 4 days ago | SNO (RT kernel) sosreport crash the SNO node CLOSED |
Issue 15865131: SNO (RT kernel) sosreport crash the SNO node Description: Description of problem: {code:none} sosreport collection causes SNO XR11 node crash. {code} Version-Release number of selected component (if applicable): {code:none} - RHOCP : 4.12.30 - kernel : 4.18.0-372.69.1.rt7.227.el8_6.x86_64 - platform : x86_64{code} How reproducible: {code:none} sh-4.4# chrt -rr 99 toolbox .toolboxrc file detected, overriding defaults... Checking if there is a newer version of ocpdalmirror.xxx.yyy:8443/rhel8/support-tools-zzz-feb available... Container 'toolbox-root' already exists. Trying to start... (To remove the container and start with a fresh toolbox, run: sudo podman rm 'toolbox-root') toolbox-root Container started successfully. To exit, type 'exit'. [root@node /]# which sos /usr/sbin/sos logger: socket /dev/log: No such file or directory [root@node /]# taskset -c 29-31,61-63 sos report --batch -n networking,kernel,processor -k crio.all=on -k crio.logs=on -k podman.all=on -kpodman.logs=on sosreport (version 4.5.6) This command will collect diagnostic and configuration information from this Red Hat CoreOS system. An archive containing the collected information will be generated in /host/var/tmp/sos.c09e4f7z and may be provided to a Red Hat support representative. Any information provided to Red Hat will be treated in accordance with the published support policies at: Distribution Website : https://www.redhat.com/ Commercial Support : https://access.redhat.com/ The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Setting up archive ... Setting up plugins ... [plugin:auditd] Could not open conf file /etc/audit/auditd.conf: [Errno 2] No such file or directory: '/etc/audit/auditd.conf' caught exception in plugin method "system.setup()" writing traceback to sos_logs/system-plugin-errors.txt [plugin:systemd] skipped command 'resolvectl status': required services missing: systemd-resolved. [plugin:systemd] skipped command 'resolvectl statistics': required services missing: systemd-resolved. Running plugins. Please wait ... Starting 1/91 alternatives [Running: alternatives] Starting 2/91 atomichost [Running: alternatives atomichost] Starting 3/91 auditd [Running: alternatives atomichost auditd] Starting 4/91 block [Running: alternatives atomichost auditd block] Starting 5/91 boot [Running: alternatives auditd block boot] Starting 6/91 cgroups [Running: auditd block boot cgroups] Starting 7/91 chrony [Running: auditd block cgroups chrony] Starting 8/91 cifs [Running: auditd block cgroups cifs] Starting 9/91 conntrack [Running: auditd block cgroups conntrack] Starting 10/91 console [Running: block cgroups conntrack console] Starting 11/91 container_log [Running: block cgroups conntrack container_log] Starting 12/91 containers_common [Running: block cgroups conntrack containers_common] Starting 13/91 crio [Running: block cgroups conntrack crio] Starting 14/91 crypto [Running: cgroups conntrack crio crypto] Starting 15/91 date [Running: cgroups conntrack crio date] Starting 16/91 dbus [Running: cgroups conntrack crio dbus] Starting 17/91 devicemapper [Running: cgroups conntrack crio devicemapper] Starting 18/91 devices [Running: cgroups conntrack crio devices] Starting 19/91 dracut [Running: cgroups conntrack crio dracut] Starting 20/91 ebpf [Running: cgroups conntrack crio ebpf] Starting 21/91 etcd [Running: cgroups crio ebpf etcd] Starting 22/91 filesys [Running: cgroups crio ebpf filesys] Starting 23/91 firewall_tables [Running: cgroups crio filesys firewall_tables] Starting 24/91 fwupd [Running: cgroups crio filesys fwupd] Starting 25/91 gluster [Running: cgroups crio filesys gluster] Starting 26/91 grub2 [Running: cgroups crio filesys grub2] Starting 27/91 gssproxy [Running: cgroups crio grub2 gssproxy] Starting 28/91 hardware [Running: cgroups crio grub2 hardware] Starting 29/91 host [Running: cgroups crio hardware host] Starting 30/91 hts [Running: cgroups crio hardware hts] Starting 31/91 i18n [Running: cgroups crio hardware i18n] Starting 32/91 iscsi [Running: cgroups crio hardware iscsi] Starting 33/91 jars [Running: cgroups crio hardware jars] Starting 34/91 kdump [Running: cgroups crio hardware kdump] Starting 35/91 kernelrt [Running: cgroups crio hardware kernelrt] Starting 36/91 keyutils [Running: cgroups crio hardware keyutils] Starting 37/91 krb5 [Running: cgroups crio hardware krb5] Starting 38/91 kvm [Running: cgroups crio hardware kvm] Starting 39/91 ldap [Running: cgroups crio kvm ldap] Starting 40/91 libraries [Running: cgroups crio kvm libraries] Starting 41/91 libvirt [Running: cgroups crio kvm libvirt] Starting 42/91 login [Running: cgroups crio kvm login] Starting 43/91 logrotate [Running: cgroups crio kvm logrotate] Starting 44/91 logs [Running: cgroups crio kvm logs] Starting 45/91 lvm2 [Running: cgroups crio logs lvm2] Starting 46/91 md [Running: cgroups crio logs md] Starting 47/91 memory [Running: cgroups crio logs memory] Starting 48/91 microshift_ovn [Running: cgroups crio logs microshift_ovn] Starting 49/91 multipath [Running: cgroups crio logs multipath] Starting 50/91 networkmanager [Running: cgroups crio logs networkmanager] Removing debug pod ... error: unable to delete the debug pod "ransno1ransnomavdallabcom-debug": Delete "https://api.ransno.mavdallab.com:6443/api/v1/namespaces/openshift-debug-mt82m/pods/ransno1ransnomavdallabcom-debug": dial tcp 10.71.136.144:6443: connect: connection refused {code} Steps to Reproduce: {code:none} Launch a debug pod and the procedure above and it crash the node{code} Actual results: {code:none} Node crash{code} Expected results: {code:none} Node does not crash{code} Additional info: {code:none} We have two vmcore on the associated SFDC ticket. This system use a RT kernel. Using an out of tree ice driver 1.13.7 (probably from 22 dec 2023) [ 103.681608] ice: module unloaded [ 103.830535] ice: loading out-of-tree module taints kernel. [ 103.831106] ice: module verification failed: signature and/or required key missing - tainting kernel [ 103.841005] ice: Intel(R) Ethernet Connection E800 Series Linux Driver - version 1.13.7 [ 103.841017] ice: Copyright (C) 2018-2023 Intel Corporation With the following kernel command line Command line: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-f2c287e549b45a742b62e4f748bc2faae6ca907d24bb1e029e4985bc01649033/vmlinuz-4.18.0-372.69.1.rt7.227.el8_6.x86_64 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/f2c287e549b45a742b62e4f748bc2faae6ca907d24bb1e029e4985bc01649033/0 root=UUID=3e8bda80-5cf4-4c46-b139-4c84cb006354 rw rootflags=prjquota boot=UUID=1d0512c2-3f92-42c5-b26d-709ff9350b81 intel_iommu=on iommu=pt firmware_class.path=/var/lib/firmware skew_tick=1 nohz=on rcu_nocbs=3-31,35-63 tuned.non_isolcpus=00000007,00000007 systemd.cpu_affinity=0,1,2,32,33,34 intel_iommu=on iommu=pt isolcpus=managed_irq,3-31,35-63 nohz_full=3-31,35-63 tsc=nowatchdog nosoftlockup nmi_watchdog=0 mce=off rcutree.kthread_prio=11 default_hugepagesz=1G rcupdate.rcu_normal_after_boot=0 efi=runtime module_blacklist=irdma intel_pstate=passive intel_idle.max_cstate=0 crashkernel=256M vmcore1 show issue with the ice driver crash vmcore tmp/vmlinux KERNEL: tmp/vmlinux [TAINTED] DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 64 DATE: Thu Mar 7 17:16:57 CET 2024 UPTIME: 02:44:28 LOAD AVERAGE: 24.97, 25.47, 25.46 TASKS: 5324 NODENAME: aaa.bbb.ccc RELEASE: 4.18.0-372.69.1.rt7.227.el8_6.x86_64 VERSION: #1 SMP PREEMPT_RT Fri Aug 4 00:21:46 EDT 2023 MACHINE: x86_64 (1500 Mhz) MEMORY: 127.3 GB PANIC: "Kernel panic - not syncing:" PID: 693 COMMAND: "khungtaskd" TASK: ff4d1890260d4000 [THREAD_INFO: ff4d1890260d4000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> ps|grep sos 449071 363440 31 ff4d189005f68000 IN 0.2 506428 314484 sos 451043 363440 63 ff4d188943a9c000 IN 0.2 506428 314484 sos 494099 363440 29 ff4d187f941f4000 UN 0.2 506428 314484 sos 8457.517696] ------------[ cut here ]------------ [ 8457.517698] NETDEV WATCHDOG: ens3f1 (ice): transmit queue 35 timed out [ 8457.517711] WARNING: CPU: 33 PID: 349 at net/sched/sch_generic.c:472 dev_watchdog+0x270/0x300 [ 8457.517718] Modules linked in: binfmt_misc macvlan pci_pf_stub iavf vfio_pci vfio_virqfd vfio_iommu_type1 vfio vhost_net vhost vhost_iotlb tap tun xt_addrtype nf_conntrack_netlink ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_nat xt_CT tcp_diag inet_diag ip6t_MASQUERADE xt_mark ice(OE) xt_conntrack ipt_MASQUERADE nft_counter xt_comment nft_compat veth nft_chain_nat nf_tables overlay bridge 8021q garp mrp stp llc nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ext4 mbcache jbd2 intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp rapl ipmi_ssif intel_cstate intel_uncore dm_thin_pool pcspkr isst_if_mbox_pci dm_persistent_data dm_bio_prison dm_bufio isst_if_mmio isst_if_common mei_me i2c_i801 joydev mei intel_pmt wmi acpi_ipmi ipmi_si acpi_power_meter sctp ip6_udp_tunnel [ 8457.517770] udp_tunnel ip_tables xfs libcrc32c i40e sd_mod t10_pi sg bnxt_re ib_uverbs ib_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel bnxt_en ahci libahci libata dm_multipath dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [last unloaded: ice] [ 8457.517784] Red Hat flags: eBPF/rawtrace [ 8457.517787] CPU: 33 PID: 349 Comm: ktimers/33 Kdump: loaded Tainted: G OE --------- - - 4.18.0-372.69.1.rt7.227.el8_6.x86_64 #1 [ 8457.517789] Hardware name: Dell Inc. PowerEdge XR11/0P2RNT, BIOS 1.12.1 09/13/2023 [ 8457.517790] RIP: 0010:dev_watchdog+0x270/0x300 [ 8457.517793] Code: 17 00 e9 f0 fe ff ff 4c 89 e7 c6 05 c6 03 34 01 01 e8 14 43 fa ff 89 d9 4c 89 e6 48 c7 c7 90 37 98 9a 48 89 c2 e8 1d be 88 ff <0f> 0b eb ad 65 8b 05 05 13 fb 65 89 c0 48 0f a3 05 1b ab 36 01 73 [ 8457.517795] RSP: 0018:ff7aeb55c73c7d78 EFLAGS: 00010286 [ 8457.517797] RAX: 0000000000000000 RBX: 0000000000000023 RCX: 0000000000000001 [ 8457.517798] RDX: 0000000000000000 RSI: ffffffff9a908557 RDI: 00000000ffffffff [ 8457.517799] RBP: 0000000000000021 R08: ffffffff9ae6b3a0 R09: 00080000000000ff [ 8457.517800] R10: 000000006443a462 R11: 0000000000000036 R12: ff4d187f4d1f4000 [ 8457.517801] R13: ff4d187f4d20df00 R14: ff4d187f4d1f44a0 R15: 0000000000000080 [ 8457.517803] FS: 0000000000000000(0000) GS:ff4d18967a040000(0000) knlGS:0000000000000000 [ 8457.517804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8457.517805] CR2: 00007fc47c649974 CR3: 00000019a441a005 CR4: 0000000000771ea0 [ 8457.517806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8457.517807] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8457.517808] PKRU: 55555554 [ 8457.517810] Call Trace: [ 8457.517813] ? test_ti_thread_flag.constprop.50+0x10/0x10 [ 8457.517816] ? test_ti_thread_flag.constprop.50+0x10/0x10 [ 8457.517818] call_timer_fn+0x32/0x1d0 [ 8457.517822] ? test_ti_thread_flag.constprop.50+0x10/0x10 [ 8457.517825] run_timer_softirq+0x1fc/0x640 [ 8457.517828] ? _raw_spin_unlock_irq+0x1d/0x60 [ 8457.517833] ? finish_task_switch+0xea/0x320 [ 8457.517836] ? __switch_to+0x10c/0x4d0 [ 8457.517840] __do_softirq+0xa5/0x33f [ 8457.517844] run_timersd+0x61/0xb0 [ 8457.517848] smpboot_thread_fn+0x1c1/0x2b0 [ 8457.517851] ? smpboot_register_percpu_thread_cpumask+0x140/0x140 [ 8457.517853] kthread+0x151/0x170 [ 8457.517856] ? set_kthread_struct+0x50/0x50 [ 8457.517858] ret_from_fork+0x1f/0x40 [ 8457.517861] ---[ end trace 0000000000000002 ]--- [ 8458.520445] ice 0000:8a:00.1 ens3f1: tx_timeout: VSI_num: 14, Q 35, NTC: 0x99, HW_HEAD: 0x14, NTU: 0x15, INT: 0x0 [ 8458.520451] ice 0000:8a:00.1 ens3f1: tx_timeout recovery level 1, txqueue 35 [ 8506.139246] ice 0000:8a:00.1: PTP reset successful [ 8506.437047] ice 0000:8a:00.1: VSI rebuilt. VSI index 0, type ICE_VSI_PF [ 8506.445482] ice 0000:8a:00.1: VSI rebuilt. VSI index 1, type ICE_VSI_CTRL [ 8540.459707] ice 0000:8a:00.1 ens3f1: tx_timeout: VSI_num: 14, Q 35, NTC: 0xe3, HW_HEAD: 0xe7, NTU: 0xe8, INT: 0x0 [ 8540.459714] ice 0000:8a:00.1 ens3f1: tx_timeout recovery level 1, txqueue 35 [ 8563.891356] ice 0000:8a:00.1: PTP reset successful ~~~ Second vmcore on the same node show issue with the SSD drive $ crash vmcore-2 tmp/vmlinux KERNEL: tmp/vmlinux [TAINTED] DUMPFILE: vmcore-2 [PARTIAL DUMP] CPUS: 64 DATE: Thu Mar 7 14:29:31 CET 2024 UPTIME: 1 days, 07:19:52 LOAD AVERAGE: 25.55, 26.42, 28.30 TASKS: 5409 NODENAME: aaa.bbb.ccc RELEASE: 4.18.0-372.69.1.rt7.227.el8_6.x86_64 VERSION: #1 SMP PREEMPT_RT Fri Aug 4 00:21:46 EDT 2023 MACHINE: x86_64 (1500 Mhz) MEMORY: 127.3 GB PANIC: "Kernel panic - not syncing:" PID: 696 COMMAND: "khungtaskd" TASK: ff2b35ed48d30000 [THREAD_INFO: ff2b35ed48d30000] CPU: 34 STATE: TASK_RUNNING (PANIC) crash> ps |grep sos 719784 718369 62 ff2b35ff00830000 IN 0.4 1215636 563388 sos 721740 718369 61 ff2b3605579f8000 IN 0.4 1215636 563388 sos 721742 718369 63 ff2b35fa5eb9c000 IN 0.4 1215636 563388 sos 721744 718369 30 ff2b3603367fc000 IN 0.4 1215636 563388 sos 721746 718369 29 ff2b360557944000 IN 0.4 1215636 563388 sos 743356 718369 62 ff2b36042c8e0000 IN 0.4 1215636 563388 sos 743818 718369 29 ff2b35f6186d0000 IN 0.4 1215636 563388 sos 748518 718369 61 ff2b3602cfb84000 IN 0.4 1215636 563388 sos 748884 718369 62 ff2b360713418000 UN 0.4 1215636 563388 sos crash> dmesg [111871.309883] ata3.00: exception Emask 0x0 SAct 0x3ff8 SErr 0x0 action 0x6 frozen [111871.309889] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309891] ata3.00: cmd 61/40:18:28:47:4b/00:00:00:00:00/40 tag 3 ncq dma 32768 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [111871.309895] ata3.00: status: { DRDY } [111871.309897] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309904] ata3.00: cmd 61/40:20:68:47:4b/00:00:00:00:00/40 tag 4 ncq dma 32768 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [111871.309908] ata3.00: status: { DRDY } [111871.309909] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309910] ata3.00: cmd 61/40:28:a8:47:4b/00:00:00:00:00/40 tag 5 ncq dma 32768 out res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [111871.309913] ata3.00: status: { DRDY } [111871.309914] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309915] ata3.00: cmd 61/40:30:e8:47:4b/00:00:00:00:00/40 tag 6 ncq dma 32768 out res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [111871.309918] ata3.00: status: { DRDY } [111871.309919] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309919] ata3.00: cmd 61/70:38:48:37:2b/00:00:1c:00:00/40 tag 7 ncq dma 57344 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [111871.309922] ata3.00: status: { DRDY } [111871.309923] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309924] ata3.00: cmd 61/20:40:78:29:0c/00:00:19:00:00/40 tag 8 ncq dma 16384 out res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [111871.309927] ata3.00: status: { DRDY } [111871.309928] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309929] ata3.00: cmd 61/08:48:08:0c:c0/00:00:1c:00:00/40 tag 9 ncq dma 4096 out res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [111871.309932] ata3.00: status: { DRDY } [111871.309933] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309934] ata3.00: cmd 61/40:50:28:48:4b/00:00:00:00:00/40 tag 10 ncq dma 32768 out res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [111871.309937] ata3.00: status: { DRDY } [111871.309938] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309939] ata3.00: cmd 61/40:58:68:48:4b/00:00:00:00:00/40 tag 11 ncq dma 32768 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [111871.309942] ata3.00: status: { DRDY } [111871.309943] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309944] ata3.00: cmd 61/40:60:a8:48:4b/00:00:00:00:00/40 tag 12 ncq dma 32768 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [111871.309946] ata3.00: status: { DRDY } [111871.309947] ata3.00: failed command: WRITE FPDMA QUEUED [111871.309948] ata3.00: cmd 61/40:68:e8:48:4b/00:00:00:00:00/40 tag 13 ncq dma 32768 out res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) [111871.309951] ata3.00: status: { DRDY } [111871.309953] ata3: hard resetting link ... ... ... [112789.787310] INFO: task sos:748884 blocked for more than 600 seconds. [112789.787314] Tainted: G OE --------- - - 4.18.0-372.69.1.rt7.227.el8_6.x86_64 #1 [112789.787316] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [112789.787316] task:sos state:D stack: 0 pid:748884 ppid:718369 flags:0x00084080 [112789.787320] Call Trace: [112789.787323] __schedule+0x37b/0x8e0 [112789.787330] schedule+0x6c/0x120 [112789.787333] schedule_timeout+0x2b7/0x410 [112789.787336] ? enqueue_entity+0x130/0x790 [112789.787340] wait_for_completion+0x84/0xf0 [112789.787343] flush_work+0x120/0x1d0 [112789.787347] ? flush_workqueue_prep_pwqs+0x130/0x130 [112789.787350] schedule_on_each_cpu+0xa7/0xe0 [112789.787353] vmstat_refresh+0x22/0xa0 [112789.787357] proc_sys_call_handler+0x174/0x1d0 [112789.787361] vfs_read+0x91/0x150 [112789.787364] ksys_read+0x52/0xc0 [112789.787366] do_syscall_64+0x87/0x1b0 [112789.787369] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [112789.787372] RIP: 0033:0x7f2dca8c2ab4 [112789.787378] Code: Unable to access opcode bytes at RIP 0x7f2dca8c2a8a. [112789.787378] RSP: 002b:00007f2dbbffc5e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [112789.787380] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f2dca8c2ab4 [112789.787382] RDX: 0000000000004000 RSI: 00007f2db402b5a0 RDI: 0000000000000008 [112789.787383] RBP: 00007f2db402b5a0 R08: 0000000000000000 R09: 00007f2dcace27bb [112789.787383] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000004000 [112789.787384] R13: 0000000000000008 R14: 00007f2db402b5a0 R15: 00007f2da4001a90 [112789.787418] NMI backtrace for cpu 34 {code} Status: CLOSED | |||
#OCPBUGS-31467 | issue | 3 days ago | az.EnsureHostInPool panic when Azure VM instance not found ASSIGNED |
Issue 15905234: az.EnsureHostInPool panic when Azure VM instance not found Description: Description of problem: {code:none} on Azure, when kube-controller-manager verify whether a machine exists or not, if the machine was already deleted, the code may panic with sigsegv I0320 12:02:55.806321 1 azure_backoff.go:91] GetVirtualMachineWithRetry(worker-e32ads-westeurope2-f72dr): backoff success I0320 12:02:56.028287 1 azure_wrap.go:201] Virtual machine "worker-e16as-westeurope1-hpz2t" is under deleting I0320 12:02:56.028328 1 azure_standard.go:752] GetPrimaryInterface(worker-e16as-westeurope1-hpz2t, ) abort backoff E0320 12:02:56.028334 1 azure_standard.go:825] error: az.EnsureHostInPool(worker-e16as-westeurope1-hpz2t), az.VMSet.GetPrimaryInterface.Get(worker-e16as-westeurope1-hpz2t, ), err=instance not found panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x60 pc=0x33d21f6]goroutine 240642 [running]: k8s.io/legacy-cloud-providers/azure.(*availabilitySet).EnsureHostInPool(0xc000016580, 0xc0262fb400, {0xc02d8a5080, 0x32}, {0xc021c1bc70, 0xc4}, {0x0, 0x0}, 0xa8?) vendor/k8s.io/legacy-cloud-providers/azure/azure_standard.go:831 +0x4f6 k8s.io/legacy-cloud-providers/azure.(*availabilitySet).EnsureHostsInPool.func2() vendor/k8s.io/legacy-cloud-providers/azure/azure_standard.go:928 +0x5f k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines.func1(0xc0159d0788?) {code} Version-Release number of selected component (if applicable): {code:none} 4.12.48 {code} (ships [https://github.com/openshift/kubernetes/commit/6df21776c7879727ab53895df8a03e53fb725d74]) issue introduced by [https://github.com/kubernetes/kubernetes/pull/111428/files#diff-0414c3aba906b2c0cdb2f09da32bd45c6bf1df71cbb2fc55950743c99a4a5fe4] How reproducible: {code:java} was unable to reproduce, happens occasionally {code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} panic{code} Expected results: {code:none} no panic{code} Additional info: {code:none} internal case 03772590{code} Status: ASSIGNED | |||
#OCPBUGS-20173 | issue | 2 days ago | The console handler panics on baremetal 4.14.0-rc.0 ipv6 sno cluster Verified |
Issue 15542650: The console handler panics on baremetal 4.14.0-rc.0 ipv6 sno cluster Description: This is a clone of issue OCPBUGS-19367. The following is the description of the original issue: --- Description of problem: baremetal 4.14.0-rc.0 ipv6 sno cluster, login as admin user to admin console, there is not Observe menu on the left navigation bar, see picture, [https://drive.google.com/file/d/13RAXPxtKhAElN9xf8bAmLJa0GI8pP0fH/view?usp=sharing,] monitoring-plugin status is Failed, see: [https://drive.google.com/file/d/1YsSaGdLT4bMn-6E-WyFWbOpwvDY4t6na/view?usp=sharing,] error is {code:java} Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/ r: Bad Gateway {code} checked console logs, 9443: connect: connection refused {code:java} $ oc -n openshift-console logs console-6869f8f4f4-56mbj ... E0915 12:50:15.498589 1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused 2023/09/15 12:50:15 http: panic serving [fd01:0:0:1::2]:39156: runtime error: invalid memory address or nil pointer dereference goroutine 183760 [running]: net/http.(*conn).serve.func1() /usr/lib/golang/src/net/http/server.go:1854 +0xbf panic({0x3259140, 0x4fcc150}) /usr/lib/golang/src/runtime/panic.go:890 +0x263 github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0003b5760, 0x2?, {0xc0009bc7d1, 0x11}, {0x3a41fa0, 0xc0002f6c40}, 0xb?) /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582 github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xaa00000000000010?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500) /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0002f6c40?}, 0x7?) /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33 net/http.HandlerFunc.ServeHTTP(...) /usr/lib/golang/src/net/http/server.go:2122 github.com/openshift/console/pkg/server.authMiddleware.func1(0xc0001f7500?, {0x3a41fa0?, 0xc0002f6c40?}, 0xd?) /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31 github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500) /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c net/http.HandlerFunc.ServeHTTP(0x5120938?, {0x3a41fa0?, 0xc0002f6c40?}, 0x7ffb6ea27f18?) /usr/lib/golang/src/net/http/server.go:2122 +0x2f net/http.StripPrefix.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400) /usr/lib/golang/src/net/http/server.go:2165 +0x332 net/http.HandlerFunc.ServeHTTP(0xc001102c00?, {0x3a41fa0?, 0xc0002f6c40?}, 0xc000655a00?) /usr/lib/golang/src/net/http/server.go:2122 +0x2f net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400) /usr/lib/golang/src/net/http/server.go:2500 +0x149 github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0002f6c40}, 0x3305040?) /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0002f6c40?}, 0x11db52e?) /usr/lib/golang/src/net/http/server.go:2122 +0x2f net/http.serverHandler.ServeHTTP({0xc0008201e0?}, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400) /usr/lib/golang/src/net/http/server.go:2936 +0x316 net/http.(*conn).serve(0xc0009b4120, {0x3a43e70, 0xc001223500}) /usr/lib/golang/src/net/http/server.go:1995 +0x612 created by net/http.(*Server).Serve /usr/lib/golang/src/net/http/server.go:3089 +0x5ed I0915 12:50:24.267777 1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data. I0915 12:50:24.267813 1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data. E0915 12:50:30.155515 1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused 2023/09/15 12:50:30 http: panic serving [fd01:0:0:1::2]:42990: runtime error: invalid memory address or nil pointer dereference {code} 9443 port is Connection refused {code:java} $ oc -n openshift-monitoring get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES alertmanager-main-0 6/6 Running 6 3d22h fd01:0:0:1::564 sno-2 <none> <none> cluster-monitoring-operator-6cb777d488-nnpmx 1/1 Running 4 7d16h fd01:0:0:1::12 sno-2 <none> <none> kube-state-metrics-dc5f769bc-p97m7 3/3 Running 12 7d16h fd01:0:0:1::3b sno-2 <none> <none> monitoring-plugin-85bfb98485-d4g5x 1/1 Running 4 7d16h fd01:0:0:1::55 sno-2 <none> <none> node-exporter-ndnnj 2/2 Running 8 7d16h 2620:52:0:165::41 sno-2 <none> <none> openshift-state-metrics-78df59b4d5-j6r5s 3/3 Running 12 7d16h fd01:0:0:1::3a sno-2 <none> <none> prometheus-adapter-6f86f7d8f5-ttflf 1/1 Running 0 4h23m fd01:0:0:1::b10c sno-2 <none> <none> prometheus-k8s-0 6/6 Running 6 3d22h fd01:0:0:1::566 sno-2 <none> <none> prometheus-operator-7c94855989-csts2 2/2 Running 8 7d16h fd01:0:0:1::39 sno-2 <none> <none> prometheus-operator-admission-webhook-7bb64b88cd-bvq8m 1/1 Running 4 7d16h fd01:0:0:1::37 sno-2 <none> <none> thanos-querier-5bbb764599-vlztq 6/6 Running 6 3d22h fd01:0:0:1::56a sno-2 <none> <none> $ oc -n openshift-monitoring get svc monitoring-plugin NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE monitoring-plugin ClusterIP fd02::f735 <none> 9443/TCP 7d16h $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq * Trying fd02::f735... * TCP_NODELAY set * connect to fd02::f735 port 9443 failed: Connection refused * Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused * Closing connection 0 curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused command terminated with exit code 7 {code} no such issue in other 4.14.0-rc.0 ipv4 cluster, but issue reproduced on other 4.14.0-rc.0 ipv6 cluster. 4.14.0-rc.0 ipv4 cluster, {code:none} $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-rc.0 True False 20m Cluster version is 4.14.0-rc.0 $ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin monitoring-plugin-85bfb98485-nh428 1/1 Running 0 4m 10.128.0.107 ci-ln-pby4bj2-72292-l5q8v-master-0 <none> <none> $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq ... { "name": "monitoring-plugin", "version": "1.0.0", "displayName": "OpenShift console monitoring plugin", "description": "This plugin adds the monitoring UI to the OpenShift web console", "dependencies": { "@console/pluginAPI": "*" }, "extensions": [ { "type": "console.page/route", "properties": { "exact": true, "path": "/monitoring", "component": { "$codeRef": "MonitoringUI" } } }, ...{code} meet issue "9443: Connection refused" in 4.14.0-rc.0 ipv6 cluster(launched cluster-bot cluster: launch 4.14.0-rc.0 metal,ipv6) and login console {code:java} $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-rc.0 True False 44m Cluster version is 4.14.0-rc.0 $ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin monitoring-plugin-bd6ffdb5d-b5csk 1/1 Running 0 53m fd01:0:0:4::b worker-0.ostest.test.metalkube.org <none> <none> monitoring-plugin-bd6ffdb5d-vhtpf 1/1 Running 0 53m fd01:0:0:5::9 worker-2.ostest.test.metalkube.org <none> <none> $ oc -n openshift-monitoring get svc monitoring-plugin NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE monitoring-plugin ClusterIP fd02::402d <none> 9443/TCP 59m $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq * Trying fd02::402d... * TCP_NODELAY set * connect to fd02::402d port 9443 failed: Connection refused * Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused * Closing connection 0 curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused command terminated with exit code 7$ oc -n openshift-console get pod | grep console console-5cffbc7964-7ljft 1/1 Running 0 56m console-5cffbc7964-d864q 1/1 Running 0 56m$ oc -n openshift-console logs console-5cffbc7964-7ljft ... E0916 14:34:16.330117 1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::402d]:9443: connect: connection refused 2023/09/16 14:34:16 http: panic serving [fd01:0:0:4::2]:37680: runtime error: invalid memory address or nil pointer dereference goroutine 3985 [running]: net/http.(*conn).serve.func1() /usr/lib/golang/src/net/http/server.go:1854 +0xbf panic({0x3259140, 0x4fcc150}) /usr/lib/golang/src/runtime/panic.go:890 +0x263 github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0008f6780, 0x2?, {0xc000665211, 0x11}, {0x3a41fa0, 0xc0009221c0}, 0xb?) /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582 github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xfe00000000000010?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d600) /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0009221c0?}, 0x7?) /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33 net/http.HandlerFunc.ServeHTTP(...) /usr/lib/golang/src/net/http/server.go:2122 github.com/openshift/console/pkg/server.authMiddleware.func1(0xc000d8d600?, {0x3a41fa0?, 0xc0009221c0?}, 0xd?) /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31 github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d600) /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c net/http.HandlerFunc.ServeHTTP(0xc000653830?, {0x3a41fa0?, 0xc0009221c0?}, 0x7f824506bf18?) /usr/lib/golang/src/net/http/server.go:2122 +0x2f net/http.StripPrefix.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d500) /usr/lib/golang/src/net/http/server.go:2165 +0x332 net/http.HandlerFunc.ServeHTTP(0xc00007e800?, {0x3a41fa0?, 0xc0009221c0?}, 0xc000b2da00?) /usr/lib/golang/src/net/http/server.go:2122 +0x2f net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500) /usr/lib/golang/src/net/http/server.go:2500 +0x149 github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0009221c0}, 0x3305040?) /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0009221c0?}, 0x11db52e?) /usr/lib/golang/src/net/http/server.go:2122 +0x2f net/http.serverHandler.ServeHTTP({0xc000db9b00?}, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500) /usr/lib/golang/src/net/http/server.go:2936 +0x316 net/http.(*conn).serve(0xc000653680, {0x3a43e70, 0xc000676f30}) /usr/lib/golang/src/net/http/server.go:1995 +0x612 created by net/http.(*Server).Serve /usr/lib/golang/src/net/http/server.go:3089 +0x5ed {code} Version-Release number of selected component (if applicable): {code:none} baremetal 4.14.0-rc.0 ipv6 sno cluster, $ token=`oc create token prometheus-k8s -n openshift-monitoring` $ $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=virt_platform' | jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "virt_platform", "baseboard_manufacturer": "Dell Inc.", "baseboard_product_name": "01J4WF", "bios_vendor": "Dell Inc.", "bios_version": "1.10.2", "container": "kube-rbac-proxy", "endpoint": "https", "instance": "sno-2", "job": "node-exporter", "namespace": "openshift-monitoring", "pod": "node-exporter-ndnnj", "prometheus": "openshift-monitoring/k8s", "service": "node-exporter", "system_manufacturer": "Dell Inc.", "system_product_name": "PowerEdge R750", "system_version": "Not Specified", "type": "none" }, "value": [ 1694785092.664, "1" ] } ] } }{code} How reproducible: {code:none} only seen on this cluster{code} Steps to Reproduce: {code:none} 1. see the description 2. 3. {code} Actual results: {code:none} no Observe menu on admin console, monitoring-plugin is failed{code} Expected results: {code:none} no error{code} Status: Verified | |||
#OCPBUGS-32450 | issue | 2 days ago | Azure upgrades to 4.14.15+ fail with UPI storage account CLOSED |
Issue 15953361: Azure upgrades to 4.14.15+ fail with UPI storage account Description: This is a clone of issue OCPBUGS-32396. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-32328. The following is the description of the original issue: --- Description of problem: {code:none} Cluster with user provisioned image registry storage accounts fails to upgrade to 4.14.20 due to image-registry-operator being degraded. message: "Progressing: The registry is ready\nNodeCADaemonProgressing: The daemon set node-ca is deployed\nAzurePathFixProgressing: Migration failed: panic: AZURE_CLIENT_ID is required for authentication\nAzurePathFixProgressing: \nAzurePathFixProgressing: goroutine 1 [running]:\nAzurePathFixProgressing: main.main()\nAzurePathFixProgressing: \t/go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:25 +0x15c\nAzurePathFixProgressing: " cmd/move-blobs was introduced due to https://issues.redhat.com/browse/OCPBUGS-29003. {code} Version-Release number of selected component (if applicable): {code:none} 4.14.15+{code} How reproducible: {code:none} I have not reproduced myself but I imagine you would hit this every time when upgrading from 4.13->4.14.15+ with Azure UPI image registry{code} Steps to Reproduce: {code:none} 1.Starting on version 4.13, Configuring the registry for Azure user-provisioned infrastructure - https://docs.openshift.com/container-platform/4.14/registry/configuring_registry_storage/configuring-registry-storage-azure-user-infrastructure.html. 2. Upgrade to 4.14.15+ 3. {code} Actual results: {code:none} Upgrade does not complete succesfully $ oc get co .... image-registry 4.14.20 True False True 617d AzurePathFixControllerDegraded: Migration failed: panic: AZURE_CLIENT_ID is required for authentication... $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.38 True True 7h41m Unable to apply 4.14.20: wait has exceeded 40 minutes for these operators: image-registry{code} Expected results: {code:none} Upgrade to complete successfully{code} Additional info: {code:none} {code} Status: CLOSED | |||
#OCPBUGS-29233 | issue | 2 days ago | Internal Registry does not recognize the `ca-west-1` AWS Region Verified |
Issue 15792537: Internal Registry does not recognize the `ca-west-1` AWS Region Description: Description of problem: {code:none} Internal registry Pods will panic while deploying OCP on `ca-west-1` AWS Region{code} Version-Release number of selected component (if applicable): {code:none} 4.14.2 {code} How reproducible: {code:none} Every time {code} Steps to Reproduce: {code:none} 1. Deploy OCP on `ca-west-1` AWS Region {code} Actual results: {code:none} $ oc logs image-registry-85b69cd9fc-b78sb -n openshift-image-registry time="2024-02-08T11:43:09.287006584Z" level=info msg="start registry" distribution_version=v3.0.0+unknown go.version="go1.20.10 X:strictfipsruntime" openshift_version=4.14.0-202311021650.p0.g5e7788a.assembly.stream-5e7788a time="2024-02-08T11:43:09.287365337Z" level=info msg="caching project quota objects with TTL 1m0s" go.version="go1.20.10 X:strictfipsruntime" panic: invalid region provided: ca-west-1goroutine 1 [running]: github.com/distribution/distribution/v3/registry/handlers.NewApp({0x2873f40?, 0xc00005c088?}, 0xc000581800) /go/src/github.com/openshift/image-registry/vendor/github.com/distribution/distribution/v3/registry/handlers/app.go:130 +0x2bf1 github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp({0x2873f40, 0xc00005c088}, 0x0?, {0x2876820?, 0xc000676cf0}) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0xb9 github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp({0x2873f40?, 0xc00005c088}, {0x285ffd0?, 0xc000916070}, 0xc000581800, 0xc00095c000, {0x0?, 0x0}) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x485 github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer({0x2873f40, 0xc00005c088}, 0xc000581800, 0xc00095c000) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:212 +0x38a github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute({0x2858b60, 0xc000916000}) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:166 +0x86b main.main() /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x496 {code} Expected results: {code:none} The internal registry is deployed with no issues {code} Additional info: {code:none} This is a new AWS Region we are adding support to. The support will be backported to 4.14.z {code} Status: Verified | |||
#OCPBUGS-32519 | issue | 31 hours ago | Agent appliance installs are broken ON_QA |
level=debug msg=Loading Agent Hosts... panic: interface conversion: asset.Asset is nil, not *agentconfig.AgentHosts goroutine 1 [running]: | |||
#OCPBUGS-33129 | issue | 4 days ago | Panic when we remove an OCB infra MCP and we try to create new ones with different names New |
Issue 15976180: Panic when we remove an OCB infra MCP and we try to create new ones with different names Description: Description of problem: {code:none} Given that we create a new pool, and we enable OCB in this pool, and we remove the pool and the MachineOSConfig resource, and we create another new pool to enable OCB again, then the controller pod panics. {code} Version-Release number of selected component (if applicable): {code:none} pre-merge https://github.com/openshift/machine-config-operator/pull/4327 {code} How reproducible: {code:none} Always {code} Steps to Reproduce: {code:none} 1. Create a new infra MCP apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} nodeSelector: matchLabels: node-role.kubernetes.io/infra: "" 2. Create a MachineOSConfig for infra pool oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: MachineOSConfig metadata: name: infra spec: machineConfigPool: name: infra buildInputs: imageBuilder: imageBuilderType: PodImageBuilder baseImagePullSecret: name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy") renderedImagePushSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest" EOF 3. When the build is finished, remove the MachineOSConfig and the pool oc delete machineosconfig infra oc delete mcp infra 4. Create a new infra1 pool apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra1 spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra1]} nodeSelector: matchLabels: node-role.kubernetes.io/infra1: "" 5. Create a new machineosconfig for infra1 pool oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: MachineOSConfig metadata: name: infra1 spec: machineConfigPool: name: infra1 buildInputs: imageBuilder: imageBuilderType: PodImageBuilder baseImagePullSecret: name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy") renderedImagePushSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest" containerFile: - containerfileArch: noarch content: |- RUN echo 'test image' > /etc/test-image.file EOF {code} Actual results: {code:none} The MCO controller pod panics (in updateMachineOSBuild): E0430 11:21:03.779078 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 265 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00035e000?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x3547bc0?, 0x53ebb20?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?) <autogenerated>:1 +0x9 k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25 k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74 k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc0007097a0, 0x0, 0x0?) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33 github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772 github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateMachineOSBuild(0xc0007097a0, {0xc001c37800?, 0xc000029678?}, {0x3904000?, 0xc0028361a0}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:395 +0xd1 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:246 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:970 +0xea k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e5738?, {0x3de6020, 0xc0008fe780}, 0x1, 0xc0000ac720) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x6974616761706f72?, 0x3b9aca00, 0x0, 0x69?, 0xc0005e5788?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 k8s.io/client-go/tools/cache.(*processorListener).run(0xc000b97c20) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 248 /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9] When the controller pod is restarted, it panics again, but in a different function (addMachineOSBuild): E0430 11:26:54.753689 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 97 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x15555555aa?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x3547bc0?, 0x53ebb20?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?) <autogenerated>:1 +0x9 k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25 k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74 k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc000899560, 0x0, 0x0?) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33 github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772 github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addMachineOSBuild(0xc000899560, {0x3904000?, 0xc0006a8b60}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:386 +0xc5 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:239 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:972 +0x13e k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00066bf38?, {0x3de6020, 0xc0008f8b40}, 0x1, 0xc000c2ea20) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc00066bf88?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 k8s.io/client-go/tools/cache.(*processorListener).run(0xc000ba6240) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 43 /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9] {code} Expected results: {code:none} No panic should happen. Errors should be controlled. {code} Additional info: {code:none} In order to recover from this panic, we need to manually delete the MachineOSBuild resources that are related to the pool that does not exist anymore.{code} Status: New | |||
#OCPBUGS-15573 | issue | 5 days ago | 4.14 Regressions in periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-ovn-remote-libvirt-ppc64le - Node Enters NotReady State Verified |
[ 823.897877] [ 824.897947] Kernel panic - not syncing: Fatal exception [root@C155F2U31 ~]# virsh list | |||
#OCPBUGS-29983 | issue | 2 weeks ago | image registry operator displays panic in status from move-blobs command Verified |
Issue 15838921: image registry operator displays panic in status from move-blobs command Description: This is a clone of issue OCPBUGS-29932. The following is the description of the original issue: --- Description of problem: {code:none} Sample job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-qe-ocp-qe-perfscale-ci-main-azure-4.15-nightly-x86-data-path-9nodes/1760228008968327168{code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} Anytime there is an error from the move-blobs command{code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} An error message is shown{code} Expected results: {code:none} A panic is shown followed by the error message{code} Additional info: {code:none} {code} Status: Verified | |||
#OCPBUGS-32112 | issue | 2 weeks ago | Invalid memory address or nil pointer dereference in Cloud Network Config Controller Verified |
Issue 15935943: Invalid memory address or nil pointer dereference in Cloud Network Config Controller Description: This is a clone of issue OCPBUGS-31754. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-27422. The following is the description of the original issue: --- Description of problem: {code:none} Invalid memory address or nil pointer dereference in Cloud Network Config Controller {code} Version-Release number of selected component (if applicable): {code:none} 4.12{code} How reproducible: {code:none} sometimes{code} Steps to Reproduce: {code:none} 1. Happens by itself sometimes 2. 3. {code} Actual results: {code:none} Panic and pod restarts{code} Expected results: {code:none} Panics due to Invalid memory address or nil pointer dereference should not occur{code} Additional info: {code:none} E0118 07:54:18.703891 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 93 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x203c8c0?, 0x3a27b20}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x203c8c0, 0x3a27b20}) /usr/lib/golang/src/runtime/panic.go:884 +0x212 github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0) /go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0 github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25 created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1a40b30] goroutine 93 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7 panic({0x203c8c0, 0x3a27b20}) /usr/lib/golang/src/runtime/panic.go:884 +0x212 github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0) /go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0 github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25 created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa{code} Status: Verified | |||
#OCPBUGS-31754 | issue | 9 days ago | Invalid memory address or nil pointer dereference in Cloud Network Config Controller CLOSED |
Issue 15921462: Invalid memory address or nil pointer dereference in Cloud Network Config Controller Description: This is a clone of issue OCPBUGS-27422. The following is the description of the original issue: --- Description of problem: {code:none} Invalid memory address or nil pointer dereference in Cloud Network Config Controller {code} Version-Release number of selected component (if applicable): {code:none} 4.12{code} How reproducible: {code:none} sometimes{code} Steps to Reproduce: {code:none} 1. Happens by itself sometimes 2. 3. {code} Actual results: {code:none} Panic and pod restarts{code} Expected results: {code:none} Panics due to Invalid memory address or nil pointer dereference should not occur{code} Additional info: {code:none} E0118 07:54:18.703891 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 93 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x203c8c0?, 0x3a27b20}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x203c8c0, 0x3a27b20}) /usr/lib/golang/src/runtime/panic.go:884 +0x212 github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0) /go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0 github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25 created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1a40b30] goroutine 93 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7 panic({0x203c8c0, 0x3a27b20}) /usr/lib/golang/src/runtime/panic.go:884 +0x212 github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0) /go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0 github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25 created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa{code} Status: CLOSED | |||
#OCPBUGS-29819 | issue | 7 weeks ago | MCO controller crashes when it tries to update the coreos-bootimage and the machineset has an invalid architecture Verified |
Issue 15831797: MCO controller crashes when it tries to update the coreos-bootimage and the machineset has an invalid architecture Description: Description of problem:{code:none} When, in an IPI on GCP cluster, a machineset is labeled with an invalid architecture and the coreos-bootimage is updated in any machineset, the MCO controller pod fails in an uncontrolled way and panics. {code} Version-Release number of selected component (if applicable):{code:none} $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-0.nightly-2024-02-17-094036 True False 71m Cluster version is 4.16.0-0.nightly-2024-02-17-094036 {code} How reproducible:{code:none} Always {code} Steps to Reproduce:{code:none} 1. Enable the TechPreview oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}' 2. Wait for all MCP to be updated 3. Edit a machineset and use an invalid architecture in its labels apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64-FAKE-INVALID < --- EDIT THIS machine.openshift.io/GPU: "0" machine.openshift.io/memoryMb: "16384" 4. Patch any machineset with a new boot image $ oc -n openshift-machine-api patch machineset.machine $(oc -n openshift-machine-api get machineset.machine -ojsonpath='{.items[0].metadata.name}') --type json -p '[{"op": "add", "path": "/spec/template/spec/providerSpec/value/disks/0/image", "value": "fake-image"}]' {code} Actual results:{code:none} The MCO controller panics I0222 09:05:50.862882 1 template_controller.go:132] Re-syncing ControllerConfig due to secret pull-secret change I0222 09:12:29.550488 1 machine_set_boot_image_controller.go:254] MachineSet sergidor-1-v4ccj-worker-a updated, reconciling all machinesets I0222 09:12:29.550919 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-a on GCP, with arch x86_64 I0222 09:12:29.552171 1 machine_set_boot_image_controller.go:572] New target boot image: projects/rhcos-cloud/global/images/rhcos-416-94-202402130130-0-gcp-x86-64 I0222 09:12:29.552323 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-b on GCP, with arch x86_64 I0222 09:12:29.552341 1 machine_set_boot_image_controller.go:573] Current image: fake-image I0222 09:12:29.553694 1 machine_set_boot_image_controller.go:413] Patching machineset sergidor-1-v4ccj-worker-a I0222 09:12:29.553893 1 machine_set_boot_image_controller.go:416] No patching required for machineset sergidor-1-v4ccj-worker-b I0222 09:12:29.553920 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-c on GCP, with arch x86_64 I0222 09:12:29.555104 1 machine_set_boot_image_controller.go:416] No patching required for machineset sergidor-1-v4ccj-worker-c I0222 09:12:29.555164 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-f on GCP, with arch amd64-FAKE-INVALID E0222 09:12:29.556282 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 356 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x34dadc0?, 0x5522aa0}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b4a640?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x34dadc0?, 0x5522aa0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc000f44a00, 0xc000b8c4c3?, {0xc000b8c4c3, 0x12}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:564 +0x1cd github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.checkMachineSet(0x0?, 0x38f77bf?, 0x7?, {0xc000b8c4c3?, 0x132a3f8?}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:479 +0x85 github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).syncMachineSet(0xc000344000, {0xc001bc5800, 0x2f}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:406 +0x60c github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).processNextWorkItem(0xc000344000) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:194 +0xcf github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).worker(...) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:183 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3d4d460, 0xc000b2b710}, 0x1, 0xc0006b21e0) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e created by github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).Run in goroutine 339 /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:174 +0x205 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x315836d] goroutine 356 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b4a640?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd panic({0x34dadc0?, 0x5522aa0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc000f44a00, 0xc000b8c4c3?, {0xc000b8c4c3, 0x12}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:564 +0x1cd github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.checkMachineSet(0x0?, 0x38f77bf?, 0x7?, {0xc000b8c4c3?, 0x132a3f8?}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:479 +0x85 github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).syncMachineSet(0xc000344000, {0xc001bc5800, 0x2f}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:406 +0x60c github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).processNextWorkItem(0xc000344000) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:194 +0xcf github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).worker(...) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:183 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3d4d460, 0xc000b2b710}, 0x1, 0xc0006b21e0) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e created by github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).Run in goroutine 339 /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:174 +0x205 The it is created a new controller, this controller will wait 5 minutes to get the leader and will panic again. {code} Expected results:{code:none} The MCO controller should fail in a controlled way. {code} Additional info:{code:none} {code} Status: Verified Comment 24216375 by Sergio Regidor de la Rosa at 2024-02-22T14:07:26.703+0000 This patch command will cause the MCO controller to panic as well {noformat} E0222 14:03:56.958310 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 341 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x34db380?, 0x5522aa0}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x34db380?, 0x5522aa0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc0018e7b80, 0xc001a12b6b?, {0x38f6118, 0x6}) | |||
#OCPBUGS-25339 | issue | 3 months ago | OLM pod panics when EnsureSecretOwnershipAnnotations runs CLOSED |
Issue 15676416: OLM pod panics when EnsureSecretOwnershipAnnotations runs Description: Description of problem: {code:none} {code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} {code} Steps to Reproduce: {code:none} 1. Run OLM on 4.15 cluster 2. 3. {code} Actual results: {code:none} OLM pod will panic{code} Expected results: {code:none} Should run just fine{code} Additional info: {code:none} This issue is due to failure of initiate a new map if nil{code} Status: CLOSED 3, Restart the olm-operator pod. Got the panic: assignment to entry in nil map. time="2023-12-14T01:24:40Z" level=info msg="monitoring the following components [operator-lifecycle-manager-packageserver]" monitor=clusteroperator panic: assignment to entry in nil map | |||
#OCPBUGS-29928 | issue | 2 months ago | origin needs workaround for ROSA's infra labels MODIFIED |
Issue 15836167: origin needs workaround for ROSA's infra labels Description: This is a clone of issue OCPBUGS-29858. The following is the description of the original issue: --- The convention is a format like {{{}node-role.kubernetes.io/role: ""{}}}, not {{{}node-role.kubernetes.io: role{}}}, however ROSA uses the latter format to indicate the {{infra}} role. This changes the node watch code to ignore it, as well as other potential variations like {{{}node-role.kubernetes.io/{}}}. The current code panics when run against a ROSA cluster: {{ E0209 18:10:55.533265 78 runtime.go:79] Observed a panic: runtime.boundsError\{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23]) goroutine 233 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(\{0x7a71840?, 0xc0018e2f48}) k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(\{0x0, 0x0, 0x1000251f9fe?}) k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:49 +0x75 panic(\{0x7a71840, 0xc0018e2f48}) runtime/panic.go:884 +0x213 github.com/openshift/origin/pkg/monitortests/node/watchnodes.nodeRoles(0x7ecd7b3?) github.com/openshift/origin/pkg/monitortests/node/watchnodes/node.go:187 +0x1e5 github.com/openshift/origin/pkg/monitortests/node/watchnodes.startNodeMonitoring.func1(0}} Status: MODIFIED | |||
#OCPBUGS-33186 | issue | 2 days ago | Sriov network operator pod is crashing due to panic: runtime error: invalid memory address or nil pointer dereference POST |
Issue 15979967: Sriov network operator pod is crashing due to panic: runtime error: invalid memory address or nil pointer dereference Description: This is a clone of issue OCPBUGS-31779. The following is the description of the original issue: --- Description of problem: {code:none} The SRIOV network operator pod is crashing and going into Crash Loop Back off state. Upon Checking the pod logs panic error messages can be seen : 2024-04-03T17:48:33.008379552Z panic: runtime error: invalid memory address or nil pointer dereference 2024-04-03T17:48:33.008379552Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x176c03f]{code} {code:none} Also on checking the Sriov Operator Config everything looks fine : apiVersion: v1 items: - apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: creationTimestamp: "2024-03-26T18:36:10Z" generation: 2 name: default namespace: openshift-sriov-network-operator resourceVersion: "18324322" uid: 61f07940-ac92-437d-bacb-41fb1224cc34 spec: disableDrain: true enableInjector: true enableOperatorWebhook: true logLevel: 2 {code} Version-Release number of selected component (if applicable): {code:none} {code} How reproducible: {code:none} {code} Steps to Reproduce: {code:none} 1. 2. 3. {code} Actual results: {code:none} {code} Expected results: {code:none} {code} Additional info: {code:none} {code} Status: POST | |||
#OCPBUGS-9193 | issue | 12 months ago | fatal error: concurrent map iteration and map write CLOSED |
Issue 15134151: fatal error: concurrent map iteration and map write Description: Description of problem: fatal error: concurrent map iteration and map write Version-Release number of the following components: ClusterVersionOperator 4.10.0-202203242126.p0.gf0fb0fa.assembly.stream-f0fb0fa From a CI run for Hypershift, though I don't think there is anything Hypershift specific about this error How reproducible: Rare Steps to Reproduce: 1. Unknown 2. 3. Actual results: fatal error: concurrent map iteration and map write goroutine 5622 [running]: runtime.throw({0x1b54b74, 0x1a3a5c0}) /usr/lib/golang/src/runtime/panic.go:1198 +0x71 fp=0xc002c24850 sp=0xc002c24820 pc=0x43a611 runtime.mapiternext(0xc002c24c50) /usr/lib/golang/src/runtime/map.go:858 +0x4eb fp=0xc002c248c0 sp=0xc002c24850 pc=0x41596b github.com/openshift/cluster-version-operator/pkg/cvo.(*operatorMetrics).Collect(0xc00013e910, 0xc0003a0ae0) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/metrics.go:493 +0x87f fp=0xc002c24f30 sp=0xc002c248c0 pc=0x16e9f5f github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1() /go/src/github.com/openshift/cluster-version-operator/vendor/github.com/prometheus/client_golang/prometheus/registry.go:446 +0x102 fp=0xc002c24fe0 sp=0xc002c24f30 pc=0x1209302 runtime.goexit() /usr/lib/golang/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc002c24fe8 sp=0xc002c24fe0 pc=0x46aac1 created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather /go/src/github.com/openshift/cluster-version-operator/vendor/github.com/prometheus/client_golang/prometheus/registry.go:538 +0xb4d https://github.com/openshift/cluster-version-operator/blob/master/pkg/cvo/metrics.go#L493 Looks like concurrent read/write on the conditions map between metrics and some other path. Attaching full log. Status: CLOSED | |||
#OCPBUGS-25785 | issue | 2 days ago | OCCM panics if Octavia returns error while polling lb status ON_QA |
Issue 15688847: OCCM panics if Octavia returns error while polling lb status Description: Description of problem:{code:none} During an e2e test, Prow detected a panic in openstack-cloud-controller-manager: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cloud-provider-openstack/257/pull-ci-openshift-cloud-provider-openstack-master-e2e-openstack/1737497317621108736 {code} Version-Release number of selected component (if applicable):{code:none} {code} How reproducible:{code:none} {code} Steps to Reproduce:{code:none} 1. 2. 3. {code} Actual results:{code:none} {code} Expected results:{code:none} {code} Additional info:{code:none} The panic seems to happen while "Waiting for load balancer ACTIVE". {code} Status: ON_QA Comment 23694041 by Pierre Prinetti at 2023-12-21T09:57:33.658+0000 The panic happens when the cloud provider fails to properly respond to the GET request we make in the status-polling function. Comment 24421180 by Pierre Prinetti at 2024-03-26T14:19:02.540+0000 Comment 24657690 by Pierre Prinetti at 2024-05-02T08:51:27.564+0000 you'd need Octavia to respond with an error code upon LB creation, and observe no panic on the CPO side | |||
#OCPBUGS-33088 | issue | 2 days ago | openshift-controller-manager pod panic due to type assertion New |
Issue 15973055: openshift-controller-manager pod panic due to type assertion Description: Caught by the test: Undiagnosed panic detected in pod Sample job run: [https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-upgrade/1783981854974545920] Error message {code} { pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:02.367266 1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret) pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:03.368403 1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret) pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:04.370157 1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)} {code} [Sippy indicates|https://sippy.dptools.openshift.org/sippy-ng/tests/4.16/analysis?test=Undiagnosed%20panic%20detected%20in%20pod&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22Undiagnosed%20panic%20detected%20in%20pod%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D] it's happening a small percentage of the time since around Apr 25th. Took out the last payload so labeling trt-incident for now. Status: POST | |||
#OCPBUGS-31306 | issue | 3 days ago | Azure-Disk CSI Driver node pod CrashLoopBackOff in Azure Stack Verified |
Issue 15895049: Azure-Disk CSI Driver node pod CrashLoopBackOff in Azure Stack Description: Description of problem: {code:none} In Azure Stack, the Azure-Disk CSI Driver node pod CrashLoopBackOff: openshift-cluster-csi-drivers azure-disk-csi-driver-node-57rxv 1/3 CrashLoopBackOff 33 (3m55s ago) 59m 10.0.1.5 ci-op-q8b6n4iv-904ed-kp5mv-worker-mtcazs-m62cj <none> <none> openshift-cluster-csi-drivers azure-disk-csi-driver-node-8wvqm 1/3 CrashLoopBackOff 35 (29s ago) 67m 10.0.0.6 ci-op-q8b6n4iv-904ed-kp5mv-master-1 <none> <none> openshift-cluster-csi-drivers azure-disk-csi-driver-node-97ww5 1/3 CrashLoopBackOff 33 (12s ago) 67m 10.0.0.7 ci-op-q8b6n4iv-904ed-kp5mv-master-2 <none> <none> openshift-cluster-csi-drivers azure-disk-csi-driver-node-9hzw9 1/3 CrashLoopBackOff 35 (108s ago) 59m 10.0.1.4 ci-op-q8b6n4iv-904ed-kp5mv-worker-mtcazs-gjqmw <none> <none> openshift-cluster-csi-drivers azure-disk-csi-driver-node-glgzr 1/3 CrashLoopBackOff 34 (69s ago) 67m 10.0.0.8 ci-op-q8b6n4iv-904ed-kp5mv-master-0 <none> <none> openshift-cluster-csi-drivers azure-disk-csi-driver-node-hktfb 2/3 CrashLoopBackOff 48 (63s ago) 60m 10.0.1.6 ci-op-q8b6n4iv-904ed-kp5mv-worker-mtcazs-kdbpf <none> <none>{code} {code:none} The CSI-Driver container log: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0xc8 pc=0x18ff5db] goroutine 228 [running]: sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).GetZone(0xc00021ec00, {0xc0002d57d0?, 0xc00005e3e0?}) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_zones.go:182 +0x2db sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).NodeGetInfo(0xc000144000, {0x21ebbf0, 0xc0002d5470}, 0x273606a?) /go/src/github.com/openshift/azure-disk-csi-driver/pkg/azuredisk/nodeserver.go:336 +0x13b github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler.func1({0x21ebbf0, 0xc0002d5470}, {0x1d71a60?, 0xc0003b0320}) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:7160 +0x72 sigs.k8s.io/azuredisk-csi-driver/pkg/csi-common.logGRPC({0x21ebbf0, 0xc0002d5470}, {0x1d71a60?, 0xc0003b0320?}, 0xc0003b0340, 0xc00050ae10) /go/src/github.com/openshift/azure-disk-csi-driver/pkg/csi-common/utils.go:80 +0x409 github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler({0x1ec2f40?, 0xc000144000}, {0x21ebbf0, 0xc0002d5470}, 0xc000054680, 0x20167a0) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:7162 +0x135 google.golang.org/grpc.(*Server).processUnaryRPC(0xc000530000, {0x21ebbf0, 0xc0002d53b0}, {0x21f5f40, 0xc00057b1e0}, 0xc00011cb40, 0xc00052c810, 0x30fa1c8, 0x0) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1343 +0xe03 google.golang.org/grpc.(*Server).handleStream(0xc000530000, {0x21f5f40, 0xc00057b1e0}, 0xc00011cb40) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1737 +0xc4c google.golang.org/grpc.(*Server).serveStreams.func1.1() /go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:986 +0x86 created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 260 /go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:997 +0x145 {code} {code:java} The registrar container log: E0321 23:08:02.679727 1 main.go:103] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = error reading from server: EOF, restarting registration container. {code} Version-Release number of selected component (if applicable): {code:none} 4.16.0-0.nightly-2024-03-21-152650 {code} How reproducible: {code:none} See it in CI profile, and manual install failed earlier.{code} Steps to Reproduce: {code:none} See Description {code} Actual results: {code:none} Azure-Disk CSI Driver node pod CrashLoopBackOff{code} Expected results: {code:none} Azure-Disk CSI Driver node pod should be running{code} Additional info: {code:none} See gather-extra and must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-azure-stack-ipi-proxy-fips-f2/1770921405509013504/artifacts/azure-stack-ipi-proxy-fips-f2/{code} Status: Verified | |||
#OCPBUGS-31621 | issue | 2 weeks ago | Autoscaler should scale from zero when taints do not have a "value" field CLOSED |
Issue 15913608: Autoscaler should scale from zero when taints do not have a "value" field Description: This is a clone of issue OCPBUGS-31464. The following is the description of the original issue: --- This is a clone of issue OCPBUGS-31421. The following is the description of the original issue: --- Description of problem:{code:none} When scaling from zero replicas, the cluster autoscaler can panic if there are taints on the machineset with no "value" field defined. {code} Version-Release number of selected component (if applicable):{code:none} 4.16/master {code} How reproducible:{code:none} always {code} Steps to Reproduce:{code:none} 1. create a machineset with a taint that has no value field and 0 replicas 2. enable the cluster autoscaler 3. force a workload to scale the tainted machineset {code} Actual results:{code:none} a panic like this is observed I0325 15:36:38.314276 1 clusterapi_provider.go:68] discovered node group: MachineSet/openshift-machine-api/k8hmbsmz-c2483-9dnddr4sjc (min: 0, max: 2, replicas: 0) panic: interface conversion: interface {} is nil, not string goroutine 79 [running]: k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredToTaint(...) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:246 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredScalableResource.Taints({0xc000103d40?, 0xc000121360?, 0xc002386f98?, 0x2?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:214 +0x8a5 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.(*nodegroup).TemplateNodeInfo(0xc002675930) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go:266 +0x2ea k8s.io/autoscaler/cluster-autoscaler/core/utils.GetNodeInfoFromTemplate({0x276b230, 0xc002675930}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60?, 0xc0023ffe90?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/utils/utils.go:41 +0x9d k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider.(*MixedTemplateNodeInfoProvider).Process(0xc00084f848, 0xc0023f7680, {0xc001dcdb00, 0x3, 0x0?}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60, 0xc0023ffe90}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go:155 +0x599 k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000617550, {0x4?, 0x0?, 0x3a56f60?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:352 +0xcaa main.run(0x0?, {0x2761b48, 0xc0004c04e0}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x2cd main.main.func2({0x0?, 0x0?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:617 +0x25 created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x105 {code} Expected results:{code:none} expect the machineset to scale up {code} Additional info: i think the e2e test that exercises this is only running on periodic jobs and as such we missed this error in OCPBUGS-27509 . [this search shows some failed results | https://search.dptools.openshift.org/?search=It+scales+from%2Fto+zero&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] Status: CLOSED Comment 24489693 by Zhaohua Sun at 2024-04-07T05:57:14.459+0000 Verified. clusterversion 4.14.0-0.nightly-arm64-2024-04-04-113917, autoscaler pod will not panic, machineset can scale up. {code:java} | |||
#OCPBUGS-28628 | issue | 2 weeks ago | [4.15] Panic: send on closed channel Verified |
Issue 15755249: [4.15] Panic: send on closed channel Description: This is a clone of issue OCPBUGS-27959. The following is the description of the original issue: --- In a CI run of etcd-operator-e2e I've found the following panic in the operator logs: {code:java} E0125 11:04:58.158222 1 health.go:135] health check for member (ip-10-0-85-12.us-west-2.compute.internal) failed: err(context deadline exceeded) panic: send on closed channel goroutine 15608 [running]: github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth.func1() github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:58 +0xd2 created by github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:54 +0x2a5 {code} which unfortunately is an incomplete log file. The operator recovered itself by restarting, we should fix the panic nonetheless. Job run for reference: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1186/pull-ci-openshift-cluster-etcd-operator-master-e2e-operator/1750466468031500288 Status: Verified Comment 24255245 by Thomas Jungblut at 2024-02-29T09:22:35.675+0000 also moving to verified, this is a rare panic that's difficult to reproduce for Ge Comment 24260537 by UNKNOWN at 2024-02-29T22:33:35.699+0000 | |||
#OCPBUGS-31464 | issue | 2 weeks ago | Autoscaler should scale from zero when taints do not have a "value" field CLOSED |
Issue 15905084: Autoscaler should scale from zero when taints do not have a "value" field Description: This is a clone of issue OCPBUGS-31421. The following is the description of the original issue: --- Description of problem:{code:none} When scaling from zero replicas, the cluster autoscaler can panic if there are taints on the machineset with no "value" field defined. {code} Version-Release number of selected component (if applicable):{code:none} 4.16/master {code} How reproducible:{code:none} always {code} Steps to Reproduce:{code:none} 1. create a machineset with a taint that has no value field and 0 replicas 2. enable the cluster autoscaler 3. force a workload to scale the tainted machineset {code} Actual results:{code:none} a panic like this is observed I0325 15:36:38.314276 1 clusterapi_provider.go:68] discovered node group: MachineSet/openshift-machine-api/k8hmbsmz-c2483-9dnddr4sjc (min: 0, max: 2, replicas: 0) panic: interface conversion: interface {} is nil, not string goroutine 79 [running]: k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredToTaint(...) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:246 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredScalableResource.Taints({0xc000103d40?, 0xc000121360?, 0xc002386f98?, 0x2?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:214 +0x8a5 k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.(*nodegroup).TemplateNodeInfo(0xc002675930) /go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go:266 +0x2ea k8s.io/autoscaler/cluster-autoscaler/core/utils.GetNodeInfoFromTemplate({0x276b230, 0xc002675930}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60?, 0xc0023ffe90?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/utils/utils.go:41 +0x9d k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider.(*MixedTemplateNodeInfoProvider).Process(0xc00084f848, 0xc0023f7680, {0xc001dcdb00, 0x3, 0x0?}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60, 0xc0023ffe90}, ...) /go/src/k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go:155 +0x599 k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000617550, {0x4?, 0x0?, 0x3a56f60?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:352 +0xcaa main.run(0x0?, {0x2761b48, 0xc0004c04e0}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x2cd main.main.func2({0x0?, 0x0?}) /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:617 +0x25 created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x105 {code} Expected results:{code:none} expect the machineset to scale up {code} Additional info: i think the e2e test that exercises this is only running on periodic jobs and as such we missed this error in OCPBUGS-27509 . [this search shows some failed results | https://search.dptools.openshift.org/?search=It+scales+from%2Fto+zero&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job] Status: CLOSED Comment 24464960 by Zhaohua Sun at 2024-04-03T06:33:01.775+0000 Verified. clusterversion 4.15.0-0.nightly-2024-04-02-181803, autoscaler pod will not panic, machineset can scale up. {code:java} | |||
#OCPBUGS-20085 | issue | 2 months ago | [IBMCloud] Unhandled response during destroy disks Verified |
Issue 15538566: [IBMCloud] Unhandled response during destroy disks Description: Description of problem: {code:none} During the destroy cluster operation, unexpected results from the IBM Cloud API calls for Disks can result in panics when response data (or responses) are missing, resulting in unexpected failures during destroy.{code} Version-Release number of selected component (if applicable): {code:none} 4.15{code} How reproducible: {code:none} Unknown, dependent on IBM Cloud API responses{code} Steps to Reproduce: {code:none} 1. Successfully create IPI cluster on IBM Cloud 2. Attempt to cleanup (destroy) the cluster {code} Actual results: {code:none} Golang panic attempting to parse a HTTP response that is missing or lacking data. level=info msg=Deleted instance "ci-op-97fkzvv2-e6ed7-5n5zg-master-0" E0918 18:03:44.787843 33 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 228 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x6a3d760?, 0x274b5790}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xfffffffe?}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x6a3d760, 0x274b5790}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:84 +0x12a github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).Retry(0xc000791ce0, 0xc000573700) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:99 +0x73 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion(0xc000791ce0, {{0xc00160c060, 0x29}, {0xc00160c090, 0x28}, {0xc0016141f4, 0x9}, {0x82b9f0d, 0x4}, {0xc00160c060, ...}}) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:78 +0x14f github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDisks(0xc000791ce0) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:118 +0x485 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:201 +0x3f k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x7f7801e503c8, 0x18}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:109 +0x1b k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x227a2f78?, 0xc00013c000?}, 0xc000a9b690?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:154 +0x57 k8s.io/apimachinery/pkg/util/wait.poll({0x227a2f78, 0xc00013c000}, 0xd0?, 0x146fea5?, 0x7f7801e503c8?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:245 +0x38 k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x227a2f78, 0xc00013c000}, 0x4136e7?, 0x28?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:229 +0x49 k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x100000000000000?, 0x806f00?) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:214 +0x46 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc000791ce0, {{0x82bb9a3?, 0xc000a9b7d0?}, 0xc000111de0?}, 0x840366?, 0xc00054e900?) /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:198 +0x108 created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xa87 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference{code} Expected results: {code:none} Destroy IBM Cloud Disks during cluster destroy, or provide a useful error message to follow up on.{code} Additional info: {code:none} The ability to reproduce is relatively low, as it requires the IBM Cloud API's to return specific data (or lack there of), which is currently unknown why the HTTP respoonse and/or data is missing. IBM Cloud already has a PR to attempt to mitigate this issue, like done with other destroy resource calls. Potentially followup for additional resources as necessary. https://github.com/openshift/installer/pull/7515{code} Status: Verified level=info msg=UNEXPECTED RESULT, Re-attempting execution .., attempt=9, retry-gap=10, max-retry-Attempts=30, stopRetry=false, error=<nil> E1006 23:21:22.092062 37 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 188 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x6309820?, 0x229b65d0}) /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99 /go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75 panic({0x6309820, 0x229b65d0}) /usr/lib/golang/src/runtime/panic.go:884 +0x212 github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1() /go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xba5 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x537bdea] | |||
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact | |||
#1783353790968107008 | junit | 9 days ago | |
Apr 25 06:19:39.418 E ns/openshift-insights pod/insights-operator-7979b9f879-695s8 node/ci-op-w65hfwzc-25656-7xsch-master-0 container/insights-operator reason/ContainerExit code/2 cause/Error 06:17:00.135755 1 configobserver.go:77] Refreshing configuration from cluster pull secret\nI0425 06:17:00.141541 1 configobserver.go:102] Found cloud.openshift.com token\nI0425 06:17:00.141575 1 configobserver.go:120] Refreshing configuration from cluster secret\nI0425 06:17:00.144755 1 configobserver.go:124] Support secret does not exist\nI0425 06:17:00.196732 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="9.666206ms" userAgent="Prometheus/2.26.1" srcIP="10.128.2.20:49628" resp=200\nI0425 06:17:00.266647 1 status.go:340] The operator is healthy\nI0425 06:17:00.266727 1 status.go:427] No status update necessary, objects are identical\nI0425 06:17:30.194504 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="9.118025ms" userAgent="Prometheus/2.26.1" srcIP="10.128.2.20:49628" resp=200\nI0425 06:17:40.079765 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 5 items received\nI0425 06:18:00.214536 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="14.612828ms" userAgent="Prometheus/2.26.1" srcIP="10.128.2.20:49628" resp=200\nI0425 06:18:12.083526 1 reflector.go:530] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172: Watch close - *v1.ConfigMap total 8 items received\nI0425 06:18:30.194220 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="8.440168ms" userAgent="Prometheus/2.26.1" srcIP="10.128.2.20:49628" resp=200\nI0425 06:19:00.206577 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="20.306844ms" userAgent="Prometheus/2.26.1" srcIP="10.128.2.20:49628" resp=200\nI0425 06:19:00.263793 1 status.go:340] The operator is healthy\nI0425 06:19:00.263876 1 status.go:427] No status update necessary, objects are identical\nI0425 06:19:30.199528 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="12.181635ms" userAgent="Prometheus/2.26.1" srcIP="10.128.2.20:49628" resp=200\n Apr 25 06:19:40.421 E ns/openshift-operator-lifecycle-manager pod/packageserver-795bc7d454-ssgvr node/ci-op-w65hfwzc-25656-7xsch-master-0 container/packageserver reason/ContainerExit code/2 cause/Error 3] Shutting down client-ca::kube-system::extension-apiserver-authentication::client-ca-file\nI0425 06:19:38.612537 1 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController\nI0425 06:19:38.612550 1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::client-ca-file\nI0425 06:19:38.612574 1 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController\nI0425 06:19:38.612591 1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file\nI0425 06:19:38.612602 1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file\nI0425 06:19:38.612679 1 secure_serving.go:241] Stopped listening on [::]:5443\nI0425 06:19:38.612707 1 tlsconfig.go:255] Shutting down DynamicServingCertificateController\nI0425 06:19:38.612746 1 dynamic_serving_content.go:145] Shutting down serving-cert::apiserver.local.config/certificates/apiserver.crt::apiserver.local.config/certificates/apiserver.key\npanic: send on closed channel\n\ngoroutine 44 [running]:\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).processNextWorkItem(0xc0000ea8f0, 0x21371f0, 0xc000420000, 0xc00058ede0, 0x0)\n /build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:297 +0x6fb\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).worker(0xc0000ea8f0, 0x21371f0, 0xc000420000, 0xc00058ede0)\n /build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:231 +0x49\ncreated by github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).start\n /build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:221 +0x446\n Apr 25 06:19:41.463 E ns/openshift-cluster-storage-operator pod/cluster-storage-operator-58b4cf58d6-7kqbk node/ci-op-w65hfwzc-25656-7xsch-master-0 container/cluster-storage-operator reason/ContainerExit code/1 cause/Error 5:56:47.400444 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-storage-operator", Name:"cluster-storage-operator", UID:"17af2172-b155-4842-9253-79cd513b9d1c", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/storage changed: status.versions changed from [{"operator" "4.8.0-0.ci-2023-07-15-003738"}] to [{"operator" "4.8.0-0.ci-2023-07-28-002940"}]\nI0425 05:56:47.476456 1 base_controller.go:72] Caches are synced for SnapshotCRDController \nI0425 05:56:47.476501 1 base_controller.go:109] Starting #1 worker of SnapshotCRDController controller ...\nI0425 06:06:47.285287 1 controller.go:174] Existing StorageClass managed-premium found, reconciling\nI0425 06:16:47.285833 1 controller.go:174] Existing StorageClass managed-premium found, reconciling\nI0425 06:16:47.295403 1 controller.go:174] Existing StorageClass managed-premium found, reconciling\nI0425 06:16:47.298896 1 controller.go:174] Existing StorageClass managed-premium found, reconciling\nI0425 06:19:40.605510 1 cmd.go:88] Received SIGTERM or SIGINT signal, shutting down controller.\nI0425 06:19:40.605636 1 base_controller.go:166] Shutting down SnapshotCRDController ...\nI0425 06:19:40.605661 1 base_controller.go:166] Shutting down CSIDriverStarter ...\nI0425 06:19:40.605674 1 base_controller.go:166] Shutting down ConfigObserver ...\nI0425 06:19:40.605743 1 base_controller.go:166] Shutting down ManagementStateController ...\nI0425 06:19:40.605745 1 reflector.go:225] Stopping reflector *v1.Role (10m0s) from k8s.io/client-go@v12.0.0+incompatible/tools/cache/reflector.go:167\nW0425 06:19:40.605771 1 builder.go:99] graceful termination failed, controllers failed with error: stopped\nI0425 06:19:40.605773 1 base_controller.go:166] Shutting down DefaultStorageClassController ...\nI0425 06:19:40.605799 1 base_controller.go:166] Shutting down StatusSyncer_storage ...\n |
Found in 100.00% of runs (100.00% of failures) across 1 total runs and 1 jobs (100.00% failed) in 96ms - clear search | chart view - source code located on github