Job:
#1842002bug2 years agoKubePodCrashLooping kube-contoller-manager cluster-policy-controller: 6443: connect: connection refused RELEASE_PENDING
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.5/2428/artifacts/e2e-gcp/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-kube-apiserver") | .firstTimestamp + " " + .lastTimestamp + " " + .message' | sort
...
2020-05-30T01:10:53Z 2020-05-30T01:10:53Z All pending requests processed
2020-05-30T01:10:53Z 2020-05-30T01:10:53Z Server has stopped listening
2020-05-30T01:10:53Z 2020-05-30T01:10:53Z The minimal shutdown duration of 1m10s finished
...
2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Created container kube-apiserver-cert-regeneration-controller
2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Created container kube-apiserver-cert-syncer
2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Started container kube-apiserver
...
#1934628bug17 months agoAPI server stopped reporting healthy during upgrade to 4.7.0 ASSIGNED
during that time the API server was restarted by kubelet due to a failed liveness probe
14:18:00	openshift-kube-apiserver	kubelet	kube-apiserver-ip-10-0-159-123.ec2.internal	Killing	Container kube-apiserver failed liveness probe, will be restarted
14:19:17	openshift-kube-apiserver	apiserver	kube-apiserver-ip-10-0-159-123.ec2.internal	TerminationMinimalShutdownDurationFinished	The minimal shutdown duration of 1m10s finished
moving to etcd team to investigate why etcd was unavailable during that time
Comment 15200626 by mfojtik@redhat.com at 2021-06-17T18:29:50Z
The LifecycleStale keyword was removed because the bug got commented on recently.
#1943804bug18 months agoAPI server on AWS takes disruption between 70s and 110s after pod begins termination via external LB RELEASE_PENDING
    "name": "kube-apiserver-ip-10-0-131-183.ec2.internal",
    "namespace": "openshift-kube-apiserver"
  },
  "kind": "Event",
  "lastTimestamp": null,
  "message": "The minimal shutdown duration of 1m10s finished",
  "metadata": {
    "creationTimestamp": "2021-03-29T12:18:04Z",
    "name": "kube-apiserver-ip-10-0-131-183.ec2.internal.1670cf61b0f72d2d",
    "namespace": "openshift-kube-apiserver",
    "resourceVersion": "89139",
#1921157bug23 months ago[sig-api-machinery] Kubernetes APIs remain available for new connections ASSIGNED
T2: At 06:45:58: systemd-shutdown was sending SIGTERM to remaining processes...
T3: At 06:45:58: kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: Received signal to terminate, becoming unready, but keeping serving (TerminationStart event)
T4: At 06:47:08 kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: The minimal shutdown duration of 1m10s finished (TerminationMinimalShutdownDurationFinished event)
T5: At 06:47:08 kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: Server has stopped listening (TerminationStoppedServing event)
T5 is the last event reported from that api server. At T5 the server might wait up to 60s for all requests to complete and then it fires TerminationGracefulTerminationFinished event.
#1932097bug18 months agoApiserver liveness probe is marking it as unhealthy during normal shutdown RELEASE_PENDING
Feb 23 20:18:04.212 - 1s    E kube-apiserver-new-connection kube-apiserver-new-connection is not responding to GET requests
Feb 23 20:18:05.318 I kube-apiserver-new-connection kube-apiserver-new-connection started responding to GET requests
Deeper detail from the node log shows that right as we get this error one of the instances finishes its connection ,which is right when the error happens.
Feb 23 20:18:02.505 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-203-7.us-east-2.compute.internal node/ip-10-0-203-7 reason/TerminationMinimalShutdownDurationFinished The minimal shutdown duration of 1m10s finished
Feb 23 20:18:02.509 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-203-7.us-east-2.compute.internal node/ip-10-0-203-7 reason/TerminationStoppedServing Server has stopped listening
Feb 23 20:18:03.148 I ns/openshift-console-operator deployment/console-operator reason/OperatorStatusChanged Status for clusteroperator/console changed: Degraded message changed from "CustomRouteSyncDegraded: the server is currently unable to handle the request (delete routes.route.openshift.io console-custom)\nSyncLoopRefreshDegraded: the server is currently unable to handle the request (get routes.route.openshift.io console)" to "SyncLoopRefreshDegraded: the server is currently unable to handle the request (get routes.route.openshift.io console)" (2 times)
Feb 23 20:18:03.880 E kube-apiserver-reused-connection kube-apiserver-reused-connection started failing: Get "https://api.ci-op-ivyvzgrr-0b477.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/default": dial tcp 3.21.250.132:6443: connect: connection refused
This kind of looks like the load balancer didn't remove the kube-apiserver and kept sending traffic and the connection didn't cleanly shut down - did something regress in the apiserver traffic connection?
#1995804bug15 months agoRewrite carry "UPSTREAM: <carry>: create termination events" to lifecycleEvents RELEASE_PENDING
Use the new lifecycle event names for the events that we generate when an apiserver is gracefully terminating.
Comment 15454963 by kewang@redhat.com at 2021-09-03T09:36:37Z
$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=The+minimal+shutdown+duration&maxAge=168h&context=5&type=build-log&name=4%5C.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep -E 'kube-system node\/apiserver|openshift-kube-apiserver|openshift-apiserver' > test.log
$ grep 'The minimal shutdown duration of' test.log | head -2
Sep 03 05:22:37.000 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-163-71.us-west-1.compute.internal node/ip-10-0-163-71 reason/AfterShutdownDelayDuration The minimal shutdown duration of 3m30s finished
Sep 03 05:22:37.000 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-163-71.us-west-1.compute.internal node/ip-10-0-163-71 reason/AfterShutdownDelayDuration The minimal shutdown duration of 3m30s finished
$ grep 'Received signal to terminate' test.log | head -2
Sep 03 08:49:11.000 I ns/default namespace/kube-system node/apiserver-75cf4778cb-9zk42 reason/TerminationStart Received signal to terminate, becoming unready, but keeping serving
Sep 03 08:53:40.000 I ns/default namespace/kube-system node/apiserver-75cf4778cb-c8429 reason/TerminationStart Received signal to terminate, becoming unready, but keeping serving
#1955333bug11 months ago"Kubernetes APIs remain available for new connections" and similar failing on 4.8 Azure updates NEW
  2021-05-01T03:59:42Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Killing: Stopping container kube-apiserver-check-endpoints
  2021-05-01T03:59:42Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Killing: Stopping container kube-apiserver-insecure-readyz
  2021-05-01T03:59:43Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationPreShutdownHooksFinished: All pre-shutdown hooks have been finished
  2021-05-01T03:59:43Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationStart: Received signal to terminate, becoming unready, but keeping serving
  2021-05-01T03:59:49Z 1 cert-regeneration-controller-lock LeaderElection: ip-10-0-239-74_02f2b687-97f4-44c4-9516-e3fb364deb85 became leader
  2021-05-01T04:00:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationMinimalShutdownDurationFinished: The minimal shutdown duration of 1m10s finished
  2021-05-01T04:00:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationStoppedServing: Server has stopped listening
  2021-05-01T04:01:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationGracefulTerminationFinished: All pending requests processed
  2021-05-01T04:01:55Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Pulling: Pulling image "registry.ci.openshift.org/ocp/4.8-2021-04-30-212732@sha256:e4c7be2f0e8b1e9ef1ad9161061449ec1bdc6953a58f6d456971ee945a8d3197"
  2021-05-01T04:02:05Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Created: Created container setup
  2021-05-01T04:02:05Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Pulled: Container image "registry.ci.openshift.org/ocp/4.8-2021-04-30-212732@sha256:e4c7be2f0e8b1e9ef1ad9161061449ec1bdc6953a58f6d456971ee945a8d3197" already present on machine
That really looks like kube-apiserver is rolling out a new version, and for some reason there is not the graceful LB handoff we need to avoid connection issues.  Unifying the two timelines:
* 03:59:43Z TerminationPreShutdownHooksFinished
* 03:59:43Z TerminationStart: Received signal to terminate, becoming unready, but keeping serving
* 04:00:53Z TerminationMinimalShutdownDurationFinished: The minimal shutdown duration of 1m10s finished
* 04:00:53Z TerminationStoppedServing: Server has stopped listening
* 04:00:58.307Z kube-apiserver-new-connection started failing... connection refused
* 04:00:59.314Z kube-apiserver-new-connection started responding to GET requests
* 04:01:03.307Z kube-apiserver-new-connection started failing... connection refused
* 04:01:04.313Z kube-apiserver-new-connection started responding to GET requests
#1979916bug18 months agokube-apiserver constantly receiving signals to terminate after a fresh install, but still keeps serving ASSIGNED
kube-apiserver-master-0-2
Server has stopped listening
kube-apiserver-master-0-2
The minimal shutdown duration of 1m10s finished
redhat-operators-7p4nb
Stopping container registry-server
Successfully pulled image "registry.redhat.io/redhat/redhat-operator-index:v4.8" in 3.09180991s
periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 13 runs, 54% failed, 29% of failures match = 15% impact
#1618320365644156928junit5 days ago
Jan 25 19:51:46.123 E ns/openshift-monitoring pod/alertmanager-main-2 node/ci-op-6khp7b26-875d2-fzfwl-worker-b-94n99 container/config-reloader reason/ContainerExit code/2 cause/Error level=info ts=2023-01-25T19:13:08.888053709Z caller=main.go:148 msg="Starting prometheus-config-reloader" version="(version=0.49.0, branch=release-4.9, revision=170b0686e)"\nlevel=info ts=2023-01-25T19:13:08.888231472Z caller=main.go:149 build_context="(go=go1.16.12, user=root, date=20220505-09:02:32)"\nlevel=info ts=2023-01-25T19:13:08.88839708Z caller=main.go:183 msg="Starting web server for metrics" listen=localhost:8080\nlevel=info ts=2023-01-25T19:13:08.888938841Z caller=reloader.go:219 msg="started watching config file and directories for changes" cfg= out= dirs=/etc/alertmanager/config,/etc/alertmanager/secrets/alertmanager-main-tls,/etc/alertmanager/secrets/alertmanager-main-proxy,/etc/alertmanager/secrets/alertmanager-kube-rbac-proxy\nlevel=info ts=2023-01-25T19:13:10.167520297Z caller=reloader.go:355 msg="Reload triggered" cfg_in= cfg_out= watched_dirs="/etc/alertmanager/config, /etc/alertmanager/secrets/alertmanager-main-tls, /etc/alertmanager/secrets/alertmanager-main-proxy, /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy"\n
Jan 25 19:51:47.407 E ns/openshift-monitoring pod/thanos-querier-767fbffc58-nnd25 node/ci-op-6khp7b26-875d2-fzfwl-worker-c-4wsjk container/oauth-proxy reason/ContainerExit code/2 cause/Error 2023/01/25 19:13:49 provider.go:128: Defaulting client-id to system:serviceaccount:openshift-monitoring:thanos-querier\n2023/01/25 19:13:49 provider.go:133: Defaulting client-secret to service account token /var/run/secrets/kubernetes.io/serviceaccount/token\n2023/01/25 19:13:49 provider.go:351: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.\n2023/01/25 19:13:49 oauthproxy.go:203: mapping path "/" => upstream "http://localhost:9090/"\n2023/01/25 19:13:49 oauthproxy.go:224: compiled skip-auth-regex => "^/-/(healthy|ready)$"\n2023/01/25 19:13:49 oauthproxy.go:230: OAuthProxy configured for  Client ID: system:serviceaccount:openshift-monitoring:thanos-querier\n2023/01/25 19:13:49 oauthproxy.go:240: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> samesite: refresh:disabled\n2023/01/25 19:13:49 main.go:156: using htpasswd file /etc/proxy/htpasswd/auth\n2023/01/25 19:13:49 http.go:107: HTTPS: listening on [::]:9091\nI0125 19:13:49.783298       1 dynamic_serving_content.go:130] Starting serving::/etc/tls/private/tls.crt::/etc/tls/private/tls.key\n
Jan 25 19:51:48.204 E ns/openshift-monitoring pod/openshift-state-metrics-5b7768c544-cnbcl node/ci-op-6khp7b26-875d2-fzfwl-worker-b-94n99 container/openshift-state-metrics reason/ContainerExit code/2 cause/Error
Jan 25 19:51:49.144 E ns/openshift-monitoring pod/telemeter-client-99c5ccdcf-hg2j9 node/ci-op-6khp7b26-875d2-fzfwl-worker-b-94n99 container/reload reason/ContainerExit code/2 cause/Error
Jan 25 19:51:49.144 E ns/openshift-monitoring pod/telemeter-client-99c5ccdcf-hg2j9 node/ci-op-6khp7b26-875d2-fzfwl-worker-b-94n99 container/telemeter-client reason/ContainerExit code/2 cause/Error
Jan 25 19:51:50.271 E ns/openshift-console-operator pod/console-operator-8595f7cccb-jmm7p node/ci-op-6khp7b26-875d2-fzfwl-master-2 container/console-operator reason/ContainerExit code/1 cause/Error type: 'Normal' reason: 'TerminationStart' Received signal to terminate, becoming unready, but keeping serving\nI0125 19:51:49.470206       1 genericapiserver.go:709] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-console-operator", Name:"console-operator-8595f7cccb-jmm7p", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationMinimalShutdownDurationFinished' The minimal shutdown duration of 0s finished\nI0125 19:51:49.470229       1 genericapiserver.go:362] "[graceful-termination] shutdown event" name="AfterShutdownDelayDuration"\nI0125 19:51:49.470161       1 genericapiserver.go:709] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-console-operator", Name:"console-operator-8595f7cccb-jmm7p", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationPreShutdownHooksFinished' All pre-shutdown hooks have been finished\nI0125 19:51:49.470287       1 base_controller.go:167] Shutting down UnsupportedConfigOverridesController ...\nI0125 19:51:49.470376       1 base_controller.go:167] Shutting down ManagementStateController ...\nI0125 19:51:49.470393       1 base_controller.go:167] Shutting down LoggingSyncer ...\nI0125 19:51:49.470410       1 base_controller.go:167] Shutting down StatusSyncer_console ...\nI0125 19:51:49.470425       1 base_controller.go:145] All StatusSyncer_console post start hooks have been terminated\nI0125 19:51:49.470439       1 base_controller.go:167] Shutting down ConsoleServiceController ...\nI0125 19:51:49.470453       1 base_controller.go:167] Shutting down ConsoleDownloadsDeploymentSyncController ...\nI0125 19:51:49.470471       1 base_controller.go:167] Shutting down DownloadsRouteController ...\nI0125 19:51:49.470486       1 base_controller.go:167] Shutting down ResourceSyncController ...\nI0125 19:51:49.470498       1 base_controller.go:167] Shutting down RemoveStaleConditionsController ...\nW0125 19:51:49.470770       1 builder.go:101] graceful termination failed, controllers failed with error: stopped\n
Jan 25 19:51:52.151 E ns/openshift-monitoring pod/kube-state-metrics-6855864cc7-97mb5 node/ci-op-6khp7b26-875d2-fzfwl-worker-b-94n99 container/kube-state-metrics reason/ContainerExit code/2 cause/Error
Jan 25 19:51:52.581 E ns/openshift-ingress-canary pod/ingress-canary-ghz9v node/ci-op-6khp7b26-875d2-fzfwl-worker-c-4wsjk container/serve-healthcheck-canary reason/ContainerExit code/2 cause/Error serving on 8888\nserving on 8080\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\nServing canary healthcheck request\n
Jan 25 19:51:54.935 E ns/openshift-operator-lifecycle-manager pod/package-server-manager-fddf49875-x95jc node/ci-op-6khp7b26-875d2-fzfwl-master-0 container/package-server-manager reason/TerminationStateCleared lastState.terminated was cleared on a pod (bug https://bugzilla.redhat.com/show_bug.cgi?id=1933760 or similar)
Jan 25 19:51:55.037 E ns/openshift-monitoring pod/node-exporter-sqsss node/ci-op-6khp7b26-875d2-fzfwl-master-0 container/node-exporter reason/ContainerExit code/143 cause/Error 5T19:05:10.104Z caller=node_exporter.go:113 collector=meminfo\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=netclass\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=netdev\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=netstat\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=nfs\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=nfsd\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=powersupplyclass\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=pressure\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=rapl\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=schedstat\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=sockstat\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=softnet\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=stat\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=textfile\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=thermal_zone\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=time\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=timex\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=udp_queues\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=uname\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=vmstat\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=xfs\nlevel=info ts=2023-01-25T19:05:10.104Z caller=node_exporter.go:113 collector=zfs\nlevel=info ts=2023-01-25T19:05:10.105Z caller=node_exporter.go:195 msg="Listening on" address=127.0.0.1:9100\nlevel=info ts=2023-01-25T19:05:10.105Z caller=tls_config.go:191 msg="TLS is disabled." http2=false\n
Jan 25 19:51:55.767 E ns/openshift-monitoring pod/grafana-65bfbf997b-xj9ck node/ci-op-6khp7b26-875d2-fzfwl-worker-a-gcjct container/grafana-proxy reason/ContainerExit code/2 cause/Error
#1615309652814204928junit13 days ago
Jan 17 12:26:22.926 E ns/openshift-machine-api pod/machine-api-controllers-7847d7bdd4-cbcv8 node/ci-op-7zjl13q5-875d2-xwgkj-master-1 container/machineset-controller reason/ContainerExit code/1 cause/Error
Jan 17 12:28:32.544 E ns/openshift-kube-storage-version-migrator pod/migrator-5585cb5f85-jqkh6 node/ci-op-7zjl13q5-875d2-xwgkj-master-1 container/migrator reason/ContainerExit code/2 cause/Error I0117 11:41:15.904792       1 migrator.go:18] FLAG: --add_dir_header="false"\nI0117 11:41:15.904988       1 migrator.go:18] FLAG: --alsologtostderr="true"\nI0117 11:41:15.904995       1 migrator.go:18] FLAG: --kube-api-burst="1000"\nI0117 11:41:15.905001       1 migrator.go:18] FLAG: --kube-api-qps="40"\nI0117 11:41:15.905010       1 migrator.go:18] FLAG: --kubeconfig=""\nI0117 11:41:15.905016       1 migrator.go:18] FLAG: --log_backtrace_at=":0"\nI0117 11:41:15.905026       1 migrator.go:18] FLAG: --log_dir=""\nI0117 11:41:15.905032       1 migrator.go:18] FLAG: --log_file=""\nI0117 11:41:15.905037       1 migrator.go:18] FLAG: --log_file_max_size="1800"\nI0117 11:41:15.905042       1 migrator.go:18] FLAG: --logtostderr="true"\nI0117 11:41:15.905047       1 migrator.go:18] FLAG: --one_output="false"\nI0117 11:41:15.905052       1 migrator.go:18] FLAG: --skip_headers="false"\nI0117 11:41:15.905057       1 migrator.go:18] FLAG: --skip_log_headers="false"\nI0117 11:41:15.905062       1 migrator.go:18] FLAG: --stderrthreshold="2"\nI0117 11:41:15.905068       1 migrator.go:18] FLAG: --v="2"\nI0117 11:41:15.905073       1 migrator.go:18] FLAG: --vmodule=""\nI0117 11:41:15.907500       1 reflector.go:219] Starting reflector *v1alpha1.StorageVersionMigration (0s) from k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167\nI0117 11:41:25.027676       1 kubemigrator.go:110] flowcontrol-flowschema-storage-version-migration: migration running\nI0117 11:41:25.145177       1 kubemigrator.go:127] flowcontrol-flowschema-storage-version-migration: migration succeeded\nI0117 11:41:26.232078       1 kubemigrator.go:110] flowcontrol-prioritylevel-storage-version-migration: migration running\nI0117 11:41:26.515178       1 kubemigrator.go:127] flowcontrol-prioritylevel-storage-version-migration: migration succeeded\nI0117 11:49:03.269988       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF\n
Jan 17 12:29:00.013 E ns/openshift-insights pod/insights-operator-5db4496c45-qtgtk node/ci-op-7zjl13q5-875d2-xwgkj-master-0 container/insights-operator reason/ContainerExit code/2 cause/Error t="Prometheus/2.29.2" audit-ID="d3106079-4839-453c-87d8-571a95b64117" srcIP="10.129.2.11:53314" resp=200\nI0117 12:27:09.183559       1 httplog.go:104] "HTTP" verb="GET" URI="/metrics" latency="6.499899ms" userAgent="Prometheus/2.29.2" audit-ID="95bada40-e2de-4346-95c0-987a9e61d79e" srcIP="10.128.2.15:48556" resp=200\nI0117 12:27:29.296379       1 configobserver.go:77] Refreshing configuration from cluster pull secret\nI0117 12:27:29.303265       1 configobserver.go:102] Found cloud.openshift.com token\nI0117 12:27:29.303313       1 configobserver.go:120] Refreshing configuration from cluster secret\nI0117 12:27:32.177560       1 status.go:354] The operator is healthy\nI0117 12:27:32.177855       1 status.go:441] No status update necessary, objects are identical\nI0117 12:27:36.045799       1 httplog.go:104] "HTTP" verb="GET" URI="/metrics" latency="4.494759ms" userAgent="Prometheus/2.29.2" audit-ID="12f76b47-ce41-4819-8d53-37da6f79a8ad" srcIP="10.129.2.11:53314" resp=200\nI0117 12:27:39.208802       1 httplog.go:104] "HTTP" verb="GET" URI="/metrics" latency="31.310416ms" userAgent="Prometheus/2.29.2" audit-ID="0458b18b-751e-477f-8896-a9ed73f4588f" srcIP="10.128.2.15:48556" resp=200\nI0117 12:28:06.055827       1 httplog.go:104] "HTTP" verb="GET" URI="/metrics" latency="13.139699ms" userAgent="Prometheus/2.29.2" audit-ID="9fda66b9-410e-41c9-9773-000fe525875b" srcIP="10.129.2.11:53314" resp=200\nI0117 12:28:09.184026       1 httplog.go:104] "HTTP" verb="GET" URI="/metrics" latency="6.374159ms" userAgent="Prometheus/2.29.2" audit-ID="674ce428-a1ba-4b33-94e2-58ae84441363" srcIP="10.128.2.15:48556" resp=200\nI0117 12:28:36.047815       1 httplog.go:104] "HTTP" verb="GET" URI="/metrics" latency="4.170039ms" userAgent="Prometheus/2.29.2" audit-ID="5cd7eb6b-85b9-4170-82d3-d6e40a7b3847" srcIP="10.129.2.11:53314" resp=200\nI0117 12:28:39.196504       1 httplog.go:104] "HTTP" verb="GET" URI="/metrics" latency="18.616868ms" userAgent="Prometheus/2.29.2" audit-ID="6fb4dfdf-7619-42ad-8fa6-c55486cd633e" srcIP="10.128.2.15:48556" resp=200\n
Jan 17 12:29:07.117 E ns/openshift-cluster-storage-operator pod/cluster-storage-operator-58dc5bcc9f-2b44q node/ci-op-7zjl13q5-875d2-xwgkj-master-0 container/cluster-storage-operator reason/ContainerExit code/1 cause/Error o:114] Shutting down worker of LoggingSyncer controller ...\nI0117 12:29:05.749566       1 base_controller.go:104] All LoggingSyncer workers have been terminated\nI0117 12:29:05.749573       1 base_controller.go:114] Shutting down worker of ManagementStateController controller ...\nI0117 12:29:05.749577       1 base_controller.go:104] All ManagementStateController workers have been terminated\nI0117 12:29:05.749584       1 base_controller.go:114] Shutting down worker of StaticResourceController controller ...\nI0117 12:29:05.749591       1 base_controller.go:104] All StaticResourceController workers have been terminated\nI0117 12:29:05.749595       1 controller_manager.go:54] StaticResourceController controller terminated\nI0117 12:29:05.749605       1 base_controller.go:114] Shutting down worker of DefaultStorageClassController controller ...\nI0117 12:29:05.749625       1 base_controller.go:104] All DefaultStorageClassController workers have been terminated\nI0117 12:29:05.749652       1 base_controller.go:114] Shutting down worker of CSIDriverStarter controller ...\nI0117 12:29:05.749658       1 base_controller.go:104] All CSIDriverStarter workers have been terminated\nI0117 12:29:05.749410       1 base_controller.go:167] Shutting down GCPPDCSIDriverOperator ...\nI0117 12:29:05.749853       1 base_controller.go:145] All GCPPDCSIDriverOperator post start hooks have been terminated\nI0117 12:29:05.749970       1 base_controller.go:114] Shutting down worker of GCPPDCSIDriverOperator controller ...\nI0117 12:29:05.750116       1 base_controller.go:104] All GCPPDCSIDriverOperator workers have been terminated\nI0117 12:29:05.750180       1 controller_manager.go:54] GCPPDCSIDriverOperator controller terminated\nI0117 12:29:05.750471       1 secure_serving.go:311] Stopped listening on [::]:8443\nI0117 12:29:05.752080       1 genericapiserver.go:363] "[graceful-termination] shutdown event" name="HTTPServerStoppedListening"\nW0117 12:29:05.750717       1 builder.go:101] graceful termination failed, controllers failed with error: stopped\n
Jan 17 12:29:07.117 E ns/openshift-cluster-storage-operator pod/cluster-storage-operator-58dc5bcc9f-2b44q node/ci-op-7zjl13q5-875d2-xwgkj-master-0 container/cluster-storage-operator reason/TerminationStateCleared lastState.terminated was cleared on a pod (bug https://bugzilla.redhat.com/show_bug.cgi?id=1933760 or similar)
Jan 17 12:29:17.101 E ns/openshift-console-operator pod/console-operator-6f665bcb64-pt9ng node/ci-op-7zjl13q5-875d2-xwgkj-master-2 container/console-operator reason/ContainerExit code/1 cause/Error minate, becoming unready, but keeping serving\nI0117 12:29:10.944684       1 genericapiserver.go:709] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-console-operator", Name:"console-operator-6f665bcb64-pt9ng", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationMinimalShutdownDurationFinished' The minimal shutdown duration of 0s finished\nI0117 12:29:10.944749       1 genericapiserver.go:362] "[graceful-termination] shutdown event" name="AfterShutdownDelayDuration"\nI0117 12:29:10.944806       1 genericapiserver.go:709] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-console-operator", Name:"console-operator-6f665bcb64-pt9ng", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TerminationStoppedServing' Server has stopped listening\nI0117 12:29:10.944860       1 genericapiserver.go:387] "[graceful-termination] shutdown event" name="InFlightRequestsDrained"\nI0117 12:29:10.945510       1 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController\nI0117 12:29:10.945639       1 configmap_cafile_content.go:222] "Shutting down controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"\nI0117 12:29:10.945696       1 configmap_cafile_content.go:222] "Shutting down controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"\nI0117 12:29:10.946145       1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"\nI0117 12:29:10.946241       1 dynamic_serving_content.go:144] "Shutting down controller" name="serving-cert::/var/run/secrets/serving-cert/tls.crt::/var/run/secrets/serving-cert/tls.key"\nI0117 12:29:10.946594       1 secure_serving.go:311] Stopped listening on [::]:8443\nI0117 12:29:10.951477       1 genericapiserver.go:373] "[graceful-termination] shutdown event" name="HTTPServerStoppedListening"\nW0117 12:29:10.949401       1 builder.go:101] graceful termination failed, controllers failed with error: stopped\n
Jan 17 12:29:28.269 E ns/openshift-service-ca-operator pod/service-ca-operator-7fb558bcfd-hfb2n node/ci-op-7zjl13q5-875d2-xwgkj-master-0 container/service-ca-operator reason/ContainerExit code/1 cause/Error
Jan 17 12:29:36.673 E ns/openshift-service-ca pod/service-ca-79648dc48b-fcx96 node/ci-op-7zjl13q5-875d2-xwgkj-master-1 container/service-ca-controller reason/ContainerExit code/1 cause/Error
Jan 17 12:29:39.433 E ns/openshift-controller-manager-operator pod/openshift-controller-manager-operator-7ddddf8bff-zb5jt node/ci-op-7zjl13q5-875d2-xwgkj-master-0 container/openshift-controller-manager-operator reason/ContainerExit code/1 cause/Error 282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"477eea0c-dfa9-42a6-809b-c6a61898978d", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-controller-manager changed: Progressing message changed from "Progressing: daemonset/controller-manager: observed generation is 7, desired generation is 8.\nProgressing: openshiftcontrollermanagers.operator.openshift.io/cluster: observed generation is 3, desired generation is 4." to "Progressing: daemonset/controller-manager: updated number scheduled is 0, desired number scheduled is 3"\nI0117 12:29:33.966013       1 httplog.go:104] "HTTP" verb="GET" URI="/metrics" latency="8.995349ms" userAgent="Prometheus/2.29.2" audit-ID="479a76a6-5f67-42e6-ae85-4a81b808fc1c" srcIP="10.129.2.11:43770" resp=200\nI0117 12:29:38.111606       1 cmd.go:97] Received SIGTERM or SIGINT signal, shutting down controller.\nI0117 12:29:38.112265       1 genericapiserver.go:386] [graceful-termination] RunPreShutdownHooks has completed\nI0117 12:29:38.112354       1 genericapiserver.go:349] "[graceful-termination] shutdown event" name="ShutdownInitiated"\nI0117 12:29:38.112379       1 genericapiserver.go:352] "[graceful-termination] shutdown event" name="AfterShutdownDelayDuration"\nI0117 12:29:38.112399       1 genericapiserver.go:376] "[graceful-termination] shutdown event" name="InFlightRequestsDrained"\nI0117 12:29:38.112500       1 reflector.go:225] Stopping reflector *v1.OpenShiftControllerManager (10m0s) from k8s.io/client-go@v0.22.0-rc.0/tools/cache/reflector.go:167\nI0117 12:29:38.112592       1 reflector.go:225] Stopping reflector *v1.ConfigMap (10m0s) from k8s.io/client-go@v0.22.0-rc.0/tools/cache/reflector.go:167\nI0117 12:29:38.112918       1 base_controller.go:167] Shutting down ConfigObserver ...\nW0117 12:29:38.113241       1 builder.go:101] graceful termination failed, controllers failed with error: stopped\n
Jan 17 12:29:39.504 E ns/openshift-authentication-operator pod/authentication-operator-5cc5db57d8-8rrfb node/ci-op-7zjl13q5-875d2-xwgkj-master-0 container/authentication-operator reason/TerminationStateCleared lastState.terminated was cleared on a pod (bug https://bugzilla.redhat.com/show_bug.cgi?id=1933760 or similar)
Jan 17 12:29:40.496 E ns/openshift-cluster-storage-operator pod/csi-snapshot-controller-operator-f9c895555-4sbp4 node/ci-op-7zjl13q5-875d2-xwgkj-master-0 container/csi-snapshot-controller-operator reason/ContainerExit code/1 cause/Error  syncing operator at 582.964963ms\nI0117 12:29:03.215361       1 operator.go:157] Starting syncing operator at 2023-01-17 12:29:03.215322348 +0000 UTC m=+2881.727506304\nI0117 12:29:03.298591       1 operator.go:159] Finished syncing operator at 83.2581ms\nI0117 12:29:05.721486       1 operator.go:157] Starting syncing operator at 2023-01-17 12:29:05.721401228 +0000 UTC m=+2884.233585194\nI0117 12:29:05.807205       1 operator.go:159] Finished syncing operator at 85.786083ms\nI0117 12:29:05.808859       1 operator.go:157] Starting syncing operator at 2023-01-17 12:29:05.808851281 +0000 UTC m=+2884.321035237\nI0117 12:29:05.906687       1 operator.go:159] Finished syncing operator at 97.822342ms\nI0117 12:29:19.621411       1 operator.go:157] Starting syncing operator at 2023-01-17 12:29:19.621396237 +0000 UTC m=+2898.133580203\nI0117 12:29:19.717435       1 operator.go:159] Finished syncing operator at 96.026462ms\nI0117 12:29:23.443025       1 operator.go:157] Starting syncing operator at 2023-01-17 12:29:23.443008335 +0000 UTC m=+2901.955192310\nI0117 12:29:23.499907       1 operator.go:159] Finished syncing operator at 56.884666ms\nI0117 12:29:31.430763       1 operator.go:157] Starting syncing operator at 2023-01-17 12:29:31.430746831 +0000 UTC m=+2909.942930788\nI0117 12:29:31.496403       1 operator.go:159] Finished syncing operator at 65.641064ms\nI0117 12:29:33.653700       1 operator.go:157] Starting syncing operator at 2023-01-17 12:29:33.653686135 +0000 UTC m=+2912.165870092\nI0117 12:29:33.741828       1 operator.go:159] Finished syncing operator at 88.129333ms\nI0117 12:29:39.079755       1 operator.go:157] Starting syncing operator at 2023-01-17 12:29:39.079741386 +0000 UTC m=+2917.591925342\nI0117 12:29:39.131948       1 cmd.go:97] Received SIGTERM or SIGINT signal, shutting down controller.\nI0117 12:29:39.133987       1 genericapiserver.go:386] [graceful-termination] RunPreShutdownHooks has completed\nW0117 12:29:39.136977       1 builder.go:101] graceful termination failed, controllers failed with error: stopped\n

Found in 15.38% of runs (28.57% of failures) across 13 total runs and 1 jobs (53.85% failed) in 94ms - clear search | chart view - source code located on github