#1842002 | bug | 2 years ago | KubePodCrashLooping kube-contoller-manager cluster-policy-controller: 6443: connect: connection refused RELEASE_PENDING |
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.5/2428/artifacts/e2e-gcp/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-kube-apiserver") | .firstTimestamp + " " + .lastTimestamp + " " + .message' | sort ... 2020-05-30T01:10:53Z 2020-05-30T01:10:53Z All pending requests processed 2020-05-30T01:10:53Z 2020-05-30T01:10:53Z Server has stopped listening 2020-05-30T01:10:53Z 2020-05-30T01:10:53Z The minimal shutdown duration of 1m10s finished ... 2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Created container kube-apiserver-cert-regeneration-controller 2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Created container kube-apiserver-cert-syncer 2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Started container kube-apiserver ... | |||
#1934628 | bug | 19 months ago | API server stopped reporting healthy during upgrade to 4.7.0 ASSIGNED |
during that time the API server was restarted by kubelet due to a failed liveness probe 14:18:00 openshift-kube-apiserver kubelet kube-apiserver-ip-10-0-159-123.ec2.internal Killing Container kube-apiserver failed liveness probe, will be restarted 14:19:17 openshift-kube-apiserver apiserver kube-apiserver-ip-10-0-159-123.ec2.internal TerminationMinimalShutdownDurationFinished The minimal shutdown duration of 1m10s finished moving to etcd team to investigate why etcd was unavailable during that time Comment 15200626 by mfojtik@redhat.com at 2021-06-17T18:29:50Z The LifecycleStale keyword was removed because the bug got commented on recently. | |||
#1943804 | bug | 20 months ago | API server on AWS takes disruption between 70s and 110s after pod begins termination via external LB RELEASE_PENDING |
"name": "kube-apiserver-ip-10-0-131-183.ec2.internal", "namespace": "openshift-kube-apiserver" }, "kind": "Event", "lastTimestamp": null, "message": "The minimal shutdown duration of 1m10s finished", "metadata": { "creationTimestamp": "2021-03-29T12:18:04Z", "name": "kube-apiserver-ip-10-0-131-183.ec2.internal.1670cf61b0f72d2d", "namespace": "openshift-kube-apiserver", "resourceVersion": "89139", | |||
#1921157 | bug | 2 years ago | [sig-api-machinery] Kubernetes APIs remain available for new connections ASSIGNED |
T2: At 06:45:58: systemd-shutdown was sending SIGTERM to remaining processes... T3: At 06:45:58: kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: Received signal to terminate, becoming unready, but keeping serving (TerminationStart event) T4: At 06:47:08 kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: The minimal shutdown duration of 1m10s finished (TerminationMinimalShutdownDurationFinished event) T5: At 06:47:08 kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: Server has stopped listening (TerminationStoppedServing event) T5 is the last event reported from that api server. At T5 the server might wait up to 60s for all requests to complete and then it fires TerminationGracefulTerminationFinished event. | |||
#1932097 | bug | 20 months ago | Apiserver liveness probe is marking it as unhealthy during normal shutdown RELEASE_PENDING |
Feb 23 20:18:04.212 - 1s E kube-apiserver-new-connection kube-apiserver-new-connection is not responding to GET requests Feb 23 20:18:05.318 I kube-apiserver-new-connection kube-apiserver-new-connection started responding to GET requests Deeper detail from the node log shows that right as we get this error one of the instances finishes its connection ,which is right when the error happens. Feb 23 20:18:02.505 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-203-7.us-east-2.compute.internal node/ip-10-0-203-7 reason/TerminationMinimalShutdownDurationFinished The minimal shutdown duration of 1m10s finished Feb 23 20:18:02.509 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-203-7.us-east-2.compute.internal node/ip-10-0-203-7 reason/TerminationStoppedServing Server has stopped listening Feb 23 20:18:03.148 I ns/openshift-console-operator deployment/console-operator reason/OperatorStatusChanged Status for clusteroperator/console changed: Degraded message changed from "CustomRouteSyncDegraded: the server is currently unable to handle the request (delete routes.route.openshift.io console-custom)\nSyncLoopRefreshDegraded: the server is currently unable to handle the request (get routes.route.openshift.io console)" to "SyncLoopRefreshDegraded: the server is currently unable to handle the request (get routes.route.openshift.io console)" (2 times) Feb 23 20:18:03.880 E kube-apiserver-reused-connection kube-apiserver-reused-connection started failing: Get "https://api.ci-op-ivyvzgrr-0b477.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/default": dial tcp 3.21.250.132:6443: connect: connection refused This kind of looks like the load balancer didn't remove the kube-apiserver and kept sending traffic and the connection didn't cleanly shut down - did something regress in the apiserver traffic connection? | |||
#1995804 | bug | 17 months ago | Rewrite carry "UPSTREAM: <carry>: create termination events" to lifecycleEvents RELEASE_PENDING |
Use the new lifecycle event names for the events that we generate when an apiserver is gracefully terminating. Comment 15454963 by kewang@redhat.com at 2021-09-03T09:36:37Z $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=The+minimal+shutdown+duration&maxAge=168h&context=5&type=build-log&name=4%5C.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep -E 'kube-system node\/apiserver|openshift-kube-apiserver|openshift-apiserver' > test.log $ grep 'The minimal shutdown duration of' test.log | head -2 Sep 03 05:22:37.000 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-163-71.us-west-1.compute.internal node/ip-10-0-163-71 reason/AfterShutdownDelayDuration The minimal shutdown duration of 3m30s finished Sep 03 05:22:37.000 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-163-71.us-west-1.compute.internal node/ip-10-0-163-71 reason/AfterShutdownDelayDuration The minimal shutdown duration of 3m30s finished $ grep 'Received signal to terminate' test.log | head -2 Sep 03 08:49:11.000 I ns/default namespace/kube-system node/apiserver-75cf4778cb-9zk42 reason/TerminationStart Received signal to terminate, becoming unready, but keeping serving Sep 03 08:53:40.000 I ns/default namespace/kube-system node/apiserver-75cf4778cb-c8429 reason/TerminationStart Received signal to terminate, becoming unready, but keeping serving | |||
#1955333 | bug | 13 months ago | "Kubernetes APIs remain available for new connections" and similar failing on 4.8 Azure updates NEW |
2021-05-01T03:59:42Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Killing: Stopping container kube-apiserver-check-endpoints 2021-05-01T03:59:42Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Killing: Stopping container kube-apiserver-insecure-readyz 2021-05-01T03:59:43Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationPreShutdownHooksFinished: All pre-shutdown hooks have been finished 2021-05-01T03:59:43Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationStart: Received signal to terminate, becoming unready, but keeping serving 2021-05-01T03:59:49Z 1 cert-regeneration-controller-lock LeaderElection: ip-10-0-239-74_02f2b687-97f4-44c4-9516-e3fb364deb85 became leader 2021-05-01T04:00:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationMinimalShutdownDurationFinished: The minimal shutdown duration of 1m10s finished 2021-05-01T04:00:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationStoppedServing: Server has stopped listening 2021-05-01T04:01:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationGracefulTerminationFinished: All pending requests processed 2021-05-01T04:01:55Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Pulling: Pulling image "registry.ci.openshift.org/ocp/4.8-2021-04-30-212732@sha256:e4c7be2f0e8b1e9ef1ad9161061449ec1bdc6953a58f6d456971ee945a8d3197" 2021-05-01T04:02:05Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Created: Created container setup 2021-05-01T04:02:05Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Pulled: Container image "registry.ci.openshift.org/ocp/4.8-2021-04-30-212732@sha256:e4c7be2f0e8b1e9ef1ad9161061449ec1bdc6953a58f6d456971ee945a8d3197" already present on machine That really looks like kube-apiserver is rolling out a new version, and for some reason there is not the graceful LB handoff we need to avoid connection issues. Unifying the two timelines: * 03:59:43Z TerminationPreShutdownHooksFinished * 03:59:43Z TerminationStart: Received signal to terminate, becoming unready, but keeping serving * 04:00:53Z TerminationMinimalShutdownDurationFinished: The minimal shutdown duration of 1m10s finished * 04:00:53Z TerminationStoppedServing: Server has stopped listening * 04:00:58.307Z kube-apiserver-new-connection started failing... connection refused * 04:00:59.314Z kube-apiserver-new-connection started responding to GET requests * 04:01:03.307Z kube-apiserver-new-connection started failing... connection refused * 04:01:04.313Z kube-apiserver-new-connection started responding to GET requests | |||
#1979916 | bug | 20 months ago | kube-apiserver constantly receiving signals to terminate after a fresh install, but still keeps serving ASSIGNED |
kube-apiserver-master-0-2 Server has stopped listening kube-apiserver-master-0-2 The minimal shutdown duration of 1m10s finished redhat-operators-7p4nb Stopping container registry-server Successfully pulled image "registry.redhat.io/redhat/redhat-operator-index:v4.8" in 3.09180991s |
Found in 0.00% of runs (0.00% of failures) across 20 total runs and 1 jobs (70.00% failed) in 111ms - clear search | chart view - source code located on github