Job:
#1842002bug2 years agoKubePodCrashLooping kube-contoller-manager cluster-policy-controller: 6443: connect: connection refused RELEASE_PENDING
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.5/2428/artifacts/e2e-gcp/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-kube-apiserver") | .firstTimestamp + " " + .lastTimestamp + " " + .message' | sort
...
2020-05-30T01:10:53Z 2020-05-30T01:10:53Z All pending requests processed
2020-05-30T01:10:53Z 2020-05-30T01:10:53Z Server has stopped listening
2020-05-30T01:10:53Z 2020-05-30T01:10:53Z The minimal shutdown duration of 1m10s finished
...
2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Created container kube-apiserver-cert-regeneration-controller
2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Created container kube-apiserver-cert-syncer
2020-05-30T01:11:58Z 2020-05-30T01:11:58Z Started container kube-apiserver
...
#1934628bug19 months agoAPI server stopped reporting healthy during upgrade to 4.7.0 ASSIGNED
during that time the API server was restarted by kubelet due to a failed liveness probe
14:18:00	openshift-kube-apiserver	kubelet	kube-apiserver-ip-10-0-159-123.ec2.internal	Killing	Container kube-apiserver failed liveness probe, will be restarted
14:19:17	openshift-kube-apiserver	apiserver	kube-apiserver-ip-10-0-159-123.ec2.internal	TerminationMinimalShutdownDurationFinished	The minimal shutdown duration of 1m10s finished
moving to etcd team to investigate why etcd was unavailable during that time
Comment 15200626 by mfojtik@redhat.com at 2021-06-17T18:29:50Z
The LifecycleStale keyword was removed because the bug got commented on recently.
#1943804bug20 months agoAPI server on AWS takes disruption between 70s and 110s after pod begins termination via external LB RELEASE_PENDING
    "name": "kube-apiserver-ip-10-0-131-183.ec2.internal",
    "namespace": "openshift-kube-apiserver"
  },
  "kind": "Event",
  "lastTimestamp": null,
  "message": "The minimal shutdown duration of 1m10s finished",
  "metadata": {
    "creationTimestamp": "2021-03-29T12:18:04Z",
    "name": "kube-apiserver-ip-10-0-131-183.ec2.internal.1670cf61b0f72d2d",
    "namespace": "openshift-kube-apiserver",
    "resourceVersion": "89139",
#1921157bug2 years ago[sig-api-machinery] Kubernetes APIs remain available for new connections ASSIGNED
T2: At 06:45:58: systemd-shutdown was sending SIGTERM to remaining processes...
T3: At 06:45:58: kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: Received signal to terminate, becoming unready, but keeping serving (TerminationStart event)
T4: At 06:47:08 kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: The minimal shutdown duration of 1m10s finished (TerminationMinimalShutdownDurationFinished event)
T5: At 06:47:08 kube-apiserver-ci-op-z52cbzhi-6d7cd-pz2jw-master-0: Server has stopped listening (TerminationStoppedServing event)
T5 is the last event reported from that api server. At T5 the server might wait up to 60s for all requests to complete and then it fires TerminationGracefulTerminationFinished event.
#1932097bug20 months agoApiserver liveness probe is marking it as unhealthy during normal shutdown RELEASE_PENDING
Feb 23 20:18:04.212 - 1s    E kube-apiserver-new-connection kube-apiserver-new-connection is not responding to GET requests
Feb 23 20:18:05.318 I kube-apiserver-new-connection kube-apiserver-new-connection started responding to GET requests
Deeper detail from the node log shows that right as we get this error one of the instances finishes its connection ,which is right when the error happens.
Feb 23 20:18:02.505 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-203-7.us-east-2.compute.internal node/ip-10-0-203-7 reason/TerminationMinimalShutdownDurationFinished The minimal shutdown duration of 1m10s finished
Feb 23 20:18:02.509 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-203-7.us-east-2.compute.internal node/ip-10-0-203-7 reason/TerminationStoppedServing Server has stopped listening
Feb 23 20:18:03.148 I ns/openshift-console-operator deployment/console-operator reason/OperatorStatusChanged Status for clusteroperator/console changed: Degraded message changed from "CustomRouteSyncDegraded: the server is currently unable to handle the request (delete routes.route.openshift.io console-custom)\nSyncLoopRefreshDegraded: the server is currently unable to handle the request (get routes.route.openshift.io console)" to "SyncLoopRefreshDegraded: the server is currently unable to handle the request (get routes.route.openshift.io console)" (2 times)
Feb 23 20:18:03.880 E kube-apiserver-reused-connection kube-apiserver-reused-connection started failing: Get "https://api.ci-op-ivyvzgrr-0b477.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/default": dial tcp 3.21.250.132:6443: connect: connection refused
This kind of looks like the load balancer didn't remove the kube-apiserver and kept sending traffic and the connection didn't cleanly shut down - did something regress in the apiserver traffic connection?
#1995804bug17 months agoRewrite carry "UPSTREAM: <carry>: create termination events" to lifecycleEvents RELEASE_PENDING
Use the new lifecycle event names for the events that we generate when an apiserver is gracefully terminating.
Comment 15454963 by kewang@redhat.com at 2021-09-03T09:36:37Z
$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=The+minimal+shutdown+duration&maxAge=168h&context=5&type=build-log&name=4%5C.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep -E 'kube-system node\/apiserver|openshift-kube-apiserver|openshift-apiserver' > test.log
$ grep 'The minimal shutdown duration of' test.log | head -2
Sep 03 05:22:37.000 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-163-71.us-west-1.compute.internal node/ip-10-0-163-71 reason/AfterShutdownDelayDuration The minimal shutdown duration of 3m30s finished
Sep 03 05:22:37.000 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-163-71.us-west-1.compute.internal node/ip-10-0-163-71 reason/AfterShutdownDelayDuration The minimal shutdown duration of 3m30s finished
$ grep 'Received signal to terminate' test.log | head -2
Sep 03 08:49:11.000 I ns/default namespace/kube-system node/apiserver-75cf4778cb-9zk42 reason/TerminationStart Received signal to terminate, becoming unready, but keeping serving
Sep 03 08:53:40.000 I ns/default namespace/kube-system node/apiserver-75cf4778cb-c8429 reason/TerminationStart Received signal to terminate, becoming unready, but keeping serving
#1955333bug13 months ago"Kubernetes APIs remain available for new connections" and similar failing on 4.8 Azure updates NEW
  2021-05-01T03:59:42Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Killing: Stopping container kube-apiserver-check-endpoints
  2021-05-01T03:59:42Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Killing: Stopping container kube-apiserver-insecure-readyz
  2021-05-01T03:59:43Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationPreShutdownHooksFinished: All pre-shutdown hooks have been finished
  2021-05-01T03:59:43Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationStart: Received signal to terminate, becoming unready, but keeping serving
  2021-05-01T03:59:49Z 1 cert-regeneration-controller-lock LeaderElection: ip-10-0-239-74_02f2b687-97f4-44c4-9516-e3fb364deb85 became leader
  2021-05-01T04:00:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationMinimalShutdownDurationFinished: The minimal shutdown duration of 1m10s finished
  2021-05-01T04:00:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationStoppedServing: Server has stopped listening
  2021-05-01T04:01:53Z null kube-apiserver-ip-10-0-189-59.ec2.internal TerminationGracefulTerminationFinished: All pending requests processed
  2021-05-01T04:01:55Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Pulling: Pulling image "registry.ci.openshift.org/ocp/4.8-2021-04-30-212732@sha256:e4c7be2f0e8b1e9ef1ad9161061449ec1bdc6953a58f6d456971ee945a8d3197"
  2021-05-01T04:02:05Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Created: Created container setup
  2021-05-01T04:02:05Z 1 kube-apiserver-ip-10-0-189-59.ec2.internal Pulled: Container image "registry.ci.openshift.org/ocp/4.8-2021-04-30-212732@sha256:e4c7be2f0e8b1e9ef1ad9161061449ec1bdc6953a58f6d456971ee945a8d3197" already present on machine
That really looks like kube-apiserver is rolling out a new version, and for some reason there is not the graceful LB handoff we need to avoid connection issues.  Unifying the two timelines:
* 03:59:43Z TerminationPreShutdownHooksFinished
* 03:59:43Z TerminationStart: Received signal to terminate, becoming unready, but keeping serving
* 04:00:53Z TerminationMinimalShutdownDurationFinished: The minimal shutdown duration of 1m10s finished
* 04:00:53Z TerminationStoppedServing: Server has stopped listening
* 04:00:58.307Z kube-apiserver-new-connection started failing... connection refused
* 04:00:59.314Z kube-apiserver-new-connection started responding to GET requests
* 04:01:03.307Z kube-apiserver-new-connection started failing... connection refused
* 04:01:04.313Z kube-apiserver-new-connection started responding to GET requests
#1979916bug20 months agokube-apiserver constantly receiving signals to terminate after a fresh install, but still keeps serving ASSIGNED
kube-apiserver-master-0-2
Server has stopped listening
kube-apiserver-master-0-2
The minimal shutdown duration of 1m10s finished
redhat-operators-7p4nb
Stopping container registry-server
Successfully pulled image "registry.redhat.io/redhat/redhat-operator-index:v4.8" in 3.09180991s

Found in 0.00% of runs (0.00% of failures) across 20 total runs and 1 jobs (70.00% failed) in 111ms - clear search | chart view - source code located on github