Job:
#OCPBUGS-30224issue3 weeks ago"k8s.ovn.org/node-chassis-id annotation not found" event causing CI failures New
for the test case:
"[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]"
#OCPBUGS-30044issue4 weeks agoThe "Extra worker" on virtualmedia job is not working correctly. Verified
Issue 15843630: The "Extra worker" on virtualmedia job is not working correctly.
Description: If you look at all the failures on [this page|https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Machines&component=Cloud%20Compute%20%2F%20Cluster%20Autoscaler&confidence=95&environment=ovn%20no-upgrade%20amd64%20metal-ipi%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-02-28%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-02-22%2000%3A00%3A00&testId=openshift-tests%3A9f3fb60052539c29ab66564689f616ce&testName=%5Bsig-cluster-lifecycle%5D%5BFeature%3AMachines%5D%5BSerial%5D%20Managed%20cluster%20should%20grow%20and%20decrease%20when%20scaling%20different%20machineSets%20simultaneously%20%5BTimeout%3A30m%5D%5Bapigroup%3Amachine.openshift.io%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial] you will notice that there is a problem with the "extra worker" required for the virtualmedia tests.  One way to see this is by clicking on [the camgi|https://github.com/elmiko/okd-camgi] link in the prowjob (see attached screenshot if you aren't familiar with this tool).
 
 When the MCO team investigated they could tell that the machine had indeed successfully joined the cluster for some period of time.  You can see "message: Kubelet stopped posting node status" if you click on the extraworker node in camgi.  As best they can tell this is an infrastructure problem. 
 
  
 
 This problem is causing quite a bit of toil understanding the CI signal for autoscaling on the metal platform.  We need assistance from the metal team to improve this or help find other teams to involve in the debugging process.
 
  
 
 Everything below this line is the details from Component Readiness:
 -----------------------
 Component Readiness has found a potential regression in [sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial].
 
 Probability of significant regression: 99.62%
 
 Sample (being evaluated) Release: 4.15
 Start Time: 2024-02-22T00:00:00Z
 End Time: 2024-02-28T23:59:59Z
 Success Rate: 80.00%
 Successes: 16
 Failures: 4
 Flakes: 0
 
 Base (historical) Release: 4.14
 Start Time: 2023-10-04T00:00:00Z
 End Time: 2023-10-31T23:59:59Z
 Success Rate: 98.35%
 Successes: 119
 Failures: 2
 Flakes: 0
 
 View the test details report at [https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Machines&component=Cloud%20Compute%20%2F%20Cluster%20Autoscaler&confidence=95&environment=ovn%20no-upgrade%20amd64%20metal-ipi%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-02-28%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-02-22%2000%3A00%3A00&testId=openshift-tests%3A9f3fb60052539c29ab66564689f616ce&testName=%5Bsig-cluster-lifecycle%5D%5BFeature%3AMachines%5D%5BSerial%5D%20Managed%20cluster%20should%20grow%20and%20decrease%20when%20scaling%20different%20machineSets%20simultaneously%20%5BTimeout%3A30m%5D%5Bapigroup%3Amachine.openshift.io%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial]
Status: Verified
periodic-ci-openshift-release-master-nightly-4.14-e2e-vsphere-ovn-techpreview-serial (all) - 26 runs, 65% failed, 6% of failures match = 4% impact
#1783972006459346944junit17 hours ago
[sig-cluster-lifecycle][Feature:Machines][Early] Managed cluster should have same number of Machines and Nodes [apigroup:machine.openshift.io] [Suite:openshift/conformance/parallel]
[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]
#1783972006459346944junit17 hours ago
[sig-cluster-lifecycle][Feature:Machines][Early] Managed cluster should have same number of Machines and Nodes [apigroup:machine.openshift.io] [Suite:openshift/conformance/parallel]
[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]
#1783972006459346944junit17 hours ago
# [sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]
fail [github.com/openshift/origin/test/extended/machines/scale.go:280]: Timed out after 900.000s.

Found in 3.85% of runs (5.88% of failures) across 26 total runs and 1 jobs (65.38% failed) in 128ms - clear search | chart view - source code located on github