Intermittent v9.18 build fails on Fedora COPR buildsys, always in `netmgr_test` ?

Tue Aug 30 17:38:28 UTC 2022

I run intentionally unit tests on every platform. Unlike system tests, 
unit tests can be part of build on every platform we build for. Our 
testing farm does not offer a simple way to run it similar way on each 
platform.

I keep unit tests enabled on purpose even when they sometime fail the 
whole build. It is annoying that just fail on single platform can fail 
whole build for all other platforms. This is limitation of our builders 
on Fedora.

If reliability is required, I think better variant would be removing 
unstable tests from tests/isc/Makefile, check_PROGRAMS variable. Because 
I have no reliable way to reproduce those issues, I were unable to try 
fixing or skipping less reliable test cases. It would help if there 
would be detailed log for each unit test, noting which test case has failed.

Is there a simple way to print details of just failed unit tests after 
the testing? Our builders do not allow ssh to the host and examining 
results later. What is not in log output is lost, no artifacts are 
available for a later download. So I guess scripting around logs would 
be needed to know where it fails the most often. It could be patched out 
for production builds then, until a way to make them more reliable is found.

But I value those tests and the effort you guys put into them. I would 
like keep them running on each build. I understand your failures are 
less annoning, because gitlab UI allows simpler restart of just selected 
runs. Unfortunately our RHEL or Fedora builds have no such ability. But 
tests are neat and we want them running anyway.

Thanks for the awesome work on those!

Cheers,
Petr

On 30. 08. 22 6:20, Ondřej Surý wrote:
> Then run only the system tests by running make check only in the bin/tests/system directory instead of the top level. Or don’t run the tests at all - these are mostly meant for development purposes where we have better control over the build environment.
>
> Ondřej
> --
> Ondřej Surý — ISC (He/Him)
>
> My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours.
>
>> On 30. 8. 2022, at 0:56, PGNet Dev <pgnet.dev at gmail.com> wrote:
>>
>> 
>>>> You might want to set the CI=true environment variable to reduce the set of the netmgr unit tests to just the more reliable subset.
>>> thx, trying that now @ COPR
>> with
>>
>>     export CI=true
>>
>> in .spec @
>>
>>     https://src.fedoraproject.org/fork/pgfed/rpms/bind/blob/rawhide/f/bind.spec#_357
>>
>> similarly random.intermittent FAILs,
>>
>> 3x OK, 0x FAIL  https://copr.fedorainfracloud.org/coprs/pgfed/bind-FORK/build/4784746/
>> 1x OK, 2x FAIL  https://copr.fedorainfracloud.org/coprs/pgfed/bind-FORK/build/4784745/
>> 0x OK, 3x FAIL  https://copr.fedorainfracloud.org/coprs/pgfed/bind-FORK/build/4784744/
>> 1x OK, 2x FAIL  https://copr.fedorainfracloud.org/coprs/pgfed/bind-FORK/build/4784743/
>>
>> either the export is incorrectly def'd/placed, or insufficient
>>
-- 
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB