After running sets of tests on the NetBSD 10.0 beta release, I had opened about 2 dozen problem reports of varying severity, and thought it time to take stock. Some have been closed, many have identified solutions, and a few are head scratchers.
Full test cycle on an ARM soc
This image is from 48 hours and shows CPU core temperature (Celsius) during the automated test cycle
Reviewing the test results (failures, in other words) reveals a few issues worth opening tickets, as did reading up on the test algorithms and reference pages. The aspect that intrigued me primarily were tests that did not pass or fail consistently. A typical case is running out of memory caused by creating sample data of variable dimensions. Side effects of this, for example:
[ 61059.604145] UVM: pid 12801 (h_libarchive), uid 0 killed: out of swap
Sometimes this is visible with vmstat commands:
0 0 7960 330868 0 0 0 0 0 0 0 8519 18 36 0 0 100 0 0 7960 330868 0 0 0 0 0 0 0 8521 18 38 0 0 100 Mon May 1 23:24:01 UTC 2023 procs memory page disk faults cpu r b avm fre flt re pi po fr sr l0 in sy cs us sy id 1 0 1968 337656 47 0 0 0 4 4 1 8503 184 67 0 1 99 1 0 10828 328948 7245 0 0 0 0 0 99 9117 5113 989 7 13 80 2 0 19112 320688 2576 0 0 0 0 0 42 9392 4107 1979 12 16 72 0 0 8568 331328 1534 0 0 0 0 0 89 8919 1788 949 1 5 95 0 0 7896 332000 201 0 0 0 0 0 0 8534 254 44 0 0 100 0 0 7896 332000 0 0 0 0 0 0 0 8527 18 36 0 0 100
Given those results are not indicative of system faults, looking at the other test case failures and sorting them by fixed/workarounds/etc, I have 3 that I don't know the root cause, and that's all of consequence. Two of them are wifi-relates and the third is a failure to compile profiling data into an an executable on a 32-bit Pi 02W.
One of the intermittent has corresponding out-of-memory messages (57291). While it's a low priority issue because the tests would pass with more storage space, there is probably a way to avoid runaway space requests and still have useful tests.
Tickets with unknown root cause:
- misc/57303 [serious/medium]: ATF unit test usr.sbin/tcpdump/t_tcpdump fails when wireless active on amd64
- toolchain/57321 [non-critical/medium]: ATF test case usr.bin/cc/t_hello:hello_profile fails on RPI02W/evbarm only
- bin/57366 [serious/medium]: Automated test usr.sbin/tcpdump/t_tcpdump:promiscuous fails on Rpi3 with wifi active
Tickets with known cause/workaround:
- kern/57185 [non-critical/medium]: Python build fails on 10_BETA due to no entropy on Atom CPU system
- misc/57286 [serious/medium]: Unit test fs/tmpfs/t_vnode_leak fails in ATF Tests suite
- misc/57291 [serious/medium]: Unit test for lib/libc/regex/t_exhaust fails in ATF Tests suite with signal 9
- lib/57314 [serious/medium]:
- ATF unit tests fail on 3 of 7 cases in program lib/libc/c063/t_utimensat on evbarm/Rpi 02W
- kern/57320 [serious/medium]: ATF test case kernel/t_magic_symlinks:machine_arch fails on RPI02W/evbarm only [?]
- lib/57331 [serious/medium]: Automated unit test lib/libc/net/t_servent:servent fails on amd64 only
- misc/57361 [non-critical/medium]: Automated test t_archive fails 2 test cases on an Rpi3
Tickets fixed:
- misc/57284 [serious/medium]: Unit test for envstat fails in ATF Tests suite on one machine
- kern/57319 [serious/medium]: ATF test case kernel/t_magic_symlinks fails as non-root instead of showing expected fail message
Tickets with intermittent pass/fail results:
- misc/57291 [serious/medium]: Unit test for lib/libc/regex/t_exhaust fails in ATF Tests suite with signal 9
- kern/57345 [serious/medium]: Automated test kernel/kqueue/t_empty fails intermittently on an amd64 machine
- toolchain/57351 [serious/medium]: Automated test usr.bin/c++/t_tsan_vptr_race:vptr_race fails intermittently on an amd64 machine
- kern/57371 [serious/medium]: Automated test fs/vfs/t_vnops:nfs_rename_reg_nodir fails intermittently on Rpi3 and Rpi4
- kern/57385 [serious/medium]: Automated test case for puffs file system fails intermittently on different architectures
Documentation tickets:
- misc/57318 [non-critical/low]: Minor typos in an automated test case - atf/tools/atf-run_test
- misc/57332 [non-critical/low]: Replace 'http' with 'https' on netbsd.org links found in man pages
- misc/57343 [non-critical/low]: Typo in automated test rumpkern/t_vm.c ('this' should say 'thus')
- misc/57344 [non-critical/low]: WIki page for evbarm port missing rpi4 mention
- misc/57347 [non-critical/low]: Several man pages have obsolete file location references under /usr/share/doc
- misc/57397 [non-critical/low]: Minor comment typos in t_vnops.c test program