2025.06.29 - Proxmox Fix

Intro

Po ostatnich aktualizacjach Proxmoxa zaczął mi się resetować Lenovo m920q. Przejrzałem internety i problem jest diagnozowany jako "marne sterowniki do karty sieciowej Intela".
Drugim problemem to zaśmiecanie logów wpisami z firewall'a.

Network - fix z internetu

W logach widać, że maszyna przed resetem ma problem z kartą sieciową:

Jun 29 04:24:07 pve3 ceph-osd[1173]: 2025-06-29T04:24:07.113+0200 xxx -1 osd.0 1844 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.xxx.0:xxx 4.12 4:xxx:::rbd_data.xxx.xxx:head [write 1200128~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e1844)
Jun 29 04:24:07 pve3 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                               TDH                  <95>
                               TDT                  <7f>
                               next_to_use          <7f>
                               next_to_clean        <94>
                             buffer_info[next_to_clean]:
                               time_stamp           <10ce578be>
                               next_to_watch        <95>
                               jiffies              <10ce60f00>
                               next_to_watch.status <0>
                             MAC Status             <80083>
                             PHY Status             <796d>
                             PHY 1000BASE-T Status  <3800>
                             PHY Extended Status    <3000>
                             PCI Status             <10>
Jun 29 04:24:08 pve3 ceph-osd[1173]: 2025-06-29T04:24:08.090+0200 xxx -1 osd.0 1844 heartbeat_check: no reply from 192.168.2.251:6806 osd.1 since back 2025-06-29T04:23:25.862229+0200 front 2025-06-29T04:23:25.862237+0200 (oldest deadline 2025-06-29T04:23:51.161943+0200)
Jun 29 04:24:08 pve3 ceph-osd[1173]: 2025-06-29T04:24:08.090+0200 xxx -1 osd.0 1844 heartbeat_check: no reply from 192.168.2.250:6806 osd.2 since back 2025-06-29T04:23:25.862255+0200 front 2025-06-29T04:23:25.862248+0200 (oldest deadline 2025-06-29T04:23:51.161943+0200)
Jun 29 04:24:08 pve3 ceph-osd[1173]: 2025-06-29T04:24:08.090+0200 xxx -1 osd.0 1844 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.xxx.0:xxx 4.12 4:xxx:::rbd_data.xxx.xxx:head [write 1200128~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e1844)
-- Boot xxx --
Jun 29 04:26:41 pve3 kernel: Linux version 6.8.12-11-pve (build@proxmox) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-11 (2025-05-22T09:39Z) ()
Jun 29 04:26:41 pve3 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=/dev/mapper/pve-root ro quiet

Problem opisany w https://bugzilla.proxmox.com/show_bug.cgi?id=6273

Potencjalne rozwiązanie to https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-8#post-375919

Dopisałem do /etc/networking/interfaces post-up dla eno1.

(...)

iface eno1 inet manual
        post-up /usr/bin/logger -p info -t ifup "Disabling offload for eno1" && /sbin/ethtool -K eno1 tso off gso off gro off && /usr/bin/logger -p info -t ifup "Disabled offload of eno1"

(...)

Po ręcznym uruchomieniu skrypty w logach widać wpis

# journalctl -S "1min ago"
Jun 29 11:03:03 pve3 ifup[167278]: Disabling offload for eno1
Jun 29 11:03:03 pve3 ifup[167280]: Disabled offload of eno1

Teraz czekamy i patrzymy czy serwer będzie bardziej stabilny.

Firewall - fix z internetu

W logach widnieje dużo wpisów:

Jun 29 10:03:36 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:03:46 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:03:56 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:04:06 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:04:16 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:04:26 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:04:36 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:04:46 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:04:56 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:05:06 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added
Jun 29 10:05:16 pve3 pve-firewall[1155]: status update error: ipset_restore_cmdlist: ipset v7.17: Error in line 4: Element cannot be added to the set: it's already added

Problem i rozwiązanie opisane w https://forum.proxmox.com/threads/proxmox-firewall-doesnt-seem-to-work-and-errors-in-log.128359/#post-561729

Przy konfiguracji firewalla chyba źle podałem aliasy dla hostów PVE - miały adresy 192.168.2.x/24 zamiast /32. Usunięcie maski via GUI (Datacenter -> Firewall -> Alias) rozwiązało sprawę i w logach nie ma już błędów.

Firewall config
Firewall config
Jun 29 11:07:46 pve3 pvefw-logger[592]: received terminate request (signal)
Jun 29 11:07:46 pve3 pvefw-logger[592]: stopping pvefw logger
Jun 29 11:07:46 pve3 systemd[1]: Stopping pvefw-logger.service - Proxmox VE firewall logger...
Jun 29 11:07:47 pve3 systemd[1]: pvefw-logger.service: Deactivated successfully.
Jun 29 11:07:47 pve3 systemd[1]: Stopped pvefw-logger.service - Proxmox VE firewall logger.
Jun 29 11:07:47 pve3 systemd[1]: pvefw-logger.service: Consumed 1.316s CPU time.
Jun 29 11:07:47 pve3 systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall logger...
Jun 29 11:07:47 pve3 pvefw-logger[169286]: starting pvefw logger
Jun 29 11:07:47 pve3 systemd[1]: Started pvefw-logger.service - Proxmox VE firewall logger.
Jun 29 11:08:16 pve3 pmxcfs[146658]: [status] notice: received log
Jun 29 11:08:45 pve3 pmxcfs[146658]: [status] notice: received log
Jun 29 11:08:45 pve3 pmxcfs[146658]: [status] notice: received log
Jun 29 11:11:13 pve3 pmxcfs[146658]: [status] notice: received log