I noticed that when I attach a dummy XDP program (that simply returns XDP_PASS) to the NIC driver, the single-core TCP throughput drops from 28 Gbps to 24 Gbps.
Upon inspecting the CPU function stack trace, I noticed approximately 35% increase in the CPU usage of the following driver functions:
mlx5e_skb_from_cqe_linear
(this is where the XDP program runs before creating SKB)mlx5e_post_rx_wqes
What could be causing the extra CPU usage? Are there any context-switching, bookkeeping, or cache-pollution overheads associated with BPF programs?