I'm running a lot of UI tests in a pipeline via a GitLab Runner. The test are run on VMs in the GCP in docker containers.
Most of the time this works without problems, but sporadically a job running the ui tests will time out, because one test is stuck until the global job timeout is reached
Context:
- This happens to random tests, not always the same
- Test timeout is set via
BrowserContext...setDefaultTimeout()
but these tests will never timeout- However, timeout does work in general for test failures/flickers
setDefaultTimeout()
will also setsetDefaultNavigationTimeout
- VMs and docker container are reachable and there is seemingly enough RAM and CPU
- Thread dump just shows that the test is at
PlaywrightAssertions.assertThat(locator).isVisible();
and waiting - This also happens when given copious amount of RAM and CPU
- In the trace viewer it seems to be stuck on an "about:blank" page and the error "Timeout 10000ms exceeded." is shown
- This always seem to happen very early in the test, basically after the first
Page.navigate
and thenFrame.click
- The browser (in the trace viewer) is always blank for every step
- This always seem to happen very early in the test, basically after the first
- Actively killing the running chrome instance fails the tests and then it succeeds in subsequent retries
- Only way to reproduce this behavior somewhat locally is to send a
SIGSTOP
signal to the chrome instance- Sending
SIGCONT
will just resume the test, it will not immediately time out
- Sending
- Setting
DEBUG=pw:api
does unfortunately not output anything for the stuck test execution
Already done:
- More memory for the server
- Updated to newest Playwright version 1.49.0
- Updated to Chrome channel
new BrowserType...setChannel("chrome")
Questions:
- Are there more possibilities to set timeouts besides setting a JUnit timeout for the UI tests?
- How could I further debug this?