I'm still trying to debug the crash detailed in Crash in std::make_exception_ptr on AIX
Had the thought that it had something to do w/ the use of std::exception_ptr
, since i refactored some code to use that before the crash started to occur
I've managed to reproduce a similar crash using the following code, but only when using MALLOCTYPE=debug MALLOCDEBUG=postfree_checking
#include <cstdio>
#include <exception>
#include <future>
#include <thread>
#include <vector>
std::future<std::exception_ptr> GetExceptionFromDeadThread()
{
return std::async(std::launch::async, []() {
try
{
throw std::invalid_argument("Some string long enough to allocate on the heap? ");
}
catch (...)
{
return std::current_exception();
}
});
}
int main()
{
try
{
constexpr size_t numThreads(100);
std::vector<std::future<std::exception_ptr>> futures;
while (true)
{
for (size_t i = 0; i < numThreads; ++i)
{
futures.push_back(GetExceptionFromDeadThread());
}
while (!futures.empty())
{
auto& future(futures.back());
future.wait();
try
{
std::rethrow_exception(future.get());
}
catch (std::invalid_argument&)
{
std::fputs(".", stdout);
}
catch (...)
{
std::fprintf(stderr, "'std::rethrow_exception(future.get())' threw unexpected exception");
abort();
}
futures.pop_back();
}
}
}
catch (...)
{
std::fprintf(stderr, "Caught unexpected exception");
abort();
}
}
I'm compiling the program using the command xlclang++ -std=c++11 -D_REENTRANT -qfullpath -qmaxmem=-1 -q32 -qroconst -DNDEBUG -O2 -g test8.cpp
bash-5.2$ xlclang++ --version
IBM XL C/C++ for AIX, V16.1.0 (5725-C72, 5765-J12)
Version: 16.01.0000.0010
bash-5.2$ MALLOCTYPE=debug MALLOCDEBUG=postfree_checking ./a.out
.Segmentation fault (core dumped)
bash-5.2$ dbx ./a.out core
Type 'help' for help.
[using memory image in core]
reading symbolic information ...
Segmentation fault in __cxa_end_catch at 0xd0ead49c ($t1)
0xd0ead49c (__cxa_end_catch+0x1fc) 801e003c lwz r0,0x3c(r30)
(dbx) where
__cxa_end_catch() at 0xd0ead49c
main(), line 46 in "test8.cpp"
(dbx)
Line 46 is the std::fputs(".", stdout);
line
Does anyone see any sort of UB/bugs in my code which would 'justify' this crash?
Edit: Hmmm, it crashed w/o the debug malloc after a while
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Segmentation fault (core dumped)
You have new mail in /usr/spool/mail/bamboo
bash-5.2$ dbx ./a.out core
Type 'help' for help.
[using memory image in core]
reading symbolic information ...
warning: Unable to access address 0x2ff21d50 from core
Segmentation fault in extend_brk at 0xd01105c4 ($t1)
0xd01105c4 (extend_brk+0x2c4) 90040004 stw r0,0x4(r4)
(dbx) where
extend_brk(internal error: assertion failed at line 3915 in file frame.c
??, internal error: assertion failed at line 3915 in file frame.c
??, internal error: assertion failed at line 3915 in file frame.c
??) at 0xd01105c4
(dbx)
What the heck? How is it running out of memory? (and ugh @ how buggy dbx
is)
Edit 2: Running again after doing an export LDR_CNTRL=MAXDATA=0x80000000
(Had been doing that before when testing w/ the actual crashing program), haven't seen another crash, but very strange that memory usage would grow at all? (... std::async
threads exiting too slowly?)
Edit 3: It seemed to hang after a while, so I attached a debugger
stopped in _event_sleep at 0xd0573254 ($t1)
0xd0573254 (_event_sleep+0x4f4) 80410014 lwz r2,0x14(r1)
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0573254
_event_wait(??, ??) at 0xd0573f3c
_cond_wait_local(??, ??, ??) at 0xd05835dc
_cond_wait(??, ??, ??) at 0xd0583ef4
pthread_cond_wait(??, ??) at 0xd058494c
condition_variable.std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)(??, ??) at 0xd09f2eac
future.std::__1::promise<void>::set_exception_at_thread_exit(std::exception_ptr)._ZNSt3__117__assoc_sub_state10__sub_waitERNS_11unique_lockINS_5mutexEEE@AF111_20(??, ??) at 0xd09f99a8
std::__1::__assoc_sub_state::wait()(??) at 0xd09f7a28
std::__1::__async_assoc_state<std::exception_ptr, std::__1::__async_func<GetExceptionFromDeadThread()::$_0> >::__on_zero_shared()(this = 0x42242282), line 992 in "future"
test8.std::__1::future<std::exception_ptr> std::__1::__make_async_assoc_state<std::exception_ptr, std::__1::__async_func<GetExceptionFromDeadThread()::$_0> >(std::__1::__async_func<GetExceptionFromDeadThread()::$_0>&&)(__f = @0x30013418), line 3440 in "memory"
unnamed block in std::__1::future<std::__1::__invoke_of<std::__1::decay<GetExceptionFromDeadThread()::$_0>::type, std::__1::decay<>::type>::type> std::__1::async<GetExceptionFromDeadThread()::$_0>(std::__1::launch, GetExceptionFromDeadThread()::$_0&&)(__policy = async, __f = &(...)), line 2220 in "type_traits"
std::__1::future<std::__1::__invoke_of<std::__1::decay<GetExceptionFromDeadThread()::$_0>::type, std::__1::decay<>::type>::type> std::__1::async<GetExceptionFromDeadThread()::$_0>(std::__1::launch, GetExceptionFromDeadThread()::$_0&&)(__policy = async, __f = &(...)), line 2220 in "type_traits"
unnamed block in main(), line 9 in "test8.cpp"
unnamed block in main(), line 9 in "test8.cpp"
unnamed block in main(), line 9 in "test8.cpp"
main(), line 9 in "test8.cpp"
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
>$t1 run blocked 33360255 k no sys _event_sleep
$t2148206 terminated 36507643 no sys
$t7220275 terminated 123211641 no sys
$t10994266 terminated 15139439 no sys
(dbx) q
Unfortunately quitting dbx
also killed the process (not sure if there's a way to disconnect from the process first) so I couldn't look more closely at the threads, but given the fact they all say 'terminated', other than the main thread, it feels like this was a deadlock?
Edit 4: looking at topas
(AIX's version of top
), it looks like my test program does have some sort of leak? I've seen it go from ~500 to ~900 megs of PgSp so far and when I was looking at the 'hung' one, it was using exactly 2.00G so it feels like it hangs when it runs out of memory?
Edit 5: I reproduced the hang, and topas
was showing 2.00G
for PgSp
, confirmed that the other threads were actually dead (See What does '.() at 0xdeadbeef' mean in a (core file) stacktrace generated by dbx on AIX?)
stopped in _event_sleep at 0xd0573254 ($t1)
0xd0573254 (_event_sleep+0x4f4) 80410014 lwz r2,0x14(r1)
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0573254
_event_wait(??, ??) at 0xd0573f3c
_cond_wait_local(??, ??, ??) at 0xd05835dc
_cond_wait(??, ??, ??) at 0xd0583ef4
pthread_cond_wait(??, ??) at 0xd058494c
condition_variable.std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)(??, ??) at 0xd09f2eac
future.std::__1::promise<void>::set_exception_at_thread_exit(std::exception_ptr)._ZNSt3__117__assoc_sub_state10__sub_waitERNS_11unique_lockINS_5mutexEEE@AF111_20(??, ??) at 0xd09f99a8
std::__1::__assoc_sub_state::wait()(??) at 0xd09f7a28
std::__1::__async_assoc_state<std::exception_ptr, std::__1::__async_func<GetExceptionFromDeadThread()::$_0> >::__on_zero_shared()(this = 0x42242282), line 992 in "future"
test8.std::__1::future<std::exception_ptr> std::__1::__make_async_assoc_state<std::exception_ptr, std::__1::__async_func<GetExceptionFromDeadThread()::$_0> >(std::__1::__async_func<GetExceptionFromDeadThread()::$_0>&&)(__f = @0x300122e8), line 3440 in "memory"
unnamed block in std::__1::future<std::__1::__invoke_of<std::__1::decay<GetExceptionFromDeadThread()::$_0>::type, std::__1::decay<>::type>::type> std::__1::async<GetExceptionFromDeadThread()::$_0>(std::__1::launch, GetExceptionFromDeadThread()::$_0&&)(__policy = async, __f = &(...)), line 2220 in "type_traits"
std::__1::future<std::__1::__invoke_of<std::__1::decay<GetExceptionFromDeadThread()::$_0>::type, std::__1::decay<>::type>::type> std::__1::async<GetExceptionFromDeadThread()::$_0>(std::__1::launch, GetExceptionFromDeadThread()::$_0&&)(__policy = async, __f = &(...)), line 2220 in "type_traits"
unnamed block in main(), line 9 in "test8.cpp"
unnamed block in main(), line 9 in "test8.cpp"
unnamed block in main(), line 9 in "test8.cpp"
main(), line 9 in "test8.cpp"
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
>$t1 run blocked 72682831 k no sys _event_sleep
$t551804 terminated 117246667 no sys
$t971604 terminated 35326155 no sys
$t4654003 terminated 66456569 no sys
$t8224903 terminated 113575163 no sys
$t10640704 terminated 66914049 no sys
$t10991453 terminated 117574363 no sys
(dbx) thread current 551804
(dbx) where
.() at 0xdeadbeef
(dbx) thread current 971604
(dbx) where
.() at 0xdeadbeef
(dbx) thread current 4654003
(dbx) where
.() at 0xdeadbeef
(dbx) thread current 8224903
(dbx) where
.() at 0xdeadbeef
(dbx) thread current 10640704
(dbx) where
.() at 0xdeadbeef
(dbx) thread current 10991453
(dbx) where
.() at 0xdeadbeef
(dbx)
Not sure where to go from here, unless there's some UB in my code it seems like I've uncovered some compiler/toolchain bug(s)?
Edit 6: I tried removing the use of std::exception_ptr
(see Why is this simple program which uses std::async crashing?), and I still get the hang:
stopped in _event_sleep at 0xd0573254 ($t1)
0xd0573254 (_event_sleep+0x4f4) 80410014 lwz r2,0x14(r1)
(dbx) where
_event_sleep(??, ??, ??, ??, ??, ??) at 0xd0573254
_event_wait(??, ??) at 0xd0573f3c
_cond_wait_local(??, ??, ??) at 0xd05835dc
_cond_wait(??, ??, ??) at 0xd0583ef4
pthread_cond_wait(??, ??) at 0xd058494c
condition_variable.std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)(??, ??) at 0xd09f2eac
future.std::__1::promise<void>::set_exception_at_thread_exit(std::exception_ptr)._ZNSt3__117__assoc_sub_state10__sub_waitERNS_11unique_lockINS_5mutexEEE@AF111_20(??, ??) at 0xd09f99a8
std::__1::__assoc_sub_state::wait()(??) at 0xd09f7a28
std::__1::__async_assoc_state<std::invalid_argument, std::__1::__async_func<GetExceptionFromDeadThread()::$_0> >::__on_zero_shared()(this = (nil)), line 992 in "future"
test9.std::__1::future<std::invalid_argument> std::__1::__make_async_assoc_state<std::invalid_argument, std::__1::__async_func<GetExceptionFromDeadThread()::$_0> >(std::__1::__async_func<GetExceptionFromDeadThread()::$_0>&&)(__f = @0x2ff225e0), line 3440 in "memory"
unnamed block in std::__1::future<std::__1::__invoke_of<std::__1::decay<GetExceptionFromDeadThread()::$_0>::type, std::__1::decay<>::type>::type> std::__1::async<GetExceptionFromDeadThread()::$_0>(std::__1::launch, GetExceptionFromDeadThread()::$_0&&)(__policy = async, __f = &(...)), line 2220 in "type_traits"
std::__1::future<std::__1::__invoke_of<std::__1::decay<GetExceptionFromDeadThread()::$_0>::type, std::__1::decay<>::type>::type> std::__1::async<GetExceptionFromDeadThread()::$_0>(std::__1::launch, GetExceptionFromDeadThread()::$_0&&)(__policy = async, __f = &(...)), line 2220 in "type_traits"
unnamed block in main(), line 9 in "test9.cpp"
unnamed block in main(), line 9 in "test9.cpp"
unnamed block in main(), line 9 in "test9.cpp"
main(), line 9 in "test9.cpp"
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
>$t1 run blocked 114165241 k no sys _event_sleep
$t243103 terminated 4853817 no sys
$t1366707 terminated 128123751 no sys
(dbx)
Edit 7: I don't know what to think anymore...
............................................................................................................................................................................................................................Segmentation fault (core dumped)
bash-5.2$ dbx ./a.out core
Type 'help' for help.
[using memory image in core]
reading symbolic information ...
Segmentation fault in extend_brk at 0xd01105c4 ($t1)
0xd01105c4 (extend_brk+0x2c4) 90040004 stw r0,0x4(r4)
(dbx) where
extend_brk(internal error: assertion failed at line 3915 in file frame.c
??, internal error: assertion failed at line 3915 in file frame.c
??, internal error: assertion failed at line 3915 in file frame.c
??) at 0xd01105c4
(dbx) q
bash-5.2$ cat test16.cpp
#include <cassert>
#include <cstdio>
#include <exception>
#include <future>
#include <functional>
#include <thread>
#include <vector>
#include <string>
int main()
{
try
{
while (true)
{
constexpr auto longStr(
"Some string long enough to allocate on the heap? ");
(void)std::invalid_argument(longStr);
std::fputs(".", stdout);
}
}
catch (...)
{
std::fprintf(stderr, "Caught unexpected exception");
abort();
}
}
My understanding is that Segmentation fault in extend_brk
indicates we've hit the virtual address limit for the process (...instead of returning NULL
from malloc
?), but I don't see how this latest test program could possibly cause the heap to grow past ~256megs, unless there was a leak in the implementation of std::invalid_argument
itself?
Edit 8: When I run it w/ LDR_CNTRL=MAXDATA=0x80000000
,
IOT/Abort trap in pthread_kill at 0xd057b12c ($t1)
0xd057b12c (pthread_kill+0xac) 80410014 lwz r2,0x14(r1)
(dbx) where
pthread_kill(??, ??) at 0xd057b12c
_p_raise(??) at 0xd057a508
raise.raise(??) at 0xd0123344
abort() at 0xd0189918
std::myabort()() at 0xd2722c0c
cxa_handlers.std::terminate()() at 0xd0eade40
__cxa_throw(??, ??, ??) at 0xd0ead054
stdlib_new_delete.operator new(unsigned long)(??) at 0xd0eb3abc
stdexcept.std::logic_error::logic_error(char const*).std::logic_error::logic_error(char const*)(??, ??) at 0xd09ed04c
unnamed block in main(), line 129 in "stdexcept"
unnamed block in main(), line 129 in "stdexcept"
main(), line 129 in "stdexcept"
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
>$t1 run running 45549741 k no sys pthread_kill
(dbx)