2024-07-13 12:43 AEST

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0000461mercuryBugpublic2020-04-20 15:06
Assigned Towangp 
Product Version 
Target VersionFixed in Version 
Summary0000461: MR_verify_final_engine_sleep_sync assertion failure (parallel conjunction)
DescriptionTests in par_conj fail intermittently in asm_fast.par.gc with an assertion failure of the form:

dep_par_17: mercury_context.c:1839: MR_verify_final_engine_sleep_sync: Assertion `esync->d.es_action == MR_ENGINE_ACTION_NONE' failed.

(As I remember, it doesn't just affect parallel conjunction but threads as well.)
TagsNo tags attached.
Attached Files




wangp (developer)

I found that I can reproduce this without parallel conjunction. The following command fails reliably on my machine. (parallel is GNU parallel; closeable_channel_test is from tests/hard_coded)

parallel ./closeable_channel_test >/dev/null ::: `seq 1 1000`

When the assertion in MR_verify_final_engine_sleep_sync fails, the value of esync->d.es_action is always MR_ENGINE_ACTION_SHUTDOWN.


wangp (developer)

In MR_do_idle_worksteal we have:

        switch (esync->d.es_action) {

Then the call sequence goes:

action_shutdown_ws_engine -> MR_finalize_thread_engine -> MR_shutdown_engine_for_threads -> MR_verify_final_engine_sleep_sync

Nothing changes esync->d.es_action before MR_verify_final_engine_sleep_sync asserts its value:

    assert(esync->d.es_action == MR_ENGINE_ACTION_NONE);

So the fix is either to widen the assertion, or to set esync->d.es_action = MR_ENGINE_ACTION_NONE somewhere after switching on it.


wangp (developer)

Fix committed 2020-04-20

-Issue History
Date Modified Username Field Change
2018-05-17 11:15 wangp New Issue
2020-04-17 13:40 wangp Note Added: 0001081
2020-04-17 17:34 wangp Note Added: 0001082
2020-04-20 15:06 wangp Assigned To => wangp
2020-04-20 15:06 wangp Status new => resolved
2020-04-20 15:06 wangp Resolution open => fixed
2020-04-20 15:06 wangp Note Added: 0001083
+Issue History