[Yt-dev] RMS mass overdensity method
Matthew Turk
matthewturk at gmail.com
Wed Mar 4 08:33:54 PST 2009
> Yes, in fact, I can see that after the first MPI error, other threads can still go and try to get spheres before they get killed by task manager.
Okay, hm, interesting.
>
> There are no core dumps. I ssh-ed in and ran 'top' and looked at the memory per process and total usage on the node, and it wasn't approaching the limits of the machine when it crashed. I think the fact that it crashes at different places in the run cycle means something else, but I don't know what.
This is suspicious. In fact, it makes me think there *could* be a
problem with processes hanging, waiting for Barriers. Do you have
debugging logging turned on? That should notify you whenever a
barrier is entered if it's done via the standard barrierization. (One
of the reasons I try to avoid any raw MPI calls.)
I'll see if I can write up a long-overdue mechanism for distinguishing
logs by processor and paste that.
-Matt
More information about the yt-dev
mailing list