[yt-users] parallel_objects with projection hanging

Semyeong Oh semyeong.oh at gmail.com
Sat Dec 6 16:39:49 PST 2014


Hi Matt,

My output from using 2 processors on 3 objects is here https://gist.github.com/smoh/6e396a7606a3bbff3450
I’ve set loglevel to 1.
The first two objects ran fine, printing some output
500 …
501 …
Then, the third object start running on rank 0, while rank 1 sleeps. The projection happens, but after that upon this line,
P000 yt : [DEBUG    ] 2014-12-06 19:03:48,494 Opening MPI Barrier on 0
the process never ends.
The rest of the output is from sending sigusr1 to each process, which I hardly understand..
(There are two projections involved in my calculation)

Any clues?

Thanks,
Semyeong


> On Dec 6, 2014, at 11:53 AM, Matthew Turk <matthewturk at gmail.com> wrote:
> 
> Hi Semyeong,
> 
> This is somewhat odd.  When you say the process hangs, do you mean
> that the process of projection hangs, or the yt script as a whole
> hangs?  You should be able to send SIGUSR1 to the processes to get a
> stack trace, which may help with debugging.  Or, if you Ctrl-C, it may
> output a stack trace, which will help see where it's hanging.
> 
> -Matt
> 
> On Sat, Dec 6, 2014 at 5:14 AM, Semyeong Oh <semyeong.oh at gmail.com> wrote:
>> Hi yt,
>> 
>> I have two questions on using parallel_objects. I am using yt 2.6.
>> 
>> 1. I have a problem of parallel_objects hanging at the end.
>> 
>> def do(i, pf):
>>   cube = pf.h.region(..)
>>   proj = pf.h.proj(…., source=cube)
>>   frb = proj.to_frb(..)
>>   ….
>> 
>> objects = [list of indices]
>> pf = load(..)
>> for i in parallel_objects(objects):
>>    do(i, pf)
>> 
>> and I run the script as
>> mpirun -np Nprocs python myscripy.py —parallel
>> 
>> When I tested with a simple print operation in do instead of proj, the parallel_objects seem to handle
>> cases when Nobjects is not divisible by Nprocs just fine. But with my real script that has proj in do, it seems to hang at the end. For example, if Nobjects is 3 and Nprocs is 2, the first two objects goes without problem, but the projection of the third completes, but the process sort of hangs there. Why so?
>> 
>> 2. Is it possible to use a portion of Nprocs assigned? Also playing around with simple print operation, it seems that because of the way parallel_objects divide work, the work is duplicated. e..g, when I do mpirun -np 5 but have parallal_objects(objects, njobs=3)
>> rank i_object
>> 0 1
>> 1 1
>> 2 2
>> 3 2
>> 4 3
>>>> so object 1 would still run simultaneously on rank 0 and 1.
>> 
>> To prevent this, would something like below work?
>> 
>> size = MPI.COMM_WORLD.Get_size()
>> rank = MPI.COMM_WORLD.Get_rank()
>> njobs = 3
>> for ind in parallel_objects(objects, njobs):
>>    if rank % int(size/njobs) != 0:
>>        continue
>>    else:
>>        do(ind)
>> 
>> Thanks,
>> Semyeong
>> 
>> 
>> 
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org




More information about the yt-users mailing list