<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>For what it's worth, this is essentially the same problem I reported the other day--projections in parallel_objects hanging.<br><br><span style="font-size: 13pt; background-color: rgba(255, 255, 255, 0);">John ZuHone</span><br><div><span style="background-color: rgba(255, 255, 255, 0);">Kavli Center for Astrophysics and Space Research<br>Massachusetts Institute of Technology<br><a href="x-apple-data-detectors://0" x-apple-data-detectors="true" x-apple-data-detectors-type="address" x-apple-data-detectors-result="0">77 Massachusetts Ave.</a>, 37-582G<br><a href="x-apple-data-detectors://1/0" x-apple-data-detectors="true" x-apple-data-detectors-type="address" x-apple-data-detectors-result="1/0">Cambridge, MA 02139</a><br>(w) <a href="tel:617-253-2354" x-apple-data-detectors="true" x-apple-data-detectors-type="telephone" x-apple-data-detectors-result="1/1">617-253-2354</a><br>(m) <a href="tel:781-708-5004" x-apple-data-detectors="true" x-apple-data-detectors-type="telephone" x-apple-data-detectors-result="1/2">781-708-5004</a><br><a href="mailto:jzuhone@space.mit.edu" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="1/3">jzuhone@space.mit.edu</a><br><a href="mailto:jzuhone@gmail.com" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="2">jzuhone@gmail.com</a><br><a href="http://www.jzuhone.com/" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="3">http://www.jzuhone.com</a></span><br style="font-family: UICTFontTextStyleBody; -webkit-text-size-adjust: auto;"></div></div><div><br>On Dec 7, 2014, at 1:51 PM, Matthew Turk <<a href="mailto:matthewturk@gmail.com">matthewturk@gmail.com</a>> wrote:<br><br></div><blockquote type="cite"><div><span>Hi Semyeong,</span><br><span></span><br><span>Sounds like a mismatch barrier.</span><br><span></span><br><span>Can you try with the enzo_tiny_cosmology dataset from the website to</span><br><span>see if that works?  I am thinking there may be a corner case we</span><br><span>haven't seen in the deomain decomp.  We'll get this fixed!</span><br><span></span><br><span>-Matt</span><br><span></span><br><span>On Sat, Dec 6, 2014 at 7:47 PM, Semyeong Oh <<a href="mailto:semyeong.oh@gmail.com">semyeong.oh@gmail.com</a>> wrote:</span><br><blockquote type="cite"><span>Hi Matt,</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Counting Opening MPI Barrier on XX in log, I think what happens is</span><br></blockquote><blockquote type="cite"><span>somehow after one projection a barrier is opened. Thus, the process which does the projections opens different</span><br></blockquote><blockquote type="cite"><span>number of barriers from the process that’s idle and only opens a barrier due to this statement in parallel_objects:</span><br></blockquote><blockquote type="cite"><span>    if barrier:</span><br></blockquote><blockquote type="cite"><span>        my_communicator.barrier()</span><br></blockquote><blockquote type="cite"><span>and this is why it hangs after the second projection.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Setting barrier=False on parallel_objects wouldn’t work either because than it hangs after the barrier after the first projection.</span><br></blockquote><blockquote type="cite"><span>Is this expected, and is there a workaround?</span><br></blockquote><blockquote type="cite"><span>I couldn’t pinpoint why this happens exactly, but I’m guessing it has something to do with</span><br></blockquote><blockquote type="cite"><span>that yt supports doing the actual projection in parallel, not just using parallel_objects to parallelize over multiple objects.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Semyeong</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>On Dec 6, 2014, at 7:39 PM, Semyeong Oh <<a href="mailto:semyeong.oh@gmail.com">semyeong.oh@gmail.com</a>> wrote:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Hi Matt,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>My output from using 2 processors on 3 objects is here <a href="https://gist.github.com/smoh/6e396a7606a3bbff3450">https://gist.github.com/smoh/6e396a7606a3bbff3450</a></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>I’ve set loglevel to 1.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>The first two objects ran fine, printing some output</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>500 …</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>501 …</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Then, the third object start running on rank 0, while rank 1 sleeps. The projection happens, but after that upon this line,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>P000 yt : [DEBUG    ] 2014-12-06 19:03:48,494 Opening MPI Barrier on 0</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>the process never ends.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>The rest of the output is from sending sigusr1 to each process, which I hardly understand..</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>(There are two projections involved in my calculation)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Any clues?</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Thanks,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Semyeong</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>On Dec 6, 2014, at 11:53 AM, Matthew Turk <<a href="mailto:matthewturk@gmail.com">matthewturk@gmail.com</a>> wrote:</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Hi Semyeong,</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>This is somewhat odd.  When you say the process hangs, do you mean</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>that the process of projection hangs, or the yt script as a whole</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>hangs?  You should be able to send SIGUSR1 to the processes to get a</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>stack trace, which may help with debugging.  Or, if you Ctrl-C, it may</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>output a stack trace, which will help see where it's hanging.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>-Matt</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>On Sat, Dec 6, 2014 at 5:14 AM, Semyeong Oh <<a href="mailto:semyeong.oh@gmail.com">semyeong.oh@gmail.com</a>> wrote:</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Hi yt,</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>I have two questions on using parallel_objects. I am using yt 2.6.</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>1. I have a problem of parallel_objects hanging at the end.</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>def do(i, pf):</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> cube = pf.h.region(..)</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> proj = pf.h.proj(…., source=cube)</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> frb = proj.to_frb(..)</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span> ….</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>objects = [list of indices]</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>pf = load(..)</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>for i in parallel_objects(objects):</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>  do(i, pf)</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>and I run the script as</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>mpirun -np Nprocs python myscripy.py —parallel</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>When I tested with a simple print operation in do instead of proj, the parallel_objects seem to handle</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>cases when Nobjects is not divisible by Nprocs just fine. But with my real script that has proj in do, it seems to hang at the end. For example, if Nobjects is 3 and Nprocs is 2, the first two objects goes without problem, but the projection of the third completes, but the process sort of hangs there. Why so?</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>2. Is it possible to use a portion of Nprocs assigned? Also playing around with simple print operation, it seems that because of the way parallel_objects divide work, the work is duplicated. e..g, when I do mpirun -np 5 but have parallal_objects(objects, njobs=3)</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>rank i_object</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>0 1</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>1 1</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>2 2</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>3 2</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>4 3</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>…</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>so object 1 would still run simultaneously on rank 0 and 1.</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>To prevent this, would something like below work?</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>size = MPI.COMM_WORLD.Get_size()</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>rank = MPI.COMM_WORLD.Get_rank()</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>njobs = 3</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>for ind in parallel_objects(objects, njobs):</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>  if rank % int(size/njobs) != 0:</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>      continue</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>  else:</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>      do(ind)</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Thanks,</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Semyeong</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>_______________________________________________</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>yt-users mailing list</span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org">http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org</a></span><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>_______________________________________________</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>yt-users mailing list</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org">http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org</a></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>_______________________________________________</span><br></blockquote><blockquote type="cite"><span>yt-users mailing list</span><br></blockquote><blockquote type="cite"><span><a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a></span><br></blockquote><blockquote type="cite"><span><a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org">http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org</a></span><br></blockquote><span>_______________________________________________</span><br><span>yt-users mailing list</span><br><span><a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a></span><br><span><a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org">http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org</a></span><br></div></blockquote></body></html>