<div>Hi everyone,</div><div><br></div><div>I am trying to improve the efficiency of my analysis script which calculates attributes of haloes, after watching the workshop video on YT parallelism I was motivated to give parallel_objects a try. I am basically trying to calculate, then output some properties of each haloes found by parallel HOP. It turns out that even if I just output the (DM particles) mass of each halo, I am missing halo(s). It doesn't matter if I run this in serial or parallel, I end up missing the same amount of haloes if I use parallel_objects() like:</div>
<div><br></div><div><div>haloes = LoadHaloes(pf, HaloListname)</div></div><div><br></div><div><div>for sto, halo in parallel_objects(haloes, num_procs, storage = my_storage):</div></div><div><br></div><div>to iterate over the haloes, and the problem goes away if I just switch to:<div>
<br></div><div>for halo in haloes:</div></div><div><br></div><div>I noticed this when I tried it on an 800 cube dataset with around 50k haloes, I only get 4k haloes in return, I then tried to narrow things down, and it ruled out the way I am calculating the attributes, because I can just output the mass from halo.total_mass() that was basically read in from the .h5 file and I'd end up missing halo using the parallel_objects. For 128 cube dataset with 85 haloes, I'd end up missing 3 and get 82 back, and for 64 cube dataset with 22 haloes, I'd get back 21 haloes.</div>
<div><br></div><div>Has anyone else encountered this behavior or can confirm it?</div><div><br></div><div>From</div><div>G.S.</div>