[Yt-dev] quick question on particle IO

Tue Oct 18 21:03:02 PDT 2011

Hi Geoffrey,

Thank you *very* much for your detailed response!

All of this sounds like memory errors.  I don't think it's a problem
with Nautilus (although I personally experienced problems with the old
GPFS filesystem on Nautilus, long ago.)

I have a few followup questions for Stephen:

 * Does parallel HOP still dynamically load balance?  To do so, does
it conduct histograms across datasets (i.e., similar to how we
subselect the particles for a region by striding over them) or does it
load, evaluate, discard?
 * What multiple of the total dataset memory size is necessary to
p-HOP an ideally load balanced set of particles?
 * Are there any points in the code where the root processor is used
as a primary staging location, or where the arrays are duplicated in
some large amount on the root processor?
 * Are there any points where fields are duplicated?  What about
fancy-indexing, or implicit copies?

Do you think it is reasonable, on a large system, to halo find a
dataset of this size?  Is it feasible to construct resource estimates
for ideally-balanced datasets?

Thanks for any ideas,

Matt

On Tue, Oct 18, 2011 at 11:50 PM, Geoffrey So <gsiisg at gmail.com> wrote:
> Sorry for the fragmented pieces of info, I was trying to determine what the
> problem is with one of the sys admin at Nautilus, so I'm not even sure yet
> if it is YT's problem.
> Symptoms:
> paralleHF fails for the 3200 cube dataset, but not always at the same place,
> which leads us to think this might be an memory issue.
> 1) What are you attempting to do, precisely?
> Currently I'm trying to run parallelHF on pieces of the subvolume since I've
> found out the memory requirement of the whole dataset exceeds the machine's
> available memory (Nautilus with 4TB shared memory).
> 2) What type of data, and what size of data, are you applying this to?
> I'm doing parallelHF with DM only on a piece of the subvolume that's 1/64th
> of the original volume.
> 3) What is the version of yt you are using (changeset hash)?
> Was using the latest YT as of last week when I ran the unsuccessful runs,
> currently trying Stephen's modification which should help with memory:
> (dev-yt)Geoffreys-MacBook-Air:yt-hg gso$ hg identify
> 2efcec06484e (yt) tip
> I am going to modify my script and send it to the sys admin to run test on
> the 800 cube first
> I've been asked not to submit jobs of the 3200 because the last time I did
> it, it brought half the machine to a standstill
> 4) How are you launching yt?
> I was launching it with 512 cores and 2TB of total memory, but they said to
> try to decrease the mpi task count so I've also tried 256, 64, 32 but they
> all failed after a while, a couple was doing fine during the parallelHF
> phase but suddenly ended with:
> MPI: MPI_COMM_WORLD rank 6 has terminated without calling MPI_Finalize()
> MPI: aborting job
> MPI: Received signal 9
> 5) What is the memory available to each individual process?
> I've usually launched the 3200 with 2TB of memory with varying mpi task
> counts from 32 to 512.
> 6) Under what circumstances does yt crash?
> I've also had
> P100 yt : [INFO     ] 2011-10-03 08:03:06,125 Getting field
> particle_position_x from 112
> MPI: MPI_COMM_WORLD rank 153 has terminated without calling MPI_Finalize()
> MPI: aborting job
> MPI: Received signal 9
>
>
> asallocash failed: system error trying to write a message header - Broken
> pipe
> and with the same script
> P180 yt : [INFO     ] 2011-10-03 15:12:01,898 Finished with binary hierarchy
> reading
> Traceback (most recent call last):
>   File "regionPHOP.py", line 23, in <module>
>     sv = pf.h.region([i * delta[0] + delta[0] / 2.0,
>   File
> "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/static_output.py",
> line 169, in hierarchy
>     self, data_style=self.data_style)
>   File
> "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/frontends/enzo/data_structures.py",
> line 162, in __init__
>     AMRHierarchy.__init__(self, pf, data_style)
>   File
> "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/hierarchy.py",
> line 79, in __init__
>     self._detect_fields()
>   File
> "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/frontends/enzo/data_structures.py",
> line 405, in _detect_fields
>     self.save_data(list(field_list),"/","DataFields",passthrough=True)
>   File
> "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/utilities/parallel_tools/parallel_analysis_interface.py",
> line 216, in in_order
>     f1(*args, **kwargs)
>   File
> "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/hierarchy.py",
> line 222, in _save_data
>     arr = myGroup.create_dataset(name,data=array)
>   File
> "/nics/b/home/gsiisg/NautilusYT/lib/python2.7/site-packages/h5py-1.3.1-py2.7-linux-x86_64.egg/h5py/highlevel.py",
> line 464, in create_dataset
>     return Dataset(self, name, *args, **kwds)
>   File
> "/nics/b/home/gsiisg/NautilusYT/lib/python2.7/site-packages/h5py-1.3.1-py2.7-linux-x86_64.egg/h5py/highlevel.py",
> line 1092, in __init__
>     space_id = h5s.create_simple(shape, maxshape)
>   File "h5s.pyx", line 103, in h5py.h5s.create_simple (h5py/h5s.c:952)
> h5py._stub.ValueError: Zero sized dimension for non-unlimited dimension
> (Invalid arguments to routine: Bad value)
>
> 7) How does yt report this crash to you, and is it deterministic?
>
> And many times there isn't any associated error output in the logs, the
> process just hangs and become non-responsive, the admin has tried it a
> couple times and seeing the different errors on 2 different dataset, so
> right now it can also be the dataset that is corrupted, but so far not
> deterministic.
> 8) What have you attempted?  How did it change #6 and #7?
> I've tried:
> - adding the environmental variables:
> export MPI_BUFS_PER_PROC=64
> export MPI_BUFS_PER_HOST=256
> with no change in behavior, resulting in MPI_Finalize() error sometimes
> - using my own installation of OpenMPI
>     from yt.mods import *
>   File "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/mods.py", line 44, in
> <module>
>     from yt.data_objects.api import \
>   File "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/api.py",
> line 34, in <module>
>     from hierarchy import \
>   File
> "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/hierarchy.py",
> line 40, in <module>
>     from yt.utilities.parallel_tools.parallel_analysis_interface import \
>   File
> "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/utilities/parallel_tools/parallel_analysis_interface.py",
> line 49, in <module>
>     from mpi4py import MPI
> ImportError:
> /nics/b/home/gsiisg/NautilusYT/lib/python2.7/site-packages/mpi4py/MPI.so:
> undefined symbol: mpi_sgi_inplace
> The system admin says there are bugs or incompatibilities with the network
> and I should use SGI's MPI by using the module mpt/2.04 which I was using
> before trying my own installation of openmpi.
> - currently modifying my script with Stephen's proposed changes, once it
> runs on my laptop will let the sys admin try it on the small dataset of 800
> cube before trying it on the 3200 dataset.  At least when his job hangs the
> machine he can terminate it faster without waiting for someone to answer his
> emails.  Hopefully these tests wouldn't be too much of a disruption to other
> Nautilus users.
> - was speaking to Brian Crosby during the enzo meeting briefly about this,
> he said he's encountered MPI errors on Nautilus as well, but his issue might
> be a different one than mine.  This may or may not be a YT issue after all,
> but since it seems like multiple people are interested in YT's performance
> on Nautilus, I'll keep everyone updated with the latest development.
>
> From
> G.S.
> On Tue, Oct 18, 2011 at 7:59 PM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> Geoffrey,
>>
>> Parallel HOP definitely does not attempt to load all of the particles,
>> simultaneously, on all processors.  This is covered in the method
>> papers for both p-hop and yt, the documentation for yt, the source
>> code, and I believe on the yt-users mailing list a couple times when
>> discussing estimates for resource usage in p-hop.
>>
>> The struggles you have been having with Nautilus may in fact be a yt
>> problem, or an application-of-yt problem, a software problem on
>> Nautilus, or even (if Nautilus is being exposed to an excessive number
>> of cosmic rays, for instance) a hardware problem.  It would probably
>> be productive to properly debug exactly what is going on for you to
>> provide to us:
>>
>> 1) What are you attempting to do, precisely?
>> 2) What type of data, and what size of data, are you applying this to?
>> 3) What is the version of yt you are using (changeset hash)?
>> 4) How are you launching yt?
>> 5) What is the memory available to each individual process?
>> 6) Under what circumstances does yt crash?
>> 7) How does yt report this crash to you, and is it deterministic?
>> 8) What have you attempted?  How did it change #6 and #7?
>>
>> We're interested in ensuring that yt functions well on Nautilus, and
>> that it is able to successfully halo find, analyze, etc.  However,
>> right now it feels like we're being given about 10% of a bug report,
>> and that is regrettably not enough to properly diagnose and repair the
>> problem.
>>
>> Thanks,
>>
>> Matt
>>
>> On Tue, Oct 18, 2011 at 7:51 PM, Geoffrey So <gsiisg at gmail.com> wrote:
>> > Ah yes, I think that answers our question.
>> > We were worried that all the particles were read in by each processor
>> > (which
>> > I told him I don't think it did, or it would have crashed my smaller 800
>> > cube long ago), but I wanted to get the answer from pros.
>> > Thanks!
>> > From
>> > G.S.
>> >
>> > On Tue, Oct 18, 2011 at 4:21 PM, Stephen Skory <s at skory.us> wrote:
>> >>
>> >> Geoffrey,
>> >>
>> >> > "Is the particle IO in YT that calls h5py spawned by multiple
>> >> > processors
>> >> > or is it doing it serially?"
>> >>
>> >> For your purposes, h5py is only used to *write* particle data to disk
>> >> after the halos have been found (if you are saving them to disk, which
>> >> you must do explicitly, of course). And in this case, it will open up
>> >> one file using h5py per MPI task.
>> >>
>> >> I'm guessing that they're actually concerned about reading particle
>> >> data, because that is more disk intensive. This is done with functions
>> >> written in C that read the data, not h5py. Here each MPI task does its
>> >> own reading of data, and may open up multiple files to retrieve the
>> >> particle data it needs depending on the layouts of grids in the
>> >> .cpuNNNN files.
>> >>
>> >> Does that help?
>> >>
>> >> --
>> >> Stephen Skory
>> >> s at skory.us
>> >> http://stephenskory.com/
>> >> 510.621.3687 (google voice)
>> >> _______________________________________________
>> >> Yt-dev mailing list
>> >> Yt-dev at lists.spacepope.org
>> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> >
>> > _______________________________________________
>> > Yt-dev mailing list
>> > Yt-dev at lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> >
>> _______________________________________________
>> Yt-dev mailing list
>> Yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>