[Yt-dev] quick question on particle IO

Tue Oct 18 20:50:42 PDT 2011

Sorry for the fragmented pieces of info, I was trying to determine what the
problem is with one of the sys admin at Nautilus, so I'm not even sure yet
if it is YT's problem.

Symptoms:
paralleHF fails for the 3200 cube dataset, but not always at the same place,
which leads us to think this might be an memory issue.

1) What are you attempting to do, precisely?
Currently I'm trying to run parallelHF on pieces of the subvolume since I've
found out the memory requirement of the whole dataset exceeds the machine's
available memory (Nautilus with 4TB shared memory).

2) What type of data, and what size of data, are you applying this to?
I'm doing parallelHF with DM only on a piece of the subvolume that's 1/64th
of the original volume.

3) What is the version of yt you are using (changeset hash)?
Was using the latest YT as of last week when I ran the unsuccessful runs,
currently trying Stephen's modification which should help with memory:
(dev-yt)Geoffreys-MacBook-Air:yt-hg gso$ hg identify
2efcec06484e (yt) tip
I am going to modify my script and send it to the sys admin to run test on
the 800 cube first
I've been asked not to submit jobs of the 3200 because the last time I did
it, it brought half the machine to a standstill

4) How are you launching yt?
I was launching it with 512 cores and 2TB of total memory, but they said to
try to decrease the mpi task count so I've also tried 256, 64, 32 but they
all failed after a while, a couple was doing fine during the parallelHF
phase but suddenly ended with:
MPI: MPI_COMM_WORLD rank 6 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 9

5) What is the memory available to each individual process?
I've usually launched the 3200 with 2TB of memory with varying mpi task
counts from 32 to 512.

6) Under what circumstances does yt crash?
I've also had
P100 yt : [INFO     ] 2011-10-03 08:03:06,125 Getting field
particle_position_x from 112
MPI: MPI_COMM_WORLD rank 153 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 9

asallocash failed: system error trying to write a message header - Broken
pipe

and with the same script
P180 yt : [INFO     ] 2011-10-03 15:12:01,898 Finished with binary hierarchy
reading
Traceback (most recent call last):
  File "regionPHOP.py", line 23, in <module>
    sv = pf.h.region([i * delta[0] + delta[0] / 2.0,
  File
"/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/static_output.py",
line 169, in hierarchy
    self, data_style=self.data_style)
  File
"/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/frontends/enzo/data_structures.py",
line 162, in __init__
    AMRHierarchy.__init__(self, pf, data_style)
  File
"/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/hierarchy.py",
line 79, in __init__
    self._detect_fields()
  File
"/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/frontends/enzo/data_structures.py",
line 405, in _detect_fields
    self.save_data(list(field_list),"/","DataFields",passthrough=True)
  File
"/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/utilities/parallel_tools/parallel_analysis_interface.py",
line 216, in in_order
    f1(*args, **kwargs)
  File
"/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/hierarchy.py",
line 222, in _save_data
    arr = myGroup.create_dataset(name,data=array)
  File
"/nics/b/home/gsiisg/NautilusYT/lib/python2.7/site-packages/h5py-1.3.1-py2.7-linux-x86_64.egg/h5py/highlevel.py",
line 464, in create_dataset
    return Dataset(self, name, *args, **kwds)
  File
"/nics/b/home/gsiisg/NautilusYT/lib/python2.7/site-packages/h5py-1.3.1-py2.7-linux-x86_64.egg/h5py/highlevel.py",
line 1092, in __init__
    space_id = h5s.create_simple(shape, maxshape)
  File "h5s.pyx", line 103, in h5py.h5s.create_simple (h5py/h5s.c:952)
h5py._stub.ValueError: Zero sized dimension for non-unlimited dimension
(Invalid arguments to routine: Bad value)

7) How does yt report this crash to you, and is it deterministic?

And many times there isn't any associated error output in the logs, the
process just hangs and become non-responsive, the admin has tried it a
couple times and seeing the different errors on 2 different dataset, so
right now it can also be the dataset that is corrupted, but so far not
deterministic.

8) What have you attempted?  How did it change #6 and #7?
I've tried:

- adding the environmental variables:
export MPI_BUFS_PER_PROC=64
export MPI_BUFS_PER_HOST=256
with no change in behavior, resulting in MPI_Finalize() error sometimes

- using my own installation of OpenMPI
    from yt.mods import *
  File "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/mods.py", line 44, in
<module>
    from yt.data_objects.api import \
  File "/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/api.py",
line 34, in <module>
    from hierarchy import \
  File
"/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/data_objects/hierarchy.py",
line 40, in <module>
    from yt.utilities.parallel_tools.parallel_analysis_interface import \
  File
"/nics/b/home/gsiisg/NautilusYT/src/yt-hg/yt/utilities/parallel_tools/parallel_analysis_interface.py",
line 49, in <module>
    from mpi4py import MPI
ImportError:
/nics/b/home/gsiisg/NautilusYT/lib/python2.7/site-packages/mpi4py/MPI.so:
undefined symbol: mpi_sgi_inplace

The system admin says there are bugs or incompatibilities with the network
and I should use SGI's MPI by using the module mpt/2.04 which I was using
before trying my own installation of openmpi.

- currently modifying my script with Stephen's proposed changes, once it
runs on my laptop will let the sys admin try it on the small dataset of 800
cube before trying it on the 3200 dataset.  At least when his job hangs the
machine he can terminate it faster without waiting for someone to answer his
emails.  Hopefully these tests wouldn't be too much of a disruption to other
Nautilus users.

- was speaking to Brian Crosby during the enzo meeting briefly about this,
he said he's encountered MPI errors on Nautilus as well, but his issue might
be a different one than mine.  This may or may not be a YT issue after all,
but since it seems like multiple people are interested in YT's performance
on Nautilus, I'll keep everyone updated with the latest development.

From
G.S.

On Tue, Oct 18, 2011 at 7:59 PM, Matthew Turk <matthewturk at gmail.com> wrote:

> Geoffrey,
>
> Parallel HOP definitely does not attempt to load all of the particles,
> simultaneously, on all processors.  This is covered in the method
> papers for both p-hop and yt, the documentation for yt, the source
> code, and I believe on the yt-users mailing list a couple times when
> discussing estimates for resource usage in p-hop.
>
> The struggles you have been having with Nautilus may in fact be a yt
> problem, or an application-of-yt problem, a software problem on
> Nautilus, or even (if Nautilus is being exposed to an excessive number
> of cosmic rays, for instance) a hardware problem.  It would probably
> be productive to properly debug exactly what is going on for you to
> provide to us:
>
> 1) What are you attempting to do, precisely?
> 2) What type of data, and what size of data, are you applying this to?
> 3) What is the version of yt you are using (changeset hash)?
> 4) How are you launching yt?
> 5) What is the memory available to each individual process?
> 6) Under what circumstances does yt crash?
> 7) How does yt report this crash to you, and is it deterministic?
> 8) What have you attempted?  How did it change #6 and #7?
>
> We're interested in ensuring that yt functions well on Nautilus, and
> that it is able to successfully halo find, analyze, etc.  However,
> right now it feels like we're being given about 10% of a bug report,
> and that is regrettably not enough to properly diagnose and repair the
> problem.
>
> Thanks,
>
> Matt
>
> On Tue, Oct 18, 2011 at 7:51 PM, Geoffrey So <gsiisg at gmail.com> wrote:
> > Ah yes, I think that answers our question.
> > We were worried that all the particles were read in by each processor
> (which
> > I told him I don't think it did, or it would have crashed my smaller 800
> > cube long ago), but I wanted to get the answer from pros.
> > Thanks!
> > From
> > G.S.
> >
> > On Tue, Oct 18, 2011 at 4:21 PM, Stephen Skory <s at skory.us> wrote:
> >>
> >> Geoffrey,
> >>
> >> > "Is the particle IO in YT that calls h5py spawned by multiple
> processors
> >> > or is it doing it serially?"
> >>
> >> For your purposes, h5py is only used to *write* particle data to disk
> >> after the halos have been found (if you are saving them to disk, which
> >> you must do explicitly, of course). And in this case, it will open up
> >> one file using h5py per MPI task.
> >>
> >> I'm guessing that they're actually concerned about reading particle
> >> data, because that is more disk intensive. This is done with functions
> >> written in C that read the data, not h5py. Here each MPI task does its
> >> own reading of data, and may open up multiple files to retrieve the
> >> particle data it needs depending on the layouts of grids in the
> >> .cpuNNNN files.
> >>
> >> Does that help?
> >>
> >> --
> >> Stephen Skory
> >> s at skory.us
> >> http://stephenskory.com/
> >> 510.621.3687 (google voice)
> >> _______________________________________________
> >> Yt-dev mailing list
> >> Yt-dev at lists.spacepope.org
> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >
> >
> > _______________________________________________
> > Yt-dev mailing list
> > Yt-dev at lists.spacepope.org
> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >
> >
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20111018/59d3a471/attachment.html>