[yt-users] Parallel Hop and MPI Issues

Geoffrey So gsiisg at gmail.com
Thu Oct 4 23:12:17 PDT 2012


Hi Joseph,

The last line which says MemoryError makes me want to attribute to the
machine running out of memory, I'm just guessing so it might not be the
case at all.  Can you tell us a little bit about the memory available to
your machine (GB per core), and the number of particles in your simulation?


In my past experience with Parallel HOP I believe a safe guideline has been
to have 1MB of RAM per 5000 particles.  YT has since been optimized further
so that number should be smaller now, but it would be a safe place to start
if you're having trouble.  I'm guessing if you have 1 particle per cell,
then 1024**2/5000/32 ~ 6710, so you'll need about 7GB per core if using 32
cores.  If your machine has 4GB per core, you might want to try 64 cores
for the job.

Hope this helps.

From
G.S.

On Thu, Oct 4, 2012 at 10:14 PM, Joseph Smidt <josephsmidt at gmail.com> wrote:

> Hey everyone,
>
>     I am trying to use Parallel Hop in YT to analyze enzo data.  I
> installed mpi4py, forthon and did the whole "python setup.py install"
> afterwards.  I next try to find halos with this code on 2 nodes with
> 16 processors each (32 total):
>
>
> from yt.mods import *
> from yt.analysis_modules.halo_finding.api import *
>
> i = 5
> filename = 'RD%04d/RedshiftOutput%04d' % (i,i)
> pf = load(filename)
> halos = parallelHF(pf)
>
> dumpn = 'RD%04d/MergerHalos' %i
> halos.dump(dumpn)
>
>
> The output is rather long since it has 32 processors of output.   The
> full output is here: http://paste.yt-project.org/show/2761/
>
>   However, here are some highlights:
>
> $ mpirun -np 32 python findhalo.py --parallel
> Reported: 2 (out of 2) daemons -  32 (out of 32) procs
> yt : [INFO     ] 2012-10-04 22:54:51,855 Global parallel computation
> enabled: 1 / 32
> yt : [INFO     ] 2012-10-04 22:54:51,855 Global parallel computation
> enabled: 21 / 32
> ....
> yt : [INFO     ] 2012-10-04 22:54:51,858 Global parallel computation
> enabled: 10 / 32
> --------------------------------------------------------------------------
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process.  Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption.  The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
>
> The process that invoked fork was:
>
>   Local host:          mu0002.localdomain (PID 9624)
>   MPI_COMM_WORLD rank: 3
>
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> --------------------------------------------------------------------------
> P000 yt : [INFO     ] 2012-10-04 22:54:55,571 Parameters: current_time
>              = 89.9505268216
> P000 yt : [INFO     ] 2012-10-04 22:54:55,571 Parameters:
> domain_dimensions         = [1024 1024 1024]
> P000 yt : [INFO     ] 2012-10-04 22:54:55,572 Parameters:
> domain_left_edge          = [ 0.  0.  0.]
> P000 yt : [INFO     ] 2012-10-04 22:54:55,572 Parameters:
> domain_right_edge         = [ 1.  1.  1.]
> P000 yt : [INFO     ] 2012-10-04 22:54:55,573 Parameters:
> cosmological_simulation   = 1
> P000 yt : [INFO     ] 2012-10-04 22:54:55,573 Parameters:
> current_redshift          = 5.99999153008
> P000 yt : [INFO     ] 2012-10-04 22:54:55,573 Parameters: omega_lambda
>              = 0.724
> ...
> P000 yt : [INFO     ] 2012-10-04 23:04:33,681 Getting particle_index
> using ParticleIO
> P001 yt : [INFO     ] 2012-10-04 23:05:09,222 Getting particle_index
> using ParticleIO
> Traceback (most recent call last):
>   File "findhalo.py", line 7, in <module>
>     halos = parallelHF(pf)
>   File
> "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py",
> line 2268, in __init__
>     premerge=premerge, tree=self.tree)
>   File
> "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py",
> line 1639, in __init__
>     HaloList.__init__(self, data_source, dm_only)
>   File
> "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py",
> line 1067, in __init__
>     self._run_finder()
>   File
> "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py",
> line 1648, in _run_finder
>     if np.unique(self.particle_fields["particle_index"]).size != \
>   File
> "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/numpy/lib/arraysetops.py",
> line 193, in unique
>     return ar[flag]
> MemoryError
> mpirun: killing job...
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 6295 on node
> mu0001.localdomain exited on signal 0 (Unknown signal 0).
> --------------------------------------------------------------------------
> 32 total processes killed (some possibly by mpirun during cleanup)
>
>
>
>    Anyways, if anyone recognizes this or has any advice it would be
> appreciated.  Thanks.
>
> --
> ------------------------------------------------------------------------
> Joseph Smidt <josephsmidt at gmail.com>
>
> Theoretical Division
> P.O. Box 1663, Mail Stop B283
> Los Alamos, NM 87545
> Office: 505-665-9752
> Fax:    505-667-1931
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20121004/183176ed/attachment.html>


More information about the yt-users mailing list