[yt-users] Problem with ParallellAnalysisInterface._mpi_allsum

Tue Jun 7 10:14:27 PDT 2011

Here's a list of my loaded modules:

  1) modules/3.1.6.5
10) xt-os/2.2.74
19) xt-asyncpe/4.9
  2) torque/2.4.8
    11) xt-boot/2.2.74
   20) xt-pe/2.2.74
  3) moab/5.4.3.s16991
12) xt-lustre-ss/2.2.74_1.6.5                                          21)
xt-mpt/5.2.3
  4) /opt/cray/xt-asyncpe/default/modulefiles/xtpe-istanbul  13)
cray/job/1.5.5-0.1_2.0202.21413.56.7                    22)
pmi/2.1.2-1.0000.8396.13.1.ss
  5) tgusage/3.0-r2
   14) cray/csa/3.0.0-1_2.0202.21426.77.7                      23)
xt-libsci/10.5.02
  6)
altd/1.0
  15) cray/account/1.0.0-2.0202.19482.49.18                 24) gcc/4.5.3
  7) DefApps
     16) cray/projdb/1.0.0-1.0202.19483.52.1                     25)
cray/MySQL/5.0.64-1.0202.2899.21.1
  8) xtpe-target-cnl
    17) Base-opts/2.2.74
26) xt-mpt/5.0.0
  9) xt-service/2.2.74
    18) PrgEnv-gnu/2.2.74
27) yt/2.1

On Tue, Jun 7, 2011 at 10:59 AM, Matthew Turk <matthewturk at gmail.com> wrote:

> Hi Anthony,
>
> Stephen and I have chatted about this -- he brought up that some MPI
> implementations are more tolerating than others.  What're the contents
> of your module list?
>
> It's also possible that we need to specify types in the Allreduce
> call.  I am not sure why that would cause problems for you and not me,
> however.
>
> -Matt
>
> On Mon, Jun 6, 2011 at 7:40 PM, Matthew Turk <matthewturk at gmail.com>
> wrote:
> > Hi Anthony,
> >
> > Using the grid cutting method, which I thought might be causing
> > problems, I was again unable to reproduce the issue.  If you could,
> > would you mind running with --detailed and sending me (off-list) the
> > log file, so that I can try to examine the problematic output?
> >
> > -Matt
> >
> > On Mon, Jun 6, 2011 at 7:17 PM, Anthony Harness
> > <anthony.harness at colorado.edu> wrote:
> >> The array shouldn't be too small. The data contain 1024^3 cells (20
> million
> >> cells within the cut_region) and I am running it on 60 processors (120
> >> doesn't work either). This is my script:
> >> from yt.mods import *
> >> from yt.analysis_modules.api import EnzoSimulation
> >> import numpy as na
> >> from krakenPlugins import *
> >> from mpi4py import MPI
> >> ###########################################################
> >> simName = '50Mpc_1024unigrid.par'
> >> dataDir =
> '/lustre/scratch/britton/box_size_study/50Mpc_1024/run_17f_cl_5D'
> >> es = EnzoSimulation('%s/%s'
> %(dataDir,simName),get_redshift_outputs=False)
> >> dataCntr = 0
> >> numBins = 1000
> >> allHisty = na.array([na.zeros(numBins+1)])
> >> allHistx = na.array([na.zeros(numBins+1)])
> >> es = es.allOutputs[:85]
> >> for output in es:
> >> pf = load('%s%s' %(dataDir,output['filename'][1:]))
> >> dd = pf.h.all_data()
> >> pc = PlotCollection(pf)
> >> cut = dd.cut_region(["grid['Metallicity'] <= 1.e-6","grid['Temperature']
> <=
> >> 10.**5.","grid['Temperature'] >= 300.","grid['Baryon_Overdensity'] >=
> >> 1.","grid['Baryon_Overdensity'] <= 100."])
> >> pc.add_profile_object(cut, ['Density','Ones'], weight=None,
> >> x_bins=numBins,x_log=True)
> >> ones = pc.plots[-1].data["Ones"]
> >> bod = pc.plots[-1].data["Density"]
> >> allHisty = na.concatenate((allHisty,[ones]))
> >> allHistx = na.concatenate((allHistx,[bod]))
> >> dataCntr += 1
> >> del pf,dd,pc,cut,ones,bod
> >> if MPI.COMM_WORLD.rank == 0:
> >> print '***Saving to .npy file. UTC Time: %s***'
> >> na.save('%s/histograms_y.npy'%saveDir,allHisty)
> >> na.save('%s/histograms_x.npy'%saveDir,allHistx)
> >>
> >>
> >> On Mon, Jun 6, 2011 at 6:24 PM, Matthew Turk <matthewturk at gmail.com>
> wrote:
> >>>
> >>> Hi Anthony,
> >>>
> >>> I tried it on a small dataset and I was unable to reproduce it.  Do
> >>> you think that the array is small enough that some of the processors
> >>> aren't getting any data?  I was able to get the profile command to
> >>> work all the way down to arrays of size 19, run on 20 processors.
> >>>
> >>> Could you post the entirety of your script?
> >>>
> >>> -Matt
> >>>
> >>> On Mon, Jun 6, 2011 at 5:15 PM, Anthony Harness
> >>> <anthony.harness at colorado.edu> wrote:
> >>> > Hello,
> >>> >
> >>> > I am trying to add a profile object to a Plot Collection (via
> >>> > pc.add_profile_object(data,fields) ) while running in parallel on
> >>> > Kraken. I
> >>> > get the following error: "TypeError: message: expecting a list or
> tuple"
> >>> > which ultimately comes from mpi4py.MPI.Comm.Allreduce which is called
> by
> >>> > ParallelAnalysisInterface._mpi_allsum(). In ._mpi_allsum() there is
> the
> >>> > following comment: "# We use old-school pickling here on the
> assumption
> >>> > the
> >>> > arrays are relatively small ( < 1e7 elements )". The dataset I am
> >>> > working
> >>> > with is larger than 1e7 elements, so is _mpi_allsum not able to pass
> >>> > such a
> >>> > large array to Comm.Allreduce?
> >>> >
> >>> > Thanks,
> >>> > Anthony
> >>> >
> >>> > Here is the traceback:
> >>> >
> >>> > File
> "/yt-2.1stable-py2.7-linux-x86_64.egg/yt/data_objects/profiles.py",
> >>> > line 146, in add_fields
> >>> >     self._lazy_add_fields(fields, weight, accumulation)
> >>> > File
> "/yt-2.1stable-py2.7-linux-x86_64.egg/yt/data_objects/profiles.py",
> >>> > line 94, in _lazy_add_fields
> >>> >     for gi,grid in enumerate(self._get_grids(fields)):
> >>> > File
> >>> >
> >>> >
> "/yt-2.1stable-py2.7-linux-x86_64.egg/yt/utilities/parallel_tools/parallel_analysis_interface.py",
> >>> > line 134, in __iter__
> >>> >     if not self.just_list: self.pobj._finalize_parallel()
> >>> > File
> "/yt-2.1stable-py2.7-linux-x86_64.egg/yt/data_objects/profiles.py",
> >>> > line 122, in _finalize_parallel
> >>> >     self.__data[key] = self._mpi_allsum(self.__data[key])
> >>> > File
> >>> >
> >>> >
> "/yt-2.1stable-py2.7-linux-x86_64.egg/yt/utilities/parallel_tools/parallel_analysis_interface.py",
> >>> > line 185, in passage
> >>> >     return func(self, data)
> >>> > File
> >>> >
> >>> >
> "/yt-2.1stable-py2.7-linux-x86_64.egg/yt/utilities/parallel_tools/parallel_analysis_interface.py",
> >>> > line 1124, in _mpi_allsum
> >>> >     MPI.COMM_WORLD.Allreduce(data, tr, op=MPI.SUM)
> >>> > File "Comm.pyx", line 530, in mpi4py.MPI.Comm.Allreduce
> >>> > (src/mpi4py_MPI.c:43646)
> >>> > File "message.pxi", line 426, in mpi4py.MPI._p_msg_cco.for_allreduce
> >>> > (src/mpi4py_MPI.c:14446)
> >>> > File "message.pxi", line 33, in mpi4py.MPI.message_simple
> >>> > (src/mpi4py_MPI.c:11108)
> >>> > TypeError: message: expecting a list or tuple
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > yt-users mailing list
> >>> > yt-users at lists.spacepope.org
> >>> > http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >>> >
> >>> >
> >>> _______________________________________________
> >>> yt-users mailing list
> >>> yt-users at lists.spacepope.org
> >>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >>
> >>
> >> _______________________________________________
> >> yt-users mailing list
> >> yt-users at lists.spacepope.org
> >> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >>
> >>
> >
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20110607/f5d80229/attachment.htm>