[yt-users] error while trying to find abundances through checkpoint files

Suoqing Ji suoqing at physics.ucsb.edu
Tue Oct 18 13:48:18 PDT 2016


Hi Tazkera,

I have the access to Stampede but I can not reproduce your error either (using a slightly different script though).

> apparently jobs can only be submitted through the $WORK folder on Stampede


Btw, I believe you could submit job from both work and scratch directory on Stampede.

Could you run your code in interactive mode, by

srun -p normal -t 0:30:00 -n 16 -A _your_project_code_ --pty /bin/bash -l

and then enter

python abundance.py

and see what’s going on?

Best wishes,
--
Suoqing JI
Ph.D Candidate
Department of Physics
University of California, Santa Barbara
http://web.physics.ucsb.edu/~suoqing

> On Oct 18, 2016, at 1:20 PM, tazkera haque <h.tazkera at gmail.com> wrote:
> 
> I typed 
> 
> $ module avail HDF5
> 
> ------------------------------------------------- /opt/apps/intel15/mvapich2_2_1/modulefiles --------------------------------------------------
>    phdf5/1.8.16
> 
> -------------------------------------------------------- /opt/apps/intel15/modulefiles --------------------------------------------------------
>    hdf5/1.8.16 (m,L)
> 
>   Where:
>    L:  Module is loaded
>    m:  built for host and native MIC
> 
> $ module avail h5py
> returns blank, if that helps
> 
> 
> On Tue, Oct 18, 2016 at 4:08 PM, Nathan Goldbaum <nathan12343 at gmail.com <mailto:nathan12343 at gmail.com>> wrote:
> Hi Tazkera,
> 
> When I tried googling your error message last night, I found that it's associated with more than one MPI process trying to access an HDF5 file at the same time, presumably using a serial version of the HDF5 library.
> 
> I just did a quick test and I'm unable to reproduce here on my laptop. Unfortunately I don't have access to stampede so can't reproduce there.
> 
> Can you share exactly which h5py and HDF5 library versions you're using?
> 
> -Nathan
> 
> On Tue, Oct 18, 2016 at 3:05 PM, tazkera haque <h.tazkera at gmail.com <mailto:h.tazkera at gmail.com>> wrote:
> HI Nathan, 
> 
> Sorry to bother you again, but the problem seems to prevail even working from $WORK directory. I got the same error msg this morning again with a different script. Do you see anything wrong with the script I attached ? I have used it for a long time now without any sort of error.
> 
> Thanks
> 
> On Tue, Oct 18, 2016 at 2:34 AM, tazkera haque <h.tazkera at gmail.com <mailto:h.tazkera at gmail.com>> wrote:
> HI Nathan, 
> 
> I figured out what was going wrong, I submitted my job script from the $SCRATCH folder. apparently jobs can only be submitted through the $WORK folder on Stampede. thanks for your prompt response though
> 
> On Tue, Oct 18, 2016 at 1:22 AM, tazkera haque <h.tazkera at gmail.com <mailto:h.tazkera at gmail.com>> wrote:
> Hi Nathan, 
> 
> I tried with one file in my ipython notebook, it seems to work there
> 
> On Tue, Oct 18, 2016 at 12:54 AM, tazkera haque <h.tazkera at gmail.com <mailto:h.tazkera at gmail.com>> wrote:
> yes it's being run in parallel, I didn't check with one core yet, I will let you know what happens then
> 
> On Tue, Oct 18, 2016 at 12:50 AM, Nathan Goldbaum <nathan12343 at gmail.com <mailto:nathan12343 at gmail.com>> wrote:
> Is the script being run in parallel? If so, does it crash if you run it on only one core?
> 
> Nathan
> 
> 
> On Monday, October 17, 2016, tazkera haque <h.tazkera at gmail.com <mailto:h.tazkera at gmail.com>> wrote:
> HI people, 
> 
> I am using yt 3.3.1 and submitting my SLURM script to stampede.
> I was using this script to find abundances of C, O etc through checkpoint files in FLASH. while my script worked fine with old yt (3.1) , suddenly it crashed today and returned me the following error:
> 
> yt : [INFO     ] 2016-10-17 23:22:24,295 Parameters: current_time              = 28.1847530806
> yt : [INFO     ] 2016-10-17 23:22:24,295 Parameters: domain_dimensions         = [128 128 128]
> yt : [INFO     ] 2016-10-17 23:22:24,296 Parameters: domain_left_edge          = [ -2.80000000e+10  -2.80000000e+10  -2.80000000e+10]
> yt : [INFO     ] 2016-10-17 23:22:24,296 Parameters: domain_right_edge         = [  2.80000000e+10   2.80000000e+10   2.80000000e+10]
> yt : [INFO     ] 2016-10-17 23:22:24,296 Parameters: cosmological_simulation   = 0.0
> Executin lessg abundance.py
> Traceback (most recent call last):
>   File "abundance2.py", line 304, in <module>
>     main(chkFilenames_own)
>   File "abundance2.py", line 59, in main
>     pf = yt.load(filenames[n])
>   File "/work/03858/thaque56/sw/yt-new-3.3/yt-conda/lib/python2.7/site-packages/yt/convenience.py", line 79, in load
>     if c._is_valid(*args, **kwargs): candidates.append(n)
>   File "/work/03858/thaque56/sw/yt-new-3.3/yt-conda/lib/python2.7/site-packages/yt/frontends/flash/data_structures.py", line 478, in _is_valid
>     if "bounding box" not in fileh["/"].keys() \
>   File "/work/03858/thaque56/sw/yt-new-3.3/yt-conda/lib/python2.7/site-packages/h5py/_hl/base.py", line 368, in keys
>     return list(self)
>   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696)
>   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654)
>   File "/work/03858/thaque56/sw/yt-new-3.3/yt-conda/lib/python2.7/site-packages/h5py/_hl/group.py", line 303, in __len__
>     return self.id.get_num_objs()
>   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696)
>   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654)
>   File "h5py/h5g.pyx", line 321, in h5py.h5g.GroupID.get_num_objs (/home/ilan/minonda/conda-bld/work/h5py/h5g.c:4194)
> RuntimeError: Can't determine (Bad symbol table node signature)
> [c560-102.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 0, pid: 22137) exited with status 1
> TACC: MPI job exited with code: 1
> 
> TACC: Shutdown complete. Exiting.
> 
> I was wondering if there is something wrong with my code or the new yt. I am also attaching my code here to look at. Thanks in advance
> 
> Best
> Tazkera
> 
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org <mailto:yt-users at lists.spacepope.org>
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org <http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org>
> 
> 
> 
> 
> 
> 
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org <mailto:yt-users at lists.spacepope.org>
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org <http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org>
> 
> 
> 
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org <mailto:yt-users at lists.spacepope.org>
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org <http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org>
> 
> 
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20161018/6218bd29/attachment.html>


More information about the yt-users mailing list