[yt-users] matplotlib issue on kraken

Kacper Kowalik xarthisius.kk at gmail.com
Wed Sep 5 13:20:25 PDT 2012


On 05.09.2012 22:09, Eric Hallman wrote:
> Matt,
>   I will try to narrow it down.  So far it seems to be in import figure call from matplotlib.  I'll isolate and see what is going on.
Hi Eric,
If it's a racing condition you can work around it by making the cache
local, e.g. use TMPDIR instead of PBS_O_WORKDIR for MPLCONFDIR

export MPLCONFIGDIR=${TMPDIR}/.matplotlib/
[ ! -d ${MPLCONFIGDIR} ] && mkdir ${MPLCONFIGDIR}

if TMPDIR is not set to sane value by Kraken's PBS, use /dev/shm or /tmp
directly.
Cheers,
Kacper

> Thanks
> 
> Eric
> On Sep 5, 2012, at 4:07 PM, Matthew Turk wrote:
> 
>> That is weird, Eric.  I think you can possibly disable usetex to get
>> rid of the tex.cache file, but I'm not entirely sure.  This might be
>> an issue to isolate, by removing yt.mods and importing the individual
>> matplotlib items that throw the error in a script, and then raise it
>> with matplotlib-users.
>>
>> -Matt
>>
>> On Wed, Sep 5, 2012 at 3:59 PM, Eric Hallman <hallman at txcorp.com> wrote:
>>> Matt,
>>>  it's weird, but in serial or in parallel on a single node I see no
>>> problems.  I run interactively on one node and it skips right through this
>>> part.  I don't get it.
>>>
>>>
>>> On Sep 5, 2012, at 3:55 PM, Matthew Turk wrote:
>>>
>>> Hi Eric,
>>>
>>> The tex.cache issue seems to tbe the big one here.  Can you try, in
>>> serial, launching a single job that imports yt?  I think it just needs
>>> to be bootstrapped once.
>>>
>>> -Matt
>>>
>>> On Wed, Sep 5, 2012 at 3:50 PM, Eric Hallman <hallman at txcorp.com> wrote:
>>>
>>> Well the earlier traceback I posted below is a good start.  Seriously if I
>>>
>>> post the current error list it's going to be 10M of text.  I'll see what I
>>>
>>> can come up with in the short term and we can try later on IRC or something.
>>>
>>>
>>> Thanks
>>>
>>>
>>> Eric
>>>
>>>
>>> On Sep 5, 2012, at 3:45 PM, Nathan Goldbaum wrote:
>>>
>>>
>>> I'm sorry you're having so much trouble.  Unfortunately I'm probably not the
>>>
>>> best person to advise since I've never run jobs on Kraken.  Others on the
>>>
>>> list might be more helpful.
>>>
>>>
>>> One thing that would aid tracking down the problem is if you could paste the
>>>
>>> errors you're seeing somewhere so that one of us can take a look at it in
>>>
>>> detail.
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Nathan
>>>
>>>
>>> On Sep 5, 2012, at 12:42 PM, Eric Hallman wrote:
>>>
>>>
>>> Nathan,
>>>
>>> usings pmods in the import leads to an error explosion in the matplotlib
>>>
>>> imports.  It's even worse than the original test.
>>>
>>>
>>> Eric
>>>
>>> On Sep 5, 2012, at 12:42 PM, Nathan Goldbaum wrote:
>>>
>>>
>>> Hi Eric,
>>>
>>>
>>> Exactly right.  This is a drop-in replacement for yt.mods on high-latency
>>>
>>> parallel filesystems (like Kraken, unfortunately).
>>>
>>>
>>> There's some discussion on the dev mailing list:
>>>
>>> http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope.org/2012-January/001760.html
>>>
>>>
>>> Unfortunately this isn't covered in the docs (except for a note in the
>>>
>>> changelog) but it should be in there.
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Nathan
>>>
>>>
>>> On Sep 5, 2012, at 9:40 AM, Eric Hallman wrote:
>>>
>>>
>>> Nathan,
>>>
>>> I've been off yt for a while, I'm unaware of pmods.  It's specific to
>>>
>>> parallel I'm guessing?
>>>
>>>
>>> Eric
>>>
>>> On Sep 5, 2012, at 12:38 PM, Nathan Goldbaum wrote:
>>>
>>>
>>> Hi Eric,
>>>
>>>
>>> Have you tried from yt.pmods import * instead of the normal yt.mods?
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Nathan
>>>
>>>
>>> On Sep 5, 2012, at 9:37 AM, Eric Hallman wrote:
>>>
>>>
>>> Hey everyone,
>>>
>>>
>>> so this issue seems like one I've had before, but I searched the lists and
>>>
>>> don't find this exact issue.
>>>
>>>
>>>
>>> On batch jobs on kraken, attempting to do halo finding, I get an almost
>>>
>>> immediate crash (with an eternal hang until the time limit is reached) due
>>>
>>> to matplotlib.  I've been unable to reproduce it in the interactive queue on
>>>
>>> kraken, which is frustrating.  I'm hoping someone has seen it and can
>>>
>>> comment.
>>>
>>>
>>>
>>> this is with yt/dev on kraken, and I set env variables to MPLCONFIGDIR like
>>>
>>> so:
>>>
>>>
>>>
>>> export MPLCONFIGDIR=${PBS_O_WORKDIR}/.matplotlib/
>>>
>>>
>>> [ ! -d ${MPLCONFIGDIR} ] && mkdir ${MPLCONFIGDIR}
>>>
>>>
>>>
>>> because if you don't, it fails immediately with perm issues.
>>>
>>>
>>>
>>> Anyway, it's a simple script and call
>>>
>>>
>>>
>>> aprun -n 12 python halo_finding.py --parallel
>>>
>>>
>>>
>>> but the details of the script are not too important, since the job fails
>>>
>>> when yt is imported, as so:
>>>
>>>
>>>
>>> Traceback (most recent call last):
>>>
>>>
>>> File "halo_finding.py", line 1, in <module>
>>>
>>>
>>> from yt.mods import *
>>>
>>>
>>> File
>>>
>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/mods.py",
>>>
>>> line 115, in <module>
>>>
>>>
>>> from yt.visualization.api import \
>>>
>>>
>>> File
>>>
>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/visualization/api.py",
>>>
>>> line 34, in <module>
>>>
>>>
>>> from plot_collection import \
>>>
>>>
>>> File
>>>
>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/visualization/plot_collection.py",
>>>
>>> line 26, in <module>
>>>
>>>
>>> from matplotlib import figure
>>>
>>>
>>> File
>>>
>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/figure.py",
>>>
>>> line 18, in <module>
>>>
>>>
>>> from axes import Axes, SubplotBase, subplot_class_factory
>>>
>>>
>>> File
>>>
>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/axes.py",
>>>
>>> line 18, in <module>
>>>
>>>
>>> import matplotlib.contour as mcontour
>>>
>>>
>>> File
>>>
>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/contour.py",
>>>
>>> line 21, in <module>
>>>
>>>
>>> import matplotlib.texmanager as texmanager
>>>
>>>
>>> File
>>>
>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/texmanager.py",
>>>
>>> line 72, in <module>
>>>
>>>
>>> class TexManager:
>>>
>>>
>>> File
>>>
>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/texmanager.py",
>>>
>>> line 92, in TexManager
>>>
>>>
>>> os.mkdir(texcache)
>>>
>>>
>>> OSError: [Errno 17] File exists:
>>>
>>> '/lustre/scratch/hallman/gigaCubes/run1024/.matplotlib/tex.cache'
>>>
>>>
>>>
>>> In each case, I have deleted tex.cache before I restart, thinking an old
>>>
>>> version persisted there, but the same error happens.  The most irritating
>>>
>>> thing is that the job does not kick out of the batch system, so the time
>>>
>>> continues to run on however many processors you have until the limit is
>>>
>>> reached (eternal hang!).
>>>
>>>
>>>
>>> I hope this is something obvious and I'm just dumb.  Let me know.
>>>
>>>
>>>
>>> Eric
>>>
>>>
>>> --
>>>
>>>
>>> Eric Hallman
>>>
>>>
>>> Tech-X Corporation               hallman at txcorp.com
>>>
>>>
>>> 5621 Arapahoe Ave, Suite A       Phone: (720) 254-5833
>>>
>>>
>>> Boulder, CO 80303                Fax:   (303) 448-7756
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>>
>>> yt-users mailing list
>>>
>>>
>>> yt-users at lists.spacepope.org
>>>
>>>
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> yt-users mailing list
>>>
>>> yt-users at lists.spacepope.org
>>>
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>
>>> --
>>>
>>> Eric Hallman
>>>
>>> Tech-X Corporation               hallman at txcorp.com
>>>
>>> 5621 Arapahoe Ave, Suite A       Phone: (720) 254-5833
>>>
>>> Boulder, CO 80303                Fax:   (303) 448-7756
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> yt-users mailing list
>>>
>>> yt-users at lists.spacepope.org
>>>
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> yt-users mailing list
>>>
>>> yt-users at lists.spacepope.org
>>>
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>
>>> --
>>>
>>> Eric Hallman
>>>
>>> Tech-X Corporation               hallman at txcorp.com
>>>
>>> 5621 Arapahoe Ave, Suite A       Phone: (720) 254-5833
>>>
>>> Boulder, CO 80303                Fax:   (303) 448-7756
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> yt-users mailing list
>>>
>>> yt-users at lists.spacepope.org
>>>
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> yt-users mailing list
>>>
>>> yt-users at lists.spacepope.org
>>>
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>>
>>> --
>>>
>>> Eric Hallman
>>>
>>> Tech-X Corporation               hallman at txcorp.com
>>>
>>> 5621 Arapahoe Ave, Suite A       Phone: (720) 254-5833
>>>
>>> Boulder, CO 80303                Fax:   (303) 448-7756
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> yt-users mailing list
>>>
>>> yt-users at lists.spacepope.org
>>>
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>> _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>> --
>>> Eric Hallman
>>> Tech-X Corporation               hallman at txcorp.com
>>> 5621 Arapahoe Ave, Suite A       Phone: (720) 254-5833
>>> Boulder, CO 80303                Fax:   (303) 448-7756
>>> --
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> 
> 
> 
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> 



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20120905/ec675e81/attachment.sig>


More information about the yt-users mailing list