[yt-users] matplotlib issue on kraken
Eric Hallman
hallman at txcorp.com
Wed Sep 5 13:32:11 PDT 2012
Kacper,
OK. That makes sense. I will give it a try. Thanks.
Eric
On Sep 5, 2012, at 4:20 PM, Kacper Kowalik wrote:
> On 05.09.2012 22:09, Eric Hallman wrote:
>> Matt,
>> I will try to narrow it down. So far it seems to be in import figure call from matplotlib. I'll isolate and see what is going on.
> Hi Eric,
> If it's a racing condition you can work around it by making the cache
> local, e.g. use TMPDIR instead of PBS_O_WORKDIR for MPLCONFDIR
>
> export MPLCONFIGDIR=${TMPDIR}/.matplotlib/
> [ ! -d ${MPLCONFIGDIR} ] && mkdir ${MPLCONFIGDIR}
>
> if TMPDIR is not set to sane value by Kraken's PBS, use /dev/shm or /tmp
> directly.
> Cheers,
> Kacper
>
>> Thanks
>>
>> Eric
>> On Sep 5, 2012, at 4:07 PM, Matthew Turk wrote:
>>
>>> That is weird, Eric. I think you can possibly disable usetex to get
>>> rid of the tex.cache file, but I'm not entirely sure. This might be
>>> an issue to isolate, by removing yt.mods and importing the individual
>>> matplotlib items that throw the error in a script, and then raise it
>>> with matplotlib-users.
>>>
>>> -Matt
>>>
>>> On Wed, Sep 5, 2012 at 3:59 PM, Eric Hallman <hallman at txcorp.com> wrote:
>>>> Matt,
>>>> it's weird, but in serial or in parallel on a single node I see no
>>>> problems. I run interactively on one node and it skips right through this
>>>> part. I don't get it.
>>>>
>>>>
>>>> On Sep 5, 2012, at 3:55 PM, Matthew Turk wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> The tex.cache issue seems to tbe the big one here. Can you try, in
>>>> serial, launching a single job that imports yt? I think it just needs
>>>> to be bootstrapped once.
>>>>
>>>> -Matt
>>>>
>>>> On Wed, Sep 5, 2012 at 3:50 PM, Eric Hallman <hallman at txcorp.com> wrote:
>>>>
>>>> Well the earlier traceback I posted below is a good start. Seriously if I
>>>>
>>>> post the current error list it's going to be 10M of text. I'll see what I
>>>>
>>>> can come up with in the short term and we can try later on IRC or something.
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Eric
>>>>
>>>>
>>>> On Sep 5, 2012, at 3:45 PM, Nathan Goldbaum wrote:
>>>>
>>>>
>>>> I'm sorry you're having so much trouble. Unfortunately I'm probably not the
>>>>
>>>> best person to advise since I've never run jobs on Kraken. Others on the
>>>>
>>>> list might be more helpful.
>>>>
>>>>
>>>> One thing that would aid tracking down the problem is if you could paste the
>>>>
>>>> errors you're seeing somewhere so that one of us can take a look at it in
>>>>
>>>> detail.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> Nathan
>>>>
>>>>
>>>> On Sep 5, 2012, at 12:42 PM, Eric Hallman wrote:
>>>>
>>>>
>>>> Nathan,
>>>>
>>>> usings pmods in the import leads to an error explosion in the matplotlib
>>>>
>>>> imports. It's even worse than the original test.
>>>>
>>>>
>>>> Eric
>>>>
>>>> On Sep 5, 2012, at 12:42 PM, Nathan Goldbaum wrote:
>>>>
>>>>
>>>> Hi Eric,
>>>>
>>>>
>>>> Exactly right. This is a drop-in replacement for yt.mods on high-latency
>>>>
>>>> parallel filesystems (like Kraken, unfortunately).
>>>>
>>>>
>>>> There's some discussion on the dev mailing list:
>>>>
>>>> http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope.org/2012-January/001760.html
>>>>
>>>>
>>>> Unfortunately this isn't covered in the docs (except for a note in the
>>>>
>>>> changelog) but it should be in there.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> Nathan
>>>>
>>>>
>>>> On Sep 5, 2012, at 9:40 AM, Eric Hallman wrote:
>>>>
>>>>
>>>> Nathan,
>>>>
>>>> I've been off yt for a while, I'm unaware of pmods. It's specific to
>>>>
>>>> parallel I'm guessing?
>>>>
>>>>
>>>> Eric
>>>>
>>>> On Sep 5, 2012, at 12:38 PM, Nathan Goldbaum wrote:
>>>>
>>>>
>>>> Hi Eric,
>>>>
>>>>
>>>> Have you tried from yt.pmods import * instead of the normal yt.mods?
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> Nathan
>>>>
>>>>
>>>> On Sep 5, 2012, at 9:37 AM, Eric Hallman wrote:
>>>>
>>>>
>>>> Hey everyone,
>>>>
>>>>
>>>> so this issue seems like one I've had before, but I searched the lists and
>>>>
>>>> don't find this exact issue.
>>>>
>>>>
>>>>
>>>> On batch jobs on kraken, attempting to do halo finding, I get an almost
>>>>
>>>> immediate crash (with an eternal hang until the time limit is reached) due
>>>>
>>>> to matplotlib. I've been unable to reproduce it in the interactive queue on
>>>>
>>>> kraken, which is frustrating. I'm hoping someone has seen it and can
>>>>
>>>> comment.
>>>>
>>>>
>>>>
>>>> this is with yt/dev on kraken, and I set env variables to MPLCONFIGDIR like
>>>>
>>>> so:
>>>>
>>>>
>>>>
>>>> export MPLCONFIGDIR=${PBS_O_WORKDIR}/.matplotlib/
>>>>
>>>>
>>>> [ ! -d ${MPLCONFIGDIR} ] && mkdir ${MPLCONFIGDIR}
>>>>
>>>>
>>>>
>>>> because if you don't, it fails immediately with perm issues.
>>>>
>>>>
>>>>
>>>> Anyway, it's a simple script and call
>>>>
>>>>
>>>>
>>>> aprun -n 12 python halo_finding.py --parallel
>>>>
>>>>
>>>>
>>>> but the details of the script are not too important, since the job fails
>>>>
>>>> when yt is imported, as so:
>>>>
>>>>
>>>>
>>>> Traceback (most recent call last):
>>>>
>>>>
>>>> File "halo_finding.py", line 1, in <module>
>>>>
>>>>
>>>> from yt.mods import *
>>>>
>>>>
>>>> File
>>>>
>>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/mods.py",
>>>>
>>>> line 115, in <module>
>>>>
>>>>
>>>> from yt.visualization.api import \
>>>>
>>>>
>>>> File
>>>>
>>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/visualization/api.py",
>>>>
>>>> line 34, in <module>
>>>>
>>>>
>>>> from plot_collection import \
>>>>
>>>>
>>>> File
>>>>
>>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/visualization/plot_collection.py",
>>>>
>>>> line 26, in <module>
>>>>
>>>>
>>>> from matplotlib import figure
>>>>
>>>>
>>>> File
>>>>
>>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/figure.py",
>>>>
>>>> line 18, in <module>
>>>>
>>>>
>>>> from axes import Axes, SubplotBase, subplot_class_factory
>>>>
>>>>
>>>> File
>>>>
>>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/axes.py",
>>>>
>>>> line 18, in <module>
>>>>
>>>>
>>>> import matplotlib.contour as mcontour
>>>>
>>>>
>>>> File
>>>>
>>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/contour.py",
>>>>
>>>> line 21, in <module>
>>>>
>>>>
>>>> import matplotlib.texmanager as texmanager
>>>>
>>>>
>>>> File
>>>>
>>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/texmanager.py",
>>>>
>>>> line 72, in <module>
>>>>
>>>>
>>>> class TexManager:
>>>>
>>>>
>>>> File
>>>>
>>>> "/lustre/scratch/proj/sw/yt/dev/lib/python2.7/site-packages/matplotlib/texmanager.py",
>>>>
>>>> line 92, in TexManager
>>>>
>>>>
>>>> os.mkdir(texcache)
>>>>
>>>>
>>>> OSError: [Errno 17] File exists:
>>>>
>>>> '/lustre/scratch/hallman/gigaCubes/run1024/.matplotlib/tex.cache'
>>>>
>>>>
>>>>
>>>> In each case, I have deleted tex.cache before I restart, thinking an old
>>>>
>>>> version persisted there, but the same error happens. The most irritating
>>>>
>>>> thing is that the job does not kick out of the batch system, so the time
>>>>
>>>> continues to run on however many processors you have until the limit is
>>>>
>>>> reached (eternal hang!).
>>>>
>>>>
>>>>
>>>> I hope this is something obvious and I'm just dumb. Let me know.
>>>>
>>>>
>>>>
>>>> Eric
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Eric Hallman
>>>>
>>>>
>>>> Tech-X Corporation hallman at txcorp.com
>>>>
>>>>
>>>> 5621 Arapahoe Ave, Suite A Phone: (720) 254-5833
>>>>
>>>>
>>>> Boulder, CO 80303 Fax: (303) 448-7756
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>>
>>>> yt-users mailing list
>>>>
>>>>
>>>> yt-users at lists.spacepope.org
>>>>
>>>>
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> yt-users mailing list
>>>>
>>>> yt-users at lists.spacepope.org
>>>>
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Eric Hallman
>>>>
>>>> Tech-X Corporation hallman at txcorp.com
>>>>
>>>> 5621 Arapahoe Ave, Suite A Phone: (720) 254-5833
>>>>
>>>> Boulder, CO 80303 Fax: (303) 448-7756
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> yt-users mailing list
>>>>
>>>> yt-users at lists.spacepope.org
>>>>
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> yt-users mailing list
>>>>
>>>> yt-users at lists.spacepope.org
>>>>
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Eric Hallman
>>>>
>>>> Tech-X Corporation hallman at txcorp.com
>>>>
>>>> 5621 Arapahoe Ave, Suite A Phone: (720) 254-5833
>>>>
>>>> Boulder, CO 80303 Fax: (303) 448-7756
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> yt-users mailing list
>>>>
>>>> yt-users at lists.spacepope.org
>>>>
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> yt-users mailing list
>>>>
>>>> yt-users at lists.spacepope.org
>>>>
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Eric Hallman
>>>>
>>>> Tech-X Corporation hallman at txcorp.com
>>>>
>>>> 5621 Arapahoe Ave, Suite A Phone: (720) 254-5833
>>>>
>>>> Boulder, CO 80303 Fax: (303) 448-7756
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> yt-users mailing list
>>>>
>>>> yt-users at lists.spacepope.org
>>>>
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>> _______________________________________________
>>>> yt-users mailing list
>>>> yt-users at lists.spacepope.org
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>> --
>>>> Eric Hallman
>>>> Tech-X Corporation hallman at txcorp.com
>>>> 5621 Arapahoe Ave, Suite A Phone: (720) 254-5833
>>>> Boulder, CO 80303 Fax: (303) 448-7756
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> yt-users mailing list
>>>> yt-users at lists.spacepope.org
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>> _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>>
>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>
>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
--
Eric Hallman
Tech-X Corporation hallman at txcorp.com
5621 Arapahoe Ave, Suite A Phone: (720) 254-5833
Boulder, CO 80303 Fax: (303) 448-7756
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20120905/80d3ca17/attachment.html>
More information about the yt-users
mailing list