[yt-users] Crash when iterating over a long list of FLASH outputs

Matthew Turk matthewturk at gmail.com
Wed Jan 18 09:07:08 PST 2012


Hi Nathan,

On Tue, Jan 17, 2012 at 10:17 PM, Nathan Goldbaum <goldbaum at ucolick.org> wrote:
> Hi Matt,
>
> A test script that opens and closes the file using h5py runs with no problem.
>
> Adding pf._handle.close() to the end of the loop in my original script seems to fix the issue I brought up in my previous e-mail.  Thanks for the suggestion.
>
> Unfortunately, this leads to another failure somewhere else in the loop.  At some point, it tries to close a file and catches an exception:
>
> Exception ValueError: 'invalid file identifier (Invalid arguments to routine: Inappropriate type)' in <bound method FLASHStaticOutput.__del__ of DiskDyn_hdf5_plt_cnt_1741> ignored
>
> Some googling returns this discussion:
>
> http://code.google.com/p/h5py/issues/detail?id=220
>
> Apparently this error happens when h5py tries to close a file that has already been closed.  Can someone give me more details about how yt  closes files?  I appear to be triggering a race condition somehow.

Well, okay.  So a few weeks ago, after another error you reported with
file handling, I went through and I tried to ensure that whenever a
FLASH file is closed, it explicitly closes the file handle. (Keep in
mind this used to be the default behavior for when an h5py File object
was GC'd but no longer is.)  Explicit > implicit, etc.  These calls to
.close() on _handle were inserted into the __del__ method of the
FLASHStaticOutput object.  Unfortunately, there are two problems with
putting it here.  The first is that there is a persistent weak
reference dictionary mapping PF hashes onto PF objects, and the second
is that the __del__ method is notoriously flaky (
http://docs.python.org/reference/datamodel.html#object.__del__ ).  I
could see both of these holding onto handles longer than they ought
to.  As a simple test, maybe inside your loop you could explicitly
garbage collect with gc.collect() (you have to import gc).

I'll try to add in some testing / tracking code to run this down a bit
before the workshop but I can't promise it'll be ready before talks
start on Tuesday.

-Matt

>
> -Nathan
>
> On Jan 17, 2012, at 6:56 PM, Matthew Turk wrote:
>
>> Hi Nathan,
>>
>> On Tue, Jan 17, 2012 at 8:49 PM, Nathan Goldbaum <goldbaum at ucolick.org> wrote:
>>> Hi all,
>>>
>>> I'm having a curious issue.  I'm trying to open a large number of FLASH
>>> parameter files, read in some vital statistics about each file, and then
>>> write out the data to a ascii table.
>>>
>>> However, for some reason I'm finding that this crashes after 1020 data files
>>> have been opened. There's nothing special about the 1020th file since I can
>>> open it and inspect it interactively.  This may have something to do with me
>>> running this script on the lustre filesystem on Pleiades at NASA Ames.
>>
>> This is puzzling.  My guess is that something about HDF5's internal
>> reference counting, or h5py, or something like that is causing these
>> issues.  My initial guess for a workaround would be to try os.fork,
>> but having now tested it I'm not sure that will reset the file handles
>> and so forth.
>>
>>>
>>> My test script is pasted here:
>>>
>>> http://paste.yt-project.org/show/2039/
>>>
>>> And a sample traceback is pasted here:
>>>
>>> http://paste.yt-project.org/show/2040/
>>>
>>> It looks like the crash happens when yt tries to open the h5py handle for
>>> the data file.  It crashes as if the data file doesn't exist, even though it
>>> really does, I promise.  This seems to be related to an h5py issue that John
>>> ZuHone noticed back in July:
>>>
>>> http://lists.spacepope.org/htdig.cgi/yt-users-spacepope.org/2011-July/001703.html
>>>
>>> Adding 'del pf._handle' to the end of my loop does not fix the issue.
>>
>> Can you reproduce it inside a small loop, without yt, just using h5py
>> and calling .close() on each file?
>>
>> -Matt
>>
>>>
>>> Thanks for your help with this!
>>>
>>> Nathan Goldbaum
>>> Graduate Student
>>> Astronomy & Astrophysics, UCSC
>>> goldbaum at ucolick.org
>>> http://www.ucolick.org/~goldbaum
>>>
>>>
>>> _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>> !DSPAM:10175,4f1634dc34752151017521!
>>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org



More information about the yt-users mailing list