[Yt-dev] Release 1.7

Stephen Skory stephenskory at yahoo.com
Mon Mar 29 18:09:09 PDT 2010


All,

> think Stephen's working on getting a better (or at least different)
> kD-tree 
> implementation into the source tree and testing his structure
> function code 
> before this release goes 
> out.


The story is that there are lots of kD-trees out there and they all have their plusses and their minuses. I have various reasons why I wish there was a 'best' solution out there. I am seeing memory leaks, or increases of some kind, when running parallel HOP repeatedly (as in a loop like my merger tree). Both of my new features going in 1.7 use a kD-tree, although the merger tree doesn't need a fast one.

The fortran kdtree I'm using with parallel HOP right now is fast and accurate, but because of the static-build requirement for Kraken, I'm using Forthon which has its flaws. It's kind of opaque and I am still not 100% certain that it frees memory correctly. I've done lots of testing today and I think the answer is 'no', but I am not fully certain.

I spent time today also testing parallel HOP using the Scipy-spatial kD-tree. This is easier to build than the fortran one (if you separate it out from the rest of Scipy), but it's much slower and uses more memory than the fortran one. The reason I don't think that the fortran one leaks memory is I see similar kinds of memory increases with this kD-tree over cycles.

For example, here is the 'dowser' output from a three-cycle run on parallel HOP (using Fortran tree). The little graphs plot number of items of that type vs time. Note the items that have a distinct three-step rise. I think that's the memory leak. Matt thinks that this might have something to do with particle IO, but it might be someplace else.

http://stephenskory.com/memdebug/mem.html

There is a C++ kdtree out there, but I don't think it's any better than what we have, and probably worse in terms of speed an memory.

HOP/FOF have a kdtree internally which is fast (more or less identical for the two), but it gives wrong answers. That might be fixable, but I don't know.

I also included a python-native kd tree in the source some time ago, but I have now decided that it is crap. It's really very, very slow and it gives wrong answers.

What I want is to settle on a single kD-tree that isn't a pain, is fast, accurate and builds easily. Matt doesn't like Forthon and I can't blame him, but the alternatives are not great. We're hoping that Fwrap, announced at the scipy conference a year ago, will be a good solution, but it remains unfinished.

There are a couple places that use the python kD-tree (Merger Tree, halo nearest neighbor stuff) that I'd like to replace with an accurate kD-tree before 1.7 release, but I don't know which one to use. Scipy-spatial is easy to build, and speed is not a concern, but then that's another kD-tree when the fortran one is perfectly fine. Perhaps having multiple ones until Fwrap comes out is the best solution, or automatically building the fortran one if possible.

Thanks for reading this far.
 _______________________________________________________
sskory at physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
________________________________(_)_\(_)_______________




More information about the yt-dev mailing list