[yt-users] error when running on cluster nodes

Tsz Ka Li tszkali2 at illinois.edu
Fri Feb 18 13:28:20 PST 2011


Hi Matt,

I understand the difficulty. The problem is I cannot run programs very 
long on head node. Right now I am trying to build a merger tree for a 
256^3 dark matter only run. The data size depends on the number of 
output dumps I use. Perhaps I can reduce the data size to make the 
analysis more efficient.

I am not sure if these are the ldd outputs you might want. You may 
ignore them you find them not useful.

on compute node:
[tszkali2 at tur2-26 ~]$ ldd 
/turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so
         linux-vdso32.so.1 =>  (0x00100000)
         libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x6fe54000)
         libm.so.6 => /lib/libm.so.6 (0x6fd63000)
         libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x6fd28000)
         libpthread.so.0 => /lib/libpthread.so.0 (0x6fce5000)
         libc.so.6 => /lib/libc.so.6 (0x6fb21000)
         /lib/ld.so.1 (0x20510000)

on head node:
[tszkali2 at turing-3 tszkali2]$  ldd 
/turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so
         linux-vdso32.so.1 =>  (0x00100000)
         libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x6fe54000)
         libm.so.6 => /lib/libm.so.6 (0x6fd63000)
         libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x6fd2b000)
         libpthread.so.0 => /lib/libpthread.so.0 (0x6fce8000)
         libc.so.6 => /lib/libc.so.6 (0x6fb24000)
         /lib/ld.so.1 (0x2033b000)

I am grateful for your input anyway.

Thanks,
Tsz Ka



On 2/18/2011 2:49 PM, Matthew Turk wrote:
> Hi Tsz Ka,
>
> For me, otool is just in /usr/bin -- so it sounds like it may not be
> there.  There is a chance that 'ldd' will also work for this purpose.
>
> It sounds to me like this could be an issue of the DYLD_LIBRARY_PATH
> being confused, as well as possibly being unable to load various
> libraries.  I'm not sure that I'm able to debug this from a distance
> -- the combination of the ppc64 cluster (which I am unfamiliar with)
> and the particulars of the local disk are making things a bit
> difficult.  Any chance you could run the analysis on the head node?
> How big is your dataset?  It may not benefit too much from
> parallelism.
>
> -Matt
>
> On Fri, Feb 18, 2011 at 3:39 PM, Tsz Ka Li<tszkali2 at illinois.edu>  wrote:
>> Hi Matt,
>> I am afraid otool is not installed on the Turing cluster. (I got command not
>> found.) Do you know of any substitute for that tool?
>> Actually I did not follow the whole Cray installation instructions, but just
>> those in the Section "Running on a compute node". So basically I just copied
>> the yt directory to the scratch disk and used the python there. Maybe this
>> does not make sense to you. Anyway sorry for the confusion.
>> Thanks,
>> Tsz Ka
>>
>>
>> On 2/18/2011 9:26 AM, Matthew Turk wrote:
>>> Hi Tsz Ka,
>>>
>>> I have been thinking about it, and I am afraid I don't have any bright
>>> ideas.  The only thing I can think of is that we may be able to get
>>> more information about the dynamic loading problem by having you run:
>>>
>>> otool -L
>>>   /turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so
>>>
>>> I would again caution though that if you attempted to create a
>>> statically linked library, this likely resulted in major breakages.
>>> The static linking process described on the wiki is only really
>>> designed for the Cray Compute Node Linux distribution, and it has a
>>> fundamentally different mode of linking than Darwin/OSX.
>>>
>>> Could you send the output of that to us, run on both the compute nodes
>>> and on the head node?  I presume loading yt still runs on the head
>>> node?
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> On Thu, Feb 17, 2011 at 12:38 PM, Tsz Ka Li<tszkali2 at illinois.edu>    wrote:
>>>> Hi Stephen,
>>>> I did try to include the -V flag in the interactive mode (i.e. qsub -I
>>>> -V)
>>>> and I saw no difference in the error outcome.
>>>> Thanks,
>>>> Tsz Ka
>>>>
>>>> On 2/17/2011 11:19 AM, Stephen Skory wrote:
>>>>> Tsz Ka,
>>>>>
>>>>>> I used the same python (~/yt-ppc64/bin/python) as I did in the command
>>>>>> line. I think I tried python2.6 but I got the same error. (Actually how
>>>>>> are they different?) The machine I am using is the Turing cluster in UI
>>>>>> (http://www.cse.illinois.edu/turing/), which is an Apple Xserve cluster
>>>>>> using G5 processors. You can find on the website some information about
>>>>>> it. Sorry I am not that familiar with cluster issues. I suppose the
>>>>>> nodes have full installation, though I don't know where to check. I am
>>>>>> using the default compilers on Turing, which are gcc for C/C++ and xlf
>>>>>> for Fortran.
>>>>> I'm going to chime in here. I noticed that on their FAQ page:
>>>>>
>>>>> http://www.cse.illinois.edu/turing/faq.html
>>>>>
>>>>> There's something about path problems on compute nodes. Your error could
>>>>> be due to a library problem (i.e. DYLD_LIBRARY_PATH). Have you tried
>>>>> submitting your job with a "#PBS -V" that keeps your environment for the
>>>>> job?
>>>>>
>>>>> Matt may have some other bright ideas...
>>>>>
>>>>>
>>>>>   Stephen Skory
>>>>> stephenskory at yahoo.com
>>>>> http://stephenskory.com/
>>>>> 510.621.3687 (google voice)
>>>>> _______________________________________________
>>>>> yt-users mailing list
>>>>> yt-users at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>
>>>> _______________________________________________
>>>> yt-users mailing list
>>>> yt-users at lists.spacepope.org
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>



More information about the yt-users mailing list