Posts by ConflictingEmotions
log in
1) Message boards : Number crunching : your BOINC client is old - please install ... (Message 1864)
Posted 3042 days ago by ConflictingEmotions
Since when did the project mandate the use of developmental versions that are usually unstable? This is not included on the main page.

If you read the different threads there are a large amount of problems with 6.10.x series. I tried 6.10.4 and it totally messed up the GPU WU schedules. While 6.10.5 took a while, it was quickly replaced by 6.10.6.

Bruce
2) Message boards : Number crunching : Errors noticed on Wingmen machines (Message 640)
Posted 3077 days ago by ConflictingEmotions
Some of the errors are the same that I mentioned elsewhere. The new Nvida driver was meant to be able to treat GPUs in SLI are independent but appears to have a race condition. On my system disabling SLI mode worked as well as ensuring a large time gaps between the two cuda jobs.
3) Message boards : Number crunching : CUDA v1.10 crash (Message 617)
Posted 3079 days ago by ConflictingEmotions
My system has a GTX 295 so it may be also due to the 190 driver series: "Added support to make all GPUs within an SLI group available for CUDA applications to use." That could be a source of the driver crash because I originally did not change now the GPUs are handled.

So I have is to ensure that is a time difference between jobs and also make sure that each GPU is treated separately. So far there have been no crashes.

This is the only WU that is a little different from the others:

http://boinc.thesonntags.com/collatz/result.php?resultid=423202
<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Beginning processing...
Collatz CUDA v1.10 (GPU Optimized Application)
worker: trying boinc_get_init_data()...
Looking for checkpoint file...
No checkpoint file found. Starting at beginning.
Success in SetCUDABlockingSync for device 1
Beginning processing...
Collatz CUDA v1.10 (GPU Optimized Application)
worker: trying boinc_get_init_data()...
Looking for checkpoint file...
Checkpoint file found. Resuming at 2361183887666881210728

Success in SetCUDABlockingSync for device 0
CUDA Error: the launch timed out and was terminated
CUDA Kernel failed
called boinc_finish

</stderr_txt>
]]>
4) Message boards : Number crunching : CUDA v1.10 crash (Message 589)
Posted 3081 days ago by ConflictingEmotions
I still get a few of these. It seems like a race condition because I have two GPUs and the reporting time is usually about 1 sec apart.
5) Message boards : Number crunching : CUDA v1.10 crash (Message 437)
Posted 3088 days ago by ConflictingEmotions
I have gotten 5 WUs that fail with same error. Two also failed with my wingman.
163626
162988

Although this one worked for my wingman
155832

These appear to crash the display driver on my 64-bit Vista with driver version 190.38 but the display driver appears to restart.


<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Beginning processing...
Collatz CUDA v1.10 (GPU Optimized Application)
worker: trying boinc_get_init_data()...
Looking for checkpoint file...
No checkpoint file found. Starting at beginning.
Success in SetCUDABlockingSync for device 0
CUDA Error: the launch timed out and was terminated
CUDA Kernel failed
called boinc_finish

</stderr_txt>
]]>




Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.