CUDA v1.10 crash
log in

Advanced search

Message boards : Number crunching : CUDA v1.10 crash

Author Message
ConflictingEmotions
Send message
Joined: 5 Aug 09
Posts: 5
Credit: 21,236,620
RAC: 0
Message 437 - Posted: 7 Aug 2009, 21:40:28 UTC

I have gotten 5 WUs that fail with same error. Two also failed with my wingman.
163626
162988

Although this one worked for my wingman
155832

These appear to crash the display driver on my 64-bit Vista with driver version 190.38 but the display driver appears to restart.


<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Beginning processing...
Collatz CUDA v1.10 (GPU Optimized Application)
worker: trying boinc_get_init_data()...
Looking for checkpoint file...
No checkpoint file found. Starting at beginning.
Success in SetCUDABlockingSync for device 0
CUDA Error: the launch timed out and was terminated
CUDA Kernel failed
called boinc_finish

</stderr_txt>
]]>

ConflictingEmotions
Send message
Joined: 5 Aug 09
Posts: 5
Credit: 21,236,620
RAC: 0
Message 589 - Posted: 15 Aug 2009, 2:07:20 UTC - in response to Message 437.

I still get a few of these. It seems like a race condition because I have two GPUs and the reporting time is usually about 1 sec apart.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 2
Message 596 - Posted: 15 Aug 2009, 15:25:42 UTC - in response to Message 589.

I still get a few of these. It seems like a race condition because I have two GPUs and the reporting time is usually about 1 sec apart.


"Success in SetCUDABlockingSync for device 0"
Just curious, does it ever show anything other than device 0? e.g. Does one show device 0 and the other device 1?

ConflictingEmotions
Send message
Joined: 5 Aug 09
Posts: 5
Credit: 21,236,620
RAC: 0
Message 617 - Posted: 17 Aug 2009, 14:39:23 UTC - in response to Message 596.

My system has a GTX 295 so it may be also due to the 190 driver series: "Added support to make all GPUs within an SLI group available for CUDA applications to use." That could be a source of the driver crash because I originally did not change now the GPUs are handled.

So I have is to ensure that is a time difference between jobs and also make sure that each GPU is treated separately. So far there have been no crashes.

This is the only WU that is a little different from the others:

http://boinc.thesonntags.com/collatz/result.php?resultid=423202
<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Beginning processing...
Collatz CUDA v1.10 (GPU Optimized Application)
worker: trying boinc_get_init_data()...
Looking for checkpoint file...
No checkpoint file found. Starting at beginning.
Success in SetCUDABlockingSync for device 1
Beginning processing...
Collatz CUDA v1.10 (GPU Optimized Application)
worker: trying boinc_get_init_data()...
Looking for checkpoint file...
Checkpoint file found. Resuming at 2361183887666881210728

Success in SetCUDABlockingSync for device 0
CUDA Error: the launch timed out and was terminated
CUDA Kernel failed
called boinc_finish

</stderr_txt>
]]>

[c@b]galle
Send message
Joined: 7 Aug 09
Posts: 2
Credit: 90,091,309
RAC: 0
Message 731 - Posted: 27 Aug 2009, 11:06:20 UTC

i don´t know what´s wrong. i´m crunching with a 295gtx, and when one of the wus ends, the other shows error half of the times(the other half it keeps working properly)... the screen goes black and then a windows7 message appears saying that there was some kind of problem with the nvidia drivers or whatever...

the system is a i7 920 OC to 3ghz and the 295gtx

i´d like to ask you about the possibility of nvidia and ati cards working together ¬¬ i read it´s possible under XP but not with vista... and any idea under win7?

Profile JockMacMad TSBT
Send message
Joined: 29 Jul 09
Posts: 12
Credit: 5,954,782
RAC: 0
Message 803 - Posted: 31 Aug 2009, 14:23:48 UTC

It works under 64-bit windows 7 for sure I have a machine doing so.

NekdDrgn
Send message
Joined: 28 Aug 09
Posts: 4
Credit: 10,790,546
RAC: 0
Message 1003 - Posted: 10 Sep 2009, 9:46:11 UTC

I get this all the time. I am running GTX260 core 216's in SLI under Windows 7 and 190.38 drivers. (crashes happened with 190.62 also). I cannot run collatz at all and I don't want to disable SLI as it is a pain to go back and fourth when other projects (seti, gpugrid) work fine with SLI enabled.

Any chance in this getting looked into? If not, I'll just not do this project with the machine having the issue.

Thanks,

-a/j

[B^S] DonaldXP
Send message
Joined: 24 Jul 09
Posts: 2
Credit: 10,753,057
RAC: 0
Message 1015 - Posted: 11 Sep 2009, 8:02:29 UTC
Last modified: 11 Sep 2009, 8:02:55 UTC

hmm,i got error-WUs as well,but i don't have a SLI-setup.
just a 9800 GTX+...

<core_client_version>6.10.4</core_client_version>
<![CDATA[
<message>
- exit code -108 (0xffffff94)
</message>
<stderr_txt>
Beginning processing...
Collatz CUDA v1.10 (GPU Optimized Application)
worker: trying boinc_get_init_data()...
Error opening workunit
called boinc_finish

</stderr_txt>
]]>

Atomic Cow
Send message
Joined: 4 Sep 09
Posts: 5
Credit: 79,025
RAC: 0
Message 1023 - Posted: 11 Sep 2009, 14:05:25 UTC

I've got a lot of error results in the last week:
http://boinc.thesonntags.com/collatz/results.php?userid=1736&offset=0&show_names=0&state=5

Windows 7 RC(x64) build 7100
i7-920 2.67ghz
12gb RAM
2x 275gtx/896mb in sli
driver version 190.62

Seems to have caused a restart as well last night.

PeteS
Send message
Joined: 17 Jul 09
Posts: 20
Credit: 54,228,605
RAC: 0
Message 1042 - Posted: 13 Sep 2009, 18:28:05 UTC - in response to Message 1023.

Same problems, can't run this on my SLI 275GTX rig. After driver crash i drop to 2D clocks and can't change back to full speed without restart.

NekdDrgn
Send message
Joined: 28 Aug 09
Posts: 4
Credit: 10,790,546
RAC: 0
Message 1083 - Posted: 15 Sep 2009, 19:29:54 UTC

So... any thoughts on this Admin? Are we just hitting some random bug with 190 drivers and this app?

Bymark
Avatar
Send message
Joined: 28 Jul 09
Posts: 78
Credit: 585,167,010
RAC: 1,095,405
Message 1084 - Posted: 15 Sep 2009, 19:47:19 UTC - in response to Message 1042.
Last modified: 15 Sep 2009, 19:59:36 UTC

Same problems, can't run this on my SLI 275GTX rig. After driver crash i drop to 2D clocks and can't change back to full speed without restart.


Really? This is why I run my 260 here if failed for same reason in gpugrid!
My one 260 is running fine here, that one that crash on gpugrid. Now 260 is here and 2 on gpugrid as typing. (+5 4850 on ati cal on mv 0,20 and some slower cuda cards here)
____________

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 2
Message 1085 - Posted: 15 Sep 2009, 20:13:57 UTC - in response to Message 1042.

Same problems, can't run this on my SLI 275GTX rig. After driver crash i drop to 2D clocks and can't change back to full speed without restart.


The 1.10 app is compiled with CUDA v2.1. I am fairly certain that in order to crunch with SLI enabled, CUDA v2.3 is required. Have to tried disabling SLI to see if that makes a difference?

FYI, v2.0, which is currently being tested, uses CUDA v2.3 and I believe will require nVidia driver v190.38 or higher.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 2
Message 1114 - Posted: 16 Sep 2009, 23:29:04 UTC

I don't run Vista or Win7 or have any boxes with SLI. So... any volunteers willing to help test the v2.0 CUDA app? If so, please PM me with the specifics of the OS, cpus, and CUDA card.


Post to thread

Message boards : Number crunching : CUDA v1.10 crash


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.