CUDA WU's errors
log in

Advanced search

Message boards : Windows : CUDA WU's errors

1 · 2 · Next
Author Message
dunx
Send message
Joined: 23 Jul 11
Posts: 3
Credit: 157,570,516
RAC: 0
Message 12616 - Posted: 25 Jul 2011, 16:23:08 UTC
Last modified: 25 Jul 2011, 16:24:06 UTC

HELP

As a "newbie" to this project I've little background so please allow me some slack...

I have 3 x GTX 460's in my main PC, but this is the output of a typical error'd out WU.

Task 89368315

Name collatz_2370710386473519786344_824633720832_3
Workunit 38765543
Created 23 Jul 2011 16:38:49 UTC
Sent 23 Jul 2011 16:44:19 UTC
Received 24 Jul 2011 23:24:13 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -1073741819 (0xffffffffc0000005)
Computer ID 70649
Report deadline 30 Jul 2011 16:44:19 UTC
Run time 20,990.44
CPU time 3.20
Validate state Invalid
Credit 0.00
Application version collatz v2.05 (cuda31)
Stderr output

<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
Collatz v2.05 for CUDA 3.1
Start 2370710386473519786344
Checking 824,633,720,832 numbers
Calculate 65,536 items per loop
Loop 32 times per reduction
Sleep 1 ms while waiting
Device GeForce 8800 GTS
Memory 262 MB (196/262 free/total MB)


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x006AFA4F read attempt to address 0x030548E4

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.11.0


Dump Timestamp : 07/25/11 00:03:19
Install Directory :
Data Directory : C:\ProgramData\BOINC
Project Symstore :
LoadLibraryA( C:\ProgramData\BOINC\dbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( C:\ProgramData\BOINC\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\ProgramData\BOINC\slots\2;C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz


ModLoad: 00400000 00072000 C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz\collatz_2.05_windows_x86_64__cuda31.exe (-nosymbols- Symbols Loaded)
Linked PDB Filename :

ModLoad: 77a30000 00180000 C:\Windows\SysWOW64\ntdll.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wntdll.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 75a90000 00100000 C:\Windows\syswow64\kernel32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wkernel32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 77370000 00046000 C:\Windows\syswow64\KERNELBASE.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wkernelbase.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 10000000 00040000 C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz\cudart32_31_9.dll (6.14.11.3010) (-exported- Symbols Loaded)
Linked PDB Filename :
File Version : 6,14,11,3010
Company Name : NVIDIA Corporation
Product Name : NVIDIA CUDA 3.1.9 Runtime
Product Version : 6,14,11,3010

ModLoad: 00660000 00552000 C:\Windows\system32\nvcuda.dll (8.17.12.7550) (-exported- Symbols Loaded)
Linked PDB Filename :
File Version : 8.17.12.7550
Company Name : NVIDIA Corporation
Product Name : NVIDIA CUDA 4.0.1 driver
Product Version : 8.17.12.7550

ModLoad: 771d0000 00100000 C:\Windows\syswow64\USER32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wuser32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 76200000 00090000 C:\Windows\syswow64\GDI32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wgdi32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 772d0000 0000a000 C:\Windows\syswow64\LPK.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wlpk.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 76020000 0009d000 C:\Windows\syswow64\USP10.dll (1.626.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : usp10.pdb
File Version : 1.0626.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Uniscribe Unicode script processor
Product Version : 1.0626.7600.16385

ModLoad: 773c0000 000ac000 C:\Windows\syswow64\msvcrt.dll (7.0.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : msvcrt.pdb
File Version : 7.0.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 7.0.7600.16385

ModLoad: 75ba0000 000a0000 C:\Windows\syswow64\ADVAPI32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : advapi32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 75800000 00019000 C:\Windows\SysWOW64\sechost.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : sechost.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 76110000 000f0000 C:\Windows\syswow64\RPCRT4.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wrpcrt4.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 755a0000 00060000 C:\Windows\syswow64\SspiCli.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wsspicli.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 75590000 0000c000 C:\Windows\syswow64\CRYPTBASE.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : cryptbase.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 758f0000 0019d000 C:\Windows\syswow64\SETUPAPI.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : setupapi.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 76550000 00027000 C:\Windows\syswow64\CFGMGR32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : cfgmgr32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

Get Product Name Failed.
ModLoad: 772e0000 0008f000 C:\Windows\syswow64\OLEAUT32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : oleaut32.pdb
File Version : 6.1.7600.16385
Company Name : Microsoft Corporation
Product Name :
Product Version : 6.1.7600.16385

ModLoad: 77470000 0015c000 C:\Windows\syswow64\ole32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : ole32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 75f70000 00012000 C:\Windows\syswow64\DEVOBJ.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : devobj.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 75d60000 00060000 C:\Windows\system32\IMM32.DLL (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wimm32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 75730000 000cc000 C:\Windows\syswow64\MSCTF.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : msctf.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 6e690000 0025d000 C:\Windows\system32\nvapi.dll (8.17.12.7550) (-exported- Symbols Loaded)
Linked PDB Filename : c:\dvs\p4\build\sw\rel\gpu_drv\r275\r275_37\drivers\nvapi\_out\win7_wow64_release\nvapi.pdb
File Version : 8.17.12.7550
Company Name : NVIDIA Corporation
Product Name : NVIDIA Windows drivers
Product Version : 8.17.12.7550

ModLoad: 775d0000 00057000 C:\Windows\syswow64\SHLWAPI.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : shlwapi.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 76580000 00c49000 C:\Windows\syswow64\SHELL32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : shell32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 712a0000 00009000 C:\Windows\system32\VERSION.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : version.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 76310000 0002d000 C:\Windows\syswow64\WINTRUST.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wintrust.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 75c40000 0011c000 C:\Windows\syswow64\CRYPT32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : crypt32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 75b90000 0000c000 C:\Windows\syswow64\MSASN1.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : msasn1.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385

ModLoad: 71170000 000eb000 C:\Windows\system32\dbghelp.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : dbghelp.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 6.1.7600.16385



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 0, Write: 0, Other 0

- I/O Transfers Counters -
Read: 0, Write: 0, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 0, QuotaPeakPagedPoolUsage: 0
QuotaNonPagedPoolUsage: 0, QuotaPeakNonPagedPoolUsage: 0

- Virtual Memory Usage -
VirtualSize: 0, PeakVirtualSize: 0

- Pagefile Usage -
PagefileUsage: 0, PeakPagefileUsage: 0

- Working Set Size -
WorkingSetSize: 0, PeakWorkingSetSize: 0, PageFaultCount: 0

*** Dump of thread ID 1520 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x006AFA4F read attempt to address 0x030548E4

- Registers -
eax=030548e0 ebx=00000000 ecx=0018f6e8 edx=00000000 esi=00000000 edi=030548e0
eip=006afa4f esp=0018f6c8 ebp=00000000
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202

- Callstack -
ChildEBP RetAddr Args to Child
00000000 00000000 00000000 00000000 00000000 00000000 nvcuda!cuD3D11CtxCreate+0x0

*** Dump of thread ID 2392 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=0295ff48 edi=00000000
eip=77a4fd31 esp=0295ff04 ebp=0295ff6c
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
0295ff6c 77383520 00000064 00000000 0295ff88 00449b54 ntdll!ZwDelayExecution+0x0
0295ff7c 00449b54 00000064 0295ff94 75aa3677 00000000 KERNELBASE!Sleep+0x0
0295ff88 75aa3677 00000000 0295ffd4 77a69d72 00000000 collatz_2.05_windows_x86_64__cu!+0x0
0295ff94 77a69d72 00000000 7566a414 00000000 00000000 kernel32!BaseThreadInitThunk+0x0
0295ffd4 77a69d45 00449b40 00000000 00000000 00000000 ntdll!RtlInitializeExceptionChain+0x0
0295ffec 00000000 00449b40 00000000 00000000 00000000 ntdll!RtlInitializeExceptionChain+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

Any help from those wise volunteers out there ?

TIA.

dunx

zombie67 [MM]
Volunteer tester
Avatar
Send message
Joined: 3 Jul 09
Posts: 156
Credit: 612,751,063
RAC: 276
Message 12696 - Posted: 30 Aug 2011, 12:39:39 UTC

Did this ever get resolved? I am having the exact same problem I think. I am getting 100% failures with this machine:

http://boinc.thesonntags.com/collatz/show_host_detail.php?hostid=73675

It is a SB win7 64, dual 590 (4x GPUs). Driver version 275.50. This machine has no problems with PG, distrrtgen, or gpugrid.
____________
Dublin, California
Team: SETI.USA

Profile Odicin
Avatar
Send message
Joined: 20 Feb 11
Posts: 14
Credit: 538,690,376
RAC: 0
Message 12697 - Posted: 31 Aug 2011, 15:51:20 UTC

I have the same Problem at one of my machines: http://boinc.thesonntags.com/collatz/forum_thread.php?id=779 and I'm not the only one http://boinc.thesonntags.com/collatz/forum_thread.php?id=780.

Regards Odi
____________

Umlüx@Springerreisen
Send message
Joined: 5 Aug 10
Posts: 1
Credit: 43,221,290
RAC: 0
Message 12702 - Posted: 5 Sep 2011, 8:53:57 UTC

me too!
http://boinc.thesonntags.com/collatz/show_host_detail.php?hostid=34379

mini collatz are working, but the rest has 100% error rate.

Profile mikey
Avatar
Send message
Joined: 11 Aug 09
Posts: 3242
Credit: 1,693,887,027
RAC: 5,445,511
Message 12704 - Posted: 5 Sep 2011, 11:54:09 UTC - in response to Message 12702.

me too!
http://boinc.thesonntags.com/collatz/show_host_detail.php?hostid=34379

mini collatz are working, but the rest has 100% error rate.


You can go into your account and select to only get the mini or full length units. If you set up the mini's on say the Default and then the full length units on Home you can then manage all your pc's the way you want, and maybe crunch more with fewer errors.

icook2
Send message
Joined: 19 Sep 11
Posts: 1
Credit: 115,223
RAC: 0
Message 12757 - Posted: 21 Sep 2011, 18:05:08 UTC - in response to Message 12704.

i have the same problem. cuda supported quadro fx 880m gpu. all collatz gpu tasks fail at the end of the task. someone needs to fix this, it's a shame to let these gpu's go to waste since they can crunch these problems 4x compared to cpu in my case. i'm going to devote my gpu to other boinc tasks which work until i've heard this issue is resolved.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 12764 - Posted: 23 Sep 2011, 5:15:41 UTC

It seems that the cuda 2.3 app is much more stable than the one compiled under cuda 3.1. Maybe I should just remove the 3.1 apps so everyone gets the 2.1 app by default. Thoughts?

Profile mikey
Avatar
Send message
Joined: 11 Aug 09
Posts: 3242
Credit: 1,693,887,027
RAC: 5,445,511
Message 12766 - Posted: 23 Sep 2011, 12:38:53 UTC - in response to Message 12764.

It seems that the cuda 2.3 app is much more stable than the one compiled under cuda 3.1. Maybe I should just remove the 3.1 apps so everyone gets the 2.1 app by default. Thoughts?


What is the downside to going back? If the wu's run in similar times and do similar amounts of work and nothing is needed by the user, then go back. Maybe pick a few people and try it out to make sure it still works okay first. Positive results, to me, are much better than non positive ones.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 12783 - Posted: 25 Sep 2011, 17:21:31 UTC - in response to Message 12766.

It seems that the cuda 2.3 app is much more stable than the one compiled under cuda 3.1. Maybe I should just remove the 3.1 apps so everyone gets the 2.1 app by default. Thoughts?


What is the downside to going back? If the wu's run in similar times and do similar amounts of work and nothing is needed by the user, then go back. Maybe pick a few people and try it out to make sure it still works okay first. Positive results, to me, are much better than non positive ones.


The code is the same. The only difference is the CUDA version used to compile it. If there were improvements in the CUDA 3.1 driver that improve speed, you might see a very slight increase in performance, but if the 3.1 app doesn't work at all, using the CUDA 2.3 version is a 100% increase in performance.

When I said, "remove the 3.1 app" I mean as a standard app. I'd still leave it out there as an opt app since it seems to work for some.

uBronan
Avatar
Send message
Joined: 30 Aug 09
Posts: 110
Credit: 29,134,968
RAC: 0
Message 12790 - Posted: 27 Sep 2011, 12:07:51 UTC
Last modified: 27 Sep 2011, 12:19:28 UTC

I think some of the errors can be explained by using combinations of cards or in most cases the unstable drivers issues
I read the first post this person stated i run 3 x 460GTX but the log error is for an 8800 GTS.
I remember a friend of me running 2 x 580 cards but kept his old card todo the physx part because he wanted to get max performance of his new cards.
I wonder if using physx can be the cause of problems as well, on my cuda box i removed the physx since i do not see any need for it on a server xD
To find what is really going on more knowledge needs to be gathered about hardware and what is installed driver wise and if physx has been installed.
I think you need the most stable version as the standard and newer for test as a added program by the use of app_info.xml untill it has proven to run almost error free
Maybe its a good idea to put 2 examples of a good app_info for the different platforms x86 and x64 included to avoid getting problems, and/or a minor help file with options and instructions for install

DaleStan
Send message
Joined: 22 Mar 10
Posts: 2
Credit: 257,264,356
RAC: 4,468
Message 12842 - Posted: 22 Oct 2011, 15:25:26 UTC - in response to Message 12783.

The only difference is the CUDA version used to compile it. If there were improvements in the CUDA 3.1 driver that improve speed, you might see a very slight increase in performance, but if the 3.1 app doesn't work at all, using the CUDA 2.3 version is a 100% increase in performance.

Just chiming in that this is the behaviour I am seeing too.

I just got a new computer, and was getting cuda23 WUs that were crunching properly. Then I noticed I had an out of date video video driver, so I upgraded, and now I'm getting cuda31 WUs that error out after five hours of crunching.

I just banished PhysX to my CPU, so we'll see what happens.

Profile Odicin
Avatar
Send message
Joined: 20 Feb 11
Posts: 14
Credit: 538,690,376
RAC: 0
Message 12852 - Posted: 25 Oct 2011, 6:35:35 UTC

I updated at latest Nvidia 285.62 and now everything runs fine on my hosts.

Regards Odi
____________

DaleStan
Send message
Joined: 22 Mar 10
Posts: 2
Credit: 257,264,356
RAC: 4,468
Message 12967 - Posted: 19 Nov 2011, 6:11:56 UTC

Likewise. Banishing PhysX (with the previous driver version, which might have been 260?) seemed to fix it; upgrading to 285.62 moved PhysX back to the GPU, but did not reintroduce the failing workunits.

Profile kmanley57
Send message
Joined: 1 Apr 12
Posts: 6
Credit: 32,623,691
RAC: 0
Message 13765 - Posted: 12 Apr 2012, 17:46:11 UTC - in response to Message 12967.

I have noticed ALL my cuda23 WU's seem to stop processing after about an hour, but suspending then resuming the task after boinc has started another WU, I then come back and resume the hung WU later when the new cuda23 WU stops processing.

Both installed cards do the same thing.

Processor: 4 AuthenticAMD AMD FX(tm)-4100 Quad-Core Processor [Family 21 Model 1 Stepping 2]
Processor: 2.00 MB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm svm sse4a osvw ibs xop skinit wdt lwp fma4 page1gb rdtscp
OS: Microsoft Windows Vista: Ultimate x86 Edition, Service Pack 2, (06.00.6002.00)
Memory: 2.99 GB physical, 6.17 GB virtual
Disk: 76.69 GB total, 15.69 GB free
Local time is UTC -6 hours
NVIDIA GPU 0: GeForce 9800 GT (driver version 296.10, CUDA version 4.20, compute capability 1.1, 1024MB, 976MB available, 504 GFLOPS peak)
NVIDIA GPU 1: GeForce GTS 450 (driver version 296.10, CUDA version 4.20, compute capability 2.1, 2048MB, 1954MB available, 601 GFLOPS peak)
OpenCL: NVIDIA GPU 0 (not used): GeForce 9800 GT (driver version 296.10, device version OpenCL 1.0 CUDA, 1024MB, 976MB available)
OpenCL: NVIDIA GPU 1 (not used): GeForce GTS 450 (driver version 296.10, device version OpenCL 1.1 CUDA, 2048MB, 1954MB available)

Paste of WU that I had to suspend, then restart to get to complete:

Collatz Conjecture 2.03 collatz (cuda23) collatz_2374834517528311212392_824633720832_1 02:20:44 (00:08:43) 4/12/2012 10:36:12 AM 4/12/2012 10:38:16 AM 0.01C + 1 NVIDIA GPU (d1) Reported: OK * MegaMan

Task 113014382 Workunit 49474699

STDERR report attached to task:

<core_client_version>7.0.25</core_client_version>
<![CDATA[
<stderr_txt>
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
based on version 1.2 by Gipsel
instructed by BOINC client to use device 1
Reading input file ... done.
Checking 824633720832 numbers starting with 2374834517528311212392
No checkpoint data found.
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 824633720832 numbers starting with 2374834517528311212392
Resuming from checkpoint ... done
needed 1759 steps for 2374834517679537690358
444630872738552 total executed steps for 824633720832 numbers

WU completed.
GPU time: 3771.35 seconds
Elapsed time: 3776.15
called boinc_finish

</stderr_txt>
]]>

I had one WU before I upgraded to the latest version of Boinc, and it also hung/stopped.

Task 112735111 Workunit 49376680

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
based on version 1.2 by Gipsel
instructed by BOINC client to use device 0
Reading input file ... done.
Checking 824633720832 numbers starting with 2374796169998708222312
No checkpoint data found.
Running Collatz Conjecture (3x+1) CUDA GPU application v2.01
based on version 1.2 by Gipsel
instructed by BOINC client to use device 1
Reading input file ... done.
Checking 824633720832 numbers starting with 2374796169998708222312
Resuming from checkpoint ... done
needed 1661 steps for 2374796170036590065311
422192759689365 total executed steps for 824633720832 numbers

WU completed.
GPU time: 4408.63 seconds
Elapsed time: 4434.02
called boinc_finish

</stderr_txt>
]]>

Claggy
Send message
Joined: 27 Sep 09
Posts: 288
Credit: 14,320,498
RAC: 0
Message 13766 - Posted: 12 Apr 2012, 18:10:24 UTC - in response to Message 13765.

I have noticed ALL my cuda23 WU's seem to stop processing after about an hour, but suspending then resuming the task after boinc has started another WU, I then come back and resume the hung WU later when the new cuda23 WU stops processing.

Both installed cards do the same thing.

Processor: 4 AuthenticAMD AMD FX(tm)-4100 Quad-Core Processor [Family 21 Model 1 Stepping 2]
Processor: 2.00 MB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm svm sse4a osvw ibs xop skinit wdt lwp fma4 page1gb rdtscp
OS: Microsoft Windows Vista: Ultimate x86 Edition, Service Pack 2, (06.00.6002.00)
Memory: 2.99 GB physical, 6.17 GB virtual
Disk: 76.69 GB total, 15.69 GB free
Local time is UTC -6 hours
NVIDIA GPU 0: GeForce 9800 GT (driver version 296.10, CUDA version 4.20, compute capability 1.1, 1024MB, 976MB available, 504 GFLOPS peak)
NVIDIA GPU 1: GeForce GTS 450 (driver version 296.10, CUDA version 4.20, compute capability 2.1, 2048MB, 1954MB available, 601 GFLOPS peak)
OpenCL: NVIDIA GPU 0 (not used): GeForce 9800 GT (driver version 296.10, device version OpenCL 1.0 CUDA, 1024MB, 976MB available)
OpenCL: NVIDIA GPU 1 (not used): GeForce GTS 450 (driver version 296.10, device version OpenCL 1.1 CUDA, 2048MB, 1954MB available)


Nvidia drivers 295.xx and 296.xx have a Bug, when a Monitor connected via the DVI (or HDMI) port goes to sleep the Cuda device becomes unavailable,
eithier connect via the VGA port, or set the monitor to never go to sleep and physically turn it off, or downgrade to 290.53 or earlier drivers, or upgrade to 301.xx or later drivers,

Claggy

Profile kmanley57
Send message
Joined: 1 Apr 12
Posts: 6
Credit: 32,623,691
RAC: 0
Message 13768 - Posted: 12 Apr 2012, 18:36:25 UTC - in response to Message 13766.

Did those 'fixes' several weeks ago before I start running any Wu from Collatz. So am hoping that is not it. I have a GPU monitoring program(EVGA PrecisionX) running on it so I can watch the usage drop off, then I go to my monitoring machine running BoincTasks and suspend and resume from it.

I had had a number of failed results on other projects that had me add the loopback connectors, which did not do anything anyway. This is the only project that has been giving this result of just stopping the WU. I may have to splurge and get an ATI card to place in there and run both types to see if it does it also.

On a side note one of the two cards does have monitors connected to both video connectors. I do not use them, so M.S. turns them off! But they are connected.

Profile kmanley57
Send message
Joined: 1 Apr 12
Posts: 6
Credit: 32,623,691
RAC: 0
Message 13770 - Posted: 12 Apr 2012, 18:54:52 UTC

I have enabled the monitors in M.S. I will see if that makes any difference.

Profile mikey
Avatar
Send message
Joined: 11 Aug 09
Posts: 3242
Credit: 1,693,887,027
RAC: 5,445,511
Message 13777 - Posted: 13 Apr 2012, 12:43:46 UTC - in response to Message 13770.

I have enabled the monitors in M.S. I will see if that makes any difference.


I am not sure it makes any difference, I use a 'dummy plug' which makes it think there is a monitor but there really isn't. Here is a link to make your own if you need it: http://www.overclock.net/t/384733/the-30-second-dummy-plug

Profile kmanley57
Send message
Joined: 1 Apr 12
Posts: 6
Credit: 32,623,691
RAC: 0
Message 13779 - Posted: 13 Apr 2012, 20:07:37 UTC - in response to Message 13777.

I am not sure it makes any difference, I use a 'dummy plug' which makes it think there is a monitor but there really isn't. Here is a link to make your own if you need it: http://www.overclock.net/t/384733/the-30-second-dummy-plug


No, I have a dummy plug on the one unused video connector. But by extending the desktop on to the other monitors resolved the cards/monitor going to sleep or whatever was done to them.

The other GPU projects I run did not need me to do this, but ?????

Profile mikey
Avatar
Send message
Joined: 11 Aug 09
Posts: 3242
Credit: 1,693,887,027
RAC: 5,445,511
Message 13786 - Posted: 14 Apr 2012, 12:23:55 UTC - in response to Message 13779.

I am not sure it makes any difference, I use a 'dummy plug' which makes it think there is a monitor but there really isn't. Here is a link to make your own if you need it: http://www.overclock.net/t/384733/the-30-second-dummy-plug


No, I have a dummy plug on the one unused video connector. But by extending the desktop on to the other monitors resolved the cards/monitor going to sleep or whatever was done to them.

The other GPU projects I run did not need me to do this, but ?????


Hey as long as it is working for you that is what counts.

1 · 2 · Next
Post to thread

Message boards : Windows : CUDA WU's errors


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.