Posts by BobMALCS
log in
1) Message boards : Number crunching : Computation error ? (Message 18832)
Posted 1410 days ago by BobMALCS
Slicker,

Just to round off this thread. I've been playing around with the configuration files and the numbers I get confirm what you have stated.

Except for one oddity. If I specify anything from 0.01 to 0.99-CPU then everything works as stated. However, if I specify 1-CPU then BOINC drops one of my CPU tasks and runs a Collatz task instead and uses a lot more cpu time with no significant reduction in elapsed time. So rather than having 3 cpu tasks plus a Collatz task I get 2 cpu tasks and 1 Collatz task. Looking at the CPU usage figures confirms this.

Thanks for the help to everybody.

Bob
2) Message boards : Number crunching : Computation error ? (Message 18819)
Posted 1412 days ago by BobMALCS
I do not have an app_config for Collatz although I do have for another couple of projects. The implication is that I MUST have an app_config for Collatz. I have not deliberately limited Collatz in any way.

Looking at the client_state.xml file in the Collatz section I see the following (which I have not modified in any way).

<app_version>
<app_name>solo_collatz</app_name>
<version_num>604</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.010000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>

<flops>3870824289727.138200</flops>
<plan_class>cuda55</plan_class>
<file_ref>
<file_name>solo_collatz_6.04_windows_x86_64__cuda55.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>solo_collatz_6.04_windows_x86_64__cuda55.pdb</file_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cudart64_55.dll</file_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>solo_collatz_windows_x86_64_cuda.config</file_name>
<open_name>collatz.config</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>NVIDIA</type>
<count>1.000000</count>
</coproc>
<gpu_ram>48234496.000000</gpu_ram>
<dont_throttle/>
</app_version>

The highlighted lines imply to me (possibly totally wrongly) that a Collatz task requires up to 1 CPU. However it seems likely that BOINC applies the minimum value unless specified otherwise.

OK. I have no problem creating an app_config. However, if the above statements influence the CPU usage then they should be looked at.
3) Message boards : Number crunching : Computation error ? (Message 18816)
Posted 1412 days ago by BobMALCS
Zydor,

Read and understood your msg. Seems like a good plan to follow.

The CPU time puzzles me possibly through lack of knowledge. I thought that the CPU time specified for a task was merely an indication to BOINC for scheduling purposes and that the tasks took whatever CPU time they needed when running. I have allocated 3 cores (out of 4) for BOINC so there is 1 core free (as I understand it) for the GPU. So I run 3 CPU only tasks and 1 or 2 tasks (depending what is available) on the GPU.

My CPU is an Intel Core i5 Quad 2500K (Sandy Bridge) OC to 4.3GHz. It is 100% stable.

Bob
4) Message boards : Number crunching : Computation error ? (Message 18815)
Posted 1412 days ago by BobMALCS
Slicker,

Thanks for fixing it so quickly.

Bob
5) Message boards : Number crunching : Computation error ? (Message 18809)
Posted 1414 days ago by BobMALCS
Zydor,

OK. Have done so.

Will also be interested to see if there is any effect on the task duration and GPU load.

I run several projects so not quite sure when I'll get the requisite data.


Bob
6) Message boards : Number crunching : Computation error ? (Message 18807)
Posted 1414 days ago by BobMALCS
Slicker,

As it appears, at the moment, to possibly be a software problem, I'll leave the .xml file as it was until I hear otherwise. If there's any other info I can give you let me know.

Bob
7) Message boards : Number crunching : Computation error ? (Message 18805)
Posted 1414 days ago by BobMALCS
Zydor, Slicker,

FYI my GPU is an 'ASUS GEFORCE GTX 760 DirectCU II OC'. The GPU runs at 1097MHz. With Collatz the temperature runs at about 68/69C and the GPU at 99%. I have not altered any of the GPU parameters.

If I remember correctly I've had 4 tasks fail with this error and they had all been suspended just before the error. However, as noted above, several other taks have been suspended and resumed with no problem. I am well aware that this could be a difficult problem to resolve as it could be hardware, or software, or both. It is only a small numbr of tasks that fail; not enough to really concern me but enough for me to report them.

I am reluctant to modify the GPU parameters as it performs perfectly 99% of the time. But I have no problem playing with the .xml file.

Possibly an aside but I tried GPUGRID for a (short) while and found that at least 50% of the tasks failed due to some sort of error. This was probably due to the program doing its best to melt my GPU. The only solution seemed to be either cool it with liquid nitrogen or severely downclock the GPU, neither of which I was inclined to do because of the knockon effects on other projects and programs.

For the moment I'll follow Zydor's suggestion and modify the .xml file.

Thanks for the intererst and replies.

Bob
8) Message boards : Number crunching : Computation error ? (Message 18792)
Posted 1414 days ago by BobMALCS
The checkpoint doesn't always cause a problem though.

-------------------------------

Name solo_collatz_2370406018805725374831_824633720832_0
Workunit 918184
Created 7 Mar 2014, 8:13:24 UTC
Sent 7 Mar 2014, 8:18:46 UTC
Received 7 Mar 2014, 11:15:02 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 139595
Report deadline 14 Mar 2014, 12:18:46 UTC
Run time 1,029.48
CPU time 9.55
Validate state Valid
Credit 7,709.13
Application version solo_collatz v6.03 (cuda55)
Stderr output

<core_client_version>7.2.39</core_client_version>
<![CDATA[
<stderr_txt>
Collatz Conjecture v6.01 Windows x86_64 for CUDA 5.5
Based on the AMD Brook+ kernels by Gipsel
verbose=1
threads=8
items_per_kernel=22
kernels_per_reduction=8
sleep=1
Config: verbose=1 items_per_kernel=4194304 kernels_per_reduction=256 threads=256 sleep=1
Name GeForce GTX 760
Compute 3.0
Parameters --device 0
Start 2370406019630359095663
Checking 824633720832 numbers
Numbers/Kernel 4194304
Kernels/Reduction 256
Numbers/Reduction 1073741824
Reductions/WU 768
Threads 256
Using: verbose=1 items_per_kernel=4194304 kernels_per_reduction=256 threads=256 sleep=1
Suspending...
verbose=1
threads=8
items_per_kernel=22
kernels_per_reduction=8
sleep=1
Config: verbose=1 items_per_kernel=4194304 kernels_per_reduction=256 threads=256 sleep=1
Name GeForce GTX 760
Compute 3.0

Resuming at 2370406020427075529071
Using: verbose=1 items_per_kernel=4194304 kernels_per_reduction=256 threads=256 sleep=1

Highest Steps 1736 for 2370406019757317067903
Total Steps 439992266314100
Avg Steps 533
CPU time 20.5141 seconds
Total time 1262.9 seconds
10:13:24 (7140): called boinc_finish

</stderr_txt>
]]>

-------------------------------

Bob
9) Message boards : Number crunching : Computation error ? (Message 18784)
Posted 1415 days ago by BobMALCS
Ooops. Should be visible now.

Bob
10) Message boards : Number crunching : Computation error ? (Message 18780)
Posted 1415 days ago by BobMALCS
Is this computation error caused by something I can (potentially) fix or do I just live with it.

-------------------------------------

Name solo_collatz_2370392632320376678767_412316860416_0
Workunit 883154
Created 6 Mar 2014, 17:44:17 UTC
Sent 6 Mar 2014, 17:51:05 UTC
Received 6 Mar 2014, 18:55:56 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -1 (0xffffffffffffffff) Unknown error number
Computer ID 139595
Report deadline 13 Mar 2014, 21:51:05 UTC
Run time 321.95
CPU time 2.87
Validate state Invalid
Credit 0.00
Application version solo_collatz v6.03 (cuda55)
Stderr output

<core_client_version>7.2.39</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1 (0xffffffff)
</message>
<stderr_txt>
Collatz Conjecture v6.01 Windows x86_64 for CUDA 5.5
Based on the AMD Brook+ kernels by Gipsel
verbose=1
threads=8
items_per_kernel=22
kernels_per_reduction=8
sleep=1
Config: verbose=1 items_per_kernel=4194304 kernels_per_reduction=256 threads=256 sleep=1
Name GeForce GTX 760
Compute 3.0
Parameters --device 0
Start 2370392632732693539183
Checking 412316860416 numbers
Numbers/Kernel 4194304
Kernels/Reduction 256
Numbers/Reduction 1073741824
Reductions/WU 384
Threads 256
Using: verbose=1 items_per_kernel=4194304 kernels_per_reduction=256 threads=256 sleep=1
Suspending...
verbose=1
threads=8
items_per_kernel=22
kernels_per_reduction=8
sleep=1
Config: verbose=1 items_per_kernel=4194304 kernels_per_reduction=256 threads=256 sleep=1
Name GeForce GTX 760
Compute 3.0

Resuming at 2370392632903418489199
Using: verbose=1 items_per_kernel=4194304 kernels_per_reduction=256 threads=256 sleep=1
At offset 376992749827 got 1723 from the GPU when expecting 565
Error: GPU steps do not match CPU steps. Workunit processing has been aborted.
18:52:49 (8316): called boinc_finish

</stderr_txt>
]]>

-------------------------------------

Bob
11) Message boards : Number crunching : Problem getting tasks (Message 18679)
Posted 1429 days ago by BobMALCS
Requested update to project. Got following messages.

20/02/2014 18:28:30 | Collatz Conjecture | update requested by user
20/02/2014 18:28:40 | Collatz Conjecture | Sending scheduler request: Requested by user.
20/02/2014 18:28:40 | Collatz Conjecture | Requesting new tasks for NVIDIA
20/02/2014 18:28:42 | Collatz Conjecture | [error] No start tag in scheduler reply

Problem somewhere ?




Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.