Collatz 4.xx CUDA and CPU Applications Released for Windows x86_64
log in

Advanced search

Message boards : News : Collatz 4.xx CUDA and CPU Applications Released for Windows x86_64

1 · 2 · Next
Author Message
Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16114 - Posted: 26 Mar 2013, 13:45:21 UTC

Two new v4.xx applications were released today, one for 64-bit Windows and the other for CUDA 5.0 for 64-bit Windows. These now contain internal validation which will reduce the "inconclusive" results and, once all platforms and plan classes have been upgraded, reduce the quorum to 1. Versions for Linux, OS/X, and 32-bit Windows will be released as testing on each is completed.

Profile Pooh Bear 27
Avatar
Send message
Joined: 1 Aug 10
Posts: 54
Credit: 108,227,920
RAC: 0
Message 16116 - Posted: 26 Mar 2013, 17:49:24 UTC

Curious as to why my CUDA takes CUDA50 for mini, but CUDA42 for regular? The new one is much faster and clearly my card can handle it, so what's the criteria on the server request to give me the slower application?

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16117 - Posted: 26 Mar 2013, 18:01:39 UTC - in response to Message 16116.

Curious as to why my CUDA takes CUDA50 for mini, but CUDA42 for regular? The new one is much faster and clearly my card can handle it, so what's the criteria on the server request to give me the slower application?


Is either machine Win32?

Profile Pooh Bear 27
Avatar
Send message
Joined: 1 Aug 10
Posts: 54
Credit: 108,227,920
RAC: 0
Message 16119 - Posted: 26 Mar 2013, 20:00:31 UTC - in response to Message 16117.

Is either machine Win32?

Single machine, single card.
121799
As you can see it is getting and validating CUDA50 for mini, and getting and validating CUDA40 for regular.

Profile Pooh Bear 27
Avatar
Send message
Joined: 1 Aug 10
Posts: 54
Credit: 108,227,920
RAC: 0
Message 16121 - Posted: 26 Mar 2013, 23:33:05 UTC - in response to Message 16119.

Now I got a Mini CUDA23. Something with the server side is not getting always getting the correct information.

zombie67 [MM]
Volunteer tester
Avatar
Send message
Joined: 3 Jul 09
Posts: 156
Credit: 612,751,766
RAC: 309
Message 16123 - Posted: 27 Mar 2013, 0:29:14 UTC

I am getting errors on my win7 64 bit machine with 8800 GT.

http://boinc.thesonntags.com/collatz/result.php?resultid=137495594

____________
Dublin, California
Team: SETI.USA

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16126 - Posted: 27 Mar 2013, 15:05:36 UTC - in response to Message 16121.
Last modified: 27 Mar 2013, 15:36:01 UTC

Now I got a Mini CUDA23. Something with the server side is not getting always getting the correct information.


You seem to have gotten both CUDA23 and CUDA50 mini_collatz WUs.
You see, the BOINC server assumes that people can't figure out which app to run. It also assumes that it is smarter than the people and can figure out which app they should be running. So, the default BOINC server setup will send 32-bit and 64-bit apps to 64-bit computers because some project admins haven't figured out that their 64-bit apps actually run slower than their 32-bit apps. (Sad, but true.) I believe BOINC does that with its GPU apps as well. There is no max driver version logic so while a machine with CUDA 2.3 can only run CUDA 2.3 apps, a machine with CUDA 5.0 can run either CUDA 2.3 or CUDA 5.0 apps. I deprecated the 31, 40, and 42 apps, but for some reason, BOINC keeps sending out CUDA23 apps. Given that most project have only 1 CUDA app, you probably don't run into it elsewhere. Also, for max compatibility, they probably have it compiled for CUDA 2.3 or 3.x so you never run into issues on those projects.

I don't want to remove the CUDA23 app yet since those who run into problems with the CUDA5 app may want to run the CUDA23 app via an app_info until the kinks get worked out.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16127 - Posted: 27 Mar 2013, 15:10:24 UTC - in response to Message 16123.

I am getting errors on my win7 64 bit machine with 8800 GT.

http://boinc.thesonntags.com/collatz/result.php?resultid=137495594


Can you edit the collatz.config file in the project folder and set "verbose=1" so that it will show more info in the output?

If the errors are due to the driver getting reset and or being overloaded, overheated, etc. you can try lowering the values of kernels_per_reduction
or items_per_kernel or both in the collatz.config until it runs no longer has errors.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16128 - Posted: 27 Mar 2013, 15:11:40 UTC - in response to Message 16121.

Now I got a Mini CUDA23. Something with the server side is not getting always getting the correct information.


Or the client isn't sending it. There are reported issues with the 7.x.x client and how it reports the driver versions for GPUs. I don't think that is the case here since you have received and run a couple mini WUs for CUDA5.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16129 - Posted: 27 Mar 2013, 15:33:34 UTC
Last modified: 27 Mar 2013, 15:34:02 UTC

If errors continue, try adding the following line to the collatz.config file in the project folder:
reduce_on_cpu=1

If you have to do this, please let me know. If enough people have problems, I'll make this the default.

Profile chip
Avatar
Send message
Joined: 8 May 11
Posts: 30
Credit: 41,295,305
RAC: 0
Message 16135 - Posted: 28 Mar 2013, 16:23:22 UTC

In 3.17 version CUDA50 minimal calculation time on GTX470 was with these settings:

items_per_kernel=21
kernels_per_reduction=10
threads=7

What has changed in the new 4.02 version? By default, config settings is:

kernels_per_reduction=8
threads=6

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16139 - Posted: 28 Mar 2013, 17:18:22 UTC - in response to Message 16135.

In 3.17 version CUDA50 minimal calculation time on GTX470 was with these settings:

items_per_kernel=21
kernels_per_reduction=10
threads=7

What has changed in the new 4.02 version? By default, config settings is:

kernels_per_reduction=8
threads=6


* About 70% of the code changed although the main logic still uses the optimization described in the Collatz Wikipedia page with a lookup table of a thousand elements (compared to 5 in the example on the wiki page)
* The kernels changed so that the CUDA and OpenCL kernels are as similar as possible so that enhancements to one can easily be made to the other.
* Verification was added so that errors are GPU calculations are identified as soon as possible which should reduce the load on the server as it no longer has to identify those and should improve overall processing speed as WUs that failed will be resent faster.
* The application has been modified so that multiple WUs can be run at one time. That means people can now use an app_config.xml rather than having to set up an app_info.xml or spend days doing trial and error on the app parameters/config to find the best settings.
* The default settings allow the app to work on older, slower GPUs without errors. Newer, faster GPUs can either change the settings in the config or use an app_config.xml to run multiple versions at once. Using too high a setting will cause the app to fail. Settings are GPU model dependent so the settings chosen are those that should work on the vast majority of GPUs.
* The CUDA applications internally contain code for compute 1.0, 1.1, 1.2, 2.0, 2.1, 3.0, and 3.5 so it will choose the kernel that best matches the GPUs capabilities taking advantage of the newer GPUs capabilities.
* The ability of the applications to do self-verification will allow the project to eliminate the quorum of 2 which will mean twice the work gets done in the same amount of time and that users will get credit as soon as the result is returned rather than have it be "pending" for days or weeks.
* The application uses a separate 256-bit large integer library to verify the results using the non-optimized method (without a lookup table).

Profile chip
Avatar
Send message
Joined: 8 May 11
Posts: 30
Credit: 41,295,305
RAC: 0
Message 16141 - Posted: 28 Mar 2013, 17:59:50 UTC - in response to Message 16139.

* The application has been modified so that multiple WUs can be run at one time. That means people can now use an app_config.xml rather than having to set up an app_info.xml or spend days doing trial and error on the app parameters/config to find the best settings.

Please, specify possible limits for items_per_kernel, kernels_per_reduction and threads settings.

* The ability of the applications to do self-verification will allow the project to eliminate the quorum of 2 which will mean twice the work gets done in the same amount of time and that users will get credit as soon as the result is returned rather than have it be "pending" for days or weeks.

You turn on adaptive replication mechanism?

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16143 - Posted: 28 Mar 2013, 20:54:30 UTC - in response to Message 16141.

If the GPU load is < 98% or 99%, my suggestion would be to use an app_config
(http://boinc.berkeley.edu/trac/wiki/ClientAppConfig) and use 0.5 or 0.33 for the gpu_usage and 0.01 for the cpu_usage rather than adjust the collatz.config settings.

* The ability of the applications to do self-verification will allow the project to eliminate the quorum of 2 which will mean twice the work gets done in the same amount of time and that users will get credit as soon as the result is returned rather than have it be "pending" for days or weeks.

You turn on adaptive replication mechanism?[/quote]

I may look into that, but I'm a firm believer in K.I.S.S. and most of the BOINC algorithms used to schedule, estimate times, determine error rates, etc. seem overly complicated (and therefore prone to error) to me. Since the WUs are not dependent upon each other, it really doesn't matter if a WU times out and gets resent or if a person only returns one in 10 good WUs. Or, it won't matter if a wingman isn't waiting for it to happen. At present, I cannot set the quorum to 1 because the 2.x and 3.x apps do NOT have internal validation and the numbers of errors are in the thousands every day. People just don't know about it until their wingman returns his and it is decided it is inconclusive requiring another person to confirm the result. With the new apps, people will know ASAP if about the errors.

zombie67 [MM]
Volunteer tester
Avatar
Send message
Joined: 3 Jul 09
Posts: 156
Credit: 612,751,766
RAC: 309
Message 16144 - Posted: 28 Mar 2013, 23:14:38 UTC - in response to Message 16127.
Last modified: 28 Mar 2013, 23:21:55 UTC

I am getting errors on my win7 64 bit machine with 8800 GT.

http://boinc.thesonntags.com/collatz/result.php?resultid=137495594


Can you edit the collatz.config file in the project folder and set "verbose=1" so that it will show more info in the output?

If the errors are due to the driver getting reset and or being overloaded, overheated, etc. you can try lowering the values of kernels_per_reduction
or items_per_kernel or both in the collatz.config until it runs no longer has errors.


I will do that. Do I have to restart BOINC for that to take effect? I ask because I have a couple XXL RNA tasks many days in with no checkpointing. So if I have to restart BOINC, then it will have to wait until they complete.

Edit: I changed it and ran some more tasks without restarting BOINC. Looks like the change was seen. Here is a task with verbose=1. What does it mean?

http://boinc.thesonntags.com/collatz/result.php?resultid=137647959

Stderr output

<core_client_version>7.0.56</core_client_version>
<![CDATA[
<message>
pȝ - exit code -1 (0xffffffff)
</message>
<stderr_txt>
Collatz Conjecture v4.02 x86_64 for CUDA 5.0
Based on the AMD Brook+ kernels by Gipsel
verbose=1
cpu=0
items_per_kernel=16
kernels_per_reduction=8
threads=8
sleep=1
solo=0
At offset 145 got 16777217 from the GPU when expecting 857
Error: GPU steps do not match CPU steps. Workunit processing has been aborted.
16:17:39 (6860): called boinc_finish

</stderr_txt>
]]>
____________
Dublin, California
Team: SETI.USA

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 16145 - Posted: 29 Mar 2013, 18:48:41 UTC - in response to Message 16144.

I am getting errors on my win7 64 bit machine with 8800 GT.

http://boinc.thesonntags.com/collatz/result.php?resultid=137495594


Can you edit the collatz.config file in the project folder and set "verbose=1" so that it will show more info in the output?

If the errors are due to the driver getting reset and or being overloaded, overheated, etc. you can try lowering the values of kernels_per_reduction
or items_per_kernel or both in the collatz.config until it runs no longer has errors.


I will do that. Do I have to restart BOINC for that to take effect? I ask because I have a couple XXL RNA tasks many days in with no checkpointing. So if I have to restart BOINC, then it will have to wait until they complete.

Edit: I changed it and ran some more tasks without restarting BOINC. Looks like the change was seen. Here is a task with verbose=1. What does it mean?

http://boinc.thesonntags.com/collatz/result.php?resultid=137647959

Stderr output

<core_client_version>7.0.56</core_client_version>
<![CDATA[
<message>
pȝ - exit code -1 (0xffffffff)
</message>
<stderr_txt>
Collatz Conjecture v4.02 x86_64 for CUDA 5.0
Based on the AMD Brook+ kernels by Gipsel
verbose=1
cpu=0
items_per_kernel=16
kernels_per_reduction=8
threads=8
sleep=1
solo=0
At offset 145 got 16777217 from the GPU when expecting 857
Error: GPU steps do not match CPU steps. Workunit processing has been aborted.
16:17:39 (6860): called boinc_finish

</stderr_txt>
]]>


add the line:
reduce_on_cpu=1

That should fix it. For whatever reason, certain CPUs "lose it" when doing the reduction. The part that is really strange is that is uses the sample reduction code from NVidia with the only changes being that instead of just adding a float to get a sum, it uses a quad-integer and gets the max steps, offset (e.g. counter) of the number with the max steps, and the total steps as a 64-bit integer. Why it works correctly on some GPUs but not others is beyond me. Then again, the same problem exists with the OpenCL version. Stranger yet is that the CUDA 2.3 version used the exact same reduction kernel, so it must have to do with the way the NVidia compiler optimizes the code for specific computer versions or something. Even after changing the app to use the CPU for the reductions, it will still use only a few seconds of CPU time total.

zombie67 [MM]
Volunteer tester
Avatar
Send message
Joined: 3 Jul 09
Posts: 156
Credit: 612,751,766
RAC: 309
Message 16146 - Posted: 30 Mar 2013, 0:08:27 UTC

That fixed it. Thanks!
____________
Dublin, California
Team: SETI.USA

BetelgeuseFive
Send message
Joined: 14 Nov 09
Posts: 26
Credit: 3,052,082
RAC: 1
Message 16148 - Posted: 30 Mar 2013, 8:51:08 UTC
Last modified: 30 Mar 2013, 8:52:36 UTC

Sorry, please ignore this message. The answer is already in this thread.

Tom


Hi there,

All my cuda50 tasks seem to fail and it seems I'm not the only one with this problem (check the following workunit):

http://boinc.thesonntags.com/collatz/workunit.php?wuid=60034060

The error message is:

At offset 1835041 got 16777217 from the GPU when expecting 839
Error: GPU steps do not match CPU steps. Workunit processing has been aborted.

Other cuda50 tasks fail with the same message (but different numbers).

Has anyone else noticed this ?
Is there a way to prevent receiving cuda50 tasks ?

Tom

BetelgeuseFive
Send message
Joined: 14 Nov 09
Posts: 26
Credit: 3,052,082
RAC: 1
Message 16154 - Posted: 31 Mar 2013, 9:30:41 UTC


Hi all,

I changed the collatz.config file and added "reduce_on_cpu=1 to fix the

Error: GPU steps do not match CPU steps. Workunit processing has been aborted.

problem.

This worked fine until I restarted BOINC.
The BOINC log showed:

31-3-2013 10:43:25 | Collatz Conjecture | Started download of collatz.config
31-3-2013 10:43:27 | Collatz Conjecture | Finished download of collatz.config

And the reduce_on_cpu line had been removed.

Is there any way to fix this permanently or do I have to wait for a new version of the clients ?

Tom

Claggy
Send message
Joined: 27 Sep 09
Posts: 288
Credit: 14,320,498
RAC: 0
Message 16156 - Posted: 31 Mar 2013, 11:37:30 UTC - in response to Message 16154.


Hi all,

I changed the collatz.config file and added "reduce_on_cpu=1 to fix the

Error: GPU steps do not match CPU steps. Workunit processing has been aborted.

problem.

This worked fine until I restarted BOINC.
The BOINC log showed:

31-3-2013 10:43:25 | Collatz Conjecture | Started download of collatz.config
31-3-2013 10:43:27 | Collatz Conjecture | Finished download of collatz.config

And the reduce_on_cpu line had been removed.

Is there any way to fix this permanently or do I have to wait for a new version of the clients ?

Tom

You don't have to wait for a new Client, Boinc is in constant development, the latest (7.0.59) is a release candidate, and is so much better than 7.0.28, Try it.

Claggy

1 · 2 · Next
Post to thread

Message boards : News : Collatz 4.xx CUDA and CPU Applications Released for Windows x86_64


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.