Collatz Sieve 1.10 Released for Windows
log in

Advanced search

Message boards : News : Collatz Sieve 1.10 Released for Windows

Author Message
Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 21084 - Posted: 19 Aug 2015, 16:03:28 UTC

While trying to port the new sieve app to OS X, I discovered a couple bugs, one of which was causing the 64-bit apps to fail. That has lead to releasing v1.10 for both 32 and 64 bit Windows.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 21085 - Posted: 19 Aug 2015, 16:07:19 UTC

Visual Studio did not catch the error "if (x = 1)" whereas the old version of the GCC compiler used by OS X knew it should be "if (x == 1)". There's a big difference between assigning a value and comparing a value. That version 11 of a non-free compiler doesn't catch that is a bit unsettling. Looks like new versions may be developed on Linux first from now on.

Corsair
Avatar
Send message
Joined: 16 Nov 09
Posts: 21
Credit: 819,511,128
RAC: 3,901,966
Message 21086 - Posted: 19 Aug 2015, 16:08:23 UTC

1.10 x64 intel_gpu working on i7 HD 4600 and i5 HD 4000
1.10 x64 amd_gpu working in AMD APU R7 & GPU HD7770

thanks.

when I'll got job for nvidia I'll revert with findings if any.
____________
over the sailor's grave doesn't grove grass.

Corsair

fzs600
Send message
Joined: 15 Nov 10
Posts: 1
Credit: 226,039,491
RAC: 83,953
Message 21087 - Posted: 19 Aug 2015, 16:14:10 UTC - in response to Message 21086.

19 Aug 2015, 16:07:15 UTC 19 Aug 2015, 16:11:38 UTC Terminé et validé 150.76 122.76 1,118.68 Collatz Sieve v1.10 (opencl_nvidia_gpu)


it is good

Corsair
Avatar
Send message
Joined: 16 Nov 09
Posts: 21
Credit: 819,511,128
RAC: 3,901,966
Message 21090 - Posted: 19 Aug 2015, 18:10:27 UTC - in response to Message 21086.

1.10 x64 nvidia_gpu working well too.
____________
over the sailor's grave doesn't grove grass.

Corsair

Profile Thomas Krajewsky
Avatar
Send message
Joined: 2 Feb 10
Posts: 13
Credit: 379,296,674
RAC: 0
Message 21096 - Posted: 20 Aug 2015, 13:38:51 UTC - in response to Message 21085.

Visual Studio did not catch the error "if (x = 1)" whereas the old version of the GCC compiler used by OS X knew it should be "if (x == 1)". There's a big difference between assigning a value and comparing a value. That version 11 of a non-free compiler doesn't catch that is a bit unsettling. Looks like new versions may be developed on Linux first from now on.


If your programming language is 'C' it's not an error, it's a feature of the Language! (But not a really good one)
But a warning should come from your compiler, that's true.

Are you doing exercises for the next 'International Obfuscated C Code Contest'? ;-)
http://www.ioccc.org/

Profile Skivelitis2
Avatar
Send message
Joined: 28 Mar 15
Posts: 17
Credit: 246,425,890
RAC: 1,188,860
Message 21099 - Posted: 20 Aug 2015, 14:56:17 UTC

v1.10 working well!

Intel HD Graphics
Driver: 10.18.10.3408
Win 8.1

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 21111 - Posted: 21 Aug 2015, 14:30:19 UTC

While it now runs OK, v1.10 seems is still having some validation errors due to it reporting a best number that is outside the range of the start and stop numbers. There must still be a bug in how it calculates the offset from the starting number. Looks like the port for OS X and Linux will need to be delayed until I fix the bug in the code. No sense deploying apps with known issues.

Anthony Ayiomamitis
Send message
Joined: 21 Jan 15
Posts: 48
Credit: 1,061,888,498
RAC: 10,080,968
Message 21115 - Posted: 22 Aug 2015, 21:04:51 UTC - in response to Message 21111.

It seems the problem is now localized to the lower boundary with the best solution being less. I had another such invalidation yesterday but the good thing is that this issue is quite rare (I will easily process a couple of hundred work units when I encounter this problem just once).

Rymorea
Send message
Joined: 14 Oct 14
Posts: 100
Credit: 200,411,819
RAC: 4
Message 21116 - Posted: 23 Aug 2015, 0:07:59 UTC

I do more then 1000+ Collatz Sieve v1.10 (opencl_amd_gpu) and (opencl_nvidia_gpu) only 14 ones Invalid. 12 AMD 2 Nvidia ones. It is reasonable results I think.
____________
Seti@home Classic account User ID 955 member since 8 Sep 1999 classic CPU time 539,770 hours

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 21125 - Posted: 25 Aug 2015, 3:12:59 UTC

It looks like there is a bug with the offset calculation in the kernel. That is a result of my changing it to work with multiple sieve sizes to keep GPUs from running out of RAM and crashing the video driver.

To fix it, I need to find an efficient way to do 64-bit addition or multiple 32-bit additions with carry logic in the kernel. Since I don't think OpenCL 1.0 supported 64-bit logic, that present a bit of a challenge which may result in different kernels running on older vs newer hardware.

Profile sosiris
Send message
Joined: 11 Dec 13
Posts: 123
Credit: 55,800,869
RAC: 0
Message 21128 - Posted: 25 Aug 2015, 15:28:58 UTC - in response to Message 21125.
Last modified: 25 Aug 2015, 15:29:15 UTC

It looks like there is a bug with the offset calculation in the kernel. That is a result of my changing it to work with multiple sieve sizes to keep GPUs from running out of RAM and crashing the video driver.

To fix it, I need to find an efficient way to do 64-bit addition or multiple 32-bit additions with carry logic in the kernel. Since I don't think OpenCL 1.0 supported 64-bit logic, that present a bit of a challenge which may result in different kernels running on older vs newer hardware.



I heard that one could test A+B >= 2^32 (both 32-bit uint) by (A>=-B) because "-B == 2^32 - B" in openCL (it's 2's complement). So this may be what you want:


uint4 addu4u4(const uint4 a, const uint4 b)
{
uint4 s; //sum
uint c; //carry
s.s0 = a.s0 + b.s0;
c = (a.s0 >= -b.s0);
s.s1 = a.s1 + b.s1 + c;
c = (a.s1 >= -b.s1) |
(c > s.s1); //for the case that a+b == 0xffffffff && c == 1
s.s2 = a.s2 + b.s2 + c;
c = (a.s2 >= -b.s2) | (c > s.s2);
s.s3 = a.s3 + b.s3 + c;
}


But still, using INT64 makes the code cleaner. I'm not sure if it's faster or slower, maybe depend on the hardware. And if my memory serves me right, openCL supports INT64 from ver 1.0. (FP64 is an optional feature in ver 1.0, though)
____________
Sosiris, team BOINC@Taiwan

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 21149 - Posted: 28 Aug 2015, 3:00:09 UTC

I have a working solution. Actually, I found a second bug in the code while testing the fix for the first one and it looks like it can now run hundreds of kernels per reduction if the GPU is fast enough. I'm having trouble testing on my laptop because when I get close to running at max speed, the GPU overheats and slows down. Looks like I'll need to start testing on the desktops instead.

The Intel GPU is 64-bit so it will handle the cl_ulong data type without emulation. I'm now running through all the test WUs using various config settings and on different plan classes on both 32 and 64 bit to make sure that the results are exactly the same. Linux unit testing starts tomorrow. If it works, I'll release 1.20 for best testing shortly.

Profile sosiris
Send message
Joined: 11 Dec 13
Posts: 123
Credit: 55,800,869
RAC: 0
Message 21150 - Posted: 28 Aug 2015, 3:47:31 UTC - in response to Message 21149.

I did some testing on my Intel GPU too because 'full adders' are also required in my new kernel.

using uint and carry : 3.3ms.
using ulong : 0.32ms.
It looks like my Intel GPU handles 64-bit integers very well.

code:




#define NLENGTH 16
#define NLOOPS 1024

__kernel void add32(
__global uint *g_result,
__constant uint* c_addVal
){
uint val[16];
uint add[16];
for(int i=0;i<NLENGTH;++i)
{
val[i] = g_result[i];
add[i] = c_addVal[i];
}
for(int i=0 ;i<NLOOPS;++i)
{
uint carryIn =0;
for(int j=0;j<NLENGTH;++j)
{
uint sum = val[j] + add[j];
uint carryOut = sum < val[j];
sum += carryIn;
carryIn = carryOut | (sum < carryIn);
val[j] = sum;
} //for(j)
}//for(i)
for(int i=0;i<NLENGTH;++i)
{
g_result[i] = val[i];
}
} //add32()

__kernel void add64(
__global uint *g_result,
__constant uint* c_addVal
){
uint val[16];
uint add[16];
for(int i=0;i<NLENGTH;++i)
{
val[i] = g_result[i];
add[i] = c_addVal[i];
}
for(int i=0 ;i<NLOOPS;++i)
{
ulong addRes =0;
for(int j=0;j<NLENGTH;++j)
{
addRes = val[j] + add[j] + (addRes>>32);
val[j] = convert_uint(addRes);
} //for(j)
}//for(i)
for(int i=0;i<NLENGTH;++i)
{
g_result[i] = val[i];
}
} //add64()


I tested the kernels in Intel kernel builder, both producing the same results.
My GPU is HD4000 in i7-3610QM.[/list]
____________
Sosiris, team BOINC@Taiwan

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 1
Message 21156 - Posted: 28 Aug 2015, 14:19:23 UTC

It looks like I may need to create a 64-bit kernel for GPUs that support 64-bits as a 10 time speed increase would put the Intel GPUs closer to the performance of the nVidia and AMD GPUs, at least for the Collatz project anyway.

Profile sosiris
Send message
Joined: 11 Dec 13
Posts: 123
Credit: 55,800,869
RAC: 0
Message 21158 - Posted: 29 Aug 2015, 3:34:13 UTC - in response to Message 21156.
Last modified: 29 Aug 2015, 3:45:14 UTC

*edited out
____________
Sosiris, team BOINC@Taiwan

Profile sosiris
Send message
Joined: 11 Dec 13
Posts: 123
Credit: 55,800,869
RAC: 0
Message 21159 - Posted: 29 Aug 2015, 3:44:34 UTC - in response to Message 21158.

I was suspicious about the result and went for deep analysis today. It turned out the 32-bit kernel stuck in the for loop but not the 64-bit one (weird!). They actually took the same GPU cycles in add-and-carry statements. As to half adders (no carry in, just a+b), 32-bit and 64-bit kernels have equal speed, too.

____________
Sosiris, team BOINC@Taiwan

Profile sosiris
Send message
Joined: 11 Dec 13
Posts: 123
Credit: 55,800,869
RAC: 0
Message 21171 - Posted: 1 Sep 2015, 10:10:04 UTC - in response to Message 21159.

I looked at the assembly code the opencl compiler generated for the AMD GPUs, and found that 64-bit additions had about the same amount of instructions as 32-bit ones, i.e. they are about the same speed. So ulongs might be better than Do-It-Yourself 64-bit integers in additions.
____________
Sosiris, team BOINC@Taiwan


Post to thread

Message boards : News : Collatz Sieve 1.10 Released for Windows


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.