Posts by Slicker

1) Message boards : Number crunching : Task stuck uploading! (Message 3429)
Posted 30 Sep 2021 by Profile Slicker
Post:
The web site and boinc server run on the same machine.
The server status shows which boinc processes are running (https://boinc.thesonntags.com/collatz/server_status.php)
But in this case, using the latest boinc client is the issue since it includes the root certificate for LetsEncrypt dated 10/5/21 but not the previous one that us good until then. You could try and downgrade to an older boinc client but I don't have experience doing that without losing work.
2) Questions and Answers : Macintosh : Consistent "clBuildProgram() failed with error (-11)" after upgrading iMac to 10.15.7 (Catalina) (Message 3407)
Posted 2 Sep 2021 by Profile Slicker
Post:
Error code 11 is actually the error from Apple's OpenCL compiler, not BOINC. It occurs while OpenCL is compiling the OpenCL kernels. This is done before any calculations are done. Also, Collatz doesn't use floating point arithmetic. It only uses bit shifts and addition. Apple has had "glitches" with their OpenCL compiler in the past. I even submitted code to both Apple and nVidia years ago and the same error keeps popping up. If I disable optimization, it works fine but takes way longer to complete a work unit. Apple has also had issues where warnings were considered errors and some of the OpenCL optimizations are dependent on the version of OpenCL that is bring run. If the compiler identifies itself as OpenCL v2 vs OpenCL v1 then the kernel uses the v2 optimizations. If the compiler is only "kind of" v2 (like Microsoft's browsers were "kind of" HTML 4 or HTML 5 compliant) and an optimization isn't fully supported, it will fail to compile. On Windows and Linux, the display drive contains the OpenCL compiler so upgrading the nVidia driver installs the latest OpenCL compiler as well. But, Apple includes OpenCL as part of the operating system, so you really can't change it. You are stuck until a new version of OS X is released.
3) Message boards : Number crunching : Back-end broken (/tmp out of space?) (Message 3406)
Posted 2 Sep 2021 by Profile Slicker
Post:
The failed ones (mine at least) all fail within 2s.
This has happened ever time there has been a problem in the past.
Basically a bad set of jobs gets generated(?) and has to flush through the system (each one gets sent to 6 systems before it "dies").
Just be patient and expect a few very quick failures for a few weeks.


Correct. Very well said. I am trying go through all the workunit and result files and make sure there aren't any orphaned records in the database and vice versa.
4) Message boards : Number crunching : no credit - WHY?? (Message 3401)
Posted 31 Aug 2021 by Profile Slicker
Post:
A power outage which exceeded the amount of gas in the generator overnight cause a major problem. When it came back online, between all the user uploads, backups, data dumps for the stats, and the feeder timing out causing the work generator to once again create unneeded work, the disk ran out of space so it couldn't validate anything because it couldn't write to the database, write to the disk, or write to the log files. After taking a day off work today to fix everything, things are getting back to normal. It's going to take some time as I have to figure out which WUs and results are out of sync with the file system and there could potentially be hundreds of thousands of them. But at least it's back online and responding now. The validator backlog seems to be cleared up now.
5) Message boards : Number crunching : This project has just run off the rails (Message 3328)
Posted 12 May 2021 by Profile Slicker
Post:
Tap your heels three times and say I want to go home...
kidding....
If you have more details I will attempt to duplicate the issue.
6) Questions and Answers : Web site : Account Access issues (Message 3318)
Posted 3 May 2021 by Profile Slicker
Post:
I changed my email password. I forgot to change the phpmailer settings on the server. After a few days of no email being sent form the server, Google changed the security settings on my account as it decided that the Collatz server no longer needed access. So, even after updating the phpmailer configuration, Google still refused to forward the mail it received from the Collatz server. I finally figured out what they changed and changed it back. EMail whether for forgotten passwords or message board notifications, or whatever, should all be working now.
7) Questions and Answers : Windows : Due date before now (Message 3317)
Posted 3 May 2021 by Profile Slicker
Post:
Unfortunately, BOINC's time estimates are based on floating point operations per second (FLOPS) and Collatz apps only do integer math (IOPS) so the time estimates can be way off. The WUs run in a linear fashion, so if it takes 1 day to do 20%, then it will take 5 days to complete regardless of how long the BOINC client thinks it will take. The GPUs work much better for this project because stream processors on GPUs work really well when given a small simple equation to run in parallel. A PC's CPU has to be a jack of all trades in comparison, and in this project, pales in comparison to how fast the GPUs can work.
8) Message boards : Number crunching : Export Stats Files (Message 3316)
Posted 3 May 2021 by Profile Slicker
Post:
The export stats files may be stuck again?

The files in the stats directory haven't been updated since 08:52 this morning.


The stats get updated 3 times a day or every 8 hours. If requested for a competition that has specific start and end times, I will be happy to run it manually for those special requests.
9) Message boards : News : Email Fixed (Message 3315)
Posted 3 May 2021 by Profile Slicker
Post:
The email connectivity has been fixed.
10) Message boards : Number crunching : "Computational error" (Message 3282)
Posted 4 Apr 2021 by Profile Slicker
Post:
There are a number of possibilities:
1. Bad WU due to server running out of space on 3/20/21. The problem was rectified the same day, but with 734K workunit records and again as many files, and given the files are scattered across 1K folders, it's a bit of a needle in a haystack to find out if that is the problem since it requires custom code to investigate and with new WUs being created and others being deleted when finished, it's a moving target unless I shut down the project for extended periods of time.
2. Apple's OpenCL driver. They may have been the first to implement OpenCL, but they sure weren't the ones to actually adhere to the standards.
3. nVidia drivers. On more than one occasion, they have released drivers that improved performance in games or supported new GPUs at the cost of causing OpenCL errors with existing apps. They usually come out with a fix pretty quickly when that is the case.
4. BOINC client. The error description doesn't sound like that would be the case but there have been OS X issues in the last year that got by the testers and required new releases just for the MAC version.

You can check #1 easier than I since all you need to do is look in your BOINC data folder and if there are any WUs that are zero bytes in length, you'll know #1 is the issue since I would assume that BOINC would complain if it was assigned a WU but couldn't download the file. But assuming anything about BOINC usually gets me into trouble (like telling it to only send 64-bit apps to 64-bit machines and the BOINC server still sends 32-bit apps because it overrides the project admin's wishes just in case the 32-bit app runs faster than the 64-biit app.)
If anyone finds that #1 is the issue, let me know which WUs are causing the problem so I can regenerate the associated file to fix it.
11) Message boards : Science : How far has Collatz conjecture been computationally verified? (Message 3166)
Posted 13 Feb 2021 by Profile Slicker
Post:
Crunch3r is a moderator and a volunteer developer. He spent a lot of time helping with the initial application in order to tweak the compiler settings to get the most performance out of every app (both CPU and GPU) for each platform. He even sent AMD bug fixes for their compiler because their compiler created bugs. When all the GPUs switched to OpenCL, they ALL had bugs in their compilers. Apple was the worst. Code that ran under 32 and 64 bit Windows and Linux would fail under Apple. It wouldn't even compile the kernels. It reminded me of the way Microsoft's IE didn't adhere to HTML standards and never really has (until now when Edge is now chrome based).

You can't count on processing until a number is less than the starting number because with tens of thousands of work units being run simultaneously, you have no idea whether the "lower than your starting number" has actually been completed.

The goal of the project isn't to find the highest steps. the goal is to find a number that doesn't resolve. That's the reason for using 256 bit math. Just in case it overflows.

The project has used GMP. It's slow as molasses. That was version 1.0 of the apps.

Anything returned by the GPU is then checked by the CPU using a brute force, non optimized method to check the result. So your incredible speed won't ever be see in the app. In addition, the results are also checked by the server. There are also checks with total steps, average steps, high steps, and randomly checked numbers, all of which don't use ANY optimizations to assure that the results are valid.

Using smaller WUs just crashes the server. BOINC requires WUs use shared memory and be pre-loaded in order to send to clients. Unless I re-write the BOINC server code, there is a limit on how many work units can be cached in advance. BOINC also requires the use of MySQL which doesn't handle memory nearly as well as Oracle or SQL Server. As such, unless the entire database fits in RAM, performance goes down drastically.

If the work generator daemon's request of the number of work units available times out, the default action is to generate more work units. The result is that the next query will take even longer and it will create more work units. Eventually, you end you with hundreds of thousands of work units even though you didn't need any. That means you really CAN'T guarantee any number that is less than the starting number has been resolved.

Clients can abort work units at any time. And they do. So you can't expect a lower starting number to have been validated as they may have aborted the work unit.

A client may overclock the their GPU to the point where the output is garbage. Every once in a while, the garbage is so bad that the validator crashes attempting to read it. That causes the daemons to all stop working since it will be several minutes before the validator is automatically restarted and when it starts, it will just crash again. The offending work unit needs to be terminated in order for it to continue. I don't monitor the project 24/7. In just a few hours, the backlog can grow such that the database exceeds the RAM. When that happens, everything goes to shit.

You state that you can't use a sieve but then you choose to ignore certain numbers in your own app. Explain how that isn't the same thing. A sieve eliminates checking numbers that are know to resolve to 4,2,1. I don't claim that by using a sieve the max steps is always reported. That's because that is not the goal. The goal is to find a number that doesn't resolve. And, depending upon the sieve size used, some numbers with higher steps than have been found will be ignored. But they will resolve. So then it becomes a "do you want BOINC credits or to find the highest steps? If you eliminate all the numbers which are 2x of a known number that resolves, then it is guaranteed to resolve since x/2 will end up being only 1 step more. There are numerous papers written on how sieves CAN be used to reduce the numbers needed to be checked if trying to find numbers that don't resolve in 4,2,1.

Small work units also require more network bandwidth. Comcast has threatened to cut off my Internet due to use of too much bandwidth. Switching to another provider would cost about $24K/year. That's not going to happen since this is paid for by myself.

Lastly, until any application is run against a non-optimized 3x+1 or x/2 algorithm, the results can't be considered valid. That means a GPU app which takes 3 minutes to complete may take a CPU 2 weeks to complete. And, one valid work unit does not mean all work units will be valid. That's why it takes MONTHS to valid a new application. Do that for 32 bit and 64 bit for Win XP, Win 7, Win 10, Linux (multiple versions), and OS/X. At over $200/month for electricity just to test apps for several months, you can appreciate why, unless there is a major breakthrough, I don't stop everything to work on it.

That and I have a full time job as well as a number of hobbies and this is only one of them. Given the 70% reduction due to Covid-19, I'm not exactly excited to spend more on electricity. While I'm fortunate to not be unemployed, I don't have an infinite budget and since this is a hobby, I have no intentions to ever take donations to run it.

So, I would reiterate what several others have said. If you feel so deeply that this project is doing it wrong, start your own BOINC project and find out for yourself what it really takes for an app to work. If may be many times faster, but once you add in the BOINC overhead and bullet proof error checking, you may find it doesn't work nearly as fast as you expect.
12) Message boards : Science : Open source (Message 3145)
Posted 8 Feb 2021 by Profile Slicker
Post:
See https://en.wikipedia.org/wiki/Collatz_conjecture . All of the optimizations there have been included in the application. I've had a few people offer optimizations for snippets of the code. What many don't understand is that ideas such as "skip checkpointing and just make the files small enough finish in under 5 minutes" may sound like an obvious solution, right up until you have 70,000 computers asking for 2 days of work when each workunit only takes 10 seconds to run (so that a CPU workunit takes under 5 minutes). That results in millions of workunits a day and unless someone has a super computer and endless bandwidth to donate, it doesn't work. Been there. Done that. MySQL doesn't handle that kind of data well, especially not unless the entire database fits in memory. I've tried different sized workunits and that didn't work since the BOINC scheduler was always out of work for one or the other depending upon which people chose to implement. When competitions, such as the BOINC Pentathalon, begin people hoard workunits to get the credits during the competition. With multi-host verification, the database grows to an extent that it becomes slow as molasses resulting in the server daemons getting backed up for hours. If they get backed up too far, they just create more work. It's a catch 22. No work? Create more. Database is slowed down even more so the insert's are backed up and don't show up so the query times out. So, still no work? Create even more. The database is slowed even more. The next thing you know, you have created a million workunits all of which need verification and only 65K in progress. It will take weeks for them to get validated and in the mean time, people bitch because they haven't gotten credit for the work they've done. THAT's why this project worked very hard to come up with self validating workunits. Anything done via a high speed optimized methiod gets re-checked using TWO much slower non-optimized methods to guarantee the values are correct.

So, as much as I'd love to have the fastest app possible, the reality is that it has to be balanced with the BOINC client, and the way the BOINC server processes the results the way that the open source database can handle the data, and the ability to guarantee that the result is verified even though only one computer processed the result.

I've considered (and even started) porting BOINC to run with SQL Server but that would take several year's effort for a single individual and by the time it was done, it would be years behind the current version.

In reality, solving this mathematical conjecture will do nothing. It won't cure world hunger. It won't change the way kids learn math. It isn't related to any other math conjecture or theories that I know of. For $1200 ( the appox. value of the reward for solving it), if the reward still exists today, no one will get rich (or likely even pay for the power you used to participate in the project) compute a result someone disproves the conjecture, and since the only accepted paper so far is that it is impossible to prove, the only option left is to disprove. No one has done that for decades. In other words, it's a hobby as it has no know applications.

The application is not, nor will ever be open source. If you have questions about how something is done is particular, I'll be happy to answer it and even post code snippets. Due to the number of cheaters, I won't ever allow non-approved optimized apps. But, if someone would like to take a shot at creating a super fast version that is bullet proof, I'll be happy to provide the function input and output requirements. It will still take months to verify all the versions of the app (Win32, Win64, Win32GPU, Win64 GPOU, Linux 32, Linux64, Linux32 GPU, Linux64 GPU, OS X CPU, .you get the idea...)

I don't get paid to do this. I don't get paid to host this. I don't get paid to support this. I don't get paid for the servers that run this or the power to keep them running or the dis space required to back them up. Nor will I. It's a hobby. I won't pay you to do your hobbies, so I don't expect you to pay for mine.

NONE of the contributed apps or optimizations has ever run as well as it did stand alone compared to when run as a sub-process of the BOINC client. There's a fair amount of overhead having a process be controlled and respond to commands from the parent process, especially if the process has child processes (e.g. GPU kernels) that it also has to track and control. That's not to say that any version of the apps can't be improved. I'm sure every one can.

So, ask one question at a time and I will try and answer it. Post a novel and I'll likely ignore it That way I can also respond with "asked and answered" to those who are too lazy to read , have no reading comprehension, or are products of the current school system and only read the first sentence of any document before "Squirrel!" (My 6 year old German Shorthaired Pointer has better attention to detail than many people I know. Just saying...LOL)
13) Message boards : Science : How far has Collatz conjecture been computationally verified? (Message 3138)
Posted 7 Feb 2021 by Profile Slicker
Post:
The project started at the number it did because it picked up where the previous 3x+1@home project left off when it shut down.

Send me examples of CPU and GPU code which uses 256 bit integers that are still 4-6 times faster and I'll look into integrating it.
A little faster is meaningless as a standalone app does have to do internal verification (e.g. check the best result with two non-optimized algorithms which is known to make sure the output matches.

There's also constantly checking whether to checkpoint, write out checkpoint data so that it can be suspended and start back up later, check to see if BOINC wants it to quit, track percentage complete, track time remaining to finish, and do error or overflow checking with error logging. It also has to run on everything from an nVidia 8400GS with 16 stream processors and 64MB RAM to the latest AMD and nVidia GPUs. It also has to be able to cross compile on all platforms, work on both 32 and 64 bit machines, and be compatible with at least one encryption library that is guaranteed to encrypt and decrypt the same on all client platforms and the server.
14) Message boards : Number crunching : Export Stats Files (Message 3119)
Posted 5 Feb 2021 by Profile Slicker
Post:
I've been really busy trying to stay in business. Revenues are down 70% this year and I have had to add thousands of lines of code and storage space to allow e-signatures for my customers since the government has changed the requirements for submitting reimbursement requests. So, life has been a bit depressing and I haven't had a lot of time for my hobbies lately. Hopefully, with the vaccines and things starting to get back to normal, things will improve. So, I apologize if I haven't been as attentive to the project as I should be. I'll try to do better.
15) Message boards : News : Back Online (Message 2960)
Posted 29 Oct 2020 by Profile Slicker
Post:
We had an extended outage today thanks to Xfinity/Comcast. On the up side, they finished their repair 30 minutes ahead of schedule, but it still took about 4 hours. I don't know if the high winds today had to do with it or not, but everything is working normally again.
16) Message boards : Number crunching : Optimizing the apps (Message 2730)
Posted 26 May 2020 by Profile Slicker
Post:
I have no idea what that means and haven't tried using a one instead.


The 980 config you posted says "verbose=0".

Was it a mistake or is there a reason to set verbose to 0?

I'd guess mistake, so maybe it should be corrected to 1?


It is a copy from the old website before Collatz switched to the current one, so no it wasn't a typo or mistake on my part.
I run 2 GTX980's and they are doing just fine with the zero.

My point was I have no clue what "verbose" even means in this context so a 1 or a zero wasn't tested by me any more than the settings for my GTX1080Ti gpu's were tested by me. Jon posted the original post and then people said changing this or that setting to this or that was faster so I copied the new settings and pasted them here.


Verbose just logs more information which helps with debugging if the apps aren't behaving as expected.
17) Message boards : Science : Open source (Message 2603)
Posted 4 May 2020 by Profile Slicker
Post:
I've never released the source code for this project and do not intend to ever do so. There were people who were cheating which required encrypting the data returned to the server. Releasing the source code would make it easier for people to cheat again.
18) Message boards : Number crunching : Help: Adjust CPU usage for GPU (Message 2445)
Posted 29 Mar 2020 by Profile Slicker
Post:
Yes, the app_config is the place to make the changes.
The name field contains the app name.Note that the app name doesn't include the platform or plan class. At present, collatz_sieve is the only app name. Just change the cpu_usage to o.30. If you have already optimized the application for you GPU to get it close to 100%, then set the gpu_usage to 1.0.
<app_config>
<app>
<name>collatz_sieve</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>0.3</cpu_usage>
</gpu_versions>
</app>
</app_config>
19) Message boards : News : Choose Wisely (Message 2443)
Posted 26 Mar 2020 by Profile Slicker
Post:
I would like to urge participants to switch to Rosetta@Home or Folding@Home (non-BOINC application but also distributed computing) to aid with Coronavirus research until a vaccine has been developed and tested. While it would be nice to solve a decades old mathematical conjecture, it would be better to do research to save lives. Please consider it. Please note that Folding@Home has both CPU and GPU applications.
20) Questions and Answers : Web site : How far has Collatz conjecture been computationally verified? (Message 1946)
Posted 25 Sep 2019 by Profile Slicker
Post:
The WU in progress with the highest number is 6,415,879,838,212,923,850,752. However, that does not mean all numbers up to that have been checked. Each WU checks approximately 53 trillion numbers, but they aren't necessarily completed in order. When people abort them abandon the project, or they timeout or error out, they get resent to another computer. That sometimes happens half a dozen times before a WU gets completed. With 271K WUs in progress, it is impossible to give an exact answer.


Next 20


©2021 Jon Sonntag; All rights reserved