Posts by Slicker

1) Message boards : News : Use at your own risk (Message 804)
Posted 17 hours ago by Slicker
Post:
The validate inconclusive result issue should be fixed now. For whatever reason, some windows apps are not reporting the elapsed time correctly and one of the validator rules is to make sure that the WU runtime is reasonable. For example, if a WU finishes in only 3 seconds, there's no way it is valid. So rather than get the elapsed time from the WU, it now counts on the run time reported by the BOINC client for the non-GPU windows apps. There were 87 such results that have now been revalidated and granted credit.
2) Message boards : Number crunching : Validation Pending? (Message 803)
Posted 17 hours ago by Slicker
Post:
The validate inconclusive result issue should be fixed now. For whatever reason, some windows apps are not reporting the elapsed time correctly and one of the validator rules is to make sure that the WU runtime is reasonable. For example, if a WU finishes in only 3 seconds, there's no way it is valid. So rather than get the elapsed time from the WU, it now counts on the run time reported by the BOINC client for the non-GPU windows apps. There were 87 such results that have now been revalidated and granted credit.
3) Message boards : News : Use at your own risk (Message 735)
Posted 7 days ago by Slicker
Post:
4 more validator versions later and results are now being granted credit.

Now to figure out how to resubmit all the WUs to be re-validated so they can get credit. MySQL runs out of memory if I submit all but submitting 1k and a time could take me weeks and in the mean time, others may turn in a valid result negating the original even thought it was valid.
4) Message boards : News : Use at your own risk (Message 723)
Posted 8 days ago by Slicker
Post:
I'm still trying to figure out why the WUs are getting marked as invalid when they are not.... I'm tempted to go back to the old validator source code and add in the new encryption as code since that worked OK. In the mean time, feel free to suspend or switch to other projects until I get this figured out.
5) Message boards : News : Use at your own risk (Message 600)
Posted 14 days ago by Slicker
Post:
The issue is that someone is trying to hack the output and their crap is causing the validator to crash. I'm trying to add code to filter their invalid results so it will stop crashing on every bad WU returned.
6) Message boards : News : Use at your own risk (Message 560)
Posted 22 days ago by Slicker
Post:
There's no join in Mudville!

The previous fix for the CPU credits wasn't working. Or rather, it didn't fix (and by fix I mean remove the stupid BOINC code that assumes all projects use floating point arithmetic). So, I commented out several hundred more lines of creditnew madness in both the validator.cpp and credit.cpp BOINC source code and then recompiled the server daemons.

I also manually changed the credits for the 204 WUs that were valid but not granted any credit.

If you run into a a problem, please provide the host id and result id as it makes it a lot easier to track down the problems. (Thanks, Conan for providing that info which led to this latest fix.)
7) Message boards : News : Use at your own risk (Message 559)
Posted 22 days ago by Slicker
Post:
> CUDA: NVIDIA GPU 0: GeForce GTX 1050 (driver version 390.48, CUDA version 9.1, compute capability 6.1, 1997MB, 1698MB available, 1862 GFLOPS peak)

Collatz only has OpenCL apps, not CUDA and from the description above, there's no OpenCL installed. Check out https://wiki.tiker.net/OpenCLHowTo
8) Message boards : Number crunching : Constantly no credits on Sieve v1.40 work units (Message 544)
Posted 24 days ago by Slicker
Post:
Fixed? Not fixed?
9) Message boards : News : Use at your own risk (Message 543)
Posted 24 days ago by Slicker
Post:
You think!


Yep, I do! Thanks for being so patient (cough, cough). ;-)
10) Message boards : News : Use at your own risk (Message 542)
Posted 24 days ago by Slicker
Post:
I added the opencl_ati_gpu plan class specifications for both i686 and x64 versions for Linux just in case the ati_opencl was the cause. In theory, any project can make up any plan class they want. I'm not so sure that works in reality. I had the plan classes listed as opencl_amd and those weren't working for windows apps. So, let me know if that solves the issue. If not, send me a private message with the host id so I can set the BOINC scheduler to log debug information. Then after you do an update, I can check the server log.

But, I hope the plan class change will fix the issue. The previous server version I was using didn't use the plan_class_spec.xml file and I just coded the plan class info in C++. This is supposed to be easier and not require coding, but it sure seems likes it's more work! (That, or I code faster than I write valid XML).
11) Message boards : News : Use at your own risk (Message 526)
Posted 19 Jun 2018 by Slicker
Post:
05/06/2018 18:42:54 | collatz | Tasks for CPU are available, but your preferences are set to not accept them
05/06/2018 18:42:54 | collatz | Tasks for NVIDIA GPU are available, but your preferences are set to not accept them
05/06/2018 18:42:54 | collatz | Tasks for Intel GPU are available, but your preferences are set to not accept them
05/06/2018 18:42:54 | collatz | New computer location: work

Notice the work location, is for PC with ATI 7790 GPU.
The projects\boinc.thesonntags.com_collatz dir continue to stay empty.


Enable sched_op_debug to get more detailed info. You will get more detail about it requesting work which may explain why it isn't getting work.

Have you installed the AMD OpenCL drivers? The ones installed by Windows will most likely be missing the OpenCL drivers.

If you still can't figure it out, what is the Host ID? (I'm not going to waste time wading though hidden hosts and search the log files for every computer you own)
12) Message boards : News : Use at your own risk (Message 525)
Posted 19 Jun 2018 by Slicker
Post:
I've had to comment out a bunch more of the creditnew code since the validation logic is utterly stupid when a project doesn't use FLOPS since all estimates (and also some of the work fetch logic and validation logic) use FLOPS measured on the device compared to estimated FLOPS for the workunit. When a project uses IOPS instead of FLOPS, those estimates can be off by an order of magnitude. When that happens BOINC thinks you are cheating (an outlier) so it won't grant credit even if the result was valid. Yes, it was logging that it was valid, then granting 0.0 credit and then changing the state back to inconclusive. I guess logic works different on the West Coast than the rest of the planet, as I would expect that a valid result _should_ actually get credit. I commented out all the "outlier" code, so it should work now. Let me know if it doesn't.

Credit has been granted to as many valid yet inconclusive results as I could find. It was a royal PITA to find the issue, identify the improper logic (specially since it is in code that they don't want projects to change) , and then search the validator log files containing the 60k results returned per day to find the 20 (that's 20, not 20k) that came from CPUs on average per day so I could manually grant the credit to them. I could only go back a few weeks since I don't archive the log files and once someone else completed the workunit, it is no longer on the server and the only history stored is about the person whose data was valid.
13) Message boards : Number crunching : Team creation (Message 511)
Posted 14 Jun 2018 by Slicker
Post:
I don't know if team.inc (a php include file) changed or whether the BOINC developers screwed up the creation of the database, but the make_team function in team.inc was failing due to 4 fields missing in the insert statement. After changing the database to add default values for total_credit, expavg_credit, expavg_time, and seti_id, it now works.
14) Message boards : Number crunching : Resource allocation (Message 425)
Posted 23 May 2018 by Slicker
Post:
Still: I set the resource share to 100 in Collatz project preference webpage, and can't get anything else than 30 in the boinc manager overview for Collatz


And did you set all other projects to 0 so that you don't get work from them? If you have three projects and all set at 100, then Collatz will only get 1/3 of the resource share.

Have you detached from the collatz project and then re-attached? It won't download work if you haven't done that since the server was replaced.

Turn on sched_op_debug in the options so you can actually see what BOINC is really doing when it requests work. The generic message is worthless for debugging issues.
15) Questions and Answers : Macintosh : Mac CPU task : too long to be true ? (Message 396)
Posted 17 May 2018 by Slicker
Post:
There's not much I can do to fix the estimates as BOINC assumes incorrectly that the project does floating point math. Once you finish a couple work units, the estimates should improve. I already set the "estimate is exact" flag on the application but it doesn't seem to have helped much.

I'll double check that the app is compiled with the correct optimization flags and that the symbols are stripped out.
16) Message boards : Number crunching : Resource allocation (Message 390)
Posted 17 May 2018 by Slicker
Post:
see https://boinc.berkeley.edu/dev/forum_thread.php?id=8257

and also

http://boinc.berkeley.edu/wiki/REC-based_scheduler
17) Message boards : News : Use at your own risk (Message 389)
Posted 17 May 2018 by Slicker
Post:
None of the WUs should ever end up as inconclusive because they are either valid or not. The validation is done within the WU. e.g. the CPU WUs doulble check every new "high" using a separate algorithm and if they don't agree, it fails. If they do, it should validate. There shouldn't an "inconclusive". I'm going to turn off the file deleter so that once I figure out what is going on I can re-validate the tasks so you should get credit.


Thanks for this Slicker,

Just had another WU that had validated (like my 1st one), disappear from my account list (just like my 1st one). This time it was on a Linux machine and ran for about 420,000 seconds.
Both were awarded Zero credit.
Have one still there (an inconclusive) and another still running.

I am glad that you are letting me post as I still have Zero RAC, so thanks for that.

Thanks for all your hard work.

Conan


The issue was because BOINC, being stupid as usual, was rejecting the WUs because it didn't like the FLOPS count. LOL. There are no FLOPS in Collatz. Only integer calculations. So, I edited credit.cpp and commented out all the stupid code and re-validated all inconclusive WUs.
18) Message boards : News : Use at your own risk (Message 372)
Posted 15 May 2018 by Slicker
Post:
Has it ever been a case where the validator has required a certain number of samples to get a pattern before granting credit to everyone?


Yes, that's the way the project started. But given that some hosts trash over 1000 WUs a day and their owners aren't smart enough to check them, there was a problem with people getting credit because there were so many failures that it would take months to get credit for a WU and that also meant months for the WU to remain in the database which increased the size which caused performance issues. MySQL works best when the entire database fits in RAM and since very little data is re-used in BOINC, the cache hits aren't the greatest so there's a lot of disk i/o if it doesn't fit in RAM. That, and the only way for it to work is for me to hard code all the parameters that you have in the config file since changing the sieve size. That would mean de-optimizing it so that it can run on the oldest and slowest GPU. That would be horrible for the new GPUs. They'd go from 99% utilization to 20% utilization with credit reduction to match.
19) Message boards : Number crunching : Optimizing the apps (Message 368)
Posted 14 May 2018 by Slicker
Post:
One way to check the speed on various settings without having to run the entire WU is to:

1. Copy the app, to a temp folder.
2. Copy the collatz config file to the temp folder but rename it to collatz.config
3. Copy a collatz WU file to the temp folder and rename it to in.txt
4. Run the WU for 15 minutes.
5. copy stderr.txt to stderr_test_N.txt changing N to a new number each time
6. delete the boinc_lockfile
7. delete the out.txt (probably won't exist unless the WU finished)
8. delete the checkpoint.txt file
9. delete the stderr.txt file
10. edit the config and try new settings
11. go back to step 4
12. compare the new stderr to the previous one and see which reports numbers in less time e.g. 1234567890 - 123 steps @ 1:03 vs 1234567890 - 123 steps @ 0:57

For GPU apps, you will also need to have an init_data.xml file in the temp folder to tell it which GPU type and number to use. You can copy one from https://github.com/BOINC/boinc/tree/master/samples/openclapp/INIT_DATA%20test%20files

Note that when changing the sieve size, it creates a new sieve file which will be re-used on subsequent runs so the time will be reduced by 1-2 seconds on subsequent tests with the same sieve size.
20) Message boards : News : Use at your own risk (Message 366)
Posted 14 May 2018 by Slicker
Post:
Now all is good! Thanks.

Barry


I re-ran the update versions which inserts the server records required for the scheduler to send the work. I'm not 100% sure why it got screwed up but I think it had to do with opencl_nvidia vs opencl_nvidia_gpu plan class stuff that happened last Thursday. Once again, more people weighing in on what might be wrong help me get it back on track faster. Thanks guys! I also found a bug in the BOINC error reporting from this so I'll be sure to forward that to the BOINC developers as well.


Next 20


©2018 Jon Sonntag; All rights reserved