Posts by Slicker

41) Message boards : Number crunching : Team creation (Message 511)
Posted 14 Jun 2018 by Profile Slicker
Post:
I don't know if team.inc (a php include file) changed or whether the BOINC developers screwed up the creation of the database, but the make_team function in team.inc was failing due to 4 fields missing in the insert statement. After changing the database to add default values for total_credit, expavg_credit, expavg_time, and seti_id, it now works.
42) Message boards : Number crunching : Resource allocation (Message 425)
Posted 23 May 2018 by Profile Slicker
Post:
Still: I set the resource share to 100 in Collatz project preference webpage, and can't get anything else than 30 in the boinc manager overview for Collatz


And did you set all other projects to 0 so that you don't get work from them? If you have three projects and all set at 100, then Collatz will only get 1/3 of the resource share.

Have you detached from the collatz project and then re-attached? It won't download work if you haven't done that since the server was replaced.

Turn on sched_op_debug in the options so you can actually see what BOINC is really doing when it requests work. The generic message is worthless for debugging issues.
43) Questions and Answers : Macintosh : Mac CPU task : too long to be true ? (Message 396)
Posted 17 May 2018 by Profile Slicker
Post:
There's not much I can do to fix the estimates as BOINC assumes incorrectly that the project does floating point math. Once you finish a couple work units, the estimates should improve. I already set the "estimate is exact" flag on the application but it doesn't seem to have helped much.

I'll double check that the app is compiled with the correct optimization flags and that the symbols are stripped out.
44) Message boards : Number crunching : Resource allocation (Message 390)
Posted 17 May 2018 by Profile Slicker
Post:
see https://boinc.berkeley.edu/dev/forum_thread.php?id=8257

and also

http://boinc.berkeley.edu/wiki/REC-based_scheduler
45) Message boards : News : Use at your own risk (Message 389)
Posted 17 May 2018 by Profile Slicker
Post:
None of the WUs should ever end up as inconclusive because they are either valid or not. The validation is done within the WU. e.g. the CPU WUs doulble check every new "high" using a separate algorithm and if they don't agree, it fails. If they do, it should validate. There shouldn't an "inconclusive". I'm going to turn off the file deleter so that once I figure out what is going on I can re-validate the tasks so you should get credit.


Thanks for this Slicker,

Just had another WU that had validated (like my 1st one), disappear from my account list (just like my 1st one). This time it was on a Linux machine and ran for about 420,000 seconds.
Both were awarded Zero credit.
Have one still there (an inconclusive) and another still running.

I am glad that you are letting me post as I still have Zero RAC, so thanks for that.

Thanks for all your hard work.

Conan


The issue was because BOINC, being stupid as usual, was rejecting the WUs because it didn't like the FLOPS count. LOL. There are no FLOPS in Collatz. Only integer calculations. So, I edited credit.cpp and commented out all the stupid code and re-validated all inconclusive WUs.
46) Message boards : News : Use at your own risk (Message 372)
Posted 15 May 2018 by Profile Slicker
Post:
Has it ever been a case where the validator has required a certain number of samples to get a pattern before granting credit to everyone?


Yes, that's the way the project started. But given that some hosts trash over 1000 WUs a day and their owners aren't smart enough to check them, there was a problem with people getting credit because there were so many failures that it would take months to get credit for a WU and that also meant months for the WU to remain in the database which increased the size which caused performance issues. MySQL works best when the entire database fits in RAM and since very little data is re-used in BOINC, the cache hits aren't the greatest so there's a lot of disk i/o if it doesn't fit in RAM. That, and the only way for it to work is for me to hard code all the parameters that you have in the config file since changing the sieve size. That would mean de-optimizing it so that it can run on the oldest and slowest GPU. That would be horrible for the new GPUs. They'd go from 99% utilization to 20% utilization with credit reduction to match.
47) Message boards : Number crunching : Optimizing the apps (Message 368)
Posted 14 May 2018 by Profile Slicker
Post:
One way to check the speed on various settings without having to run the entire WU is to:

1. Copy the app, to a temp folder.
2. Copy the collatz config file to the temp folder but rename it to collatz.config
3. Copy a collatz WU file to the temp folder and rename it to in.txt
4. Run the WU for 15 minutes.
5. copy stderr.txt to stderr_test_N.txt changing N to a new number each time
6. delete the boinc_lockfile
7. delete the out.txt (probably won't exist unless the WU finished)
8. delete the checkpoint.txt file
9. delete the stderr.txt file
10. edit the config and try new settings
11. go back to step 4
12. compare the new stderr to the previous one and see which reports numbers in less time e.g. 1234567890 - 123 steps @ 1:03 vs 1234567890 - 123 steps @ 0:57

For GPU apps, you will also need to have an init_data.xml file in the temp folder to tell it which GPU type and number to use. You can copy one from https://github.com/BOINC/boinc/tree/master/samples/openclapp/INIT_DATA%20test%20files

Note that when changing the sieve size, it creates a new sieve file which will be re-used on subsequent runs so the time will be reduced by 1-2 seconds on subsequent tests with the same sieve size.
48) Message boards : News : Use at your own risk (Message 366)
Posted 14 May 2018 by Profile Slicker
Post:
Now all is good! Thanks.

Barry


I re-ran the update versions which inserts the server records required for the scheduler to send the work. I'm not 100% sure why it got screwed up but I think it had to do with opencl_nvidia vs opencl_nvidia_gpu plan class stuff that happened last Thursday. Once again, more people weighing in on what might be wrong help me get it back on track faster. Thanks guys! I also found a bug in the BOINC error reporting from this so I'll be sure to forward that to the BOINC developers as well.
49) Message boards : News : Use at your own risk (Message 365)
Posted 14 May 2018 by Profile Slicker
Post:
None of the WUs should ever end up as inconclusive because they are either valid or not. The validation is done within the WU. e.g. the CPU WUs doulble check every new "high" using a separate algorithm and if they don't agree, it fails. If they do, it should validate. There shouldn't an "inconclusive". I'm going to turn off the file deleter so that once I figure out what is going on I can re-validate the tasks so you should get credit.
50) Message boards : News : Use at your own risk (Message 333)
Posted 13 May 2018 by Profile Slicker
Post:
OK guys, you have convinced me that something changed and it may not be on your end. At present, I don't think it was on mine either since I didn't upgrade the BOINC server code or anything else on the server since before Friday. So.... I now need to check if there are any Linus automatic security updates that may have screwed it up and/or anything else. As I said, I've never experienced a "code signing:" error before so I'm not sure what to do to fix it.

I am working on both 1.4 CPU and 1.4 CPU w/ intrinsic apps to reduce any cpu errors on older machines. Results are looking good, but that doesn't help those with GPU issues.
51) Message boards : News : Use at your own risk (Message 328)
Posted 13 May 2018 by Profile Slicker
Post:
the issue at the moment as far as I can tell is simply that no nvidia work units are being produced.


There's actually no such thing. The WUs can be sent to any platform (PC, MAC, Linux) and any plan class (cpu, nvidia gpu, amd gpu (a.k.a. ati) or intel gpu). Note that there are no CUDA workunits as there are no CUDA apps. Only OpenCL. The ATI, Intel, and nVidia apps are the exact same app (just renamed to allow for seperate collatz config files).

I've not seen the "old code signing key" error before so I'm not sure how one goes about fixing it other than to disconnect and then re-connect to the project which should get all new project info including any code signing keys.
52) Message boards : News : Use at your own risk (Message 327)
Posted 13 May 2018 by Profile Slicker
Post:
12.05.2018 23:43:09 | | OpenCL: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 391.35, device version OpenCL 1.2 CUDA, 3072MB, 2487MB available, 4111 GFLOPS peak)

12.05.2018 23:36:25 | collatz | update requested by user
12.05.2018 23:36:28 | collatz | sched RPC pending: Requested by user
12.05.2018 23:36:28 | collatz | [sched_op] Starting scheduler request
12.05.2018 23:36:28 | collatz | Sending scheduler request: Requested by user.
12.05.2018 23:36:28 | collatz | Requesting new tasks for NVIDIA GPU
12.05.2018 23:36:28 | collatz | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
12.05.2018 23:36:28 | collatz | [sched_op] Miner ASIC work request: 0.00 seconds; 0.00 devices
12.05.2018 23:36:28 | collatz | [sched_op] NVIDIA GPU work request: 82632.46 seconds; 1.00 devices
12.05.2018 23:36:28 | collatz | [sched_op] Intel GPU work request: 0.00 seconds; 0.00 devices
12.05.2018 23:36:29 | collatz | Scheduler request completed: got 0 new tasks
12.05.2018 23:36:29 | collatz | [sched_op] Server version 711
12.05.2018 23:36:29 | collatz | Project requested delay of 121 seconds
12.05.2018 23:36:29 | collatz | [sched_op] Deferring communication for 00:02:01
12.05.2018 23:36:29 | collatz | [sched_op] Reason: requested by project

I don't think it's a driver issue, as all nVidia GPU users report they are unable to receive work.

Regards
Senilix


The scheduler log is reporting: "received old code sign key"
Per the BOINC source code:
// if the client has an old code sign public key,
// send it the new one, with a signature based on the old one.
// If they don't have a code sign key, send them one.
// Return false if they have a key we don't recognize
// (in which case we won't send them work).
//

I think the latter is true since it isn't sending any work. Try disconnecting and then re-joining the project and see if that works. That should cause it to get all new info including any code signing keys.
53) Message boards : News : Use at your own risk (Message 321)
Posted 12 May 2018 by Profile Slicker
Post:
Same problem on a Windows box here.

11/05/2018 09:56:06 | collatz | Sending scheduler request: To fetch work.
11/05/2018 09:56:06 | collatz | Requesting new tasks for NVIDIA GPU
11/05/2018 09:56:08 | collatz | Scheduler request completed: got 0 new tasks

Updated the drivers
11/05/2018 08:59:55 | | CUDA: NVIDIA GPU 0: GeForce GTX 1080 Ti (driver version 391.35, CUDA version 9.1, compute capability 6.1, 4096MB, 3550MB available, 11974 GFLOPS peak)

11/05/2018 09:28:29 | | CUDA: NVIDIA GPU 0: GeForce GTX 1080 Ti (driver version 397.64, CUDA version 9.2, compute capability 6.1, 4096MB, 3550MB available, 11974 GFLOPS peak)


Other project's GPU tasks -- PrimeGrid & Amicable Numbers -- seem to be working OK.

Can we get the test app checkbox back? :)


No OpenCL version is listed in your post. It appears to still be a driver issue.

Enable sched_op_debug in the BOINC client options if you really want to know what BOINC is doing. Without it, the BOINC client tells you what you want to hear, not what it is actually doing. It may be asking for 0 seconds of work.
54) Message boards : News : New Windows 64-bit version (Message 320)
Posted 12 May 2018 by Profile Slicker
Post:
A new version (1.40) of the Windows 64-bit CPU application has been released.
55) Message boards : News : Linux Apps Avaliable (Message 293)
Posted 10 May 2018 by Profile Slicker
Post:
... it will be a few hours of work to dot the i's and cross the t's but I'll get it fixed ASAP.


Do 10 Mai 2018 09:11:29 CEST | collatz | Requesting new tasks for NVIDIA GPU
Do 10 Mai 2018 09:11:29 CEST | collatz | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Do 10 Mai 2018 09:11:29 CEST | collatz | [sched_op] NVIDIA GPU work request: 87940.65 seconds; 1.00 devices
Do 10 Mai 2018 09:11:30 CEST | collatz | Scheduler request completed: got 0 new tasks


No success.
Did you finish the changes?
If the app page shows recent numbers, nobody got a linux task so far.


What about the "_gpu" at the end of the app name?
Does it correspond to "<plan_class>opencl_nvidia_gpu</plan_class>"?

If you kept your naming convention
Old app: collatz_sieve_1.21_x86_64-pc-linux-gnu__opencl_nvidia_gpu
New app (expected): collatz_sieve_1.40_x86_64-pc-linux-gnu__opencl_nvidia_gpu
app page (expected): 1.40 (opencl_nvidia_gpu)

I'm not sure if this "_gpu" tells the scheduler to decide between cpu and gpu apps and answer a request accordingly.


I implemented the plan_class_spec.xml (see https://boinc.berkeley.edu/trac/wiki/AppPlanSpec#FieldsforGPUapps) with every combination of names (opencl_ati, opencl_amd, ati_opencl, amd_opencl, etc.) although prior to that, WUs were being returned successfully for all apps so people were getting work.
56) Message boards : Number crunching : Optimizing the apps (Message 286)
Posted 10 May 2018 by Profile Slicker
Post:
Thanks for the help. I have tried this config file and while I can see no real sign of increased performance, nothing has crashed yet either.


DId you tell boinc manager to re-read the config files to pick up the changes?


Yes I did. I also just tried bumping up the lut_size to 17 and no apparent change in performance, but no crashes yet either (knock on wood!)


FYI, bumping up the lut size on GPUs can slow down the processing. The goal is to find the lut size and cache that fit within the GPUs cached RAM. Anything larger is slower and anything smaller is slower. For example, on my laptop
s nVidia 970M, while it can do 1024 threads, it works best at 256 with a lut of 12 and a sieve of 28 or 29. It doesn't have the oomph to support lut 14 or sieve 31 withouth swapping memory and/or over heating such that it throttles itself back to a slower speed.
57) Message boards : News : Linux Apps Avaliable (Message 285)
Posted 10 May 2018 by Profile Slicker
Post:
Can't get work for my GPUs (NVIDIA and ATI) running on linux.
What makes me wonder is that the app page shows different naming patterns for windows and linux apps:

windows: (opencl_ati_gpu), (opencl_intel_gpu), (opencl_nvidia_gpu)
linux: (ati_opencl), (intel_opencl), (nvidia_opencl)

According to my backups, the old linux apps had the same naming patterns than the windows apps.


Thank you so much for pointing out the discrepancy. I was pulling my hair out trying to find the issue when it all looked OK. Given the way BOINC sets up the app names in folders with xml files, it will be a few hours of work to dot the i's and cross the t's but I'll get it fixed ASAP. Thanks again.
58) Message boards : News : Linux Apps Avaliable (Message 270)
Posted 8 May 2018 by Profile Slicker
Post:
Thank you Slicker!

However, my Linux doesn't download anything from collatz.
Tue 08 May 2018 11:53:03 AM JST | collatz | Sending scheduler request: To fetch work.
Tue 08 May 2018 11:53:03 AM JST | collatz | Requesting new tasks for CPU and NVIDIA GPU
Tue 08 May 2018 11:53:05 AM JST | collatz | Scheduler request completed: got 0 new tasks
Tue 08 May 2018 11:53:05 AM JST | collatz | No tasks sent


This repeats every two minutes. Its project directory is created (/var/lib/boinc/projects/boinc.thesonntags.com_collatz), but it contains nothing.

Why?


See Number Crunching FAQ.
59) Message boards : Number crunching : FAQ (Message 263)
Posted 7 May 2018 by Profile Slicker
Post:
Q: How come I'm not getting any work?
A: Your computer may already have enough work. Just because the boinc log says it requested work, it may have requested 0 seconds. The ONLY way to see what it really asked for is to enable sched_op_debug in Boinc Manager via the Options, Event Log Options screen.

If you want work for your GPU you need to have OpenCL drivers installed. The Windows drivers installed automatically by Microsoft may not contain the required OpenCL files. Try installing the version from the AMD, nVidia, or Intel web sites.

Check what preferences you have set for the Collatz project via the web site. You won't get work if you don't have it enabled.

Lastly, BOINC bases its calculations on how many floating point operations your computer can do. Unfortunately, Collatz only uses integers which causes the estimates to be way off. In addition, the GPU applications can run anywhere from twice as fast (older slower GPUs and Intel embedded GPUs) to hundreds of times faster. For example, it thinks my Android phone is 1/4 the speed of my i7 laptop when in reality, it is about 1/400 the speed.

Q: All the workunits have errors. What's wrong?
A: The Windows versions require the Microsoft C Runtime library. If you are running a 64-bit version of Windows, you will need BOTH 32 and 64 bit versions since BOINC will likely send you both even though the server has been set to prefer sending 64-bit apps to 64-bit operating systems.

[b]Q: When are the new apps going to be available for my computer?[b]
A: It takes about 40 hours to test each individual application to make sure it calculates correctly. That's 25 apps x 40 hours each for OS X, Windows, and Linux. So, it takes 1,000 hours to run through all the tests, and if there's a bug, start over. Since this is not my full time job just as crunching is not your full time job, I have limited time to spend doing it.
60) Message boards : News : Linux Apps Avaliable (Message 262)
Posted 7 May 2018 by Profile Slicker
Post:
Linux versions of the Collatz Sieve application are now available. This includes 32 and 64 bit versions of the cpu apps, ATI OpenCL, Intel OpenCL, and nVidia OpenCL apps.


Previous 20 · Next 20


©2022 Jon Sonntag; All rights reserved