Posts by Vid Vidmar*
log in
1) Message boards : Number crunching : CUDA tasks erorr out after suspend / resume (Message 7625)
Posted 2821 days ago by Vid Vidmar*
6.10.43 seems to have fixed this problem for me (I haven't seen an error for quite some time). In my case, this is the only GPU project (on this cruncher), so no other task swapping besides CPU benchmarks is going on.
BR
2) Message boards : Science : Largest number checked? (Message 6842)
Posted 2850 days ago by Vid Vidmar*
I've been thinking about this search. Are we looking through numbers one by one, or do you do some kind of sieving beforehand (most obvious that come to mind are powers of 2)? If not, would it make sense to store numbers found during a search that are bigger than the starting number, so that they can be eliminated from later searches?
Just a thought.
BR

[edit]spelling and grammar[/edit]
3) Message boards : Number crunching : 2 WUs on the same ATI GPU (Message 6697)
Posted 2856 days ago by Vid Vidmar*
Guys, I think Gipsel has already stated somewhere that running 2 concurrent tasks on any card here [Collatz] doesn't provide any advantages over running 1.
Remember, this isn't MW where mem. utilization is negligible.


During one WU's finish and other WU's start GPU sits idle if only one WU is run concurently. Here that time is very short in proportion to WU. But on MW a delay of even 0.5s compared to 80+s of computation on fastest GPUs means >5% loss (and in truth, by watching GPUz that delay is even longer, so figures of 5-10% improvement seem quite right).
BR
4) Message boards : Number crunching : CUDA tasks erorr out after suspend / resume (Message 6395)
Posted 2877 days ago by Vid Vidmar*

Have you tried Boinc 6.10.x?, it doesn't just have ATI GPU support added, there are loads of other changes.

Claggy


Changes that aren't all for the better. I run 6.10.x on only one computer just because of two ATI cards in it.

[edit]shortened quotation[/edit]
5) Message boards : Number crunching : CUDA tasks erorr out after suspend / resume (Message 6391)
Posted 2878 days ago by Vid Vidmar*
From alpha mailing list:
I checked in:

- client: if suspending apps because of CPU benchmarks,
leave them in memory

-- David


Better than nothing, I guess...
6) Message boards : Number crunching : CUDA tasks erorr out after suspend / resume (Message 6331)
Posted 2883 days ago by Vid Vidmar*
The GPU is supposed to be unloaded when the benchmark is run but something may be interfering, or it is not coded correctly.


There isn't any code in the cpu, ati, or cuda apps specific to boinc doing benchmarks. Since we know that it can and does checkpoint and recover from checkpoints, my guess is that boinc is ignoring the fact that the kernel is running even though it is running as a critical section which is supposed to imply (afaik) that it cannot be interrupted. I'd be happy to try asomething else, but unfortunately, people want to run multiple cuda projects at once so it has to play nice with others and removing the critical section start/end calls allows other WUs to start before the current WU is done which often leads to a lack of GPU resources which causes both to crash.

What I find interesting is that I have never run into that problem with any of my three cuda cards. However, none are 200 series cards, none are X2s, and no machine has multiple GPUs. I'm also running the boinc 6.10.18 client verses Vid's 6.6.41. While there are literally hundreds of code changes that are different between the two client versions, it is unknown whether those would have any effect on this particular scenario.

Some projects still award credit according to cpu benchmarks. However, if I'm not running one of those projects, why do I need the benchmarks updated weekly? It really only needs to be done if the user changes the CPU percent, number of CPUs, or the processor changes because it would affect the scheduler. Other than that, why not set it and forget it, especially if it isn't needed and never changes?


This particular computer is an AMD 3500+ (single core) and the card is GF 9400GT, running only collatz on GPU and MW on CPU. I decided on 6.6.41 as it's the last without ATI stuff (I just need to run CUDA apps on this one) being stuffed in so rudely as it was done in 6.10+.

Considering benchmarks, my thoughts exactly. However, it's most unlikely this will ever change (in official releases anyway). And all it would require would be a server flag "<fixed_credit />" or something similar.
BR
7) Message boards : Number crunching : CUDA tasks erorr out after suspend / resume (Message 6320)
Posted 2883 days ago by Vid Vidmar*
I can't say that I've ever had that happen but I'm running the 6.10.xx boinc clients on the 3 different cuda cards I'm using. My guess is that the app thinks it has suspended all the tasks but boinc isn't smart enough to know that you can't cancel a GPU task while it is running a kernel. There is no choice but to wait for the kernel to finish. My guess is that Boinc gets impatient, ignores the fact that the kernel is running in a critical section, and just kills it.

What happens if you enable the "keep app in memory when suspended"? Does it work then or do you get the same results?


These were results with "leave app. in memory". I have this set to "yes" on every venue.
BR,
Vid
8) Message boards : Number crunching : CUDA tasks erorr out after suspend / resume (Message 6314)
Posted 2884 days ago by Vid Vidmar*
Hi.
As per suggestion I am posting this also here, as it may be a problem with collatz CUDA app. I have noticed, that after BOINC weekly benchmarks, CUDA 2.03 app bombs out with "too many restarts" error. In fact I was also able to find another occasion when a CUDA task got through suspend/resume cycle and failed to complete.

The log:
24.2.2010 10:38:39 Collatz Conjecture Computation for task
collatz_1266776455_119464_2 finished
24.2.2010 10:38:39 Collatz Conjecture Starting collatz_1266776455_155301_1
24.2.2010 10:38:39 Collatz Conjecture Starting task
collatz_1266776455_155301_1 using collatz version 203
24.2.2010 10:38:41 Collatz Conjecture Started upload of
collatz_1266776455_119464_2_0
24.2.2010 10:38:44 Collatz Conjecture Finished upload of
collatz_1266776455_119464_2_0
24.2.2010 11:10:51 Running CPU benchmarks
24.2.2010 11:10:51 Suspending computation - running CPU benchmarks
24.2.2010 11:11:22 Benchmark results:
24.2.2010 11:11:22 Number of CPUs: 1
24.2.2010 11:11:22 1961 floating point MIPS (Whetstone) per CPU
24.2.2010 11:11:22 3471 integer MIPS (Dhrystone) per CPU
24.2.2010 11:11:23 Resuming computation
24.2.2010 11:12:05 Collatz Conjecture Task collatz_1266776455_155301_1
exited with zero status but no 'finished' file
24.2.2010 11:12:05 Collatz Conjecture If this happens repeatedly you
may need to reset the project.
24.2.2010 11:12:05 Collatz Conjecture Restarting task
collatz_1266776455_155301_1 using collatz version 203
24.2.2010 11:12:46 Collatz Conjecture Task collatz_1266776455_155301_1
exited with zero status but no 'finished' file
24.2.2010 11:12:46 Collatz Conjecture If this happens repeatedly you
may need to reset the project.
24.2.2010 11:12:46 Collatz Conjecture Restarting task
collatz_1266776455_155301_1 using collatz version 203
24.2.2010 11:13:27 Collatz Conjecture Task collatz_1266776455_155301_1
exited with zero status but no 'finished' file
24.2.2010 11:13:27 Collatz Conjecture If this happens repeatedly you
may need to reset the project.
24.2.2010 11:13:27 Collatz Conjecture Restarting task
collatz_1266776455_155301_1 using collatz version 203
24.2.2010 11:14:08 Collatz Conjecture Task collatz_1266776455_155301_1
exited with zero status but no 'finished' file
24.2.2010 11:14:08 Collatz Conjecture If this happens repeatedly you
may need to reset the project.
24.2.2010 11:14:08 Collatz Conjecture Restarting task
collatz_1266776455_155301_1 using collatz version 203
24.2.2010 11:14:49 Collatz Conjecture Task collatz_1266776455_155301_1
exited with zero status but no 'finished' file
24.2.2010 11:14:49 Collatz Conjecture If this happens repeatedly you
may need to reset the project.
24.2.2010 11:14:49 Collatz Conjecture Restarting task
collatz_1266776455_155301_1 using collatz version 203
24.2.2010 11:15:30 Collatz Conjecture Task collatz_1266776455_155301_1
exited with zero status but no 'finished' file
24.2.2010 11:15:30 Collatz Conjecture If this happens repeatedly you
may need to reset the project.
24.2.2010 11:15:30 Collatz Conjecture Restarting task
collatz_1266776455_155301_1 using collatz version 203
24.2.2010 11:16:11 Collatz Conjecture Task collatz_1266776455_155301_1
exited with zero status but no 'finished' file
24.2.2010 11:16:11 Collatz Conjecture If this happens repeatedly you
may need to reset the project.
...
9) Message boards : Number crunching : Collatz as backup project (Message 6098)
Posted 2898 days ago by Vid Vidmar*
Hello all.
Recently, after long wait and a lot of b*****g about it, BOINC devs blessed us with the possibility of setting *true* backup projects. However this blessing requires also some server-side updates, to allow users setting resource share to 0, which tells BOINC CC that this is backup project. Currently the server changes / "corrects" resource share value of 0 to 100 (default), it does keep very small values > 0, however 0.00000...01 != 0, thus I have been unable to make collatz behave like true backup project (which, when MW is down again today, would be much appreciated). My question is: when can we expect this option to work here?
BR
10) Message boards : Number crunching : Can I crunch with my GPU? Please help me find out! (Message 5637)
Posted 2920 days ago by Vid Vidmar*
11) Message boards : Number crunching : HD5870 and New Opti Application (Message 5136)
Posted 2937 days ago by Vid Vidmar*
6.10.25 here, XP 64, 5870 and 9.12 drivers (no CCC), 2.07 app. running along with MW, working like a charm. Now, if the GPU FIFO rule was eliminated, I'd very much like to bump my cache here.
12) Message boards : Number crunching : Are you ready? (Message 4211)
Posted 2967 days ago by Vid Vidmar*
And yet again, all effort is being made to draw more and more people to run a broken piece of software, instead to fix the software first, under assumption, that average User Joe wouldn't even notice the software is broken in the first place. Sheesh! Are you sure you're working for UCB and not M$? :P
13) Message boards : Number crunching : Are you ready? (Message 4178)
Posted 2968 days ago by Vid Vidmar*
Rom, we still have the issue with BOINC only doing FIFO for ATI GPU work (don't know about nvidia) and not following the resource share.


For some reason I think that is by design. I don't remember why. Although this should be taken up in BOINC alpha.

I have, several times.

It was added as a work around to the problems Richard was reporting where SaH CUDA tasks were being started as soon as they completed download. In fact, some of them were started before initialization was complete. Several change sets were applied to eventually fix the two issues, one of which was initialization was not being done properly (result vs. task if I recall correctly) ... the other change was a minor fix to flags ...

{edit}
Oh, and other people have also noted the problem both on Alpha and I think on Dev.


And even more would (me included), if there wasn't such an attitude for outside suggestions on those lists. Why would I even try explaining my opinion about something that JM7 and DA made their minds as being the right solution? Got burned that way once (and a half), don't want to do it again.
BR
14) Message boards : Number crunching : The "I want to use THREE Gpus!" Question (Message 4155)
Posted 2969 days ago by Vid Vidmar*
Also, if you are using different GPUs, BOINC will use only the fastest one, unless you tell it to use them all via cc_config.xml.

http://boinc.berkeley.edu/wiki/Client_configuration


<use_all_gpus>0|1</use_all_gpus>

If 1, use all GPUs (otherwise only the most capable ones are used). New in 6.6.25


I think the criteria is not speed, but rather amount of (v)ram, since it uses both my 4870 1G and 5870 1G, without any need for that flag.
BR
15) Message boards : Number crunching : Are you ready? (Message 4154)
Posted 2969 days ago by Vid Vidmar*
Rom, we still have the issue with BOINC only doing FIFO for ATI GPU work (don't know about nvidia) and not following the resource share.


In fact, it's not even doing FIFO properly. I have a bunch (52) of collatz WUs on my ATI cruncher, since yesterday 19:22 UTC, but BOINC has been busy downloading and running MW WUs since before and after that time. Not a single collatz WU has been crunched so far. CC is 6.10.17, resource shares are: 1 collatz, 100 MW, connection interval is 0 and additional cache is 0.024 days (36 min).
Now, it makes me wonder, what will it take for those collatz WUs to start running, and when they do, will I see similar but reversed behavior? [shrug]
BR
16) Message boards : Number crunching : CAL 9.2 8.12 (Message 4004)
Posted 2973 days ago by Vid Vidmar*
My whole Pharm is running on CAT v9.10 except for the NVIDIA Cards of course & Client v6.10.16 ... I don't think I've ever seen a VPU Recovery running the Collatz WU's, see a lot running the MWay WU's though ...


Hey PB!
Have you tried the somewhat newer 9.11? It seems that this driver fixed crashes on MW with my 4870/5870 combo on XP 64bit. I know, what your feelings toward that project are, but give it a try, as I would be interested in your results.
BR


Seems like I have because I installed them on my i7 and another Box but if I remember right I got VPU Error's on them too with the 9.11 Drivers. I may try them without the CCC which I've tried before to but still got the VPU Crashes but whem Milkyway is giving out consistent work I may try it again.


Oh, yes, forgot to mention no CCC here, drivers only. And so far, it seems everything runs smoothly, even without any command line options.
BR
17) Message boards : Number crunching : CAL 9.2 8.12 (Message 3994)
Posted 2974 days ago by Vid Vidmar*
My whole Pharm is running on CAT v9.10 except for the NVIDIA Cards of course & Client v6.10.16 ... I don't think I've ever seen a VPU Recovery running the Collatz WU's, see a lot running the MWay WU's though ...


Hey PB!
Have you tried the somewhat newer 9.11? It seems that this driver fixed crashes on MW with my 4870/5870 combo on XP 64bit. I know, what your feelings toward that project are, but give it a try, as I would be interested in your results.
BR
18) Message boards : Number crunching : Proposed new BOINC credit system (Message 3424)
Posted 2994 days ago by Vid Vidmar*
Paul, I was really just kidding. I am also aware that such plans as awarding network bandwidth, storage and other resources were in past part of the plan and how, as time passed, things developed away from that plan. Being subscribed to boinc mailing lists I also see how an inner circle developed which is indeed very resilient to outside suggestions. When I once asked for information about some parameter usage, described what I intend to use it for and dared to make some remarks and suggestions, I was not only not been answered what I asked in first place, but told in a way, that my plan was stupid, and been ignored thereafter. In that aspect I must say I really admire your persistence and effort you put into it, even if I don't always agree with them. However the points and questions you presented regarding the new credit concept are right on spot. And yes, they made it through to boinc_dev mailing list.

BR,
19) Message boards : Number crunching : Proposed new BOINC credit system (Message 3416)
Posted 2994 days ago by Vid Vidmar*
Paul, thanks for your analysis.

You are welcome ...

I forgot to mention NCI projects though ...


Well those are trivial, aren't they? NCI == 0 computation == 0 credits, right? And these would be the easiest to normalize across platforms and across NCI projects. :^)

BR
20) Message boards : Number crunching : Specify which GPU(s) to use? (Message 3403)
Posted 2995 days ago by Vid Vidmar*
I think I saw a note somewhere that we may get a flag (probably in cc config) that will allow the user to say to not use GPU 0 ... at least for those that do not want to have desktop lag but who do have multiple GPUs and want to run BOINC on the others ...

NOt sure how that would work ... would be nice if you had the option to not run on GPU 0 when the system is in use but to run on the other GPUs ... and then if the system is not in use to run on all .. but who knows how UCB will implement this flag ... likely it will be a never run on GPU 0 only as that is the least useful way to configure ...


And ofc. UCB will make it (if ever) a global setting, not project specific. I for example don't have problems with Collatz using both, the 4870 and 5870, as it's stable enough, not to crash with VPU recoveries every 12 - 24h (in fact, I dont remember Collatz bringing my ATIs down even once), MW however has troubles with 4870 and 9.10 driver combination, so, I'd like to exclude it from running that project, which unfortunately isn't possible with 6.10.x line of clients. Anyways, now, that question is quite moot, as I'm leaving MW, prompted by yet another ingenious move by admins over there.
BR


Next 20

Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.