Checkpointing
log in

Advanced search

Message boards : Number crunching : Checkpointing

Author Message
Rob.B
Send message
Joined: 30 Jul 09
Posts: 28
Credit: 11,805,038
RAC: 0
Message 308 - Posted: 30 Jul 2009, 19:25:46 UTC
Last modified: 30 Jul 2009, 19:36:36 UTC

Hi.

Just joined, must admit I'm a little dissapointed with the lack of checkpointing on CUDA. My system swapped out a Collatz CUDA WU at about 45%, when it restarted it reset to 0%.

I hope the CPU WU'scheckpoint OK as the single task I have queued is showing 44 Hrs as estimated run time, that's bound to get swapped out at some stage.

Additionally, the one CUDA WU I have completed (about 1hr 15m wall clock time) is showing only 3.3 seconds CPU (I suppose that could be true) and an claimed credit of 0.02. I hope I will be granted more than that for an hours worth of GPU, regardless of the CPU usage.

Details of low score WU: http://boinc.thesonntags.com/collatz/workunit.php?wuid=56890

Rob.B

Profile DoctorNow
Avatar
Send message
Joined: 12 Jul 09
Posts: 30
Credit: 102,805,175
RAC: 0
Message 314 - Posted: 31 Jul 2009, 6:01:32 UTC - in response to Message 308.
Last modified: 31 Jul 2009, 6:03:54 UTC

I hope the CPU WU'scheckpoint OK as the single task I have queued is showing 44 Hrs as estimated run time, that's bound to get swapped out at some stage.

No, the CPU tasks have no checkpoints yet. You can only let them run continuosly with the "leave apps in memory" activated, otherwise it always starts from scratch.

Additionally, the one CUDA WU I have completed (about 1hr 15m wall clock time) is showing only 3.3 seconds CPU (I suppose that could be true) and an claimed credit of 0.02. I hope I will be granted more than that for an hours worth of GPU, regardless of the CPU usage.

Don't worry, they give enough credits. ;)
Btw: look in the task ID of your WUs, there you can see the real runtime.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 2
Message 320 - Posted: 31 Jul 2009, 12:57:58 UTC - in response to Message 308.

Hi.

Just joined, must admit I'm a little dissapointed with the lack of checkpointing on CUDA. My system swapped out a Collatz CUDA WU at about 45%, when it restarted it reset to 0%.

I hope the CPU WU'scheckpoint OK as the single task I have queued is showing 44 Hrs as estimated run time, that's bound to get swapped out at some stage.

Additionally, the one CUDA WU I have completed (about 1hr 15m wall clock time) is showing only 3.3 seconds CPU (I suppose that could be true) and an claimed credit of 0.02. I hope I will be granted more than that for an hours worth of GPU, regardless of the CPU usage.

Details of low score WU: http://boinc.thesonntags.com/collatz/workunit.php?wuid=56890

Rob.B


Credit is awarded using:
total_steps_calculated / fixed_divisor = credit

The fixed_divisor is calculated using a 98.7 credit per day PIII 800Mhz machine running the stock application. I try to adjust the credit so what that machine claims is what gets awarded. So, WUs are getting between 72 and 84 credits each. So, 70+ credits for 3.3 seconds of CPU isn't too shabby.

71 minutes seems awful long for a 9800 card. You might want to try the 190.xx nVidia drivers. It only takes 10-12 minutes to run a WU on my 9800+ GTX card. If yours took 71 minutes, then either drivers or something else is slowing it down.

Liuqyn
Send message
Joined: 8 Jul 09
Posts: 26
Credit: 164,516,656
RAC: 0
Message 334 - Posted: 1 Aug 2009, 23:05:07 UTC - in response to Message 320.

agreed that is too slow, my 9600gt is doing about 21 minutes running the 182.50 driver on vista64. though will probably update it when I get home anyway.

Rob.B
Send message
Joined: 30 Jul 09
Posts: 28
Credit: 11,805,038
RAC: 0
Message 337 - Posted: 2 Aug 2009, 6:10:16 UTC - in response to Message 320.
Last modified: 2 Aug 2009, 6:15:44 UTC

Hi,

The 9800 completes in about 12 to 15 min but that was crunching GPUGRID at the time, Collatz was running on a 9400 in the same chassis, it's just that boinc seems to report both as 9800's. I have the flops set to the speed of the slowest card so that the 9400 does not get over run, and prevents CUDA scheduling issues.

Tried running the OP App, chaos. Should have checked the XML first, seems not to support CUDA. Ended up having to detach to get things back under control WU wise. Will fix the XML when I get chance and give it another go.

ROb.

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 2
Message 433 - Posted: 7 Aug 2009, 17:56:18 UTC - in response to Message 337.

Hi,

The 9800 completes in about 12 to 15 min but that was crunching GPUGRID at the time, Collatz was running on a 9400 in the same chassis, it's just that boinc seems to report both as 9800's. I have the flops set to the speed of the slowest card so that the 9400 does not get over run, and prevents CUDA scheduling issues.

Tried running the OP App, chaos. Should have checked the XML first, seems not to support CUDA. Ended up having to detach to get things back under control WU wise. Will fix the XML when I get chance and give it another go.

ROb.


Rather than return the info about each coprocessor, BOINC returns the info about the fastest CUDA coprocessor back to the server. So, the server is oblivious as to whether or not the other CUDA card will be overworked or not.

Valter
Send message
Joined: 28 Aug 09
Posts: 3
Credit: 6,848,946
RAC: 0
Message 789 - Posted: 31 Aug 2009, 8:43:55 UTC

Hi.

I have a 8400 GS. When I download a WU, it says it will take 43 hours to complete. Without checkpointing, this is impossible.

Now I see that you state it should take less than 1h30. Is my card too old for this project?

Thanks and regards,

Valter Aguiar
Brazil.

Providence Christian School
Send message
Joined: 13 Aug 09
Posts: 6
Credit: 151,193,621
RAC: 405,285
Message 810 - Posted: 31 Aug 2009, 18:09:33 UTC

Any progress on getting checkpointing to work? I seem to lose a lot of time due to lack of checkpointing.

Rob.B
Send message
Joined: 30 Jul 09
Posts: 28
Credit: 11,805,038
RAC: 0
Message 983 - Posted: 9 Sep 2009, 17:37:11 UTC - in response to Message 810.

Any progress on getting checkpointing to work? I seem to lose a lot of time due to lack of checkpointing.



Me too, please put some effort into sorting this out!

Profile Slicker
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 11 Jun 09
Posts: 2525
Credit: 740,580,099
RAC: 2
Message 984 - Posted: 9 Sep 2009, 18:31:13 UTC - in response to Message 983.
Last modified: 9 Sep 2009, 18:33:12 UTC

Any progress on getting checkpointing to work? I seem to lose a lot of time due to lack of checkpointing.



Me too, please put some effort into sorting this out!


I'm hoping it will work in v2.0 as all flavors (CPU, ATI, and CUDA) will use the same checkpointing routines.

In the mean time, you could set the boinc preferences to keep the app in memory. Unfortunately, that won't help if you reboot the machine though.

P.S. An 8400GS will do a WU in 2.3 hours as seen here


Post to thread

Message boards : Number crunching : Checkpointing


Main page · Your account · Message boards


Copyright © 2018 Jon Sonntag; All rights reserved.