Pull DP: Add Timeout to BITS Job to Eliminate Stuck BITS Jobs
I love Pull Distribution Points. Love, love, love, just love the little buggers. If they had cheeks you bet your bottom dollar I’d pinch them. The combination of dedupe, LEDBAT, and Pull DPs is just brilliance.
There is however, one major drawback. At scale (100s of Pull DPs) you are pretty much guaranteed that at any given time a handful of them are ‘stuck on stupid’. There will be a huge backlog of jobs waiting to transfer but the Pull DP is not downloading content and refuses to process new jobs.
In almost every case this boils down to BITS jobs gone wild. I don’t know if the PullDP is just losing track of the jobs but in such cases you can pull up the list of BITS jobs where it’s clear that the lights are on but nobody’s home. The list contains the maximum number of BITS jobs allowed but few/none of them are progressing or doing anything. Often troubleshooting individual jobs lead to some ‘transient’ error that’s been going on for days/weeks/lifetimes. Solution: whack the entire BITS job queue, cross your fingers, and hope that the PullDP figures it all out.
I’d like to recommend that ConfigMgr add a timeout to every BITS job. For extra points make it a user-configurable option on the site and/or DP level. If the job doesn’t complete (or maybe just doesn’t make any progress) over a period of time then whack the job and start again. If there’s multiple sources for the Pull DP then switch to another source. Yes, this will cause some in-situ content to be lost and redownloaded. I’ll take that over having to manually whack the whole BITS queue myself any day of the week. I’m perfectly fine taking a conservative approach with a large timeout (5/10/15/30 minutes) but at some point you need to cut your losses and try again so that the transfer can succeed.
(FWIW: I don’t see any reason this should not apply equally to the Client either which a PullDP essentially is.)
You got to know when to hold 'em.
Know when to fold 'em.
Know when to walk away.
Know when to run.