Something unexpected with provisioning policies

Please Register or Login to post a reply

3 Replies

A Allan Herrod 5 years 9 months ago

Sean; OK, I have discussed this scenario at length with the engineering team and we now understand it and agree, in principle, that it is not exactly working in the most intuitive manner.  It IS however, working as designed.  But we think we might want to change it to work better in the future.  Let me explain what is happening now. The way you have your Smart Staging Policies chained together is by using inventory.xxx attributes.  There is nothing wrong with this approach, but it does have some drawbacks that are related to the behavior you are seeing.  By using this approach, each Policy after Policy #1 is dependent on the Policy just before it.  This means that if, for any reason, a Job is executed for a Policy in the middle, the next Policy in line will automatically fire because it will become Applicable due to the inventory.xxx attribute that it caused to become defined. When a Policy becomes Application and Non-Compliant for a device, a Job is sent to the Relay Server for that device and the Job file on the Relay Server is marked as being in the Delivered state.  When the Agent on the device checks in with the Relay Server and sees a Job file in the Delivered state, it renames the Job file on the Relay Server to change it to the Started state and begins executing it.  When it finishes executing the Job, it renames the Job file on the Relay Server to change it to the Completed state and uploads the Job log.  Once ALL Jobs that are going to be processed on a given check-in are completed, the Discovery Document showing the final state (set of device attributes) of the device is uploaded to the Relay Server. If a device is re-Staged when there is a Job on the Relay Server that is in the Delivered state, that Job remains on that Relay Server.  This is the case even if the device is deleted from the MSP Server.  That last part is where we think we may want to make an improvement - if the device is deleted from the MSP Server, it makes sense that any Jobs that are pending for that device should be removed from the Relay Server.  But any such change will have to be in a future MSP Server, which is quite some ways off at this point. Anyway, as it stands now, if the device is cleaned and then checks-in, the FIRST thing it does is look for and process any Jobs files that it finds on the Relay Server.  This is BEFORE it has sent up a new Discovery Document.  If there is a Job on the Relay Server for that device that is in the Delivered state, then the device will execute that Job.  If that was a Job for Policy 7 in a 10 Policy chain, then the result will be that once the Discovery Document is sent up, Policy 8 will become Applicable because that Job caused the inventory.xxx attribute required by Policy 8 to be set.  Hence the sequence picks up from Policy 8. Whether it will ALSO fire Policy 1 depends on how Policy 1 is defined and the initial state set by re-Staging plus the result of executing the Job for Policy 7.  If Policy 1 is driven SOLELY by Compliance (e.g. no special Applicability Rules), then it will likely fire AFTER the Discovery Document is sent up, which means that Jobs for Policy 1 and 8 might be sent together.  It is also possible that executing the Job for Policy 8 could cause Policy 1 to not be Applicable.  It all depends on how the Rules for Policy 1 are defined. If you re-Stage a device while there is a Job in the Delivered state on a device, there is basically NOTHING you can do to prevent that Job from being executed, aside from manually removing that Job file from the Relay Server.  In a future version of MSP, we may have the MSP Server remove the Job from the Relay Server when the device is deleted, but it is not really reasonable to ask the customer to go delete a device from MSP when they re-Stage that device.  So, while it DOES make sense to delete the Job when the device is deleted, that really is NOT the solution to this issue. What makes more sense is when you are intentionally re-Staging a device, that you should somehow instruct the MSP Agent on the device to fail any pending Jobs BEFORE cleaning the device.  That will ensure that after the device is cleaned, there will be no Jobs to execute until the MSP Server gets the new post-re-Staging Discovery Document and sends Jobs that are appropriate to the device's new post-re-Staging state. We are seeing more and more cases where customers are re-Staging devices to accomplish various restore and recovery scenarios.  We have already made some improvements in MSP in 3.3.1 to handle some such cases better.  We are discussing possible ways to allow a special re-Staging Profile to be defined which will remove any pending Jobs to make this kind if thing more reliable and intuitive.  Note that since Staging could be used for other things, such as to JUST change WLAN settings, any such behavior would have to be explicitly requested as part of a special Staging operation that was being used for such a purpose. Also, you might want to consider a more positive interlock on your Policies.  When you initially Stage a device, with the goal of kicking off Smart Staging, consider having an attribute applied via an Attribute Settings Object.  Say there is an attribute called "UserAttribute.Staging.FirstTime" which is set to "1".  In the Applicability Rule of Policy 1, explicitly test for that attribute having a value of "1". Then as part of the content delivered by Policy 1, change that attribute to a value of "0".  For all Policies after Policy 1, explicitly include a Rule that checks for "UserAttribute.Staging.FirstTime"  being a "0".  This could be done in addition to the Rule that checks for the inventory.xxx attributes.  The effect of this will ensure that no Policies after Policy 1 can possibly fire following re-Staging until after Policy 1 fires, because Staging will intentionally set the value of UserAttribute.Staging.FirstTime" to "1", thus preventing all other Policies from being Applicable. Now, realize that if a Job is already pending for Policy 7, then that Job WILL be executed.  There is no avoiding that.  But due to the change in Rules, Policy 8 will NOT execute since the re-Staging set "UserAttribute.Staging.FirstTime" to "1".  So, once the Discovery Document goes up, the MSP Server will send a Job for Policy 1, as desired.  Note, however, that when it gets to Policy 7, it will NOT send a Job because the Job previously executed will have made it Compliant. Also note that once Policy 1 has executed, both Policy 2 and 8 may become Applicable, because "UserAttribute.Staging.FirstTime" has been set to "0".  So, if you REALLY want to prevent any Job except the unavoidable one (the one that was on the Relay Server when you re-Staged) from executing out of order, then you need a better way to sequence Policies. If I were trying to do that I would have Policy 2 triggered by "UserAttribute.Staging.FirstTime" set to "0" AND "UserAttribute.Staging.Step" set to 2.  Then, in Policy 1, I would set "UserAttribute.Staging.FirstTime" to "0" and "UserAttribute.Staging.Step" to "2" and Policy 2 would check for both of those.  Policy 2 would also set "UserAttribute.Staging.Step" to "3" and Policy 3 would check for "UserAttribute.Staging.FirstTime" to "0" and "UserAttribute.Staging.Step" to "3".  This would provide a foolproof interlock if continued through all the Policies. As I mentioned previously, NOTHING (aside from a special patch from us to remove Job files from the Relay Server) is going to prevent a Delivered Job that is sitting on the Relay Server from being executed by the device following re-Staging.  But if you defined your Policy rules correctly, it IS possible to get the Policy chain back on track fairly automatically, with the out-of-place Job being the only one that was executed out of sequence. If you want to give the above a try and need help, contact me.  If you urgently need a patch to have the device delete Jobs, let us know and we can try and cobble something together. Allan

r roberto cottone 5 years 9 months ago

Sean, are you really sure that the application folder on the device was completely wiped and set back to default. Could you check that the folder \application\airbeam\pkg is empty. Any successful installed package on the device stores a package apd file into this folder. If an apd file exists it tells MSP that the package is installed on the device regardless of if the package components are on the device or not. Thomas

S Sean Wheatley 5 years 9 months ago

Quite sure yes although I didnt check each terminal. At first I thought I must have just been doing a clean and not a clean boot and blank. But I did check some units and the Application folder was empty except the standard folders and the ones I did check the pkg folder was empty. The device must have cleared down otherwise other policies would have started and subsequently registered compliant status. Its as if MSP kept a record of just this one package and remembered the device from the last time that it was staged. Maybe a clear down of the relay server would have helped but I give that a go next time. Its no longer an issue because I simply deactivated the last few policies and waited for all of the devices and earlier policies to catch up then re-activated the last few policies again. Just weird that the previously staged (and then cleaned) units jumped to the 7th policy when it should not have been applicable.