1) 10:58 AM/ 2/25/10
2) 3 Day
3) MC7090
4) WM 5.0 BSP 43
5) 2073532 I have a customer who has multiple MC7090 devices lockup up every time they suspend, they will not resume. Requires Warm Boot to recover. EmScript logs are attached, last lockup recorded at 2/23/2010 at 9:32 AM. Can someone take a look at these and see if they can tell what the issue is?
13 Replies
I'll try to look later on today but sounds like the next step here is enable RM logging. There is a variable in the script that toggles this behaviour. I also find it somewhat diffcult to run through some of the logs since they have time updates as part of their run. So I am hesitiant to read too much into the time gaps but I do agree it does seem to imply either a long sleep. Gene
Ok, The second set of logs to go along with the CaptureDump.exe kdmp file is attached. The report from the customer was this happened at March 8, 2010 at 3:00 PM. I looked at the Resource.CSV.bk.csv. I looked at line 855 and I see the device was rebooted. Even though the time is not set properly (time is 1/31/2005 6:56 and gets reset to 1/31/2005 0:00) the logging continues until line 860 where a time change sets it to 3/8/2010 at 15:01, which leads me to believe the boot at line 855 was what we are looking for. 855 is the cold boot. Before that there app TMSMobile is running but I see no CPU Hogging (CPU load is in 10-11% range), Memory Load is fine (Memory load is 29%) Looking at the last MemMap 36 everything is fine, they are running well below DLL Crunch. I still cant tell why the device would be suspending and not resuming.
I dont know if this makes a difference but they have 2 instances of TMSMobile.exe running on the ProcList and MemMap 36.
I am having trouble in understanding how one set of logs shows emScript was stopped (or device was not running) and the other set does. If the information we are giving is correct we may be looking at a couple of issues but right now I need to better understand when the process snap was captured in the second set of logs. There was no resource.csv to give me context. When you got some time, come on by and let's go through this. Gene
Gene, I have to get the second set. Dont have them yet so I cant say the second set shows it running or not. I will stop by when I do.
Ok. Also if the logs don't conclude much, we may want to look into them sending us a unit while it is in the failed state (and charged before shipping). I suspect, the terminal is not locked but just appears to be so.
Gene, Attached is the 1 log generated by the CaptureDump program. Can you check these out to see if they provide more info?
Why are all these directed to me? The forums are supposed to be for all to enjoy. :) Did we get confirmation to some of the other questions I posed? In any case, I looked at th dumps with UIDumpviewer.exe (google for it) and didn't see anyhting jump out as far a CPU time in device.exe. I did not look too much into the other process since I also saw emScript was running. This indicates the device was probably NOT locked up at the time the snaps were taken or the emScript logs would be more helpful. If this snap was taken during a lockup, we should wait the 30 mins and see what the emscript logs show. I also see that you did zip up a memory snapshot but do not want to read it until you can tell me what is was or when it was taken. In short, I need to understand if these were taken during a lockup and some help in looking into this.
To resolve the issue they are cold booting the unit during a "lockup". Lockup is defined as the device from the perspective of the end user suspends and will not resume. From what your saying looking at the capturedump it seems as if the device never fully suspends or gets locked up on resume. Its funny because the last "lockup" you looked at on the logs EMScript clearly stopped running. I am getting the EMScript logs from this unit and will check to see if at the time specificed by the customer the dump was taken of logging was halted.
CPU is not 100%, no obvious memory crunch, memory looks fine.
So it seems that the logging stopped at the time you said. This is normal to happen on a suspend but if they tried to resume and the tool didn't continue then this would indicate a hard lockup. We should confrim if thy suspended the device at that time or simply closed the tool down. If this was trulely a lockup, we has a dump app on the ECSG compass site that can help determine what is running during the lock up state. The tool needs to be run prior to the lock up. I also found the other items of interest in the resource.csv - The time of day reset upon the reboot. I would have thought the time would persist - When the time did set properly it was about 3+ hours after the logging stopped - for some reason the system seems to do a double boot prior to the time change (perhaps this is somethign they do upon startup). I also saw appcenter running could this be something it is doing? - The backup battery seems to drop to 75% while the main battery is on. I am not sure if this normal but leads me to believe that maybe they are not fully charged (just a guess). I would suggest trying out the dump app on the ECSG site as the next step after we get answers on the time reset and confimation, they are not killing emScript.
Have you looked at the logs? Do you have any specific questions regarding what you see?
At 2/23 at 9:30 all I see is a reboot. I dont see anything indicating a lockup other than it stopped logging. Can you determine looking around this specific time frame a reason why?