MC2180 - Locking Up

1) Time/Date               11:15am 06/12/2012

2) Response time

3) Product                    MC2180

4) OS Version              01.34.0017

5) Clarify Case #         2605287

Have been troubleshooting this customer's issue for a while.
customer is having intermittent issue of devices locking up
Frequency is (out of 9 devices), every device locks up at least once every 5 days.
customer is using same software on these devices as he has been using on his other devices
which are MC3090, MC7090, and MC5574 without issue.

Talked to PM for MC2180 and engineers, they gave "FUSION_X_1.01.1.0.041R_CE60C.CAB" to try on customer's devices; which probably solved a lot of issues but lockup still occurs.

Have the customer install eMscript on the devices
He had a lockup this morning. Attaching eMscript log..
within the log, you can see very clearly he loses wifi.
Customer stated the user used the device .. no problem left the device in the cradle after the shift
in the morning was locked up.
customer stated in the morning the saw he did not have wifi connection to his network.
in fusion the radio was enabled, he tried to turn off the radio and turn it back on, he could not
so he tried to warm boot the device, which wiped out the device (acted like a cold boot.)


(Please note issue has occurred outside of the cradle 50% of the time .. because he only has 3 cradles..(backordered))
you can see in the log the device is not in a low voltage state.
you can see in line 258 -> on the issue occurs ...

I know all hands have been concentrating on the MC65, but I really need help analyzing this log.
I tried to get a couple of logs analyzed but having difficulty finding someone who has time; I tried myself with little comprehension.
Any assitance would be greatly appreciated.

David Meyer
Hi Glenn, It sounds like

Hi Glenn,

It sounds like this issue is not a lockup at all, but a connection issue.  Is that true? If so, you may want to transfer the case over the WID so they can help debug the connectivity issue.

Dave
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
can't determine if hardware

can't determine if hardware or software.. if application or wireless.
Don't know if wifi is a symptom or cause...
added that cab file because PM said they caught some leaks in fusion..
It did not resolve the issue ..

PM for MC2180 , engineer thought it was perhaps memory load issue ..
but then retracted that thought.. I am not sure if they have the experience of
reviewing these logs as ECRT, that is why I was hoping for clarification of analysis from the log.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


David Meyer
I looked at the logs, and if

I looked at the logs, and if emscript keeps logging, its not a lockup.  I think the customer needs to give more information to you.

The logs at the time you pointed out shows a loss of WiFi connectivity, and then a couple more messages, and then finally the device just suspends due to inactivity.  The device resumes much later and still isn't locked up.  The end user reboots eventually (way at the bottom of the log).

I am guessing that the application was frozen because it doesn't handle out of network or something like that, but I think you need to try and clarify with the customer what is really being seen, and what time the device was "locked up" and what can really be done with the device at that time.

Also, if it is really locked up, have the customer get an RTLog of the failure.  If it is really a WiFi issue, have the customer save Fusion logs as a start.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
Thanks Dave, this is helping.

Thanks Dave, this is helping.
Customer has reported some devices has fully locked up with white screen and some do not fully lock up ...sorry if i mislabeled this specific instance..
If you look at line 259 - 260 , there is a 5 minute gap.

As well at this time, the device is unattended but you can see DWCtlApp.exe launched .. which is datawedge..
I looked at the ResourseMonitor.mot file, by default the RTLog is enabled... don't know why it is not being generated in the folder. (does it make a difference for a MC2180 ce6 device?)

int iRTLogEnable    = true  // LOG RETAIL MESSAGES TO FILE IF THIS IS TRUE

I will try to get the customer to save the fusion log if device is responding.

Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
Customer has reported another

Customer has reported another incident of the issue.
the attached eMscript log does have the RTLog included.
customer states
" Ok just had an issue. The user started his shift. He scanned two pallets, put the unit in his pocket. Pulled it out a few minutes later and unit was totally dead. I could not get it to respond in any way. I had to pull the battery"

I asked what time the issue occurred and if there was any error message, etc..

" 2:17 right before the date change to 2009. The date changing to 2009 was me pulling the battery.
The screen was totally black, no matter what I did nothing would happen. I even held the power button on for 5-10 seconds, nothing. Until I pulled the battery"

As per customer this was a total lockup ..
you will see large amounts of times where the device is not getting an ip address even though  fusion does see the essid of the AP.
Dave can you review this log and advise. If this is definately a wireless issue, I will escalate to WID, need confirmation.

Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
previous posting was from a

previous posting was from a device that locked up yesterday.
This is from a device that was found this morning; total lockup

As per customer
"

I have another one. This morning when I came in to make sure all units were working. All were fine but one. That unit was unresponsive to anything I did except if I put it on the charger the light on the top saying it was charging would blink. I held the power button down for 5-10 secs and the unit cold booted(this was done @5:00am this morning). I have no idea if it locked up on a user or if it just locked up while sitting there as when I get in nobody is here.

Attached is the file

This unit has that update on it that you sent me

 

Rob

"

looking at the RTLog I believe this was occurring around the time of the lockup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
1, 2012-06-14, 05:06:30, 00000000h, 00000002h, RTLogInit: RTLog Reinitialized. 
2, 2012-06-14, 05:06:30, 00000000h, 00000002h, RTLogInit: RTLog Params:    Version=v11    State=Active    pCntrStruct=0xa0005000    pbRingBuffer=0xa0005778    pbRingBuffWrite=0xa00057d4    pbRingBuffWrap=0xa0024000    RingBuffSize=0x0001e888    MaxMsgLen=384    Attributes=0x4c8e    MaxClgBuffLen=0x00800000 
3, 2012-06-14, 05:06:30, 00000000h, 00000002h, RTLogInit: RTLog Started. 
4, 2012-06-14, 05:06:30, 00000000h, 00000002h, Windows CE Kernel for ARM (Thumb Enabled) Built on Apr  5 2011 at 17:47:22 
5, 2012-06-14, 05:06:30, 00000000h, 00000002h, +Amagansett: +OEMInit 
6, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit:AFTER  misc.dwMemorySize = 0x8000000. 
7, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: Coldboot 
8, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: dwNKDrWatsonSize = 0x20000. 
9, 2012-06-14, 05:06:30, 00000000h, 00000002h, +OEMInit: OALIntrInit 
10, 2012-06-14, 05:06:30, 00000000h, 00000002h, CPLD Ver = [1.0]  
11, 2012-06-14, 05:06:30, 00000000h, 00000002h, DebugSer: * 0x40700000 
12, 2012-06-14, 05:06:30, 00000000h, 00000002h, +OEMInit: OALTimerInit 
13, 2012-06-14, 05:06:30, 00000000h, 00000002h, +OEMInit: OALKitlStart 
14, 2012-06-14, 05:06:30, 00000000h, 00000002h, +OEMKitlStartup 
15, 2012-06-14, 05:06:30, 00000000h, 00000002h, WARN: Empty device ID buffer, using default 
16, 2012-06-14, 05:06:30, 00000000h, 00000002h, OALKitlStart, ETHKITL 
17, 2012-06-14, 05:06:30, 00000000h, 00000002h, -OEMKitlStartup(rc = 1) 
18, 2012-06-14, 05:06:30, 00000000h, 00000002h, +OEMInit: lcd_clock_update_init 
19, 2012-06-14, 05:06:30, 00000000h, 00000002h, +OEMInit: DisableSRAM 
20, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: GPDR0   =c003c0e3 
21, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: GPDR1   =00000000 
22, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: GPDR2   =00100000 
23, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: GPDR3   =00000002 
24, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: MDCNFG  =e000074d 
25, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: BSP_VERSION 34.00.0001 
26, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: Processor Vendor: Marvell 
27, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: Processor ClockSpeed: 624 MHz 
28, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: Processor Name: PXA32X-624MHz 
29, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: Processor Core: PXA32X 
30, 2012-06-14, 05:06:30, 00000000h, 00000002h, DeviceID: MOTOROLA MC2180_12058521120280 
31, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMInit: L2 Cache enabled!!! 
32, 2012-06-14, 05:06:30, 00000000h, 00000002h, -OEMInit 
33, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMGetExtensionDRAM: Memory size determined by monitor 0x8000000 
34, 2012-06-14, 05:06:30, 00000000h, 00000002h, OEMGetExtensionDRAM: MainMemoryEndAddress 0x87FE0000 
35, 2012-06-14, 05:06:30, 00400002h, 00410002h, RTLog: Actual CE-Core events initialized = 48 
36, 2012-06-14, 05:06:30, 00400002h, 00410002h, RTLog: RTLog_KernReportSysPostInit 
37, 2012-06-14, 05:06:30, 00400002h, 00410002h, OALIoCtlHalPostInit 
38, 2009-01-01, 00:00:03, 00400002h, 005d0002h, RTLog: RTLog_KernReportSysHalInitReg 
39, 2009-01-01, 00:00:07, 00400002h, 00a2000eh, Datalight FlashFX project for Windows CE 6.0 on the Motorola MPA2.0
40, 2009-01-01, 00:00:07, 00400002h, 00a2000eh,  
41, 2009-01-01, 00:00:07, 00400002h, 00a2000eh, Datalight FlashFX Pro v4.1a Build 1687FS-2
42, 2009-01-01, 00:00:07, 00400002h, 00a2000eh,  
43, 2009-01-01, 00:00:07, 00400002h, 00a2000eh, Copyright (c) 1993-2010 Datalight, Inc.  All Rights Reserved Worldwide.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Dave does these logs show any indication of the issue?

Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


David Meyer
This RTLog looks like a

This RTLog looks like a lockup on suspend.  The device starts suspending but doesn't not finish suspending.  I don't see any reason to suspect WiFi in this case.  You should work to find a reproduction scenario on this and then an SPR can be opened.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


David Meyer
My comment was for the first

My comment was for the first log.  The 6/14 logs don't show a true lockup, as you can see logs when the user put the device in a cradle, and it also responded to a Warmboot request (user did not cold boot -- just warm boot).
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
customer stated that when the

customer stated that when the device is in this "lockup state.."
he warm boots the device but the device cold boots in terms of all applications and settings are deleted.
i am going to ask the customer to change one device so it never suspends .. and send me the logs to see if his device still loses connection to his AP or not .
I have a device here on my desk with same OS, fusion and customer's app loaded.. cannot connect to his server.. but running his app and changed the suspend setting to 1 minute .. trying to reproduce which is now my biggest problem.. he has failures every day and I can 't duplicate it.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
Dave, I am losing ground

Dave,
I am losing ground here and the customer is being impacted..
another issue just occurred and the customer lost 2 hours of data collected.
Issue occurred while user was using the device .

(eMscriptSL02-06142012-1.zip)
" Had another issue
This time one of the units that was set to stay on all the time. The unit never locked up  but I could not get the radio to turn on even though it was not turned off. I was able to use the warm boot option from start button. It still did a cold boot.
I am starting to get a lot of pressure on these issues. The unit that just locked up had two hours of work on it that will now need to be recreated.
I’m attaching the logs as well as the wireless log.

Help"

I asked for reproduction steps, customer responded
"The user was just doing his work out in the whse. Device was acting normal. He went to sync the unit up and it would not. That was when he gave it to me. I tried to also it would not come on"

On this device I told the customer
to remove the setting to suspend the device.. told him to set to never suspend..
he said he set it at 10am. .. .. customer stated user uploaded data at 9:53, 10:04, 10:12, 10:22 into their database,  the resource log stopped logging 10:25 about 1:00pm customer tried to upload to his database, could not even though wifi radio was on..   could not activesync device .. , warm booted .. device cold booted.  


I asked
So basically it is batch scanning..
Then at some point send data wirelessly to your server?
Is that correct?
The issue did not occur until the user tried to send the data?
he replied "Correct. The wireless is always on. But we batch send.

I am not saying that the issue started when he went to send. It happened sometime between the last time he sent and when he just tried to send"

I can not nail down any reproduction steps .. this is the 4th device in less than 24 hours.. and i am not sure where to from here.

(eMscriptSL06-06132012.zip)
customer sent me another log, went over it step by step.. at 5:02 device was working..
customer did warm boot... (as i instructed him to do for all of his devices), no issues
customer opened radio, was connected customer synced to his database.. ok
left in the cradle around 6:15  .. 2 hours later customer checked the device it was unresponsive; had to warm boot which cold booted the device...

any suggestions would be appreciated.
I have not been able to duplicate issue on my device. i have his app loaded but do not have ability to log in to his server.. it seems to me issue may be related to uploading data to his server..
seems it happens some time after that occurs..
his database is an access database

Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
clarification about S06

clarification about S06 device last lockup..

Correct me if I am wrong,

For S06, you stated that you

  • came in to the office and verified the device was functional Correct
  • you warm booted the device Correct
  • you checked the radio, was enabled and you were connected to your network Correct
  • you submitted a sync command to your server The Server is always sitting there listening, When you press the Sync button on the handheld is what initiates the Sync
    (did you verify that the sync was completed successfully or was it still syncing before you placed the device in to the cradle?).. When I press the Sync I get a confirmation that it made the connection. It does not have to send information to get this
  • you placed the device in the cradle
  • the device was left for 2 hours before you realized that it was not responding The time that this unit was sitting there it was not in the cradle

 

If this is true, I would to see if you can repeat these steps and see if you can duplicate the issue again. If you can duplicate this every time… then we will start to focus on removing each step until issue is not reproducible any more..

I know you are busy working.. and I do not wish to impact you or your business, is this possible? I appreciate your help… The steps above are what I do every morning. I do not know why yesterday was any different. These units sit alone unattended often. What is happening has from what I can see so far is totally random. I understand what you are saying about recreating the steps but I don’t know what they are. Can the engineers not see anything in the logs?

Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Log in to post comments