MC7596 Lockup Issue

case 2051779 - multiple units locking up. This happens while the units are cradled or on the shelf or being used. This is an intermittent issue. Along with customer's app, customer is using App Center 941. OS WM 6.1 loader image 1.27.03.

TA sent eMscript log  of lockup 01/12/2010 @11:59 pm.
In log at that time (line 580)
foreground process shows  services.exe ,
foreground window shows "Notification Error [Dialog]"

Attaching eMscript logs, need help with the anaylsis.

George Dellaratta
Please define "lock-up"? 

Please define "lock-up"?  Does ActiveSync work?  What error dialog was displayed at the time of failure?
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
Attaching document from TA on

Attaching document from TA on testing performed
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Jarod Fox
From the reatil message log

From the reatil message log it appears that a call into shell32.exe caused a stack error ultimately causing ScheduleColdBoot.exe to crash

Fatal Stack Error, Terminating thread 86863000, pexi = 0d92fd24
Exception 'Data Abort' Thread=86863000 AKY=00001021 PC=804353b4 BVA=0d92fd4c
 R0=0d92fd5c  R1=80401c08  R2=0d930080  R3=00000180
 R4=0d930080  R5=80401c08  R6=0d930130  R7=86863000
 R8=0d930224  R9=00000004 R10=1b93fe90 R11=80402444
R12=0d92fd5c  SP=0d92fd5c  Lr=80435a20 Psr=6000001f
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XeClientCe.dll trust = 2
Data Abort: Thread=86863000 Proc=806a91f0 'shell32.exe'
AKY=ffffffff PC=8041c9fc(NK.EXE+0x0001c9fc) RA=00000001(???+0x00000001) BVA=0c000040 FSR=00000007
Exception 'Data Abort' Thread=86863000 AKY=ffffffff PC=00000000 BVA=0c000040
 R0=00000019  R1=e68b37aa  R2=0d93fb60  R3=00000000
 R4=0d93fb50  R5=80420d90  R6=00000001  R7=804218ac
 R8=00000000  R9=01a8f5fc R10=0d93fb0c R11=00000000
R12=806a8d40  SP=ffffffff  Lr=00000000 Psr=2000001f
RaiseException: Thread=85d0c808 Proc=806a96a0 'ScheduleColdBoot.exe'
AKY=00000401 PC=8040cd84(NK.EXE+0x0000cd84) RA=8040a0b0(NK.EXE+0x0000a0b0) BVA=00000001 FSR=00000001

if we had the kdmp file it might be possible to isolate further.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
Customer (Delta Airlines) is

Customer (Delta Airlines) is continuing to test..
customer stated he has only seen the device lock up in cradle
warm boot / cold boot recovers device
lockup constitutes the device totally unresponsive...

Software loaded on the device
launch pad - security software (same as appcenter - approves icons)
also app center - to disable phone keys and active sync
net motion
afaria client kicks off XCEScheduler.exe - (reboots device) - tries to remotely rebooting the device
schedulecold boot.exe  automatically cold boot at a certain time - was added after lockup occurred to reduce lockups

lockups can occur from 1 to 8 days  and customer has always seen it lockup in the cradle , times has varied
= = = =
customer has sent me another eMscript log. The lockup has been documented to occur 01/19/2010 @ 10:46pm (line 1259 - 1261) in resource.csv file.
customer sent picture of error message received on screen at time of lockup.
"Can not execute "\Program Files\ Afaria\XCEScheduler.exe""
I have requested the customer asked Afaria for more info on this app.
told customer to request info as to better description of what it does and if it makes a call to shell32.exe. .
Preliminary analysis of difference between a device that has issue and one that does not, (from a software loaded aspect is ) the device that locks up has AppCenter and Afaria Client loaded on it. (requested customer test with both application removed )
As well RIL version on the devices may be contributing factor.
While customer stated he will do this, he is asking if the eMscript log can be analyzed.
please see attached files
Please let me know if anything is showing to be the issue?
Below is comparison of 2 devices experiences lockups vs. no lockups
====================
has one device does not lock up , had customer read off programs listed in remove programs
MC7596-PZCSKRWA9WR - SN 8303520800509  - symboll manage class lib,  ilium screen capture,dal launch pad, ms.net cf20 enu string, dal ret component base name, dal btprint 1.0.2.1, dal roving agent 1.3.5, motorola spr 16841_wt40xx_ce50, delta airlines update battery status, delta in sched cold boot , F5 networks inc F5 SSL Agent.

= = =
has one deivce does lock up
MC7596-PKCSKQWA9WR - SN 9071520800535 - afaria client, dal launch pad, dal reset nmpw 1.0 , dal roving agent 1.3.5, dal btprint 1.0.2.0 stratix roving agent entries..
ianywhere afaria remote control, netmotion ability exe , delta in sched cold boot , ilium screen capture, motorola inc emscript, ms.net cf20 enu string, odyssey appcenter
====

Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
the error and log was

the error and log was generated after updating to latest RIL
FYI
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Anonymous (not verified)
Glenn, The retail message

Glenn,

The retail message logs also show two exceptions right before the failure:

eMscript TimeStamp: 1/19/10 10:45:22 PM
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XeClientCe.dll trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XeClient.Exe trust = 2
CertVerify: bde.dll trust = 2
Data Abort: Thread=86322000 Proc=806a92e0 'gwes.exe'
AKY=00001041 PC=00048a2c(gwes.exe+0x00038a2c) RA=00048a24(gwes.exe+0x00038a24) BVA=0e000001 FSR=000000f3
TLSKERN_NOFAULT set... bypassing kernel debugger.
CertVerify: XeInvMgr.dll trust = 2
CertVerify: \Temp\warmboot.exe trust = 2
+EnableKeybrd 0
Data Abort: Thread=874db730 Proc=806a8f20 'device.exe'
AKY=00000005 PC=03d1b490(ohci2.dll+0x0000d490) RA=03d1b364(ohci2.dll+0x0000d364) BVA=06000000 FSR=00000007

We would need to check if there are any .kdmp files in \windows\system\dumpfiles (make sure show all files is selected).  If you can grab those from a failure case that may give us a hint about what is going on here.

They would need to use remote file viewer (from visual studio) or similar to pull these files off as they are marked as system files.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Herbert De Jong
Glenn, you are mentioning

Glenn, you are mentioning that Appcenter is in the picture here, there is something specific with Phone devices needed to overcome a issue with Today screen
.
this is not mentioned very clear in admin guide of appcenter but resolved appcenter lockups in a couple of cases.

maybe it is worth a try.

1) You have to install 2.0 build 941 on a unit and let it create the default Appcenter.cfg file

 


That is because there are some more default settings created by this build 941

 

You recognize a CFG file of build 941 by the version number on top.


2)       Add the following in the CFG as well for Phone device WM6.1


additional 'Approved Window Titles' for Shell32

Program Shell32 ApprovedWindow *|DesktopExplorerWindow


For the Palm Treo, also add *|DesktopExplorerWindow to the approved

 

Read ALL WM devices for TREO

see the admin guide appcenter 941 

3)      add your own settings either manually or through the GUI



One of the new default settings in 941 build is the TodayScreenInterval
setting
and is explained in AppCenter PPC 2.0 Release Notes (Build 941).pdf


Version v2.0 Build 941

1. AppCenter now supports connectivity icons for Edge, 3G, HSDPA and EVDO. Previously only GPRS and 1XRTT were supported.


2. The new TodayScreenInterval setting allows control over previously hard coded behavior that fixes a Windows bug. On WM6 devices with phones, the phone app doesn't create its dialog windows until the Today screen displays. If an app such as AppCenter or Pocket IE is configured to start with the device, the Today screen often doesn't display and the green phone button then can't display the missing phone dialog. When TodayScreenInterval is non-zero AppCenter brings the Today screen to the foreground after the specified number of seconds. AppCenter only does this on WM6 devices with phones and only when the phone dialogs are missing. Unless the Today screen is approved, AppCenter should immediately send it back to the background. The default is 10 seconds. A setting of 0 disables this behavior.


3. AppCenter now disables the Today screen when it is running.


4. AppCenter now adds the version number to the config file.

Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
Can ECRT please look at this

Can ECRT please look at this eMscript Log.
The issue occurred between line 1454 & 1455 in the Resource.csv file.
This is the email from the customer.

There was no \windows\system\dumpfiles folder found on the device.

Here’s what I also noticed about the device in general:

-          The device will not attempt to Active Sync

-          AppCenter clock frozen at 3:29am  (01/30/10)

-          Touch Screen is non-responsive

-          The right Radio Status LED is blinking green

-          The HSDPA data connection indicates an active session according to the AppCenter indicator

-          The AppCenter signal strength indicator shows a strong (4 bar) signal

-          The AppCenter Battery status indicates that the battery is being charged, even with the unit outside of the cradle.

-          The MC75 will not power down in or out of the cradle.

-          In the charger, the center Power LED blinks then turns solid indicating that the device recognizes external power for charging.

-          The Keyboard is locked, as the blue and orange shift keys have no effect.

-          The MC75 will warm boot.

-          The last successful Afaria Connection occurred at 5:28:17 on 1/28/10.

David Dwyer

Sr. Systems Engineer

Stratix Corporation

Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


David Meyer
I took a look at the log, and

I took a look at the log, and nothing jumps at me.  The battery voltage, memory usage, CPU utilization, etc, all looked normal at the time of failure.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
Found this in the retailmsg

Found this in the retailmsg.txt

eMscript TimeStamp: 1/30/10 3:28:57 AM
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
Fatal Stack Error, Terminating thread 8697bbd0, pexi = 0d92ff24
Exception 'Data Abort' Thread=8697bbd0 AKY=00008021 PC=804353b4 BVA=0d92ff4c
 R0=0d92ff5c  R1=80401c08  R2=0d930280  R3=00000180
 R4=0d930280  R5=80401c08  R6=0d930330  R7=8697bbd0
 R8=0d930424  R9=00000004 R10=2193fe90 R11=80402444
R12=0d92ff5c  SP=0d92ff5c  Lr=80435a20 Psr=6000001f
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
Windows CE Kernel for ARM (Thumb Enabled) Built on Jul 19 2008 at 13:10:02
ProcessorType=0411  Revision=7
sp_abt=ffff1000 sp_irq=ffff0800 sp_undef=ffffc800 OEMAddressTable = 80409ab4
CPLD Ver = [3.11]
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
can you confirm that my

can you confirm that my assumption is correct ?

Is this the correct indicator of the device locking up

CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
Fatal Stack Error, Terminating thread 8697bbd0, pexi = 0d92ff24
Exception 'Data Abort' Thread=8697bbd0 AKY=00008021 PC=804353b4 BVA=0d92ff4c

Is there any way to know through the logs of determining if the software alone is the issue or if there is a combination of application causing the issue?
I am trying to get the customer to reduce complexity but they have been reluctant to this point.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Anonymous (not verified)
Glenn, The retail message

Glenn,

The retail message log again indicaties a fatal stack error around the failure time which is most likely to blame for the lockup.

What I would recommend is to start eliminating components (applications) one by one to see what is contributing to these stack errors and other exceptions the log shows.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Jarod Fox
It appears XCEScheduler is

It appears XCEScheduler is running 12 times in less than in a minute which seems kick of the fatal stack error.  Is there a reason it is running so many instances?

eMscript TimeStamp: 1/30/10 3:28:57 AM
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
Fatal Stack Error, Terminating thread 8697bbd0, pexi = 0d92ff24
Exception 'Data Abort' Thread=8697bbd0 AKY=00008021 PC=804353b4 BVA=0d92ff4c
 R0=0d92ff5c  R1=80401c08  R2=0d930280  R3=00000180
 R4=0d930280  R5=80401c08  R6=0d930330  R7=8697bbd0
 R8=0d930424  R9=00000004 R10=2193fe90 R11=80402444
R12=0d92ff5c  SP=0d92ff5c  Lr=80435a20 Psr=6000001f
CertVerify: \Program Files\Afaria\XCEScheduler.exe trust = 2
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Glenn Sobel
Thank you for all the

Thank you for all the responses.
I will point out the fact that the application it is 12 times in less than a minute to the partner and and suggest he contact that company for assistance to answer why ...
Thank you again.
Vote: 
Vote up!
Vote down!

Points: 0

You voted ‘up’


Log in to post comments