Augmented Reality Extended: Developing Enterprise Applications with Zebra Mobile Computers
Augmented Reality Extended: Developing Enterprise Applications with Zebra Mobile Computers
Learn more about this topic and others at Zebra's APPFORUM in Las Vegas, Oct. 1-2, 2019. REGISTER NOW
By Pat Narendra, PhD
Innovator, Emerging Technologies
Zebra Technologies
Summary
Zebra’s TC52, TC57, TC72 and TC77 are Augmented Reality enabled as introduced in the companion Zebra YourEdge blog. This opens a vista of new use cases across key enterprise verticals—retail, warehousing, manufacturing, transportation and logistics, and healthcare—that Zebra’s ISV partners and developers can leverage to grow their business. In this article we’ll introduce how to develop enterprise applications with augmented reality frameworks.
Zebra is working with its developer and customer ecosystems to help create impactful augmented reality (AR) applications for these markets. Zebra also offers webinars, APPFORUM sessions, and examples which illustrate Zebra verticals.
Smartphone AR – A Quick Review
Creating an AR experience in the “camera viewfinder” is to overlay virtual 3D objects in the field of view of the camera while projecting the virtual objects to the camera plane in real time and overlaying them on the camera video.
To do this, we need to:
- Develop a 3D model of the environment around the camera
- Track the camera pose (position and orientation) relative to this 3D model above.
- Anchor the virtual object to the 3D model in #1 above
- Render the virtual object to the camera plane every frame and superimpose on the camera feed.
The game changer in the recent smartphone AR frameworks is a mechanism to do #1 and #2 above with a single camera utilizing the device Inertial Measurement Unit (the combination of 3 axis accelerometer and 3 axis gyro). Let us briefly dive into how it works (caveat: this is a just an intuitive explanation and implementation on different platforms will of course vary).
Imagine we had two stereoscopic cameras with a known baseline (distance between them) and relative orientation. By matching corresponding scene elements seen by the two cameras, we can get a depth map to various matched points (i.e., a point cloud) in the scene, using triangulation. Since this point cloud is nominally fixed in space, we can estimate the relative change in position and orientation of the camera at each instant.
But how is this done with just one camera? The answer lies in the gyroscope (which measures angular rotation rate around 3 axes) and the accelerometer (which measures the linear acceleration in three directions) at > 1000 samples per second. Now, if we took two video frames separated by, say, 1/10 of a second (3 frames apart), and assuming the camera has moved a few centimeters laterally in the meantime and did the stereo matching as above, we could get the depth map, right? But what about the unknown baseline (the distance between the two camera views) and orientation differences? Simplistically, we can reconstruct this baseline and alignment between the two looks using the accelerometer data (100 samples over the .1 second) (integrated twice) and gyro (integrated once). This is precisely why, in the beginning of every smartphone AR session, you are instructed to slowly wave the phone (in lateral motion) pointing at the ground and other objects until you get feedback that the device is discovering and mapping its environment (such as the ground plane). This is also why each individual phone needs to be precisely calibrated to account for the placement of the IMU sensors and the camera relative to one another.
Why is specific AR enablement necessary to support AR applications?
AR frameworks typically require calibration of the device’s sensors (camera and IMU). This is primarily related to sensitive motion tracking, which is done by combining the camera image and the motion sensor input to estimate the relative pose of the device over time. Each device type undergoes testing for compliance on the IMU, camera and platform features, including the processor and memory.
In typical smartphone AR frameworks, the heavy lifting of steps #1-4 above are implemented in the framework itself (usually as a service or a fragment which your app can extend/invoke).
AR enabled Zebra Devices:
Now might be a good time to review the companion enterprise AR blog, where you will see a detailed overview of the use cases as well as the announcement of AR enablement on the Zebra portfolio.
TC52
TC72
TC57
TC77
Developing Enterprise AR applications with Zebra Mobile Computers
If you are creating an enterprise application with AR functionality integrated into Android in workflow applications, you should choose the Android frameworks.
Once you set up your development environment and work through the sample apps, you will be ready to tackle extending to your enterprise use cases.
Persistence is critical to a successful Enterprise AR app
In contrast to consumer applications (games, novelty and measure apps, for example), you will have to address the following challenges to provide a great user experience in Enterprise applications:
- The AR scene model needs to persist across: sessions, time, users, devices and even reproduce in a totally different location! Most of the consumer applications of the AR frameworks are transient sessions. Once you put the device in your pocket you may not be able consistently recover the state of the augmented reality you had before. This is not an issue when you are trying out virtual furniture in your living room, but enterprise use cases require persistence as we will see below.
- It is critical to make your AR solutions robust within the workflow constraints of the Enterprise user. The enterprise associate needs to be able seamlessly execute the function on demand, switch contexts and later return to the function without even being aware of the “AR” limitations underlying the app (like getting “lost” when the camera loses track).
Scene Model Persistence
The virtual models in the framework are represented as Node objects which contain the local and world pose of the object, as well as the “renderable” to be attached to the node derived from a 3D or 2D (view). Each node has a parent (either the scene itself or another node). The local pose is relative to the parent. You can anchor a node to a plane (horizontal or vertical), a feature point, or just scene itself depending on your use case.
As we noted earlier, the “pose” of a node in an AR session is relative to a world coordinate system unique to that session (as long AR is “tracking”). When you create a new session (or lose track and not recover), this world coordinate system will have changed, and the node pose needs to be remapped from the old to the new coordinate system.
One method to create persistence is to create the virtual models in a hierarchical tree structure at the root of which is a “Persistent Node”. When you build the scene, attach the scene model nodes to this Persistent Node as local poses relative to the Persistent Node (simply setting the Persistent Node as the node parent and then setting its global pose in the current scene accomplishes this).
The next step is to persist the Node objects (the local pose, world pose, attached renderables, etc) into a serializable object which can be stored and retrieved locally (shared preference, SQLite, sd card etc) or cloud (Firebase storage, AWS, on prem storage, etc). The Node class is itself not serializable (since it is a closed class) but it is straightforward to deconstruct and reconstruct into serializable properties. I found it useful to extend the Node class (along with attributes necessary to characterize the specific renderable constructs I needed) and create functions to create the persistent attributes from the Node properties and vice versa) in the extended class. So, you will always have a persistable state of a Node available.
Geo Registration
When you are trying restart the AR session and restore the persisted state of the virtual objects, you must locate the current pose of the “Persistent Node” you linked the entire virtual scene to above in the previous session. If you can do this, then you can place all the other nodes automatically since as children of this node, they will be correctly placed in the scene.
This represents the core challenge you will face in developing a successful enterprise AR experience. You will have to simultaneously navigate the following constraints:
- User friendliness: you need to make the reacquisition process as seamless as possible. Place yourself in your user’s shoes. Integrate this into their workflow. Not requiring any more motions or actions than what they are already used to is a winner.
- Infrastructure dependencies: Some environments will allow adding enablers to simplify the AR registration (large QR codes – see below) while others are extremely restrictive.
- Consistency: don’t depend on long term tracking stability of an AR session. However good your demo experience is in a controlled environment by you as a careful developer doing the demo, you can be assured that a typical user will break it. You should minimize both the time for reacquisition and length of the AR session between acquisition and process completion. You will expend a significant portion of your development effort on this task.
Following are some mechanisms at your disposal to “anchor” the Persistent Node to recover its pose from session to session:
Image Markers
Early AR frameworks depended entirely on image markers and the new smartphone frameworks have extended them by integrating with the SLAM component as described above. In a nutshell, an image marker is a planar, big enough image (typically > 15 - 30 cm on a side) which is recognized by the smartphone camera which deduces its pose relative to the marker. In the context of the above task of reacquiring the pose of the Persistent Node, you can anchor the Persistent Node to the image anchor found by the framework when the camera is looking at the image marker. Every time you need to reacquire, you must train the camera on the image marker again.
A few things to remember about image markers.
- The marker images need to have distinct trackable reference points. By the way, most QR codes (meeting the minimum size requirement of course) will satisfy this criteria,
- You need to develop a data base of the markers in your app, map the markers to the world space (either using the CAD layout of the space or using AR itself to generate the model).
- The size of the marker is important. While a minimum size might be ~ 15cm on a side, the larger the better both for accuracy (Solid Geometry 101) and to be visible and tracked from afar. I have found a 1-meter poster to be usable at about 6 meters head-on).
Depending on your use case, image markers may satisfy your geo registration need. But remember your system constraints while you consider this approach – especially as it requires changes to the infrastructure environment to implement this. For example, using this in a retail (or any consumer marketing sensitive environment) would be problematic. But a warehouse picking operation may well tolerate this. Your customer will tell you unambiguously.
Cloud Anchors
Some frameworks offer cloud anchors. A user points the device at the center of interest for while moving around it to generate a sparse 3D map of the immediate environment. This is uploaded to a cloud service. To reacquire, the user (or a different user) downloads the anchor and performs the inverse mapping process to relate the cloud anchor back to the current coordinate system, just as we did in the image marker example above.
Are cloud anchors right for your use case?
- Depending on the provider you may have a limited time window of persistence of the cloud anchor. In some cases, it is less than 24 hours.
- The acquisition timeline of a cloud anchor may be an eternity in a workflow constrained use case.
- You have no control over what aspect of the scene to anchor on and what to register as transient. For example, there may be a person wearing a bright hat sitting in a chair in your field of view – clearly a transient, amidst several persistent features (shelves, etc). You know that person will get up and walk away but you have no control over whether your current cloud anchor will include that portion of the scene.
- Sending cloud anchor data (even in a “digested form”) to a cloud service may be antithetical to some enterprise customer policies of protecting their enterprise intelligence.
- If you are comfortable with creating your own “cloud anchor” you may try using CV techniques to build a cloud anchor overlay and integrate with the smartphone AR framework. If you are looking for examples of this, dig into “cross platform” spatial anchors.
Advanced Enterprise Enablers
Given the overwhelming importance of solving the persistence / geo registration challenges in creating successful enterprise AR experiences, Zebra is creating platform building blocks to overcome the above challenges across several enterprise use cases. Stay tuned for future posts here on the Zebra Developer portal.
Editor’s Note: We invite the Zebra developer community to join us at one of Zebra’s 2019 APPFORUM series events to learn more about Augmented Reality in the enterprise: • October 1-2: Las Vegas, Nevada, US
Pat Narendra