Tagging Your World

Categories:
technology
Tags:
augmented reality, bionic eye, google, google goggles, layar, meta-tag, sekai camera, sixthsense, srengine, tagging the world

Though AR has been around for awhile (ACM Special Issue in 1993 though dedicated conferences beginning in 1997), it seems to be at a point where it is finally hitting the mainstream.  A number of development libraries have been around for awhile (ARToolkit, FlARToolkit, ARTAGD’Fusion @Home, Mixed Reality Toolkit (MRT), Unifeye Viewer XtraMirage Builder, and Studierstube Tracker). Though with the advent of the iPhone App Market and the Google Android Market Place, we are finally starting to see some practical applications that seem to be a head start towards a meta-tagged world. I’ll continually add more here and try to group them as I see fit. But the basic premise of these applications allows users to create a digital layer over the physical one. I think the SixthSense video demo shows this concept brilliantly (though most of the demo is more concept than reality).

Using the camera to detect features in a scene:

(1) Google Goggles

(2) SixthSense (the demo is more concept than reality)

(3) SREngine

Augmented Reality Browsers (see a comparison of a few here):

(1) Layar

(2) WWSignpost

(3) Wikitude (though this seems to have a few more applications)

Using Wifi hotspots, GPS, and an Online POI database to localize (also digital compass):

(1) Bionic Eye

(2) Sekai Camera

If you managed to watch through the last video, you’re probably still laughing as hard as I was.  The panel brought up a few interesting points though perhaps some of these came from a misunderstanding of how Sekai Camera works.   I’m still not entirely sure on the Sekai Camera’s methods and haven’t played with it, but after talking with Simon Biggs from the ECA it seems that everything is done via wireless communication.  For instance, networked iPhones can talk to each other and know each others location via Wifi (e.g. MAC addresses), Bluetooth, Bonjour, whatever. Then when your smart-device finds a wireless access point, it knows that object is in view and starts localizing based on the device’s Gyrometer and Compass readings (for orientation) and signal strength (for distance). What they are trying to accomplish is paradigm-shifting: an entirely tagged world.  This type of mentality leads one to tie the physical world with digital information rather than the digital with the physical   They are not the first to think of this idea let alone try and attempt a solution (e.g. SixthSense). 

It reminds me of my undergraduate research where we advocated for a Peer2Peer or ad-hoc networking solution rather than an infrastructure one. The idea was simple: why does a cell phone signal have to go from my phone all the way to a cell tower if I just want to call someone next door? Can’t the signal just go next door? Suddenly, every cell phone becomes a tower (or node) and I no longer have any worries about signal strength. We showed that in highly-populated scenarios such as evacuations or crowds from concerts etc…, infrastructure signals are unable to localize cell phones where-as an ad-hoc solution grows in power from the sheer number of additional nodes. Though again, why hasn’t this been adopted? It seemed to be one of the criticisms brought to Sekai Camera as well: what if people don’t want to adopt Sekai Camera’s infrastructure?

It may be time to start thinking about objects not as simple physical devices but something smart(er).  QR codes, fiducials, RFID, wifi, bluetooth, etc… are becoming ever more prevalent in the simplest of devices and objects are no longer purely physical.  They can also have a lot of digital information as part of it rather than as detected.  Just think about your shopping experience and how many objects already have digital information in the form of an RFID.  People are getting RFIDs implanted under their skin, tied to their bank account.  We are sentient beings in the making…

But what if your device isn’t smart? What Sekai Camera seems to lack is some further localization via the camera (as is solely done in Google Goggles via OCR or SREngine via static image scene recognition). Computer vision methods also let you do interest point, fiducial marker, and QR code detection and tracking.  As well, smart-devices are equipped with additional sensors such as gyrometers, digital compass, accelerometers, haptic surfaces, and proximity sensors.   Combine all of this and it seems like there is too much information.  In fact, our phones are starting to look more and more like living entities in our pockets…

I find the mixed reality or augmented physical layer or whatever you want to call it something researchers and developers need to start thinking about (and certainly many already have with projects like SeaDragon, Google Goggles, and SixthSense), especially if we want to feel connected to our physical world again.  We are moving to a future where computing no longer needs to sit on our desks but can occur anywhere.  As such, we need to start thinking of computing as a coupled process with space, where the space can be anywhere – the sidewalk, your living room, kitchen, etc… in a similar manner to Clark and Chalmers thesis of extended mind or active externalism, i.e. the mind and environment as active coupled systems.

One problem these situated AR methods may face in detecting un-smart devices is what if the world changes (and it does)?  Let’s say I have managed to tag every possible un-smart object in this world and my phone is capable of localizing the object based on the information in my online database.  Then I can walk up to my desk and know that my cup should be there.  My interest point detector will look for something that resembles a cup and viola, I am able to track a cup on my desk.  But what if I move that cup to the kitchen and my phone doesn’t see this?   

Time to start throwing away the old cups, folks.