@INPROCEEDINGS{mannacmmm96, author = "Mann, Steve", title = "`Smart Clothing': Wearable Multimedia and `Personal Imaging' to restore the balance between people and their intelligent environments", pages = "163--174", month = "Nov. 18-22", year = 1996, address = "Boston, MA", publisher = "Proceedings, {ACM} Multimedia 96", }
Steve Mann
http://www.wearcam.org/
[email protected]
MIT Building E15-389;
author currently with University of Toronto
10 King's College Road, Room 2001,
Tel. (416) 946-3387; Fax. (416) 971-2326
Current portable computers and PDAs fail to truly become part of our daily lives in the sense that we need to stop what we are doing and expend conscious effort to use them. They also do not have the situational awareness that they should have: while they are not being explicitly used, they are unable to remain attentive to possible ways to help the user.
Environmental technology in the form of ubiquitous computing, ubiquitous surveillance, and smart spaces, has attempted to bring multimedia computing seamlessly into our daily lives, promising a future world with cameras and microphones everywhere, connected to invisible computing, always attentive to our every movement or conversation. This raises some serious privacy issues. Even if we ignore these issues, there is still a problem of user-control, customization, and reliance on an infrastructure that will not (and probably should not) become totally ubiquitous.
In response to these problems, a personal, wearable, multimedia computer, with head-mounted camera(s)/display, sensors, etc. is proposed for use in day-to-day living within the surrounding social fabric of the individual. Examples of practical uses include: face identification (memory aid for names), way-finding via sequences of freeze-frames, shared visual memory/environment maps, and other personal note-taking together with visual images.
Anecdotal personal experiences, over several years of use, are reported, and privacy issues are addressed, in particular, with a discussion of how personal `smart clothing' has counteracted or at least reached a healthy balance with environmental surveillance.
KEYWORDS augmented reality, mediated reality, ubiquitous computing, smart spaces, video surveillance, mobile multimedia, wearable computing, personal imaging, video orbits, pencigraphic image compositing.
PROBLEM STATEMENT The advent of the personal computer brought computers closer to us -- from distant computer rooms of the mainframe era right to our desks. However, computing still remains accessible only when we're sitting at our desks, and is thus not really part of our personal day-to-day living.
Later, portable computing made it possible to carry this personal environment in a briefcase or pocket. However, laptop computers and PDAs fail to become part of our daily lives in the sense that we need to stop what we are doing to use them. They are far from providing a seamless interaction in the context of day-to-day living.
Other devices that we often carry, such as cellular telephones, pagers, wristwatches, personal sound systems, tape recorders, camcorders, and perhaps pocket calculators or measurement instruments such as pocket multimeters duplicate much of the same functionality many times over, and don't communicate with each other. These items are far more encumbering than would be a single item that performed all of the tasks that each was meant to do.
Other developments, such as ubiquitous computing [1] have attempted to bring computing seamlessly into our daily lives. Ubiquitous multimedia computing and smart spaces [2][3][4] would seem to suggest a future world in which we're surrounded with computing, as well as cameras, microphones, and other forms of perceptual intelligence during all facets of our daily lives.
There are two problems with smart spaces: (1) Not all environments will ever be so equipped. Even if most spaces were equipped with cameras and microphones, they may not serve our needs or be of direct benefit to us, as those who would install the infrastructure could not possibly predict our needs and exact preferences. Even if we desired that they know everything that could be known about us (and provided as much information as possible to them), it is still doubtful that their systems would totally eliminate the need for personal technology. (2) There is no guarantee that the organizations behind the infrastructure would not put their needs before ours. In particular, the prospect of ubiquitous surveillance would no doubt be attractive to many organizations, and the temptations to use it for other purposes (besides helping the users of the space) suggests that we might not want a world in which our every movement and conversation is being monitored by an external entity.
Morover, there are numerous well-documented examples in which organizations have abused their capability to covertly monitor their members. For example, it was recently reported that Dunkin Donuts was monitoring and recording conversations within their establishments, using hidden microphones, in addition to their surveillance cameras, and the use of hidden cameras in department store change rooms and employee locker rooms has also been documented[5][6].
Environmental perceptual intelligence is not new. Smart floors, electric eyes, motion sensors, smart lights, etc., have been used for a number of years. Some examples of smart spaces appear in Fig 1.
Figure 1:
Examples of ``smart spaces''. (a) ``Intelligent highways''
and ubiquitous surveillance. Systems like this are used for
both traffic monitoring, and ``public safety''. In Baltimore,
for example, the government is installing approximately 200 cameras
throughout the city to keep watch over the general activities of
its citizenry. (b) Smart ceilings (fifteen ceiling domes or
dark windows visible here) monitor people in the space,
purportedly for their benefit. Sophisticated
machine vision algorithms are often used to track
shoppers' activities [7]
and make inferences about possible suspicious
behaviour. (c) Machines with dark windows monitor users'
activities, purportedly for the users' own protection,
although organizations are often secretive about the exact
nature of these systems (hence the use of very dark glass to
hide the apparatus behind it). (d) `Smart toilets', with dark
windows, provide an awareness of the user's state to a
miniature computer system or the like contained inside the box
with the window, to assist the user in flushing the toilet.
However, as perceptual intelligence becomes networked in a multimedia environment, there will be more cameras installed and a wider variety of officially stated reasons for their existance. As cameras are installed for various purposes, people should be aware of the (very likely) possibility that they will have a plurality of purposes. Indeed, many video conferencing systems and multimedia computers have means to cover up cameras when they are not in use. This is because many people feel uncomfortable with an exposed camera lens pointed at them, even if the camera is turned off. People are justified in this feeling; multimedia/visual intelligence systems should be designed with physical lens covers and not just on/off switches (in good designs the on/off switch and lens cover are combined so that sliding the cover over the lens automatically shuts off the power). The problem statement, in summary, is that neither laptop computers and PDAs, nor ubiquitous multimedia computing/surveillance offer the kind of truly personal environment that would best suit our needs.
PROPOSED SOLUTION: `SMART CLOTHING' `Smart clothing', the combination of wearable multimedia computing, personal imaging (through the use of one or more wearable video cameras), and wireless communications, into a rig that is comfortably worn in an active ``always ready'' mode, not just carried in a briefcase or the like -- is proposed as an alternative to being forced to make a choice between portable multimedia computers/PDAs, and environmental technologies such as ubiquitous computing/surveillance.
`Smart clothing' is a step toward truly personal computing and enhanced situational awareness, with less (or possibly no) reliance on a centralized infrastructure.
The first `smart clothing' prototype, designed and built by the author (Fig 2),
Figure 2: Wearable computer systems designed and built
by author for experiments in personal imaging:
Early (1980) apparatus was somewhat cumbersome.
Before the advent of the TNCs in 1981,
communications was handled with two separate radios at each end.
The bulky 1.5 inch CRT required a bicycle helmet for
support, and could only display 40 characters per row of text.
Later, a waist-mounted television was found to be less
cumbersome, but failed to provide the constancy of
user-interface.
With the advent of miniature CRTs in the late 1980s,
a comfortable eyeglass-based system became practical (1990),
which was recently transferred
to a more modern (early nineties)
visor (VVSport TV, replacing the LCD with a CRT).
A single hat-mounted antenna provided communications
in the ham bands. More recently (1995) with the advent of cellular
communications, a base-level of connectivity (at reduced speed)
remains in effect when the unit is too far from its home base
to use the high speed ham radio unit. Furthermore, in applications
where the high speed ham radio link is not necessary, users of
this technology will not need to pass a radio exam or obtain a
special radio license as the commercially
available cellular technology may be used.
comprised a modular personal, wearable, multimedia computer system together with one or more head-mounted cameras, a head-mounted display, and other sensors (one or more microphones, biosensors, two wearable radar systems, etc.), connected wirelessly to the Internet. The modular nature of the system allows portions to be left out or included, depending on the occasion.
Currently, `smart clothing' provides the following functionalies:
HISTORICAL BACKGROUND The proposed `smart clothing' was not the world's first wearable computer with wireless communications. Thomas Bass, in his book ``The Eudaemonic Pie'', describes shoe-based computers of the 1970s that were designed and built by physicists and other researchers in California, for the purpose of assisting them at playing roulette. It was remarkable that they were able to design these computer systems to be so unobtrusive as to pass the ultimate test of unobtrusiveness - the ``casino test'' -- surviving the scrutiny of the croupiers and pit bosses. It is also truly ironic that these computers were being used -- privately -- in a place where privacy would otherwise be unthinkable (under the `eye in the sky', the partially-silvered ceiling mirrors and the ubiquitous ceiling domes of wine-dark opacity that are pervasive throughout most casinos).
The author's suggestion of a community of nomads wearing wirelessly connected multimedia systems is not entirely new either. For many years, ham radio hobbyists have used voice communications on their hand-held battery-operated radios -- often wearing them with headsets and hands-free boom microphones, sometimes even together with ``antenna hats'' quite similar to the author's -- to stay in touch with each other as part of their day-to-day lifestyle, in all facets of their lives. Some even sent and received television pictures, using battery operated tetherless equipment. The ham radio community, perhaps the predecessor of the internet newsgroups and the like, is also perhaps the predecessor of online wearable wireless multimedia living.
Ivan Sutherland, a pioneer in computer graphics, described a head-mounted display with half-silvered mirrors so that the wearer could see a virtual world superimposed on reality [8] [9]. Sutherland's work, as well as more recent related work [10][11][12][13][14]is characterized by its tethered nature. The wearer is tethered to a workstation which is generally powered from an AC outlet -- in this sense it differs from the `smart clothing' which is entirely battery operated and tetherless.
OTHER RELATED WORK Other recent work in wearable computing[15] provides a task-specific system, in particular, a repair manual for use by soldiers. To make use simple, and to keep the soldier focused on the task at hand, the only input is a knob and pushbutton, so that menu items from a specific program may be selected.
`Smart clothing' differs from employer-owned technology, or technology controlled by an external entity (the most extreme case being devices used to track criminals[16]). In particular, it is owned, operated, and controlled by the wearer.
It is primarily intended for day-to-day living within the surrounding social fabric of the individual[17].
`PERSONAL IMAGING' THEORETICAL BACKGROUND The theoretical background for personal imaging is based on regarding the camera as a measurement instrument, in particular, an array of directional lightmeters. This theory is described in detail in[18][19][20]. Some of the basic ideas behind the theory are summarized here, in particular, those pertaining to automatic generation of image composites from multiple pictures of the same scene.
PENCIGRAPHIC IMAGE COMPOSITING. Consider, for simplicity, a 1-D `camera' in a 2-D static world. It is desired to know how two pictures, taken from the same location in space (but with different orientations, and possibly a change in focal length if the camera has zoom lens capability), are related to one another. The situation is depicted in Fig 3(a),
Figure 3: Pencigraphic imaging:
(a) Two pictures of a static scene, taken
from a common location, are related by a projectivity about
the center of projection (COP).
In particular, both images
contribute to the same
pencil of lines passing through the COP.
The coordinate transformation from one image to the other
is given by considering the mapping between
domain and range . (b) This form of coordinate
transformation gives rise to
the `projective chirping effect'[21],
where uniformly spaced points in the domain (depicted as circles)
map to a ``chirping'' lattice in the range (depicted by squares).
In this example, the vanishing point (point of zero
spatial frequency) is located at , while the
`exploding point' (point of infinite spatial frequency)
is given by the range of . (c) The plot of range
coordinate as a function of the domain coordinate is in the
form of a rectangular hyperbola, but shifted
by the `exploding point', in the domain and the
vanishing point, in the range. A sinusoid is
depicted in the domain together with the chirp function
that it maps to in the range. (d) The `projective
chirping phenomenon'
depicted on the face of a building having uniformly spaced windows.
The best-fit chirp function captures
this `periodicity-in-perspective'.
where, for simplicity of visualization, the moving camera is depicted as two separate cameras drawn together on the same figure. From the figure, the coordinate transformation from to is given by
x_2 & = & z_2 ((x_1/z_1) - ), \; \; x_1 o_1
& = & (a x_1 + b)/(c x_1 + 1), \; \; x_1 o_1
where , , , and , is the location of the singularity in the domain.
Given a set of images that lie in the same orbit of the group, we wish to find for each image pair, that operator in the group which takes one image to the other image.
If two frames, say, and , are in the same orbit, then there is an group operation such that the mean-squared error (MSE) between and is zero. In practice, however, we find which element of the group takes one image ``nearest'' the other, for there will be a certain amount of parallax, noise, interpolation error, edge effects, changes in lighting, depth of focus, etc.
A fully automatic featureless parameter estimator for estimating the parameters of the true projective group of coordinate transformations (e.g. the parameters in (1)) has been proposed[18]. In particular:
[ \! \!
\! \! \!
] \! \!
[ \!
\! \!
] \!
=
- \!
[ \! \!
\! \!
]
where , , and , denotes the spatial derivative of the `reference image', and the temporal derivative (between the two images).
Images that are in approximately the same orbit of the projective group of coordinate transformations often arise from a quick movement of the head when wearing the apparatus described previously. An example of a `pencigraphic image composite' captured using WearCam, in which 117 images are seamlessly joined together into one larger image is shown in Fig 4.
Figure 4: A `pencigraphic image composite'
created using the `personal imaging'
apparatus described in the previous section. As the author
faced the cashier, and looked up toward the ceiling, 117 somewhat
low-resolution NTSC images were captured. The processed image,
showing the author's
gaze pattern, is in some sense perhaps more like a
painting than a photograph -- the image contents are
``painted'' onto an empty image canvas merely by looking.
Furthermore, even though the image was captured from a
standard NTSC video camera,
(giving 240-line images after deinterlacing),
the resulting image is much larger (3730 pixels high)
than any of the input images.
Note the creative use of the
distortionless (rectilinear)
but extremely wide-angle perspective
-- the ceiling dome looming almost directly overhead is put near
the `exploding point' for maximum resolution enhancement.
Other examples appear in
http://wearcam.org/pencigraphy/gallery.html
The `pencigraphic image representation' (regarding the camera as a collection of light-measuring instruments), together with the WearCam apparatus, forms the basis of `personal imaging'.
HISTORICAL BACKGROUND Efforts to combine multiple images of the same scene have been around for many years. Photogrammeters have combined images manually, in non-overlapping sections. Photogrammetry is a well-developed field of study[22]. Artists, most notably, David Hockney[23], often assemble multiple pictures of the same scene by hand, using the medium expressively.
Research in combining multiple pictures electronically has previously relied on an pure-translational model[24] of the form or an affine model[25] of the form . However, these models fail to capture the `chirping' effect (described in the previous section) which is clearly quite pronounced in almost any practical imaging situation. With feature correspondences, the projective coordinate transformation may be determined but it was not until 1993 that a fully automatic featureless method of doing so appeared in the literature[18].
Other related work, such as Apple's Quick Time VR (QTVR) requires a special apparatus, comprising a tripod with a precisely calibrated rotating camera stage, so that camera movements are known. Such an approach is unsuitable for the freeform `painting with looks' approach to `personal imaging'.
EXPERIMENTAL APPARATUS Only a brief description of the apparatus is provided here; more detailed information can be found elsewhere in the literature[26].
The author's current apparatus, with miniature camera and miniature (one inch diag.) 24-bit color screen in the eyeglasses, and miniaturized Pentium 90 system (64M RAM, 1.2G hard drive, etc) in the waist bag is still a somewhat cumbersome prototype. Clearly it is experimental in nature, with the knowledge that trends in miniaturization, and new technologies like conductive threads, conformal antennas, etc., will soon make this apparatus blend into ordinary clothing.
The multimedia computer is connected to the internet over TCP/IP, using one or more forms of wireless communication. In particular, these include a 1987 WA4DSY system which provides a data rate of 56kbps, together with a G3RUH system with variable data rate, and an older dual band radio. The older system is essentially a backup system -- higher output power together with a low data rate of 1200bps give it a robustness that keeps the author online, for example, in the sub-basements of buildings where the other systems do not work that well.
Antennas are currently mounted in a hat which is lined with copper mesh, (the 1980 rig made use of a distributed antenna network sewn into a jacket). Work is underway to make the antennas completely unobtrusive, as they were in the 1980 rig, but while maintaining the same high level of performance of the current rig, through the use of conformal antenna technology.
THE WEARCAM PERSPECTIVE `WearCam' is an Internet-connected multimedia computer with its own Internet address, but the items are attached in a natural fashion, so that they can be used, at times, without (much) conscious effort, and while performing other activities such as walking, shopping, or riding a bicycle.
The image from the camera(s) is presented on the display in a way that is natural and intuitive. A camera and microphone are attached to the display in such a way as to give a first person vantage -- the camera ``sees'' exactly what the wearer sees, rather than the second person perspective of traditional multimedia applications where the camera ``sees'' a picture of the user.
OTHER SENSORS The apparatus includes a ProComp 8 channel analog to digital converter which facilitates measurement of voltages and such. In particular, the author's `smart shoes' (shoes with an array of sensors that provide information about footstep force and velocity, etc.) and `smart undergarments' that sense, for example, heart rate, respiration, and skin resistance, are connected to this analog to digital converter, providing the capability to log the information into a file, or to access it online.
Other sensors such as infared and radar, enhance and extend the author's sensory capabilities, and have been used for various experiments in synthetic synesthesia, which might someday be of assistance to the visually challenged[27].
The author's apparatus is somewhat reminiscent of the ``Winnebiko/Behemoth'' work of Steve Roberts, N4RVE[28], except that it is built into clothing rather than a bicycle.
`PERSONAL IMAGING' `Personal imaging' is an important aspect of `smart clothing', thus the camera and display are important parts of the apparatus. The current computer screen is actually quite similar to the display in the author's 1980 version (it is still a miniature CRT operating at the same 5 to 6 kV anode voltage), except that it provides high-quality color imagery; the field rate is three times as high (180Hz instead of 60Hz), and a color filter is used to sequence through red, green, and blue. This provides full color fidelity with good shadow detail as well as good highlight detail (LCDs often provide only washed-out pastel colors). A spinning color filter wheel was replaced with an electronic color shutter to reduce physical size and increase reliability by eliminating moving parts.
EXPERIMENTS The apparatus, a wearable multimedia computer system equipped camera(s), and wireless Internet connection [29], enabled experimentation with imaging applications in ordinary day-to-day situations, not just in a lab. Possible applications of this `WearCam' to the handicapped have been suggested [27], in particular, its use as a personal visual assistant (PVA), and as a visual memory prosthetic has been suggested.
WearCam was originally designed and built as a `visual memory prosthetic' to attain an improved awareness of visual experiences (e.g. to gain an enhanced awareness of light and shade), and to help overcome visual amnesia. Note that this goal is the opposite of that embodied in both virtual reality and personal sound systems (in the sense that they reduce awareness of actual reality).
WearCam embodies a temporal visual filter that provides computer-induced flashbacks (possibly together with annotation). This functionality has made it possible to experiment with practical applications such as providing assistance in such visual tasks as remembering faces.
EXPERIMENTS IN NETWORKED ONLINE COMMUNITIES Internet connectivity is due to antennas the author has erected on rooftops of various buildings. In particular, one of these is coordinated through the New England Spectrum Management Council to operate as an ``open'' gateway (available for use by the general ham radio community), and is listed as being an open gateway.
Furthermore, with the current (or soon-to-be) availability of commercial systems such as Metricom, Wavelan, and Motorola, getting more `smart clothing' online will soon be trivial.
Experiments with two WearCam users have allowed an exchange of viewpoint, so that each person sees through the other person's eyes[30].
Instead of just two people, suppose we have a community (network) of individuals wearing the apparatus. This could be a homogeneous community (all wearing the same form of the apparatus), but for simplicity of implementation, consider a heterogeneous community (Fig 5) wearing various prototypes of the apparatus. These prototypes are widely different in their design but provide similar functionality with ability to communicate with one another.
Figure 5: The `safety net': a network of individuals,
three with WearCams: author at left with 1990 CRT-based WearCam;
fourth and fifth (cellular hat) from left
wearing LCD-based WearCams more recently designed and built
by the author
but lacking the dynamic range and color fidelity
of the older CRT-based WearCam. Even this newer display
technology is somewhat obtrusive.
The displays are the most obtrusive
parts of these systems (cameras, microphones, computers,
and input devices of the three
wearcams are almost completely invisible).
It will take some time before a high-quality unobtrusive
display becomes available.
EXPERIMENTS WITH THE `VISUAL MEMORY PROSTHETIC' WearCam presents the visual world to the wearer on a computer screen, so that the visual experience can be augmented, diminished, or otherwise altered, under program control if desired. This facilitates experiments on visual perception in which the experience of the visual world must pass through a `visual filter'[30]. The `visual memory prosthetic' is based primarily on a `visual filter' that doesn't (assuming proper calibration) bend any rays of light passing through it, but only delays these light rays. Such a temporal-only `visual filter' might, for example, function like a pair of eyeglasses made of hypothetical slowglass[31] and merely delay the lightfield. It might also produce a stroboscopic or freeze-frame effect, as well as more sophisticated temporal visual filters such as for the face-recognition/reminder experiments to be described later.
RESULTS AND DISCUSSION EDGERTONIAN EYES Early experiments with a variety of different `visual filters', involved experimenting with the WearCam apparatus in day-to-day activities[30]. Each of these filters provided a different visual reality. For one such filter, the author experimented by applying a repeating freeze-frame effect to WearCam. With this video sample and hold, It was found that nearly periodic patterns would appear to freeze at certain speeds. For example, while looking out the window of a car, periodic railings that were a complete blur without the apparatus were found to appear in sharp focus with the apparatus, creating a heightened sense of awareness of subtle differences between each period of a periodic structure -- an awareness that exceeded even that attained standing still examining the structure.
Rotational periodicity (such as in the blades of a spinning airplane propeller) would appear as objects rotating slowly backwards or forwards, in much the same way as objects do under the stroboscopic lights of Harold Edgerton[32].
FLASHBACKS AND FREEZE-FRAMES Of greater interest than just being able to see that which is invisible to the naked eye, was the fact that sometimes the effect would cause certain things to be rememembered much better. In particular, it was found that faces would be remembered much better with this freeze-frame effect.
DEJA VU A free-running visual memory prosthetic was found to be helpful in way-finding. With the ability to compare past and present imagery, it became quite evident when one had been in a particular place before (e.g. if one were going around in ``circles''). When this comparison is done automatically, the computer attempts to compare all previous images with the current image (from what the camera is currently pointed at). With some form of visual processing, WearCam can also function as the visual equivalent of Starner's rememberance agent[33] (a text interface that constantly watches what the user types and automatically reminds the user of related text files).
`VISUAL CLEW' The `visual clew' is a computer-assisted way-finding system that runs on WearCam.
We've all no doubt been lost at one time or another. One might enter a shopping complex, or the like, and be unable to find the way back to the car or subway stop at the end of the day.
It was found that by building up a stack of images, either consciuosly deciding to capture an image at each branch point, or putting the apparatus in a free-running mode, the problem of getting lost was solved, at least to the extent that one could find one's way back to where one started from, whenever desired.
The current implementation of `visual clew' uses the Wearable Wireless Webcam: each image is appended to the web page, and then, when it is time to return (e.g. to find my way back to the parking lot) the Web browser (Mosaic) is invoked, to browse through pages of previously captured images. Alternate implementations have been built using a local image stack but the Web stack allowed for shared ``visual memory'', so that, for example, two people could use such an apparatus to find each other, should their paths happen to have crossed at some point.
A method of determining if images are in the same orbit of the projective group of coordinate transformations has been described[34][20]. This `video orbits' approach, together with other image recognition systems, can be used to automate the way-finding `visual clew' system. In particular, if one applies the following use strategy, it is very easy for even a small battery-powered computer with only moderate processing capability to recognize a previous image:
Alternatively, instead of capturing a single image at each branch point, a partial environment map may be captured at each branch point (stop for a couple of seconds, and `paint with looks' -- making a quick glance around to generate an image composite). Then if the same location is encountered, the two collections of input images will be in the same orbit of the projective group of coordinate transformations.
ANNOTATED `FLASHBACKS': WEARABLE FACE-RECOGNIZER Many people (author included) have difficulty remembering faces. The `visual memory prosthetic' was found to assist in the task of remembering faces, through the use of computer-induced flashbacks. In addition to helping the human remember, and therefore recognize, faces, computers are also capable of directly recognizing faces. Previous work by others is based on using a fixed, tethered camera [35][36][37][38]. The kinds of applications for this work might include video surveillance with a fixed camera and people moving through its field of view. The FBI-funded FERET project comprises a large database (more than 7000 faces) that can be searched quickly on a workstation-class system.
Automatic face recognition has raised extensive privacy concerns[39]: `` Privacy International is calling on the UK government to prohibit ... Computerised Face Recognition (CFR) systems that have the capacity to automatically compare faces captured on CCTV, with a database of facial images. Several police and commercial organisations are developing this technology. ... should ...re-establish some democratic mechanism in the development of wide-scale urban CCTV systems ... grave risk that the CCTV industry is out of control... quashing public debate...the systems have challenged some fundamental tenets of justice, and created the threat of a surveillance society. Other more traditional approaches to law enforcement and social justice are being undermined...''
However, computational resources, attached to a person, suggest the possibility of turning the tables on the traditional third-person perspective (such as a ceiling-mounted surveillance camera), and, instead, using face recognition from a person-worn perspective. In particular, if face recognition is used ubiquitously by ordinary individuals as part of their day-to-day living, it will give rise to a more democratic society in which policemen, as well as ordinary citizens (and shopkeepers as well as shoppers, bank tellers as well as bankers) will be recognizable and accountable for their actions.
A variety of face-recognizer implementations are possible with the current WearCam apparatus. These range from storing the database of candidate faces on the body-worn apparatus and using local processing, to connecting remotely from the apparatus to the database, possibly using remote processing.
Several different implementations of the capture/display configuration (e.g. having both the camera and display rotated 90 degrees, having the camera rotated 90 degrees with the display still in landscape orientation, etc) were tried. It was found that the best overall configuration was to have the camera rotated 90 degrees (portrait) but with the display still in landscape orientation. Anecdotes on the author's experiences ``living in a rot90 world'' (reminiscent of George Stratton's upside-down glasses[40]) appear in[30].
Improvements to the `wearable face recognizer' included providing means of alignment, using a registration template (Fig 6) displayed on top of the video to facilitate manual alignment of the face, through rotating the head to aim the camera appropriately. (The cursors themselves were implemented as a JPEG image.)
Figure 6:
Template-based wearable face-recognizer
(a) As candidate approaches, an effort is made to
orient the apparatus (by turning of the head) so that
the candidate is centered. This is easy because the
full-motion color
video input stream appears on the computer screen
together with the template.
(b) At some point, the distance to the candidate will
be such that the scale (size of the face on the image
plane) will be appropriate, and, while still keeping the
orientation appropriate, the match is made.
(c) After the match is made, the template image drops
away, revealing a radioteletype (RTTY) window behind
it, upon which is displayed the desired information
(for example, the name of the candidate, ``Alan Alda'',
and possibly additional parameters or other
relevant information).
Further implementational details of the wearable face recognizer and visual memory prosthetic are available in[27] and [30].
VIDEO ORBITS/`PERSONAL IMAGING' When shopping, friends and relatives can remotely look at whatever one is looking at (the author has been surprised from time to time, for example, with unexpected email about the fruits and vegetables in the WearCam's view, or a reminder to pick up some milk, which was especially surprising when, for example, a remote visitor happened upon a portion of the environment map that was captured by accident).
In more recent years, with the advent of the World Wide Web, a new form of connectivity has been explored using the `Wearable Wireless Webcam' (http://wearcam.org). The application of WearCam has been an exploration of a new form of personal visual connectivity. As an tool for visual artists (in some sense the pencigraphic image composites are an expressive art form), such a device can reduce the time from first seeing something of visual interest, to showing an image in a gallery (completed exhibition), down to a fraction of a second.
ILLUSORY RIGID PLANAR PATCH Examples of illusory rigid planar patches arise directly from projective coordinate transformations applied to individual images. In particular, the wearcam apparatus displays both video (imagery) and text on the NTSC-svideo screen. Typically a large font (e.g. such that there are on the order of 30 characters across) is used so that the text will be very easy to read even when one is jogging or the eyeglasses are being jostled around. An example of a text window superimposed on an image composite appears in Fig 7.
Figure 7: Here the author is conducting banking business, and a problem
has been encountered with the author's bank card.
A radioteletype (RTTY) session appears
as a rectangular text window
on the sixth image in a seven-image pencigraphic composite,
creating the illusion of a rigid planar patch.
Text is sideways because the author had adapted to
seeing images rotated 90 degrees -- text appeared normal
(unrotated) to author. The second window on the right denotes
the viewpoint of another person, remotely (the author's spouse).
This viewpoint, corresponds to the lower right hand quadrant of
the first image in the sequence. Although the author's gaze
is currently fixed on the paperwork in front of him, the
remote viewer is navigating the environment map independent of
where the author is looking. In particular, the remote viewer
recognizes the face of the teller serving the customer to the
author's right, and sends a message into the author's RTTY
window, upon his current point of gaze.
Note also the 3 video surveillance cameras visible at the top of
the image composite. This would suggest that both the author and
the bank have a visual record of the transaction.
In the future, just
as both parties of such a transaction keep signed copies of all
documents, so too will both parties keep `visual memories' of
the transaction, giving rise to balance between
wearable technology and environmental technology.
Face recognition may be used together with `pencigraphic imaging'. In particular, the group of projective coordinate transformations may be used to sustain the illusion of a rigid planar patch. Thus once the face is recognized by the computer, an virtual name tag is created using the illusory rigid planar patch. Then when the wearer looks to the right, for example, causing the video imagery to move to the left across the image sensor (and viewer's screen), the text moves to the left to follow it. But not only does the text move to the left, but it also ``keystones'' and `rechirps' itself to maintain the illusion of a ridid planar patch in 3-D space (Fig 8).
Figure 8: Six frames from an image sequence in which author looked from
left to right. The illusory rigid planar patch is created by
featureless tracking of the video orbit of the projective group
of coordinate transformations. Notice that in frame six,
even though the face is no longer in view of the apparatus, it
continues to be `tracked'. What is actually being tracked is
the entire orbit (the projective coordinate transformation
arising from other objects in the room).
Here the illusory rigid planar patch is a list of grocery
items previously purchased by the author from this same cashier.
Such an apparatus may be useful when returning items to a
department store, as it assists one's memory of what was purchased,
when, and from whom. A clear memory of details often helps in
avoiding disputes regarding terms of the sale.
ACTIVE BADGES AND PERSON-TRACKING Card keys and active badges both represent technologies that either keep track of where individual people are located, or comprise hardware that has the potential to do so.
MIT has attempted, somewhat unsuccessfully, on two different occasions, to deploy a card-reader system for access control to various buildings. In both cases, students have strongly opposed the initiative. In particular, there was widespread concern regarding the privacy implications of such a system.
TURNING THE SYSTEM INSIDE-OUT Both the card key and active badge systems rely on a ``smart'' element built into the architecture (card reader or IR receiver) and a ``dumb'' element (card or beacon) carried or worn by the user. The ``smart'' element is networked to a central computer system, while the ``dumb'' element has no communications or networking capability whatsoever.
Suppose, however, that we swap the two. Suppose that the user carries or wears the ``smart'' element, and the building architecture is endowed with the ``dumb'' element. Thus, for example, the user might wear the infa red (IR) receiver, and have this connected to his/her `smart clothing', while numerous beacons would be distributed throughout the building. This means that there is no need to network the beacons, no need to wire the building. The system relies on the communications infastructure each user wears.
However, now the location of the user is known to the user's clothing, and thus the user has control over who can and cannot know his/her location. A user might, for example, define an access control list comprising faculty advisor, thesis advisor, colleagues, etc.. The user's clothing would automatically encrypt the user's location (as determined by the last beacon ``seen'' by the user's clothing) and transmit this information to the desired recipients. Any interception of this communications would be unintelligible to those not on the access control list.
In a prototype system, the author deployed a number of ``room tags'' -- name tags fixed at known locations in the building (Fig 9(a)), and fixed an IR receiver to his glasses (Fig 9(b)), and connected this to the clothing-based computer.
In addition to giving the user control over his/her personal whereabouts, the system may also be used to provide location-dependent computer-induced flashbacks (Fig 9), adding new dimensions to the visual memory prosthetic. For example, entering my office, I was surprised to find that the lost sweater I had been looking for that day had been sitting on my desk only a short while ago. Thus I `remembered' that I had worn it that day, and that therefore I must have lost it somewhere nearby.
Figure 9: A paradigm reversal for person-tracking
(a) `Room tag' (one of many name tags that may be
easily deployed throughout the building).
(b) Paradigm reversal: tags and receivers swap places.
Receiver is attached to author's eyeglasses
and plugged into `smart clothing'.
(c) Computer-induced flashback triggered by revisiting
same room as before. Note name tag is visible in upper
left of picture. Induced `memory' of sweater, sitting on desk
in lower left corner of picture, proved to be helpful.
ACHIEVING BALANCE: COOPERATION BETWEEN SMART SPACES AND SMART CLOTHING The proliferation of ever more intrusive environmental technology has created an imbalance between individuals and their environment. This section presents some examples of how wearable technology might work cooperatively with the environmental technology. An example of environmental control (HVAC) using smart underwear is presented.
SMART UNDERWEAR AND BIOSENSORS Even when we take off our `smart clothing' at night, we might still choose to keep on our `smart underwear' which controls the heater or air conditioner in the room. Upon arriving home, late at night, one is generally too hot from just climbing the stairs, etc., so when first going to sleep, the underwear tells the heater to turn off, but after a couple of hours sleeping, when one's metabolism slows down, the underwear senses the resulting changes in one's body temperature/conductivity, and turns up the heat. Our clothing of the future may some day be interoperable and interconnected, so that it keeps track of our physical condition and allows us to decrypt this information for evaluation by a doctor or other professional of our choosing. Further description of the `smart underwear' prototype, and anecdotes on the author's experience designing and using it is appears in[41].
The natural place for our medical records is right in our clothing. Having a patient wear his or her entire medical history would solve much of the medical records privacy problems we face today. With various biosensors, the most current and up-to-date information would be readily available withing the very clothing that's taking the measurements. This approach would eliminate the need for, and the possible abuses that can arise with, a central database of medical records, and would eliminate the need to for a person to venture through bureaucratic procedures to access his or her own medical information. It would also eliminate the problems associated with smart cards, as clothing is almost always worn, while cards may be misplaced and inaccessible in times of emergency care. Epidemilogical research would still be possible being done with the patient's data -- participating patients could make the data accessible to organizations doing the research, but this would be done through a query to each participating patient's online `smart clothing' each time the data were needed, so that the patient's clothing would be kept ``in the loop'', that is, access logs would be automatically generated in the `smart clothing', so that patients could trace the history/usage of their data at a later date if desired.
CRIMEWATCH AND `SAFETY NET' In a networked wearable multimedia community, people might pay attention mostly to their immediate surroundings, but may, at times, get an image from someone who thinks there might be danger. This fear of danger might be triggered by a `maybe I'm in distress' button pressed by the wearer, or automatically. Automated distress may be facilitated through a heart rate monitor and an activity meter/pedometer, such as the sensors in the author's shoes. The heart rate divided by the footstep activity gives a `visual saliency' index. Should someone pull out a gun and demand cash from the wearer, the `smart clothing' might respond appropriately (video capture/transmission at maximal frame rate, etc.) by virtue of the sudden increase in heart rate for no apparent reason (e.g. without any increase in physical exertion). As a personal safety device, ubiquitous use of `smart clothing' might have the potential to turn the world into a small-town community -- a global village as barriers of time and space fall. A community of individuals networked in this way would look out for each others' safety in the form of a `neighbourhood watch'. This `safety net' could be used for a `virtual safewalk': a participant, about to walk home or enter an underground parking garage late at night, sees `eye-to-eye' with one or more people (perhaps in a different time zone, say somewhere in the world where it is morning, so the virtual escort has fresh alert eyes). Neil Stephenson's Global Neighborhood Watch[42] comes to mind, but now with free-roaming tetherless connectivity.
This level of connectivity raises many new social and ethical questions that will need to be addressed as we further experiment with more WearCams.
SMART CLOTHING v. SMART UNIFORMS `Smart clothing' represents a significant future direction for computing. The recent proliferation of wearable computers (there are about 5 companies making wearable computers now) suggests that we're moving in that direction. However, many of the applications of wearable computers so-far envisioned, such as the land warrior (military), the intelligent maintenance aid, or various applications in the workplace[15] might better be described as `smart uniforms'. A `smart uniform' is issued to a soldier or employee at the start of a job, and then taken away after the job is completed.
There is a fundamental difference in the way that people feel about their own clothing as compared to a uniform. Although people can become quite familiar with their uniforms, whether worn in prison, the military, certain workplaces, or old-fashioned schools, the individuality of personal clothing, and the pleasures associated with its selection and wearing should be extended to computing. The full power and enjoyment of this synergy between human and machine will be realized only when the computer is owned, operated, and controlled by the wearer, giving rise to truly personal computing. Indeed, examples of wearable technology at the extreme opposite to the personal wearable, are the wearable ID transponders that have been rejected by many employees, and the devices attached to criminals to keep track of them[16]. These devices are owned, operated, and controlled by a remote entity. Some such devices even have the capability to provide the wearer with an ``electrical corrective signal'' (euphemism for electric shock) when the wearer does something against the will of the entity that controls the system (e.g. ventures outside a prescribed boundary). This prospect is at least as troublesome as the pole-top surveillance cameras discussed earlier.
ANECDOTAL DISCUSSIONS ON THE ``SAFETY VERSUS PRIVACY'' ARGUMENT When I first joined the MIT Media Lab, I expressed concern regarding the possible development of surveillance technologies, such as ubiquitous use of video cameras, face recognition and the like. My advisor, trying to relieve my concerns regarding a possible Big-Brother future, presented me with the argument of her advisor (Sandy Pentland) who was the director of the research on face recognition: `` Cameras make the world a smaller place, kind of like a small town. You give up privacy in exchange for safety. In a small town, if you were suffering from a heart attack and collapsed on the floor of your kitchen, chances are better that someone would come to your rescue. Perhaps a neighbour would come over to borrow some sugar, and, since your door would be unlocked, would just come right in and see you had collapsed and come to your aid.'' Although this analogy makes perfect logical sense -- on the safety versus privacy axis, the small town of the past and future with ubiquitous surveillance are very similar -- there was still something very disturbing about it.
If we look along a different dimension, characterized by symmetry, the small town and a future with ubiquitous surveillance are exact opposites. In a small town, the sheriff knows what everyone's up to, but everyone also knows what the sheriff is up to. This symmetry is generally not the case with regard to surveillance systems like the ones used in the UK, which, although often installed by governments, are operated as closed (secret) systems, the imagery being unavailable to ordinary citizens.
Phil Patton [5] discusses the surveillance dilemma, making reference to the ubiquitous ``ceiling domes of wine-dark opacity'', making mention that ``many department stores use hidden cameras behind one-way mirrors in fitting rooms'', and in general, that there is much more video surveillance than we might at first think.
With so much video surveillance in place, and growing at a tremendous rate, one wonders if privacy is a lost cause. If we are going to be under video surveillance, we may as well keep our own ``memory'' of the events around us, analogous to a contract in which both parties keep a signed copy. Falsification of video surveillance recordings is a point addressed in the movie Rising Sun, and in William Mitchell's book, The Reconfigured Eye [43]. However, if there is a chance that individuals might have their own account of what happened, organizations using surveillance would be much less likely to risk falsifying surveillance data. Even though it is easy to falsify images [43], when accounts of what happened differ, further investigation would be called for. Careful analysis (e.g. kinematic constraints on moving objects in the scene, the way shadows reflect in shiny surfaces, etc) of two or more differing accounts of what happened would likely uncover falsification that would otherwise remain unnoticed. The same technology that is used to demonstrate a person has removed an item from a department store without paying may be used by a person to demonstrate that he or she did, in fact, pay. One can only imagine what would have happened if the only video recording of the Rodney King beating were one that had been made by police, using a network of police surveillance cameras, such as the camera networks used in some cities in the UK[44]. Of course, most officials are honest, and would have no reason to be any more paranoid of an average citizen's camera operating together with their own network of cameras.
Once we accept that images can be easily falsified, we will someday soon (some have already) begin regarding images as having no more truth than verbal or written accounts. The difference between whether images themselves are presented to a jury from someone's WearCam, or a verbal description of incredible detail is provided by that same person through their looking into their own WearCam's memory, will matter less and less as technology evolves.
Privacy concerns regarding WearCam itself have also been raised. However, living in a society in which cameras were not permitted, unless they were worn on people, would be far preferable to today's society -- then at least we'd know we had privacy when we were alone -- our instincts like ``don't pick your nose in front of other people'' would work in harmony with ubiquitous personal imaging.
CONCLUSIONS `Smart clothing', a wearable multimedia computer and `personal imaging' system been proposed. Its practical utility in day-to-day applications has been explored, in particular, through experiments in way-finding, face recognition, creation of partial environment maps and shared visual spaces, etc. A framework for `personal imaging' has been set forth, in particular, through the use of a wearable camera system, regarding the camera as an array of directional lightmeters, and through consideration of `video orbits' (orbits of the projective group of coordinate transformations acting on images).
Much like the cellular telephone, pager, pocket calculator, notebook computer, pocket organizer, wristwatch, etc., that it will subsume and replace, the most successful wearable technology will be owned, operated, and controlled by the wearer. Much like the the wearer's own clothing, this technology will arise out of the wearer's own choosing.
`Smart clothing' raises some interesting ethical and social issues, which we will need to face when the apparatus makes its way into widespread use. The boundaries between seeing and viewing, and between remembering and recording will crumble. When we purchase a new appliance, we may well `memorize' the face behind the store counter. A week later, our spouse, taking the appliance back for a refund, could `recall' the name and face of the clerk she never met.
In some sense `smart clothing' turns the tables on video surveillance -- the shopkeeper who illegally chains fire exits shut will risk getting caught on candid camera just as the shoplifter risks getting caught on the shop's surveillance cameras. Although miniature camcorders have brought to light incidents such as the Rodney King beating, they lack wireless communications to safeguard the image data from being siezed or destroyed.
Because WearCam is tetherless, it has been possible to wear the apparatus for several years, in day-to-day interactions. Getting multimedia computing off the desk and onto the streets, shops, and into other establishments, has raised some interesting social issues.
`Smart Clothing' offers an alternative to centralized surveillance. It suggests a future in which people, through prosthesis, might have both improved visual memory and improved ability to share it. But it also suggests a hope that the visual memory be distributed among people, and be less likely to be abused than if it exist in a centralized form, as is more common with a network of surveillance cameras, such as is commonly used on the streets in the UK. The proliferation of hidden cameras everywhere has the possibility to threaten our privacy, but suppose the only cameras were the prosthetic elements of other individuals. Then at least one would still have privacy when one was alone.
Furthermore, as images become easier and easier to edit[43], WearCam, like other cameras, will begin to be regarded as being more like a visual memory aid and an artist's tool than a source of evidence of fact.
ACKNOWLEDGEMENTS Thanks to: Roz Picard; Hiroshi Ishii; Thad Starner; Neil Gershenfeld; Sandy Pentland; Ted Adelson; Jennifer Healey; Matt Reynolds, KB2ACE; and Steve Roberts, N4RVE, for many useful technical discussions; Julie Scher for inviting me as a guest-lecturer to her class (a course on surveillance -- this paper originated as a course handout for my guest lecture), causing a great deal of useful discussion of the social issues; those at Computers, Freedom, and Privacy who contributed ideas during an informal demonstration I gave at CFP-96; and the reviewers/organizers of the conference for a thorough constructive criticism of the first draft. In particular, I would like to thank Carole Goble for a very thorough proof-reading of the preliminary draft, and many useful comments. Thanks also to HP labs, ProComp, VirtualVision, Compaq, Kopin, Colorlink, Ed Gritz, Miyota, BelTronics, M/A-Com, and Virtual Research for lending or donating equipment that made my experiments possible.