Latest News

Real-Time Object Recognition with ofxCaffe

Screen Shot 2015-01-03 at 12.57.23 PM

I’ve spent a little time with Caffe over the holiday break to try and understand how it might work in the context of real-time visualization/object recognition in more natural scenes/videos. Right now, I’ve implemented the following Deep Convolution Networks using the 1280×720 resolution webcamera on my 2014 Macbook Pro:

The above image depicts the output from an 8×8 grid detection showing brighter regions as higher probabilities of the class “snorkel” (automatically selected by the network from 1000 possible classes as the highest probability).

So far I have spent some time understanding how Caffe keeps each layer’s data during a forward/backward pass, and how the deeper layers could be “visualized” in a meaningful way.


Past the 1st layer, the dimensions do not make sense for producing meaningful visualizations. However, recent work by Zeiler/Fergus, suggest the use of deconvnets as a potential solution for visualizing how an input image may produce a maximum response in a particular “neuron” . Their work attempts to reconstruct each layer of a convnet from the output. Another recent work by Simonyan et. al also attempt the problem. I haven’t yet wrapped my head around how that would work in this framework, but would really love to implement this soon. The following screenshots therefore only show the first 3 channels of a > 3 channel layer.


Screen Shot 2015-01-03 at 12.57.53 PM

What the real-time implementation has taught me so far is that it is horrible at detecting objects in everyday scenes. Even the Hybrid model which should have some notion of both objects and scenes, though its not clear if these labels should even be within the same “hierarchy”. All of the models are very good (from my very simple tests using the webcam) when the scene is un-occluded, presented with only the object of focus, with fairly plain backgrounds. R-CNN’s focus seems to be aiding with this problem, though I have not implemented region proposals at the moment. Without such a constraint, often the network will end up jittering between a number of different possible object categories, though the scene itself does not seem to change very much. This may tell us something about how task, attention, or other known cognitive functions may interact with these networks.

Also, it is really fast.

I’ve also noticed that when I show my hand, it suggests with high probability (> 0.5) that it is a band-aid. Or that my face is “ski-goggles” or “sun-glasses”. These are fairly poor guesses, but understandable, considering what they have been trained on. I guess that may start to also tell us about the nature of the problem, and how a simple object class for an entire image depicting a scene is not necessarily how we should learn objects.

Lastly, it is quite easy to setup your own data, visualize the gradients, and get a sense of what is happening in real-time. The data is also agnostic to format, dimensionality, etc… I’d love to eventually try training something using this framework with other datasets such as sound, fMRI, and/or eye-movements.

Update 1: Code available here: ofxCaffe

Extracting Automatically Labeled Volumetric ROIs from MRI

Performing a region of interest analysis on MRI requires knowing where the regions are in your subject data. Typically, this has been done using hand-drawn masks in a 3d viewer. However, recent research has made the process mostly automatic and the open-source community has implemented everything you will need to automatically create labeled volumetric regions of interest [1-3]. With FreeSurfer 5.3, we have the option of performing cortical parcellation using 4 different atlases:

Destrieux atlas: aparc.a2009s
Desikan-Killiany atlas: aparc
Mindboggle: aparc.DKTatlas40
Brodman areas: BA and BA.thresh

We’ll first use freesurfer’s recon-all tool to perform a cortical reconstruction of our anatomical scans. Download freesurfer and register your copy. You’ll be sent an e-mail with a license. Follow the instructions and create the license file “.license” inside your freesurfer home directory (check the environment variable, FREESURFER_HOME, e.g., “$ echo $FREESURFER_HOME"). Then run the script, “$FREESURFER_HOME/” to setup necessary paths.

Next make sure you have set the environment variable for SUBJECTS_DIR to where you’d like your analysis to go (e.g., “$ export SUBJECTS_DIR=/some/directory“). For our example, we’ll keep this to a directory called “freesurfer” in our home directory, “~/”. Each subject we analyze will have its own folder insider SUBJECTS_DIR (i.e., “~/freesurfer”, for our example), with the name specified by recon-all’s “subjid” parameter. We’ll come back to that if it doesn’t make sense yet.

We’re ready to run recon-all now. It can take up to 12 hours on a new machine. Since we have a few subjects, we used our computing cluster to perform the analysis. The cluster we use requires us to submit jobs to a queue using qsub. This bash script file will loop through all our subjects and run recon-all on each of them using a new “job” submitted through qsub. Our subject’s anatomical data is stored in a Nifti file format, “.nii”, in the directory ~/anatomical.

# set the environment variable to where our subject data will be stored
export SUBJECTS_DIR=~/freesurfer

# these are the names of the subjects for which we have high-res scans for
declare -a arr=("15jul13rr" "16jul13ad" "17jul13bs" "18jul13ys")

# we are going to use this temporary directory while processing data
mkdir /global/scratch/pkm

# loop through our subjects
for i in "${arr[@]}"
   echo "$i"

   # copy the scan to the temp directory
   cp ../anatomical/"$i"_anat.nii $datasetlocation

   # subject a job with the current subject's file
   qsub -v subject=$i,location=$datasetlocation run_freesurfer.pbs

This pbs file is our job script which takes one subject’s mri data and runs recon-all on it.

#!/bin/bash -l

# Name your job (used in the PBS output file names)
#PBS -N $subject

# request the queue (enter the possible names, if omitted, serial is the default)
#PBS -q default

# request 1 node and  request 1 processor per node
#PBS -l nodes=1:ppn=1

# Specify how much time you think the job will run
#PBS -l walltime=36:00:00

# By default, PBS scripts execute in your home directory, not the
# directory from which they were submitted. The following line
# places you in the directory from which the job was submitted.

# Set the environment variable again as this is on another machine
export SUBJECTS_DIR=~/freesurfer

# Filename for logging
filename="$LOGID"_`date +%Y%m%d-%H%M%S`_freesurfer.log

# Run recon-all
recon-all -i $location -subjid $subject -all &> logs/$filename

If you don’t have a cluster, the line to perform is simply:

recon-all -i YOUR_NIFTI_DATA.nii -subjid SOME_IDENTIFIER -all

where YOUR_NIFTI_DATA may be something like ’15julrr_anat.nii’, and SOME_IDENTIFIER could be ’15julrr’. This identifier is the directory which will be created inside SUBJECTS_HOME.

Next we’ll make use of AFNI’s toolset (free download here). The first is @SUMA_Make_Spec_FS, which will prepare analysis for suma and store the results inside the subject’s surf directory in its own folder, SUMA. This process takes about 15-30 minutes. We then use @SUMA_AlignToExperiment to re-align our data and labels.

Finally, we are ready to extract VOIs. We’ll use whereami to do this. We first find what the name of our region is within our atlas code:

$ whereami -show_atlases
$ whereami -show_atlas_code -atlas ATLAS_CODE

where ATLAS_CODE is an atlas identifier from the output of show_atlases. If we use, “aparc.a2009s+aseg_rank”, then we are using the Destrieux’s Atlas (see here), and we can see the codes for our atlas like so:

$ whereami -show_atlas_code -atlas aparc.a2009s+aseg_rank
++ Input coordinates orientation set by default rules to RAI

Atlas aparc.a2009s+aseg_rank,      194 regions
----------- Begin regions for aparc.a2009s+aseg_rank atlas-----------
----------- End regions for aparc.a2009s+aseg_rank atlas --------------

To extract one of these regions as a mask, we can use whereami:

$ whereami -atlas aparc.a2009s+aseg_rank_Alnd_Exp 
           -mask_atlas_region aparc.a2009s+aseg_rank_Alnd_Exp::ctx_lh_G_temporal_middle 

More reading:

[1]. Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. Neuroimage, 53(1), 1–15. doi:10.1016/j.neuroimage.2010.06.010.Automatic

[2]. Fischl et al. (2004). Automatically Parcellating the Human Cerebral Cortex. Cerebral Cortex, 14:11-22.

[3]. Desikan et al. (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3):968-80.

Many thanks to Beau Sievers and Carolyn Parkinson for their help in this process.

YouTube’s “Copyright School” Smash Up

Ever wonder what happens when you’ve been accused of violating copyright multiple times on YouTube? First, you get a redirect to YouTube’s “Copyright School” whenever you visit YouTube, forcing you to watch a cartoon of Happy Tree Friends where the main character is dressed as an actual pirate:

Second, I’m guessing, your account will be banned. Third, you cry and wonder why you ever violated copyright in the first place.

In my case, I’ve disputed every one of the 4 copyright violation notices that I’ve received under grounds of Fair Use and Fair Dealing. Here’s what happens when you file a dispute using YouTube’s online form (click for high-res):

3 of the 4 have been dropped after I’ve filed disputes, though I’m still waiting to hear about the response to the above dispute. Read the dispute letter to Sony ATV and UPMG Publishers in full here.

The picture above shows a few stills from what my Smash Ups look like. The process described in greater detail on is part of my ongoing research into how existing content can be transformed into artistic styles reminiscent of analytic cubist, figurative, and futurist paintings. The process to create the videos uses content-based information retrieval techniques that I would assume are very similar (though likely not as advanced) as the techniques used to flag the video as a duplicate copy in the first place, YouTube’s Content ID System. Until Sony and UPMG respond, the infringing video is still available on YouTube:

Regardless of my disputes, I’m now redirected to YouTube’s Copyright School whenever I visit YouTube (until I successfully complete the test):

A bit about Happy Tree Friends – it is, according to Wikipedia, “extremely violent, with almost every episode featuring blood, pain, and gruesome deaths…depicting bloodshed and dismemberment in a vivid manner.” Nevermind. I’m a copyright violater, I can handle a little dismemberment. In fact, that is exactly what I’ve done to the “Copyright School” video, dismember it with the content of 70 videos of Happy Tree Friends using the same process which brought me to YouTube’s “Copyright School” in the first place:

I hope Russell the Pirate doesn’t feel his copyright is being violated.

Related: Copyright Violation Notice from “Rightster”, Intention in Copyright, EFF Wins Renewal of Smartphone Jailbreaking Rights Plus New Legal Protections for Video Remixing, An open letter to Sony ATV and UMPG

An open letter to Sony ATV and UMPG

Dear Sony ATV Publishing, UMPG Publishing, and other concerned parties,

I ask you to please withdraw your copyright violation notice on my video, “PSY – GANGNAM STYLE (?????) M/V (YouTube SmashUp)” as I believe my use of any copyrighted material is protected under Fair Use or Fair Dealing. This video was created by an automated process as part of an art project developed during my PhD at Goldsmiths, University of London: and

The process which creates the audio and video is entirely automated meaning the accused video is created by an algorithm. This algorithm begins by first creating a large database of tiny fragments of audio and video (less than 1 second of audio per fragment) using 9 videos from YouTube’s top 10 list. From this database, the tiny fragments of video and audio are stored as unrelated pieces of information and described only by a short series of 10-15 numbers. These numbers represent low-level features describing the texture and shape of the fragment of audio or video. These tiny fragments are then matched to the tiny fragments of audio and video detected within the target for resynthesis, in this case the number one YouTube video at the time, “PSY – GANGNAM STYLE (?????) M/V”.

To reiterate, the content from the target video, “PSY – GANGNAM STYLE (?????) M/V”, is not used in the resulting synthesis. That is, the process is creating a new video by not merely copying the target video, but attempting to re-create it out of entirely different material, the remaining 9 top 10 YouTube videos. Abstractly, there may appear to be a similar form or structure due to the collection of many fragments organized in a similar way as the target for resynthesis. These fragments however are from a very large collection of very different material to the original content’s own material. The content used in the resynthesis itself is only from the large database of tiny fragments of audio and video segmented from 9 other videos. As a result, I would argue the use of any content within this video is only through Fair Use or Fair Dealing of the content.

This art project’s purpose is towards highlighting an important aspect of how computers and humans perceive and how copyright itself may be dealt with within a computational arts practice which by its nature has to make use of existing content. The nature of this work further seeks to transform existing material into something entirely different such that experiencing a resynthesized video reveals a new understanding of one’s own perception. The amount of the content used is fragmented in nature and assembled using a coarse idea of audiovisual scene understanding with no notion of semantics. As a result, the video itself is very abstract and at times incomprehensible. Further, its effect on the publisher’s marker as noted by the very low view rate on YouTube is marginal at best. I therefore ask you to please withdraw your copyright claim.

Parag K. Mital

Related: Copyright Violation Notice from “Rightster”, Intention in Copyright, EFF Wins Renewal of Smartphone Jailbreaking Rights Plus New Legal Protections for Video Remixing, YouTube’s Copyright School

[UPDATE Dec 8, 2012: All copyright violation notices have been dropped and the video is publicly accessible.]

Copyright Violation Notice from “Rightster”

I’ve been working on an art project which takes the top 10 videos in YouTube and tries to resynthesize the #1 video in YouTube using the remaining 9 videos. The computational model is based on low-level human perception and uses only very abstract features such as edges, textures, and loudness. I’ve created a new synthesis each week using the top 10 of the week in the hopes that, one day, I will be able to resynthesize my own video in the top 10. It is a viral algorithm essentially but it is not proven if it will succeed or not.

The database of content used in the recreation of the above video comes from the following videos:
#2 News Anchor FAIL Compilation 2012 || PC
#3 Flo Rida – Whistle [Official Video]
#4 Carly Rae Jepsen – Call Me Maybe
#5 Jennifer Lopez – Goin’ In ft. Flo Rida
#6 Taylor Swift – We Are Never Ever Getting Back Together
#7 – This Is Love ft. Eva Simons
#8 Call Me Maybe – Carly Rae Jepsen (Chatroulette Version)
#9 Justin Bieber – As Long As You Love Me ft. Big Sean
#10 Rihanna – Where Have You Been

It looks and sounds like an abstract mess.

Today, I’ve received a somewhat automated copyright violation notice from YouTube (shown below) suggesting my smashup of “11 Month Old Twins Dancing to Daddy’s Guitar (YouTube Smash Up)” (shown above via Vimeo instead of YouTube) are infringing the “audiovisual content administered by: Rightster” (their website describes them as: “Services that optimise the distribution and monetisation of live + on demand video for sports rights holders, news networks, event owners and publishers“), and my account has been placed under “Not a good standing“. Acknowledging infringement seems to be the suggested path via YouTube’s automated copyright infringement system (see pictures below). Though perhaps I should instead dispute under fair-use terms, and risk my account being banned if my dispute is “fradulent“.

[UPDATE: 17/10/12]: I’ve disputed the claim and my account status is still “Not in a good standing” (though I have no idea what this means). YouTube says they will temporarily put the video back online though this may change at any time:

“After your dispute has been submitted, your video will soon be available on YouTube without ads for third parties. This is a temporary status and might change at any time. Learn more about copyright on YouTube.
As a result, your account has been penalized and is not in good standing. Deleting the video will remove the penalty due to this claim.”

Related: YouTube’s “Copyright School”, Intention in Copyright, EFF Wins Renewal of Smartphone Jailbreaking Rights Plus New Legal Protections for Video Remixing

3D Musical Browser

I’ve been interested in exploring ways of navigating media archives. Typically, you may use iTunes and go from artist to artist, or have managed to tediously classify your collection into genres. Some may still even browse their music through a file browser, perhaps making sure the folders and filenames of their collection are descriptive of the artist, album, year, etc… Though what about how the content actually sounds?

Wouldn’t it be nice to hear all music which shares similar sounds, or similar phrases of sounds? Research in the last 10-15 years have developed methods precisely to solve this problem and fall under the umbrella term content-based information retrieval (CBIR) algorithms, or uncovering the relationships of an archive through the information within the content. For images, Google’s Search by Image is a great example which only recently became public. For audio, audioDB and ShaZam are good examples of discovering music through the way it sounds, or the content-based relationships of the audio itself. Though, each of these interfaces present a list of matches to a image or audio query, making exploring the content-based relationships of a specific set of material difficult.

The video above demonstrates interaction with a novel 3D browser of a collection of music by one artist, Daphne Oram. The sounds are grouped in 3D space based on the way they sound, clustering together similar sounding material. Each of the 3 axes describes a grouping of sound frequencies. So a timbre, or a texture of sound. The position of the sound along one of these axes means there is a lot of that group of frequencies present in the sound file.

Exploring her work in this browser really demonstrates the variety of sounds she achieved. It also makes exploring the collection really fun to use, as there is a fun visual form, and you also get to hear stuff right away.

The browser has also been built to be a real-time tool for creating new sounds. Mousing over any of the tiny boxes (representing parts of audio files in the collection) triggers the clip to play. Since similar sounding clips are grouped closer together, one can “perform” the collection along perceptually coherent axes by moving the mouse along any of the axes.

Intention in Copyright

The following article is written for the LUCID Studio for Speculative Art based in India.


My work in audiovisual resynthesis aims to create models of how humans represent and attend to audiovisual scenes. Using pattern recognition of both audio and visual material, these models use large corpora of learned audiovisual material which can be matched to ongoing streams of incoming audio or visual material. The way audio and visual material is stored and segmented within the model is based heavily on neurobiology and behavioral evidence (the details are saved for another post). I have called the underlying model Audiovisual Content-based Information Description/Distortion (or ACID for short).

As an example, a live stream of audio may be matched to a database of learned sounds from recordings of nature, creating a re-synthesis of the audio environment at present using only pre-recorded material from nature itself. These learned sounds may be fragments of a bird chirping, or the sound of footsteps. Incoming sounds of someone talking may then be synthesized using the closest sounding material to that person talking, perhaps a bird chirp or a footstep. Instead of a live stream, one can also re-synthesize a pre-recorded stream. Consider using a database of nature recordings and, instead of the live-stream, now use a pre-existing recording of Michael Jackson. The following video demonstrates the output using Michael Jackson’s “Beat It”.

Everything you hear comes from nature recordings (by Chris Watson). Try to realize what elements of Michael Jackson’s original recording remain “meaningful”. The beat of the song is incredibly predominant (@ 33 seconds). As well, some aspects of the lyrics are present and heavily cross-modally present with the influence of the visual (e.g. @ 1:27), though no words are audible as the database contained no words. Also consider what meaningful information may be present in the opposite scenario, i.e. using a database of Michael Jackson, and re-synthesizing nature sounds.

As a side note, I have developed a similar approach for visual resynthesis, taking segments of visual objects as the basis of the resynthesis algorithm, rather than segments of audio. The example below demonstrates a resynthesis of the introduction to The Simpsons using only material from the introduction to The Family Guy:

More examples are on my vimeo channel.

Audio Collage

The idea of audio collage is not new. Electronic musicians have investigated the technique within the practice of music concrète. The advent of digital sampling with devices such as the Fairlight CMI as well made the practice much more accessible. Plunderphonics, a technique by John Oswald in which he manually chopped and resynthesized a number of copyrighted albums, created entirely new landscapes and sounds of material. It was also formalized in his essay, “Plunderphonics, or Audio Piracy as a Compositional Prerogative” where he questioned where copyright can begin to claim ownership. He famously made use of Michael Jackson recordings after 12 years of developing the technique in his EP Plunderphonics, perhaps the most extensive use of sampling to date, which featured an image of Michael Jackson on a naked woman on the album cover.

The album itself made use of material from a variety of artists, all credited, such as Count Basie, Dolly Parton, Beethoven, and Michael Jackson to recreate unheard of sounds reminiscent of Stravinsky and The Beatles. Oswald believed that by not selling the album, he was not infringing on anyone’s copyright. However, he faced legal pressure by Jackson’s attorneys and was forced to stop releasing the album by CBS and Jackson’s attorneys, destroying all remaining prints of the album. Other artists such as Negativland which experienced similar legal battles with U2 and Cassetteboy whose audiovisual video collage depicting BBC material of Queen Elizabeth was stripped from Youtube are also of note. Numerous documentaries including RiP: A Remix Manifesto and Good Copy Bad Copy, books such as Cutting Across Media: Appropriation Art, Interventionist Collage, and Copyright Law and Lessig’s Remix, and funded studies such as Recut, Reframe, Recycle have also focused on the topic (Thanks to Nathan Harmer for additional links).

Perception is Inference

I have extended these questions into the very nature of perception, claiming the computational models I employ are plausible models of our own psychological modeling of audio and vision itself. I try to make explicit one possible mechanism of perception as a meaning making inference machine, an idea which dates back at least to Helmholtz in 1869. The very nature of inference entails understanding requires prior experiences, prior models, a set of known examples. By taking the small fragments of sound and rearranging them, these perceptual units lose their context, and necessarily their original meaning, and are only bounded together by the ongoing environment to create a new meaning based on organization of the existing environment. In other words, without the environment to make a sound, there is no synthesis of a sound. The meaning therefore is created both by the viewer, and the target of the synthesis. The fragments within the database have no intentionality or meaning attributed to it until it is re-organized, resynthesized, and re-appropriated within a new context. In the video above, this context is Michael Jackson’s Beat-It; however, not a single fragment of Michael Jackson’s Beat It appears in the audio output.

No Infringement of Copyright Intended

Where then does copyright hold stake within the computational models I have created? The ongoing environment provides the intentional influence of how the sounds are to be rearranged. If Michael Jackson were to appear in the environment and sing a song that also appears in the corpus, it is likely the synthesis would re-create Michael Jackson’s song. As our own mental machinery encompasses having heard Michael Jackson before, we are able to recognize Michael Jackson. However, now consider a database containing tiny fragments of Michael Jackson’s recordings and a target of birds chirping and taxis honking. Then, the only semblance of Michael Jackson is based on the tiny fragments that appear, and the organization of a song of Michael Jackson is no longer present.

Now the question appears, “Does copyright hold stake over the ongoing environment’s intentions?”, requiring no one to perform Michael Jackson for fear of the copyrighted re-synthesis? Or does copyright instead hold stake in the subtle fragments of sounds which were sampled from a copyrighted track? Let us say I set up this computational model as an installation environment where the database contains both Michael Jackson songs and nature recordings. As a target for synthesis, I have a microphone feeding live audio to the computational model. Until the microphone hears something *like* Michael Jackson, creating a synthesis using the tiniest fragment of a Michael Jackson recording, it seems I will not have violated copyright.

Even still, how could we have understood that a tiny fragment of a resynthesis was copyrighted in the first place? We will have had to matched every tiny fragment within our own perceptual machinery (i.e. recognition) to have understood that I had heard this fragment within a different context, that is, a copyrighted one. Though, is it not the case that any meaning making inference will necessarily be matched to a prior experience? If so, then isn’t copyright claiming our own experiences as copyright? Where does the context of sound take place within copyright? Could I not listen to the blow of trees and be reminded of Michael Jackson? Or more likely, hear the wind and be reminded of a Chris Watson recording?

The question I am getting at is “How does the meaning elicited by the new resynthesis change our notion of copyright?” Consider the opposite scenario, where Michael Jackson’s songs are no longer in the database, and instead we have nature recordings, as the video above does. What does copyright have to say for using the organization of Michael Jackson’s songs though using entirely different content? In this case it seems I am using the artist’s full intention, though not breaking any copyright as the material is not evidently materialized in the resynthesis.

Early 20th Century

The early Dada movement of the 1920’s also looked at resynthesis within a narrative context. During a surrealist rally in the 1920’s, Tristan Tzara is famously noted as standing in front of a theater and pulling random fragments of text out of a hat before the riot ensued and wrecked the theater. T.S. Eliot’s The Waste Land and John Dos Passos’ U.S.A. trilogy are also early examples of the cut-up technique popularized by Tzara. The technique was further made famous by Brion Gysin and William Burroughs during the 1950’s in their poetry, which heavily made use of the cut-up technique in a variety of fashions. Gysin mistakenly came across the technique as he made use of layers of newspapers while cutting paper on top of them using a razor blade (originally to protect the table). He noticed the cut-up fragments of newspaper created a juxtaposition of image and text that were strangely coherent and meaningful. The two key terms to stress are ‘coherent’ and ‘meaningful’, of which our perceptual systems cannot help but create from the world. In fact, a number of theories of neuro-cognitive behavior are also based on these premises. Interested readers are encouraged to read Ronald Rensink’s work on theorizing the phenomena of visual Change Blindness into “Coherence Theory”, and Shihab Shamma’s work on modeling auditory perception within a “Temporal Coherence Theory”.

Frederic Jameson also discusses the nature of collage, an early analytic cubist practice developed by Picasso and Braque which graced the start of the century, as a process which creates a new meaning from existing material. In contrast to pastiche or bricolage, collage seeks a new meaning, whereas pastiche is often used pejoratively to denote a random intention or imitation of existing intentions, e.g. 16th century forgeries/imitations.


The distinction of collage versus pastiche seems incredibly relevant to practitioners making use of sampling and collage. However, I doubt such a distinction would help anyone in court. Though it is curious to think of the opposing scenario, where the material content is not copyright, though the intention or organization of the content is. How does a digital future handle this distinction without referring to intentions of an artist? Further, how could it prove an artist’s intentions in the first place?

Related: Copyright Violation Notice from “Rightster”, YouTube’s “Copyright School”, EFF Wins Renewal of Smartphone Jailbreaking Rights Plus New Legal Protections for Video Remixing

Course @ CEMA Srishti School of Design, Bangalore, IN

From November 21st to the 2nd of December, I’ll have the pleasure to lead a course and workshop with Prayas Abhinav at the Center for Experimental Media Arts in the Srishti School of Design in Banaglore, IN.  Many thanks to Meena Vari for all her help in organizing the project.

Stories are flowing trees

Key words:  3D, interactive projects, data, histories, urban, creative coding, technology, sculpture, projection mapping

Project Brief:

Urban realities are more like fictions, constructed through folklore, media and policy. Compressing these constructions across time would offer some possibilities for the emergence of complexity and new discourse. Using video projections adapted for 3D surfaces, urban histories will become data and information – supple, malleable, and material.

The project will begin with a one week workshop by Parag Mital on “Creative Coding” using the openFrameworks platform for C/C++ coding”.

About the Artists:

Prayas Abhinav

Presently he teaches at the Srishti School of Art, Design and Technology and is a researcher at the Center for Experimental Media Arts (CEMA). He has taught in the past at Dutch Art Institute (DAI) and Center for Environmental Planning and Technology (CEPT).
He has been supported by fellowships by Openspace India (2009), TED (2009), Center for Media Studies (CMS) (2006), Public Service Broadcasting Trust (PSBT) (2006), Sarai/CSDS (2005). He has presented his projects and proposals in the last few years at Periferry, Guwahati (2010), Exit Art, New York (2010), Futuresonic, Manchester (2009), Wintercamp, Amsterdam (2009), 48c: Public Art Ecology (2008), Khoj (2008), Urban Climate Camp, ISEA (2008), Sensory Urbanism, Glasgow (2008), First Monday, Chicago (2006), The Paris Accord (2006) and PSBT/Prasar Bharti (2006).
He has also participated in the exhibitions Myth ?? Reality (2011) at The Guild, Mumbai, Continuum Transfunctioner (2010) at exhibit 320 in Delhi, Contested Space – Incursions (2010) at Gallery Seven Arts in Delhi and Astonishment of Being (2009) at the Birla Academy of Art and Culture in Kolkatta (2009).

Parag K Mital (London)

Parag K Mital is an American-born London-based PhD-student in Arts and Computational Technology at Goldsmiths, University of London working on augmented realities and audiovisual resynthesis. As an audiovisual installation artist, his work encourages the audience to directly question the processes surrounding perception through introspection and curiosity from experiencing real-time models of audiovisual perception. His work has traveled extensively in London, Athens, and Moscow, including the London Science Museum and the British Film Institute. As an educator he has taught at Edinburgh University, Goldsmiths, University of London, and is due to deliver a course on Audiovisual Processing for iPhone/iPad at the Victoria & Albert Museum in London.

Workshop in “Creative Coding” using the openFrameworks platform

The workshop will cover the basics of openFrameworks, a c/c++ creative coding platform. This course will also introduce students to digital signal processing techniques in synthesis and analysis of audio and visual signals for interactive techniques using custom made libraries developed at Goldsmiths, University of London. Depending on interest, participants will also receive tutorial on developing for the iPhone/iPad in order to create real-time audiovisual apps.

Memory Mosaicing

A product of my PhD research is now available on the iPhone App Store (for a small cost!): View in App Store.

This application is motivated by my interests in experiencing an Augmented Perception and of course very much inspired by some of the work here at Goldsmiths. The application of existing approaches in soundspotting/mosaicing to a real-time stream and situated in the real-world allows one to play with their own sonic memories, and certainly requires an open ear for new experiences. Succinctly, the app records segments of sounds in real-time using it’s own listening model, as you walk around in different environment (or sit at your desk). These segments are constantly built up the longer the app is left running to form a database (working memory model) for which to understand new sounds. Incoming sounds are then matched to this database and the closest matching sound is played instead. What you get is a polyphony of sound memories triggered by the incoming feed of audio, and an app which sounds more like your environment the longer it is left to run. A sort of gimmicky feature of this app is the ability to learn a song from your iTunes Library. What this lets you do is experience your sonic world as your favorite hip-hop song or whatever you listen to.

Hope you have a chance to try it out and please forward to anyone of interest.

Concatenative Video Synthesis (or Video Mosaicing)


Working closely with my adviser Mick Grierson, I have developed a way to resynthesize existing videos using material from another set of videos. This process starts by learning a database of objects that appear in the set of videos to synthesize from. The target video to resynthesize is then broken into objects in a similar manner, but also matched to objects in the database. What you get is a resynthesis of the video that appears as beautiful disorder. Here are two examples, the first using Family Guy to resynthesize The Simpsons. And the second using Jan Svankmajer’s Food to resynthesize Jan Svankmajer’s Dimensions of Dialogue.

Google Earth + Atlantis Space Shuttle

I managed to catch the live feed from of the Atlantis Space Shuttle launch yesterday. Though what I found really interesting was a real-time virtual reality of the space shuttle launch from inside Google Earth. Screen-capture with obligatory 12x speedup to retain attention span below:

Lunch Bites @ CULTURE Lab, Newcastle University

I was recently invited to the CULTURE lab at Newcastle University by director, Atau Tanaka. I would say it has the resources and creative power of 5 departments all housed in one spacious building. In the 12-some studios housed over 3 floors, over the course of 2 short days, I found people building multitouch tables, controlling synthesizers with the touch of fabric, and researching augmented spatial sonic realities. There is a full suite of workshop tools including a laser cutter, multiple multi-channel sound studios, full stage/theater with stage lighting and multiple projection, radio lab, and tons of light and interesting places to sit and do whatever you feel like doing. The other thing I found really interesting is there are no “offices”. Instead, the staff are dispersed amongst the students in the twelve-some studios, picking a new desk perhaps whenever they need a change of scenery? If you are ever in the area, it is certainly worth a visit, and I’m sure the people there will be very open to tell you what they are up to.

I also had the pleasure to give a talk on my PhD research in Resynthesizing Audiovisual Perception with Augmented Reality at the Lunch BITES seminar series. Slides are below, though the embedded media is removed. Comments are welcome!

GDE Error: Unable to load profile settings

Creative Community Spaces in INDIA

Jaaga – Creative Common Ground

CEMA – Center for Experimental Media Arts at Srishti School of Art, Design and Technology

Bar1 – non-profit exchange programme by artists for artists to foster the local, Indian and international mutual exchange of ideas and experiences through guest residencies in Bangalore

Sarai – a space for research, practice, and conservation about the contemporary media and urban constellations.
New Dehli

Khoj/International Artists’ Association – artist led, alternative forum for experimentation and international exchange
New Dehli

Periferry – To create a nomadic space for hybrid art practices. It is a laboratory for people cross- disciplinary practices. The project focuses on the creation of a network space for negotiating the challenge of contemporary cultural production. It is located on a ferry barge on river Brahmaputra and is docked in Guwhati, Assam.
Narikolbari, Guwahati

Point of View – non-profit organization that brings the points of view of women into community, social, cultural and public domains through media, art and culture.

Majilis – a center for rights discourse and inter-disciplinary arts initiatives

Camp – not an “artists collective” but a space, in which ideas and energies gather, and become interests and forms.

ChitraKarKhana – A fully independent, small scale unit for experimental media.

Facial Appearance Modeling/Tracking

I’ve been working on developing a method for automatic head-pose tracking, and along the way have come to model facial appearances. I start by initializing a facial bounding box using the Viola-Jones detector, a well known and robust detector used for training objects. This allows me to centralize the face. Once I know where the 2D plane of the face is in an image, I can register an Active Shape Model like so:

After multiple views of the possible appearance variations of my face, including slight rotations, I construct an appearance model.

The idea I am working with is using the first components of variations of this appearance model for determining pose. Here I show the first two basis vectors and the images they reconstruct:

As you may notice, these two basis vectors very neatly encode rotation. By looking at the eigenvalues of the model, you can also interpret pose.

Short Time Fourier Transform using the Accelerate framework

Using the libraries pkmFFT and pkm::Mat, you can very easily perform a highly optimized short time fourier transform (STFT) with direct access to a floating-point based object.

Get the code on my github:
Depends also on:

 *  pkmSTFT.h
 *  STFT implementation making use of Apple's Accelerate Framework (pkmFFT)
 *  Created by Parag K. Mital - 
 *  Contact:
 *  Copyright 2011 Parag K. Mital. All rights reserved.
 *	Permission is hereby granted, free of charge, to any person
 *	obtaining a copy of this software and associated documentation
 *	files (the "Software"), to deal in the Software without
 *	restriction, including without limitation the rights to use,
 *	copy, modify, merge, publish, distribute, sublicense, and/or sell
 *	copies of the Software, and to permit persons to whom the
 *	Software is furnished to do so, subject to the following
 *	conditions:
 *	The above copyright notice and this permission notice shall be
 *	included in all copies or substantial portions of the Software.
 *  Usage:
 *  // be sure to either use malloc or __attribute__ ((aligned (16))
 *  size_t buffer_size = 4096;
 *  float *sample_data = (float *) malloc (sizeof(float) * buffer_size);
 *  pkm::Mat magnitude_matrix, phase_matrix;
 *  pkmSTFT *stft;
 *  stft = new pkmSTFT(512);
 *  stft.STFT(sample_data, buffer_size, magnitude_matrix, phase_matrix);
 *  fft.ISTFT(sample_data, buffer_size, magnitude_matrix, phase_matrix);
 *  delete stft;

#include "pkmFFT.h"
#include "pkmMatrix.h"

class pkmSTFT

	pkmSTFT(size_t size)
		fftSize = size;
		numFFTs = 0;
		fftBins = fftSize/2;
		hopSize = fftSize/4;
		windowSize = fftSize;
		bufferSize = 0;
		initializeFFTParameters(fftSize, windowSize, hopSize);
	void initializeFFTParameters(size_t _fftSize, size_t _windowSize, size_t _hopSize)
		fftSize = _fftSize;
		hopSize = _hopSize;
		windowSize = _windowSize;
		// fft constructor
		FFT = new pkmFFT(fftSize);
	void STFT(float *buf, size_t bufSize, pkm::Mat &M_magnitudes, pkm::Mat &M_phases)
		// pad input buffer
		int padding = ceilf((float)bufSize/(float)fftSize) * fftSize - bufSize;
		float *padBuf;
		if (padding) {
			printf("Padding %d sample buffer with %d samples\n", bufSize, padding);
			padBufferSize = bufSize + padding;
			padBuf = (float *)malloc(sizeof(float)*padBufferSize);
			// set padding to 0
			memset(&(padBuf[bufSize]), 0, sizeof(float)*padding);
			// copy original buffer into padded one
			memcpy(padBuf, buf, sizeof(float)*bufSize);	}
		else {
			padBuf = buf;
			padBufferSize = bufSize;
		// create output fft matrix
		numWindows = (padBufferSize - fftSize)/hopSize + 1;
		if (M_magnitudes.rows != numWindows && M_magnitudes.cols != fftBins) {
			printf("Allocating %d bins x %d windows matrix for STFT\n", fftBins, numWindows);
			M_magnitudes.reset(numWindows, fftBins, true);
			M_phases.reset(numWindows, fftBins, true);
		// stft
		for (size_t i = 0; i < numWindows; i++) {
			// get current col of freq mat
			float *magnitudes = M_magnitudes.row(i);
			float *phases = M_phases.row(i);
			float *buffer = padBuf + i*hopSize;
			FFT->forward(0, buffer, magnitudes, phases);	
		// release padded buffer
		if (padding) {
	void ISTFT(float *buf, size_t bufSize, pkm::Mat &M_magnitudes, pkm::Mat &M_phases)
		int padding = ceilf((float)bufSize/(float)fftSize) * fftSize - bufSize;
		float *padBuf;
		if (padding) 
			printf("Padding %d sample buffer with %d samples\n", bufSize, padding);
			padBufferSize = bufSize + padding;
			padBuf = (float *)calloc(padBufferSize, sizeof(float));
		else {
			padBuf = buf;
			padBufferSize = bufSize;
		pkm::Mat M_istft(padBufferSize, 1, padBuf, false);
		for(size_t i = 0; i < numWindows; i++)
			float *buffer = padBuf + i*hopSize;
			float *magnitudes = M_magnitudes.row(i);
			float *phases = M_phases.row(i);
			FFT->inverse(0, buffer, magnitudes, phases);

		memcpy(buf, padBuf, sizeof(float)*bufSize);
		// release padded buffer
		if (padding) {
	pkmFFT				*FFT;
	size_t				sampleRate,

Real FFT/IFFT with the Accelerate Framework

Apple’s Accelerate Framework can really speed up your code without thinking too much. And it will also run on an iPhone. Even still, I did bang my head a few times trying to get a straightforward Real FFT and IFFT working, even after consulting the Accelerate documentation (reference and source code), stackoverflow (here and here), and an existing implementation (thanks to Chris Kiefer and Mick Grierson). Still, the previously mentioned examples weren’t very clear as they did not handle the case of overlapping FFTs which I was doing in the case of a STFT or they did not recover the power spectrum, or they just didn’t work for me (lots of blaring noise).

Get the code on my github:

 *  pkmFFT.h
 *  Real FFT wraper for Apple's Accelerate Framework
 *  Created by Parag K. Mital - 
 *  Contact:
 *  Copyright 2011 Parag K. Mital. All rights reserved.
 *	Permission is hereby granted, free of charge, to any person
 *	obtaining a copy of this software and associated documentation
 *	files (the "Software"), to deal in the Software without
 *	restriction, including without limitation the rights to use,
 *	copy, modify, merge, publish, distribute, sublicense, and/or sell
 *	copies of the Software, and to permit persons to whom the
 *	Software is furnished to do so, subject to the following
 *	conditions:
 *	The above copyright notice and this permission notice shall be
 *	included in all copies or substantial portions of the Software.
 *  Additional resources: 
 *  This code is a very simple interface for Accelerate's fft/ifft code.
 *  It was built out of hacking Maximilian (Mick Grierson and Chris Kiefer) and
 *  the above mentioned resources for performing a windowed FFT which could
 *  be used underneath of an STFT implementation
 *  Usage:
 *  // be sure to either use malloc or __attribute__ ((aligned (16))
 *  float *sample_data = (float *) malloc (sizeof(float) * 4096);
 *  float *allocated_magnitude_buffer =  (float *) malloc (sizeof(float) * 2048);
 *  float *allocated_phase_buffer =  (float *) malloc (sizeof(float) * 2048);
 *  pkmFFT *fft;
 *  fft = new pkmFFT(4096);
 *  fft.forward(0, sample_data, allocated_magnitude_buffer, allocated_phase_buffer);
 *  fft.inverse(0, sample_data, allocated_magnitude_buffer, allocated_phase_buffer);
 *  delete fft;

#include <Accelerate/Accelerate.h>

class pkmFFT

	pkmFFT(int size = 4096, int window_size = 4096)
		fftSize = size;					// sample size
		fftSizeOver2 = fftSize/2;		
		log2n = log2f(fftSize);			// bins
		log2nOver2 = log2n/2;
		in_real = (float *) malloc(fftSize * sizeof(float));
		out_real = (float *) malloc(fftSize * sizeof(float));		
		split_data.realp = (float *) malloc(fftSizeOver2 * sizeof(float));
		split_data.imagp = (float *) malloc(fftSizeOver2 * sizeof(float));
		windowSize = window_size;
		window = (float *) malloc(sizeof(float) * windowSize);
		memset(window, 0, sizeof(float) * windowSize);
		vDSP_hann_window(window, window_size, vDSP_HANN_DENORM);
		scale = 1.0f/(float)(4.0f*fftSize);
		// allocate the fft object once
		fftSetup = vDSP_create_fftsetup(log2n, FFT_RADIX2);
		if (fftSetup == NULL) {
			printf("\nFFT_Setup failed to allocate enough memory.\n");
	void forward(int start, 
				 float *buffer, 
				 float *magnitude, 
				 float *phase)
		//multiply by window
		vDSP_vmul(buffer, 1, window, 1, in_real, 1, fftSize);
		//convert to split complex format with evens in real and odds in imag
		vDSP_ctoz((COMPLEX *) in_real, 2, &split_data, 1, fftSizeOver2);
		//calc fft
		vDSP_fft_zrip(fftSetup, &split_data, 1, log2n, FFT_FORWARD);
		split_data.imagp[0] = 0.0;
		for (i = 0; i < fftSizeOver2; i++) 
			//compute power 
			float power = split_data.realp[i]*split_data.realp[i] + 
			//compute magnitude and phase
			magnitude[i] = sqrtf(power);
			phase[i] = atan2f(split_data.imagp[i], split_data.realp[i]);
	void inverse(int start, 
				 float *buffer,
				 float *magnitude,
				 float *phase, 
				 bool dowindow = true)
		float *real_p = split_data.realp, *imag_p = split_data.imagp;
		for (i = 0; i < fftSizeOver2; i++) {
			*real_p++ = magnitude[i] * cosf(phase[i]);
			*imag_p++ = magnitude[i] * sinf(phase[i]);
		vDSP_fft_zrip(fftSetup, &split_data, 1, log2n, FFT_INVERSE);
		vDSP_ztoc(&split_data, 1, (COMPLEX*) out_real, 2, fftSizeOver2);
		vDSP_vsmul(out_real, 1, &scale, out_real, 1, fftSize);
		// multiply by window w/ overlap-add
		if (dowindow) {
			float *p = buffer + start;
			for (i = 0; i < fftSize; i++) {
				*p++ += out_real[i] * window[i];
	size_t				fftSize, 
	float				*in_real, 
	float				scale;
    FFTSetup			fftSetup;
    COMPLEX_SPLIT		split_data;

Augmented Sonic Reality

I recently gave two talks, one for the PhDs based in the Electronic Music Studios, and another for the PhDs in Arts and Computational Technology. I received some very valuable feedback, and having to incorporate what I’ve been working on in a somewhat presentable manner also had a lot of benefit. The talk abstract (which is very abstract) is posted below with a few references listed. Please feel free to comment and open a discussion, or post any references that may be of interest.

An augmented sonic reality aims to register digital sound content with an existing physical space. Perceptual mappings between an agent in such an environment and the augmented content should be both continuous and effective, meaning the intentions of an agent should be taken into consideration in any affective augmentations. How can an embedded intelligence such as an iPhone equipped with detailed sensor information such as microphone, accelerometer, gyrometer, and GPS readings infer the behaviors of its user in creating affective, realistic, and perceivable augmented sonic realities tied to their situated experiences? Further, what can this augmented domain reveal about our own ongoing sensory experience of our sonic environment?

Keywords: augmented, reality, sonic, enactive, perception, memory, behavior, sensors, gesture, embodied, situated, acoustic, ecology, liminality

Augoyard and Torgue, “Sonic Experience”, McGill-Queen’s University Press, 2005.
Arfib, D. “Organised Sound”, 2002.
E. Corteel, “Synthesis of directional sources using Wave Field Synthesis, possibilities and limitations.” EURASIP Journal on Advances in Signal Processing, special issue on Spatial Sound and Virtual Acoustics, January, 2007
Lemaitre, G., Houix, O., Visell, Y., Franinovic, K., Misdariis, N., Susini, P. “Toward the Design and Evaluation of Continuous Sound in Tangible Interfaces: The Spinotron”, International Journal of Human Computer Studies, no 67, 2009
K. Nguyen, C. Suied, I. Viaud-Delmon, O. Warusfel, “Spatial audition in a static virtual environment : the role of auditory-visual interaction.” Journal of Virtual Reality and Broadcasting, 2009
M. Noisternig, B. Katz, S. Siltanen, L. Savioja, “Framework for Real-Time Auralization in Architectural Acoustics.” Acta acustica united with Acustica, vol. 94, no 6, November, 2008
R. Murray Schafer, “The Soundscape”, Destiny Books, 1977.
J. Tardieu, P. Susini, F. Poisson, P. Lazareff, S. McAdams, “Perceptual study of soundscapes in train stations.” Applied Acoustics, vol. 69, no 12, December, 2008
Strategies of mapping between gesture data and synthesis model parameters using perceptual spaces. D. Arfib, J. M. Couturier, L. Kessous, V. Verfaille. Organised Sound, International Journal of Music Technology, Volume 7, Issue 2 , pages 127-144, 2002.
D. Arfib, J-M. Couturier, L. Kessous , “Expressiveness and digital musical instrument design”, in Journal of New Music Research,Vol. 34, No. 1, pages 125 – 136, 2005.
N. d’Alessandro, O. Babacan, B. Bozkurt, T. Dubuisson, A. Holzapfel, L. Kessous, A. Moinet, M. V. Lieghe, “RAMCESS 2.X framework – expressive voice analysis for realtime and accurate synthesis of singing”, Journal On Multimodal User Interfaces, Springer Berlin/Heidelberg, Vol. 2, Nr. 2, September, pages 133-144, 2008.
L. Kessous, G. Castellano, G. Caridakis, Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis, Journal on Multimodal User Interfaces, Vol. 3, Issue 1, Springer Berlin/Heidelberg, December 12, pages 33-48, 2009.
G.Caridakis, K. Karpouzis, M. Wallace, L. Kessous, N.Amir, Multimodal user’s affective state analysis in naturalistic interaction, Journal on Multimodal User Interfaces, Vol. 3, Issue 1, Springer Berlin/Heidelberg, December 15, pages 49-66, 2009.
Anton Batliner, Stefan Steidl, Bjoern Schuller, Dino Seppi, Turid Vogt, Johannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Loic Kessous and Vered Aharonson. Whodunnit – Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech. Computer Speech and Language, volume 25, No. 1, pages 4–28, 2011.

Tim J Smith guest blogs for David Bordwell

Tim J Smith, expert in scene perception and film cognition, and of The DIEM project [1] recently starred as a guest blogger for David Bordwell, a leading film theorist with an impressive list of books and publications widely used in film cognition/film art research/studies [2]. In his article featured on David’s site, Tim expands on his research on film cognition including continuity editing [3], attentional synchrony [4], and the project we worked on in 2008-2010 as part of The DIEM Project. Since Tim’s feature on David Bordwell’s blog, The DIEM Project saw a surge of publicity and our vimeo video loads going higher than 200,000 in a single day and features on dvice, slashfilm, gizmodo, Rogert Ebert’s facebook/twitter, and the front page of

Not to mention, our tools and visualizations are finally reaching an audience with interests in film, photography, and cognition. If you haven’t yet seen some of our videos, please head on over to our vimeo page, where you can see a range of videos embedded with eye-tracking of participants and many different visualizations of models of eye-movements using machine learning, or start by reading Tim’s post on You can also visit our website and create your own visualizations with our completely open-source tool, CARPE, and our completely royalty free database, the DIEM database. I’ve linked a few of my favorite videos below which were all made with CARPE with the last one showing a unique visualization of a movie as it’s motion:

Montage of 4 Visualizations of Eye-movements during Charlie Bit My Finger from TheDIEMProject on Vimeo.

tv the simpsons 860×528 web from TheDIEMProject on Vimeo.

Eye movments during the Video Republic from TheDIEMProject on Vimeo.

Eye Movements during a Movie Trailer of Ice Age 3 from TheDIEMProject on Vimeo.

Eye-movements during 50 People, 1 Question (Brooklyn) from TheDIEMProject on Vimeo.

[1]. The DIEM Project was funded by the Leverhulme Trust (Grant Ref F/00-158/BZ) and the ESRC (RES 062-23-1092) and awarded to John M Henderson.
[4]. Mital, P.K., Smith, T. J., Hill, R. and Henderson, J. M., “Clustering of gaze during dynamic scene viewing is predicted by motion,” Cognitive Computation

Responsive Ecologies Documentation

As part of a system of numerous dynamic connections and networks, we are reactive and deterministic to a complex system of cause and effect. The consequence of our actions upon our selves, the society we live in and the broader natural world is conditioned by how we perceive our involvement. The awareness of how we have impacted on a situation is often realised and processed subconsciously, the extent and scope of these actions can be far beyond our knowledge, our consideration, and importantly beyond our sensory reception. With this in mind, how can we associate our actions, many of which may be overlooked as customary, with for instance, the honey bee depopulation syndrome or the declining numbers of Siberian Tigers.

Responsive Ecologies is part of an ongoing collaboration with ZSL London Zoo and Musion Academy. Collectively we have been exploring innovative means of public engagement, to generate an awareness and understanding of nature and the effects of climate change. All of the contained footage has come from filming sessions within the Zoological Society; this coincidentally has raised some interesting questions on the spectacle of captivity, a issue which we have tried to reflect upon in the construction and presentation of this installation. The nature of interaction within Responsive Ecologies means that a visitor to the space can not simply view the installation but must become a part of its environment. When attempting to perceive the content within the space the visitor reshapes the installation. Everybody has a degree of impact whether directed or incidental, and when interacting as a group it is interesting to see how collective behaviour can develop and incite the outcome of the work.

captincaptin and Parag K Mital exhibited Responsive Ecologies at the Watermans between 6th December 2010 and the 21st January 2011. The installation was in the form of a 360 degrees multi-screened projection or CAVE (Cave Automatic Virtual Environment). Visitors to the exhibition would enter the CAVE through a passageway leading from the gallery entrance. All four sides of the CAVE were back projected with each side connecting to form a large continuous projection. The presence of people within the space would be tracked and used to deconstruct and interlace the video in response to their movement. The video documentation below was taken from the installation (throughout this video the camera is panning around the space in order to record all sides of the CAVE).

Responsive Ecologies from pkmital on Vimeo.

Source code is also available with an example part of the actual installation (10 second clip as the whole video is far too long) – though you may also need 4 monitors/projectors or scale down the size of the screen in the code (SCREEN_WIDTH and SCREEN_HEIGHT variables in testApp.h):

More information:

Streaming Motion Capture Data from the Kinect using OSC on Mac OSX

This guide will help to get you running PrimeSense NITE’s Skeleton Tracking inside XCode on your OSX.  It will also help you stream that data in case you’d like to use it in another environment such as Max.  An example Max patch is also available.

PrimeSense NITE Skeletonization and Motion Capture to Max/MSP via OSC from pkmital on Vimeo.


0.) 1 Microsoft Kinect or other PrimeSense device.

1.) Install XCode and Java Developer Package located here: – if you require a Mac OSX Developer account, just register at since it is free.

2.) Install Macports:

3.) Install libtool and libusb > 1.0.8:

$ sudo port install libusb-devel +universal

4.) Get the OpenNI Binaries for Mac OSX:

5.) Install OpenNI by unzipping the file OpenNI-Bin-MacOSX (-v1.0.0.25 at the time of writing) and running,

$ sudo ./

6.) Get SensorKinect from avin2:

7.) Install SensorKinect by unzipping and running

$ sudo ./

8.) Install OpenNI Compliant Middleware NITE from Primesense for Mac OSX:

9.) Install NITE by unzipping and running

$ sudo ./

When prompted for a key, enter the key listed on the openni website.

Getting it up and running:

1.) Download the OSC example from here: – you can still download the project without having git, just look for the “Downloads” link to the right of the screen.

2.) After downloading (and extracting if you downloaded the zip file), navigate to ./StickFigure and open the XCode Project file.

3.) Compile and run, and hopefully there are no problems…

1.) Try and visualize the data in Max/MSP using the Max patch bundled inside the git repository mentioned in step 10:

With this, you can also do multiple person skeletonization and motion tracking, though if you are using the OSC information, you might want to include an OSC Tag for which person’s joint are being sent.

Copyright © 2010 Parag K Mital. All rights reserved. Made with Wordpress. RSS