Latest Entries

Memory Mosaic iOS – Generative Audio Mashup App

I had a chance to incorporate some udpates into Memory Mosaic, an iOS app I started developing during my PhD in Audiovisual Scene Synthesis. The app organizes sound in real-time and clusters them based on similarity. Using the microphone on the device, or an iTunes song, any onset triggers a new audio segment to be created and stored in a database. The example video below shows how this works for Do Make Say Think’s song, Minim and The Landlord is Dead:

Memory Mosaic iOS from Parag K Mital on Vimeo.

Here’s an example of using it with live instruments

Memory Mosaic – Technical Demo (iOS App) from Parag K Mital on Vimeo.

The app also works with AudioBus, meaning you can use it with other apps too, adding effects chains, or sampling from another app’s output.  Available on the iOS App Store:… Continue reading...

C.A.R.P.E. version 0.1.1 release

Screen Shot 2015-02-09 at 2.59.14 PM

I’ve updated C.A.R.P.E., a graphical tool for visualizing eye-movements and processing audio/video, to include a graphical timeline (thanks to ofxTimeline by James George/YCAM), support for audio playback/scrubbing (using pkmAudioWaveform), audio saving, and various bug fixes. This release has changed some parameters of the XML file and added others. Please refer to this example XML file for how to setup your own data:

<?xml version="1.0" encoding="UTF-8" ?>
        <!-- Some debugging output will be put in the file here if this is set. -->
        <!-- Automatically start processing the first stimuli [true] -->
        <!-- Automatically exit CARPE once all stimulis have finished processing [true] -->
        <!-- Should skip frames to produce a real-time sense (not good to outputting files) or not (better for outputting files) [false] -->
        <!-- Automatically determine clusters of eye-movements [false] -->
        <!-- Create Alpha-map based on eye-movement density map [false] -->
        <!-- Show contour lines around eye-movement density map [false] -->
        <!-- Show circles for every fixation [true] -->
        <!-- Show lines between fixations [true] -->
        <!-- Show a JET colormapped heatmap of the eye-movements [true] -->
        <!-- options are 0 = "jet" (rgb map 
Continue reading...

Toolkit for Visualizing Eye-Movements and Processing Audio/Video

Screen Shot 2015-02-06 at 6.24.27 PM

Original video still without eye-movements and heatmap overlay copyright Dropping Knowledge Video Republic.

From 2008 – 2010, I worked on the Dynamic Images and Eye-Movements (D.I.E.M.) project, led by John Henderson, with Tim Smith and Robin Hill. We worked together to collect nearly 200 participants eye-movements on nearly 100 short films from 30 seconds to 5 minutes in length. The database is freely available and covers a wide range of film styles form advertisements, to movie and music trailers, to news clips. During my time on the project, I developed an open source toolkit, C.A.R.P.E. to complement D.I.E.M., or Computational Algorithmic Representation and Processing of Eye-movements (Tim’s idea!), for visualizing and processing the data we collected, and used it for writing up a journal paper describing a strong correlation between tightly clustered eye-movements and the motion in a scene. We also output visualizations of our entire corpus on our Vimeo channel. The project came to a halt and so did the visualization software. I’ve since picked up the ball and re-written it entirely from the ground up.

The image below shows how you can represent the movie, the motion in the scene of the movie (represented in … Continue reading...

Handwriting Recognition with LSTMs and ofxCaffe

Long Short Term Memory (LSTM) is a Recurrent Neural Network (RNN) architecture designed to better model temporal sequences (e.g. audio, sentences, video) and long range dependencies than conventional RNNs [1]. There is a lot of excitement in the machine learning communities with LSTMs (and Deep Minds’s counterpart, “Neural Turing Machines” [2], or Facebook’s, “Memory Networks” [3]) as they overcome a fundamental limitation to conventional RNNs and are able to achieve state-of-the-art benchmark performances on a number of tasks [4,5]:

  • Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
  • Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
  • Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
  • Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
  • Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
  • English to French translation (Sutskever et al., Google, NIPS 2014)
  • Audio onset detection (Marchi et al., ICASSP 2014)
  • Social signal classification (Brueckner & Schulter, ICASSP 2014)
  • Arabic handwriting recognition (Bluche et al., DAS 2014)
  • TIMIT phoneme recognition (Graves et al., ICASSP 2013)
  • Optical character recognition (Breuel et al., ICDAR 2013)
  • Image caption generation (Vinyals et al., Google, 2014)
  • Video to textual description (Donahue et al., 2014)

The current dynamic state … Continue reading...

Real-Time Object Recognition with ofxCaffe

Screen Shot 2015-01-03 at 12.57.23 PM

I’ve spent a little time with Caffe over the holiday break to try and understand how it might work in the context of real-time visualization/object recognition in more natural scenes/videos. Right now, I’ve implemented the following Deep Convolution Networks using the 1280×720 resolution webcamera on my 2014 Macbook Pro:

The above image depicts the output from an 8×8 grid detection showing brighter regions as higher probabilities of the class “snorkel” (automatically selected by the network from 1000 possible classes as the highest probability).

So far I have spent some time understanding how Caffe keeps each layer’s data during a forward/backward pass, and how the deeper layers could be “visualized” in a … Continue reading...

Extracting Automatically Labeled Volumetric ROIs from MRI

Performing a region of interest analysis on MRI requires knowing where the regions are in your subject data. Typically, this has been done using hand-drawn masks in a 3d viewer. However, recent research has made the process mostly automatic and the open-source community has implemented everything you will need to automatically create labeled volumetric regions of interest [1-3]. With FreeSurfer 5.3, we have the option of performing cortical parcellation using 4 different atlases:

Destrieux atlas: aparc.a2009s
Desikan-Killiany atlas: aparc
Mindboggle: aparc.DKTatlas40
Brodman areas: BA and BA.thresh

We’ll first use freesurfer’s recon-all tool to perform a cortical reconstruction of our anatomical scans. Download freesurfer and register your copy. You’ll be sent an e-mail with a license. Follow the instructions and create the license file “.license” inside your freesurfer home directory (check the environment variable, FREESURFER_HOME, e.g., “$ echo $FREESURFER_HOME"). Then run the script, “$FREESURFER_HOME/” to setup necessary paths.

Next make sure you have set the environment variable for SUBJECTS_DIR to where you’d like your analysis to go (e.g., “$ export SUBJECTS_DIR=/some/directory“). For our example, we’ll keep this to a directory called “freesurfer” in our home directory, “~/”. Each subject we analyze will have its own folder insider … Continue reading...

YouTube’s “Copyright School” Smash Up

Ever wonder what happens when you’ve been accused of violating copyright multiple times on YouTube? First, you get a redirect to YouTube’s “Copyright School” whenever you visit YouTube, forcing you to watch a cartoon of Happy Tree Friends where the main character is dressed as an actual pirate:

Second, I’m guessing, your account will be banned. Third, you cry and wonder why you ever violated copyright in the first place.

In my case, I’ve disputed every one of the 4 copyright violation notices that I’ve received under grounds of Fair Use and Fair Dealing. Here’s what happens when you file a dispute using YouTube’s online form (click for high-res):

3 of the 4 have been dropped after I’ve filed disputes, though I’m still waiting to hear about the response to the above dispute. Read the dispute letter to Sony ATV and UPMG Publishers in full here.

The picture above shows a few stills from what my Smash Ups look like. The process described in greater detail on is part of my ongoing research into how existing content can be transformed into artistic styles reminiscent of analytic cubist, figurative, and futurist paintings. The process to create the videos … Continue reading...

An open letter to Sony ATV and UMPG

Dear Sony ATV Publishing, UMPG Publishing, and other concerned parties,

I ask you to please withdraw your copyright violation notice on my video, “PSY – GANGNAM STYLE (?????) M/V (YouTube SmashUp)” as I believe my use of any copyrighted material is protected under Fair Use or Fair Dealing. This video was created by an automated process as part of an art project developed during my PhD at Goldsmiths, University of London: and

The process which creates the audio and video is entirely automated meaning the accused video is created by an algorithm. This algorithm begins by first creating a large database of tiny fragments of audio and video (less than 1 second of audio per fragment) using 9 videos from YouTube’s top 10 list. From this database, the tiny fragments of video and audio are stored as unrelated pieces of information and described only by a short series of 10-15 numbers. These numbers represent low-level features describing the texture and shape of the fragment of audio or video. These tiny fragments are then matched to the tiny fragments of audio and video detected within the target for resynthesis, in this case the number one YouTube video … Continue reading...

Copyright Violation Notice from “Rightster”

I’ve been working on an art project which takes the top 10 videos in YouTube and tries to resynthesize the #1 video in YouTube using the remaining 9 videos. The computational model is based on low-level human perception and uses only very abstract features such as edges, textures, and loudness. I’ve created a new synthesis each week using the top 10 of the week in the hopes that, one day, I will be able to resynthesize my own video in the top 10. It is a viral algorithm essentially but it is not proven if it will succeed or not.

The database of content used in the recreation of the above video comes from the following videos:
#2 News Anchor FAIL Compilation 2012 || PC
#3 Flo Rida – Whistle [Official Video]
#4 Carly Rae Jepsen – Call Me Maybe
#5 Jennifer Lopez – Goin’ In ft. Flo Rida
#6 Taylor Swift – We Are Never Ever Getting Back Together
#7 – This Is Love ft. Eva Simons
#8 Call Me Maybe – Carly Rae Jepsen (Chatroulette Version)
#9 Justin Bieber – As Long As You Love Me ft. Big Sean
#10 Rihanna – Where Have You Been

It … Continue reading...

3D Musical Browser

I’ve been interested in exploring ways of navigating media archives. Typically, you may use iTunes and go from artist to artist, or have managed to tediously classify your collection into genres. Some may still even browse their music through a file browser, perhaps making sure the folders and filenames of their collection are descriptive of the artist, album, year, etc… Though what about how the content actually sounds?

Wouldn’t it be nice to hear all music which shares similar sounds, or similar phrases of sounds? Research in the last 10-15 years have developed methods precisely to solve this problem and fall under the umbrella term content-based information retrieval (CBIR) algorithms, or uncovering the relationships of an archive through the information within the content. For images, Google’s Search by Image is a great example which only recently became public. For audio, audioDB and ShaZam are good examples of discovering music through the way it sounds, or the content-based relationships of the audio itself. Though, each of these interfaces present a list of matches to a image or audio query, making exploring the content-based relationships of a specific set of material difficult.

The video above demonstrates interaction with a novel 3D browser … Continue reading...

Intention in Copyright

The following article is written for the LUCID Studio for Speculative Art based in India.


My work in audiovisual resynthesis aims to create models of how humans represent and attend to audiovisual scenes. Using pattern recognition of both audio and visual material, these models use large corpora of learned audiovisual material which can be matched to ongoing streams of incoming audio or visual material. The way audio and visual material is stored and segmented within the model is based heavily on neurobiology and behavioral evidence (the details are saved for another post). I have called the underlying model Audiovisual Content-based Information Description/Distortion (or ACID for short).

As an example, a live stream of audio may be matched to a database of learned sounds from recordings of nature, creating a re-synthesis of the audio environment at present using only pre-recorded material from nature itself. These learned sounds may be fragments of a bird chirping, or the sound of footsteps. Incoming sounds of someone talking may then be synthesized using the closest sounding material to that person talking, perhaps a bird chirp or a footstep. Instead of a live stream, one can also re-synthesize a pre-recorded stream. Consider using a database … Continue reading...

Course @ CEMA Srishti School of Design, Bangalore, IN

From November 21st to the 2nd of December, I’ll have the pleasure to lead a course and workshop with Prayas Abhinav at the Center for Experimental Media Arts in the Srishti School of Design in Banaglore, IN.  Many thanks to Meena Vari for all her help in organizing the project.

Stories are flowing trees

Key words:  3D, interactive projects, data, histories, urban, creative coding, technology, sculpture, projection mapping

Project Brief:

Urban realities are more like fictions, constructed through folklore, media and policy. Compressing these constructions across time would offer some possibilities for the emergence of complexity and new discourse. Using video projections adapted for 3D surfaces, urban histories will become data and information – supple, malleable, and material.

The project will begin with a one week workshop by Parag Mital on “Creative Coding” using the openFrameworks platform for C/C++ coding”.

About the Artists:

Prayas Abhinav

Presently he teaches at the Srishti School of Art, Design and Technology and is a researcher at the Center for Experimental Media Arts (CEMA). He has taught in the past at Dutch Art Institute (DAI) and Center for Environmental Planning and Technology (CEPT).
He has been supported by fellowships by Openspace India (2009), TED (2009), … Continue reading...

Memory Mosaicing

A product of my PhD research is now available on the iPhone App Store (for a small cost!): View in App Store.

This application is motivated by my interests in experiencing an Augmented Perception and of course very much inspired by some of the work here at Goldsmiths. The application of existing approaches in soundspotting/mosaicing to a real-time stream and situated in the real-world allows one to play with their own sonic memories, and certainly requires an open ear for new experiences. Succinctly, the app records segments of sounds in real-time using it’s own listening model, as you walk around in different environment (or sit at your desk). These segments are constantly built up the longer the app is left running to form a database (working memory model) for which to understand new sounds. Incoming sounds are then matched to this database and the closest matching sound is played instead. What you get is a polyphony of sound memories triggered by the incoming feed of audio, and an app which sounds more like your environment the longer it is left to run. A sort of gimmicky feature of this app is the ability to learn a song from your … Continue reading...

Concatenative Video Synthesis (or Video Mosaicing)


Working closely with my adviser Mick Grierson, I have developed a way to resynthesize existing videos using material from another set of videos. This process starts by learning a database of objects that appear in the set of videos to synthesize from. The target video to resynthesize is then broken into objects in a similar manner, but also matched to objects in the database. What you get is a resynthesis of the video that appears as beautiful disorder. Here are two examples, the first using Family Guy to resynthesize The Simpsons. And the second using Jan Svankmajer’s Food to resynthesize Jan Svankmajer’s Dimensions of Dialogue.

Continue reading...

Google Earth + Atlantis Space Shuttle

I managed to catch the live feed from of the Atlantis Space Shuttle launch yesterday. Though what I found really interesting was a real-time virtual reality of the space shuttle launch from inside Google Earth. Screen-capture with obligatory 12x speedup to retain attention span below:

Continue reading...

Lunch Bites @ CULTURE Lab, Newcastle University

I was recently invited to the CULTURE lab at Newcastle University by director, Atau Tanaka. I would say it has the resources and creative power of 5 departments all housed in one spacious building. In the 12-some studios housed over 3 floors, over the course of 2 short days, I found people building multitouch tables, controlling synthesizers with the touch of fabric, and researching augmented spatial sonic realities. There is a full suite of workshop tools including a laser cutter, multiple multi-channel sound studios, full stage/theater with stage lighting and multiple projection, radio lab, and tons of light and interesting places to sit and do whatever you feel like doing. The other thing I found really interesting is there are no “offices”. Instead, the staff are dispersed amongst the students in the twelve-some studios, picking a new desk perhaps whenever they need a change of scenery? If you are ever in the area, it is certainly worth a visit, and I’m sure the people there will be very open to tell you what they are up to.

I also had the pleasure to give a talk on my PhD research in Resynthesizing Audiovisual Perception with Augmented Reality at the Lunch Continue reading...

Creative Community Spaces in INDIA

Jaaga – Creative Common Ground

CEMA – Center for Experimental Media Arts at Srishti School of Art, Design and Technology

Bar1 – non-profit exchange programme by artists for artists to foster the local, Indian and international mutual exchange of ideas and experiences through guest residencies in Bangalore

Sarai – a space for research, practice, and conservation about the contemporary media and urban constellations.
New Dehli

Khoj/International Artists’ Association – artist led, alternative forum for experimentation and international exchange
New Dehli

Periferry – To create a nomadic space for hybrid art practices. It is a laboratory for people cross- disciplinary practices. The project focuses on the creation of a network space for negotiating the challenge of contemporary cultural production. It is located on a ferry barge on river Brahmaputra and is docked in Guwhati, Assam.
Narikolbari, Guwahati

Point of View – non-profit organization that brings the points of view of women into community, social, cultural and public domains through media, art and culture.

Majilis – a center for rights discourse and inter-disciplinary arts initiatives

Camp – not an “artists collective” but a space, in which ideas and energies … Continue reading...

Facial Appearance Modeling/Tracking

I’ve been working on developing a method for automatic head-pose tracking, and along the way have come to model facial appearances. I start by initializing a facial bounding box using the Viola-Jones detector, a well known and robust detector used for training objects. This allows me to centralize the face. Once I know where the 2D plane of the face is in an image, I can register an Active Shape Model like so:

After multiple views of the possible appearance variations of my face, including slight rotations, I construct an appearance model.

The idea I am working with is using the first components of variations of this appearance model for determining pose. Here I show the first two basis vectors and the images they reconstruct:

As you may notice, these two basis vectors very neatly encode rotation. By looking at the eigenvalues of the model, you can also interpret pose.… Continue reading...

Short Time Fourier Transform using the Accelerate framework

Using the libraries pkmFFT and pkm::Mat, you can very easily perform a highly optimized short time fourier transform (STFT) with direct access to a floating-point based object.

Get the code on my github:
Depends also on:

 *  pkmSTFT.h
 *  STFT implementation making use of Apple's Accelerate Framework (pkmFFT)
 *  Created by Parag K. Mital - 
 *  Contact:
 *  Copyright 2011 Parag K. Mital. All rights reserved.
 *	Permission is hereby granted, free of charge, to any person
 *	obtaining a copy of this software and associated documentation
 *	files (the "Software"), to deal in the Software without
 *	restriction, including without limitation the rights to use,
 *	copy, modify, merge, publish, distribute, sublicense, and/or sell
 *	copies of the Software, and to permit persons to whom the
 *	Software is furnished to do so, subject to the following
 *	conditions:
 *	The above copyright notice and this permission notice shall be
 *	included in all copies or substantial portions of the Software.
Continue reading...

Real FFT/IFFT with the Accelerate Framework

Apple’s Accelerate Framework can really speed up your code without thinking too much. And it will also run on an iPhone. Even still, I did bang my head a few times trying to get a straightforward Real FFT and IFFT working, even after consulting the Accelerate documentation (reference and source code), stackoverflow (here and here), and an existing implementation (thanks to Chris Kiefer and Mick Grierson). Still, the previously mentioned examples weren’t very clear as they did not handle the case of overlapping FFTs which I was doing in the case of a STFT or they did not recover the power spectrum, or they just didn’t work for me (lots of blaring noise).

Get the code on my github:

 *  pkmFFT.h
 *  Real FFT wraper for Apple's Accelerate Framework
 *  Created by Parag K. Mital - 
 *  Contact:
 *  Copyright 2011 Parag K. Mital. All rights reserved.
 *	Permission is hereby granted, free of charge, to any person
 *	obtaining a copy of this software and associated documentation
 *	files (the "Software"), to deal in the Software without
 *	restriction, including without limitation the rights to use,
Continue reading...

Copyright © 2010 Parag K Mital. All rights reserved. Made with Wordpress. RSS