Archived entries for perception

YouTube’s “Copyright School”

Ever wonder what happens when you’ve been accused of violating copyright multiple times on YouTube? First, you get a redirect to YouTube’s “Copyright School” whenever you visit YouTube, forcing you to watch a cartoon of Happy Tree Friends where the main character is dressed as an actual pirate:

Second, I’m guessing, your account will be banned. Third, you cry and wonder why you ever violated copyright in the first place.

In my case, I’ve disputed every one of the 4 copyright violation notices that I’ve received under grounds of Fair Use and Fair Dealing. Here’s what happens when you file a dispute using YouTube’s online form (click for high-res):






3 of the 4 have been dropped after I’ve filed disputes, though I’m still waiting to hear about the response to the above dispute. Read the dispute letter to Sony ATV and UPMG Publishers in full here.

The picture above shows a few stills from what my Smash Ups look like. The process described in greater detail on createdigitalmotion.com is part of my ongoing research into how existing content can be transformed into artistic styles reminiscent of analytic cubist, figurative, and futurist paintings. The process to create the videos uses content-based information retrieval techniques that I would assume are very similar (though likely not as advanced) as the techniques used to flag the video as a duplicate copy in the first place, YouTube’s Content ID System. Until Sony and UPMG respond, the infringing video is still available on YouTube:

Regardless of my disputes, I’m now redirected to YouTube’s Copyright School whenever I visit YouTube (until I successfully complete the test):

A bit about Happy Tree Friends – it is, according to Wikipedia, “extremely violent, with almost every episode featuring blood, pain, and gruesome deaths…depicting bloodshed and dismemberment in a vivid manner.” Nevermind. I’m a copyright violater, I can handle a little dismemberment. In fact, that is exactly what I’ve done to the “Copyright School” video, dismember it with the content of 70 videos of Happy Tree Friends using the same process which brought me to YouTube’s “Copyright School” in the first place:

I hope Russell the Pirate doesn’t feel his copyright is being violated.

Related: Copyright Violation Notice from “Rightster”, Intention in Copyright, EFF Wins Renewal of Smartphone Jailbreaking Rights Plus New Legal Protections for Video Remixing, An open letter to Sony ATV and UMPG

[...]

An open letter to Sony ATV and UMPG

Dear Sony ATV Publishing, UMPG Publishing, and other concerned parties,

I ask you to please withdraw your copyright violation notice on my video, “PSY – GANGNAM STYLE (?????) M/V (YouTube SmashUp)” as I believe my use of any copyrighted material is protected under Fair Use or Fair Dealing. This video was created by an automated process as part of an art project developed during my PhD at Goldsmiths, University of London: http://pkmital.com/home/projects/visual-smash-up/ and http://pkmital.com/home/projects/youtube-smash-up/

The process which creates the audio and video is entirely automated meaning the accused video is created by an algorithm. This algorithm begins by first creating a large database of tiny fragments of audio and video (less than 1 second of audio per fragment) using 9 videos from YouTube’s top 10 list. From this database, the tiny fragments of video and audio are stored as unrelated pieces of information and described only by a short series of 10-15 numbers. These numbers represent low-level features describing the texture and shape of the fragment of audio or video. These tiny fragments are then matched to the tiny fragments of audio and video detected within the target for resynthesis, in this case the number one YouTube video at the time, “PSY – GANGNAM STYLE (?????) M/V”.

To reiterate, the content from the target video, “PSY – GANGNAM STYLE (?????) M/V”, is not used in the resulting synthesis. That is, the process is creating a new video by not merely copying the target video, but attempting to re-create it out of entirely different material, the remaining 9 top 10 YouTube videos. Abstractly, there may appear to be a similar form or structure due to the collection of many fragments organized in a similar way as the target for resynthesis. These fragments however are from a very large collection of very different material to the original content’s own material. The content used in the resynthesis itself is only from the large database of tiny fragments of audio and video segmented from 9 other videos. As a result, I would argue the use of any content within this video is only through Fair Use or Fair Dealing of the content.

This art project’s purpose is towards highlighting an important aspect of how computers and humans perceive and how copyright itself may be dealt with within a computational arts practice which by its nature has to make use of existing content. The nature of this work further seeks to transform existing material into something entirely different such that experiencing a resynthesized video reveals a new understanding of one’s own perception. The amount of the content used is fragmented in nature and assembled using a coarse idea of audiovisual scene understanding with no notion of semantics. As a result, the video itself is very abstract and at times incomprehensible. Further, its effect on the publisher’s marker as noted by the very low view rate on YouTube is marginal at best. I therefore ask you to please withdraw your copyright claim.

Sincerely,
Parag K. Mital

Related: Copyright Violation Notice from “Rightster”, Intention in Copyright, EFF Wins Renewal of Smartphone Jailbreaking Rights Plus New Legal Protections for Video Remixing, YouTube’s Copyright School

[UPDATE Dec 8, 2012: All copyright violation notices have been dropped and the video is publicly accessible.]

[...]

3D Musical Browser

I’ve been interested in exploring ways of navigating media archives. Typically, you may use iTunes and go from artist to artist, or have managed to tediously classify your collection into genres. Some may still even browse their music through a file browser, perhaps making sure the folders and filenames of their collection are descriptive of the artist, album, year, etc… Though what about how the content actually sounds?

Wouldn’t it be nice to hear all music which shares similar sounds, or similar phrases of sounds? Research in the last 10-15 years have developed methods precisely to solve this problem and fall under the umbrella term content-based information retrieval (CBIR) algorithms, or uncovering the relationships of an archive through the information within the content. For images, Google’s Search by Image is a great example which only recently became public. For audio, audioDB and ShaZam are good examples of discovering music through the way it sounds, or the content-based relationships of the audio itself. Though, each of these interfaces present a list of matches to a image or audio query, making exploring the content-based relationships of a specific set of material difficult.

The video above demonstrates interaction with a novel 3D browser of a collection of music by one artist, Daphne Oram. The sounds are grouped in 3D space based on the way they sound, clustering together similar sounding material. Each of the 3 axes describes a grouping of sound frequencies. So a timbre, or a texture of sound. The position of the sound along one of these axes means there is a lot of that group of frequencies present in the sound file.

Exploring her work in this browser really demonstrates the variety of sounds she achieved. It also makes exploring the collection really fun to use, as there is a fun visual form, and you also get to hear stuff right away.

The browser has also been built to be a real-time tool for creating new sounds. Mousing over any of the tiny boxes (representing parts of audio files in the collection) triggers the clip to play. Since similar sounding clips are grouped closer together, one can “perform” the collection along perceptually coherent axes by moving the mouse along any of the axes.

[...]

Intention in Copyright

The following article is written for the LUCID Studio for Speculative Art based in India.

Introduction

My work in audiovisual resynthesis aims to create models of how humans represent and attend to audiovisual scenes. Using pattern recognition of both audio and visual material, these models use large corpora of learned audiovisual material which can be matched to ongoing streams of incoming audio or visual material. The way audio and visual material is stored and segmented within the model is based heavily on neurobiology and behavioral evidence (the details are saved for another post). I have called the underlying model Audiovisual Content-based Information Description/Distortion (or ACID for short).

As an example, a live stream of audio may be matched to a database of learned sounds from recordings of nature, creating a re-synthesis of the audio environment at present using only pre-recorded material from nature itself. These learned sounds may be fragments of a bird chirping, or the sound of footsteps. Incoming sounds of someone talking may then be synthesized using the closest sounding material to that person talking, perhaps a bird chirp or a footstep. Instead of a live stream, one can also re-synthesize a pre-recorded stream. Consider using a database of nature recordings and, instead of the live-stream, now use a pre-existing recording of Michael Jackson. The following video demonstrates the output using Michael Jackson’s “Beat It”.

Everything you hear comes from nature recordings (by Chris Watson). Try to realize what elements of Michael Jackson’s original recording remain “meaningful”. The beat of the song is incredibly predominant (@ 33 seconds). As well, some aspects of the lyrics are present and heavily cross-modally present with the influence of the visual (e.g. @ 1:27), though no words are audible as the database contained no words. Also consider what meaningful information may be present in the opposite scenario, i.e. using a database of Michael Jackson, and re-synthesizing nature sounds.

As a side note, I have developed a similar approach for visual resynthesis, taking segments of visual objects as the basis of the resynthesis algorithm, rather than segments of audio. The example below demonstrates a resynthesis of the introduction to The Simpsons using only material from the introduction to The Family Guy:

More examples are on my vimeo channel.

Audio Collage

The idea of audio collage is not new. Electronic musicians have investigated the technique within the practice of music concrète. The advent of digital sampling with devices such as the Fairlight CMI as well made the practice much more accessible. Plunderphonics, a technique by John Oswald in which he manually chopped and resynthesized a number of copyrighted albums, created entirely new landscapes and sounds of material. It was also formalized in his essay, “Plunderphonics, or Audio Piracy as a Compositional Prerogative” where he questioned where copyright can begin to claim ownership. He famously made use of Michael Jackson recordings after 12 years of developing the technique in his EP Plunderphonics, perhaps the most extensive use of sampling to date, which featured an image of Michael Jackson on a naked woman on the album cover.

The album itself made use of material from a variety of artists, all credited, such as Count Basie, Dolly Parton, Beethoven, and Michael Jackson to recreate unheard of sounds reminiscent of Stravinsky and The Beatles. Oswald believed that by not selling the album, he was not infringing on anyone’s copyright. However, he faced legal pressure by Jackson’s attorneys and was forced to stop releasing the album by CBS and Jackson’s attorneys, destroying all remaining prints of the album. Other artists such as Negativland which experienced similar legal battles with U2 and Cassetteboy whose audiovisual video collage depicting BBC material of Queen Elizabeth was stripped from Youtube are also of note. Numerous documentaries including RiP: A Remix Manifesto and Good Copy Bad Copy, books such as Cutting Across Media: Appropriation Art, Interventionist Collage, and Copyright Law and Lessig’s Remix, and funded studies such as Recut, Reframe, Recycle have also focused on the topic (Thanks to Nathan Harmer for additional links).

Perception is Inference

I have extended these questions into the very nature of perception, claiming the computational models I employ are plausible models of our own psychological modeling of audio and vision itself. I try to make explicit one possible mechanism of perception as a meaning making inference machine, an idea which dates back at least to Helmholtz in 1869. The very nature of inference entails understanding requires prior experiences, prior models, a set of known examples. By taking the small fragments of sound and rearranging them, these perceptual units lose their context, and necessarily their original meaning, and are only bounded together by the ongoing environment to create a new meaning based on organization of the existing environment. In other words, without the environment to make a sound, there is no synthesis of a sound. The meaning therefore is created both by the viewer, and the target of the synthesis. The fragments within the database have no intentionality or meaning attributed to it until it is re-organized, resynthesized, and re-appropriated within a new context. In the video above, this context is Michael Jackson’s Beat-It; however, not a single fragment of Michael Jackson’s Beat It appears in the audio output.

No Infringement of Copyright Intended

Where then does copyright hold stake within the computational models I have created? The ongoing environment provides the intentional influence of how the sounds are to be rearranged. If Michael Jackson were to appear in the environment and sing a song that also appears in the corpus, it is likely the synthesis would re-create Michael Jackson’s song. As our own mental machinery encompasses having heard Michael Jackson before, we are able to recognize Michael Jackson. However, now consider a database containing tiny fragments of Michael Jackson’s recordings and a target of birds chirping and taxis honking. Then, the only semblance of Michael Jackson is based on the tiny fragments that appear, and the organization of a song of Michael Jackson is no longer present.

Now the question appears, “Does copyright hold stake over the ongoing environment’s intentions?”, requiring no one to perform Michael Jackson for fear of the copyrighted re-synthesis? Or does copyright instead hold stake in the subtle fragments of sounds which were sampled from a copyrighted track? Let us say I set up this computational model as an installation environment where the database contains both Michael Jackson songs and nature recordings. As a target for synthesis, I have a microphone feeding live audio to the computational model. Until the microphone hears something *like* Michael Jackson, creating a synthesis using the tiniest fragment of a Michael Jackson recording, it seems I will not have violated copyright.

Even still, how could we have understood that a tiny fragment of a resynthesis was copyrighted in the first place? We will have had to matched every tiny fragment within our own perceptual machinery (i.e. recognition) to have understood that I had heard this fragment within a different context, that is, a copyrighted one. Though, is it not the case that any meaning making inference will necessarily be matched to a prior experience? If so, then isn’t copyright claiming our own experiences as copyright? Where does the context of sound take place within copyright? Could I not listen to the blow of trees and be reminded of Michael Jackson? Or more likely, hear the wind and be reminded of a Chris Watson recording?

The question I am getting at is “How does the meaning elicited by the new resynthesis change our notion of copyright?” Consider the opposite scenario, where Michael Jackson’s songs are no longer in the database, and instead we have nature recordings, as the video above does. What does copyright have to say for using the organization of Michael Jackson’s songs though using entirely different content? In this case it seems I am using the artist’s full intention, though not breaking any copyright as the material is not evidently materialized in the resynthesis.

Early 20th Century

The early Dada movement of the 1920′s also looked at resynthesis within a narrative context. During a surrealist rally in the 1920′s, Tristan Tzara is famously noted as standing in front of a theater and pulling random fragments of text out of a hat before the riot ensued and wrecked the theater. T.S. Eliot’s The Waste Land and John Dos Passos’ U.S.A. trilogy are also early examples of the cut-up technique popularized by Tzara. The technique was further made famous by Brion Gysin and William Burroughs during the 1950′s in their poetry, which heavily made use of the cut-up technique in a variety of fashions. Gysin mistakenly came across the technique as he made use of layers of newspapers while cutting paper on top of them using a razor blade (originally to protect the table). He noticed the cut-up fragments of newspaper created a juxtaposition of image and text that were strangely coherent and meaningful. The two key terms to stress are ‘coherent’ and ‘meaningful’, of which our perceptual systems cannot help but create from the world. In fact, a number of theories of neuro-cognitive behavior are also based on these premises. Interested readers are encouraged to read Ronald Rensink’s work on theorizing the phenomena of visual Change Blindness into “Coherence Theory”, and Shihab Shamma’s work on modeling auditory perception within a “Temporal Coherence Theory”.

Frederic Jameson also discusses the nature of collage, an early analytic cubist practice developed by Picasso and Braque which graced the start of the century, as a process which creates a new meaning from existing material. In contrast to pastiche or bricolage, collage seeks a new meaning, whereas pastiche is often used pejoratively to denote a random intention or imitation of existing intentions, e.g. 16th century forgeries/imitations.

Conclusion

The distinction of collage versus pastiche seems incredibly relevant to practitioners making use of sampling and collage. However, I doubt such a distinction would help anyone in court. Though it is curious to think of the opposing scenario, where the material content is not copyright, though the intention or organization of the content is. How does a digital future handle this distinction without referring to intentions of an artist? Further, how could it prove an artist’s intentions in the first place?

Related: Copyright Violation Notice from “Rightster”, YouTube’s “Copyright School”, EFF Wins Renewal of Smartphone Jailbreaking Rights Plus New Legal Protections for Video Remixing

[...]

Augmented Sonic Reality

I recently gave two talks, one for the PhDs based in the Electronic Music Studios, and another for the PhDs in Arts and Computational Technology. I received some very valuable feedback, and having to incorporate what I’ve been working on in a somewhat presentable manner also had a lot of benefit. The talk abstract (which is very abstract) is posted below with a few references listed. Please feel free to comment and open a discussion, or post any references that may be of interest.

Abstract:
An augmented sonic reality aims to register digital sound content with an existing physical space. Perceptual mappings between an agent in such an environment and the augmented content should be both continuous and effective, meaning the intentions of an agent should be taken into consideration in any affective augmentations. How can an embedded intelligence such as an iPhone equipped with detailed sensor information such as microphone, accelerometer, gyrometer, and GPS readings infer the behaviors of its user in creating affective, realistic, and perceivable augmented sonic realities tied to their situated experiences? Further, what can this augmented domain reveal about our own ongoing sensory experience of our sonic environment?

Keywords: augmented, reality, sonic, enactive, perception, memory, behavior, sensors, gesture, embodied, situated, acoustic, ecology, liminality

References:
Augoyard and Torgue, “Sonic Experience”, McGill-Queen’s University Press, 2005.
Arfib, D. “Organised Sound”, 2002.
E. Corteel, “Synthesis of directional sources using Wave Field Synthesis, possibilities and limitations.” EURASIP Journal on Advances in Signal Processing, special issue on Spatial Sound and Virtual Acoustics, January, 2007
Lemaitre, G., Houix, O., Visell, Y., Franinovic, K., Misdariis, N., Susini, P. “Toward the Design and Evaluation of Continuous Sound in Tangible Interfaces: The Spinotron”, International Journal of Human Computer Studies, no 67, 2009
K. Nguyen, C. Suied, I. Viaud-Delmon, O. Warusfel, “Spatial audition in a static virtual environment : the role of auditory-visual interaction.” Journal of Virtual Reality and Broadcasting, 2009
M. Noisternig, B. Katz, S. Siltanen, L. Savioja, “Framework for Real-Time Auralization in Architectural Acoustics.” Acta acustica united with Acustica, vol. 94, no 6, November, 2008
R. Murray Schafer, “The Soundscape”, Destiny Books, 1977.
J. Tardieu, P. Susini, F. Poisson, P. Lazareff, S. McAdams, “Perceptual study of soundscapes in train stations.” Applied Acoustics, vol. 69, no 12, December, 2008
Strategies of mapping between gesture data and synthesis model parameters using perceptual spaces. D. Arfib, J. M. Couturier, L. Kessous, V. Verfaille. Organised Sound, International Journal of Music Technology, Volume 7, Issue 2 , pages 127-144, 2002.
D. Arfib, J-M. Couturier, L. Kessous , “Expressiveness and digital musical instrument design”, in Journal of New Music Research,Vol. 34, No. 1, pages 125 – 136, 2005.
N. d’Alessandro, O. Babacan, B. Bozkurt, T. Dubuisson, A. Holzapfel, L. Kessous, A. Moinet, M. V. Lieghe, “RAMCESS 2.X framework – expressive voice analysis for realtime and accurate synthesis of singing”, Journal On Multimodal User Interfaces, Springer Berlin/Heidelberg, Vol. 2, Nr. 2, September, pages 133-144, 2008.
L. Kessous, G. Castellano, G. Caridakis, Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis, Journal on Multimodal User Interfaces, Vol. 3, Issue 1, Springer Berlin/Heidelberg, December 12, pages 33-48, 2009.
G.Caridakis, K. Karpouzis, M. Wallace, L. Kessous, N.Amir, Multimodal user’s affective state analysis in naturalistic interaction, Journal on Multimodal User Interfaces, Vol. 3, Issue 1, Springer Berlin/Heidelberg, December 15, pages 49-66, 2009.
Anton Batliner, Stefan Steidl, Bjoern Schuller, Dino Seppi, Turid Vogt, Johannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Loic Kessous and Vered Aharonson. Whodunnit – Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech. Computer Speech and Language, volume 25, No. 1, pages 4–28, 2011.

[...]

Dynamic Scene Perception Eye-Movement Data Videos and Analysis

Over the past 2 years, I have been working under the direction of Prof John M Henderson together with Dr Tim J Smith and Dr Robin Hill on the DIEM project (Dynamic Images and Eye-Movements). Our project has focused on investigating active visual cognition by eye-tracking numerous participants watching a wide-variety of short videos.

We are in the process of making all of our data freely available for research use. As well, we have also worked on tools for analyzing eye-movements during such dynamic scenes.

CARPE, or more bombastically known as Computational Algorithmic Representation and Processing of Eye-movements, allows one to begin visualizing eye-movement data together with the video data it was tracked with in a number of ways. It currently supports low-level feature visualizations, clustering of eye-movements, model selection, heat-map visualizations, blending, contour visualizations, peek-through visualizations, movie output, binocular data input, and more. The videos shown above on our Vimeo page were all created using this tool. Head over to Google code to check out the source code or download the binary. We are still in the process of stream-lining this process by creating manuals for new users and uploading more of the eye-tracking and video data so keep checking back if you are interested.

[...]

Perception Related Videos

Seeing as how many interesting talks are being collected in online video databases such as videolectures, mit world, academic earth, opencourseware, ted, uctv, or the nih videocasts, just to name a few, I’ve started to collect a few of the interesting talks that are related to perception.

You can find them here.

[...]


Copyright © 2010 Parag K Mital. All rights reserved. Made with Wordpress. RSS