I've been naughty (and busy). I skipped two project milestones: 

    2.  Working prototype for output.
    3. Working front-ends for both input and output (searching for products and scraping recommendations).

The idea was the each user could have multiple research profiles that they trained differently and compared. 
IN turn, each research profile could carry out multiple searches on Amazon to influence recommendations.

I was at milestone 3. I had a stable version of RARS running for the past month with thousands of recommendations stored in it for the benefit of those marking my MA final project.

The logic (partly) represented by this chart was working! ( see full size)

If it was possible, milestone 4 was going to be beta testing the application with hapless members of the public. After receiving my marks from my MA, I decided to tweak RARS a little. Everything promptly broke; the database couldn't talk to the web application which couldn't talk to the web server. I rather rashly decided to completelty terminate the cloud server on which RARS was running.

My Game of Cat-Duck and Mouse with Amazon

I had put a few lines in my snu-snu authentication source code to print the HTML of the web page at key junctures. This made it possible to determine whether the headless Chrome instance was reaching the right pages when attempting to access Amazon.

Logging the source-code also led to this chuckle-inducing discovery

I spun up a new cloud server on Amazon Web Services and had to start from scratch. When I attempted to use RARS to log into the old dummy amazon accounts I had used, I was met with this:

Take it from me: Combing through HTML source code is a fun Sunday Afternoon activity

This page was new. I had already encountered two others:

  1. One telling me that if I wanted to carry out automated interaction with Amazon, I could ask them and/or use their API.
  2. One asking for a captcha after an attempted log-in.

These two were fairly easily to deal with. With the first, I would just assign a new elastic IP to the cloud server. Yes, that's right: on top of free cloud computing for a year, Amazon give you random IP addresses that their e-commerce platform doesn't seem to recognise as belonging to them. With the second, I would either abandon the account or log into it over TOR and fill in the captcha, restoring Amazon's trust in the account.

This third roadblock was insurmountable. I tried rotating IP addresses. I tried generating a new email address. No joy. It was possible that the fact that so many suspicious accounts were created using email addresses at my domain name had led to it being blacklisted. I considered trying to find a service that would let me create email accounts without a phone number over TOR. Yandex used to; a few others still might.... At this point, I felt that trying to find another email provider would be about as effective as a broom in keeping the tide at bay. The only other option would be to fetch the relevent emails for the given account over IMAP, extract the code and get Chrome to type it into the page. Even if I wanted to, my day job wouldn't allow me to do this within a reasonable timeframe.

I fell at milestone 3.141592653589793... For some reason my progress approximates pi quite well, I have no idea why.

Where from Here?

I knew at the beginning of this project that even if it was viable to create a wrapper around fake amazon accounts for sociological research, it was never going to be scalable. Why did it do it? For one thing, I was doing an MA at Goldsmiths;  the normal logics didn't apply.  This was the year for a dippy, impractical project. Like other things a person can end up doing at Goldsmiths, a lifetime of projects like these would be liable to get me institutionalised.

There are a handful of things I can do with the few thousand lines of code that I wrote for this project, and the skills I learned along the way.

  • Adapt snu-snu into a browser plugin that spams Amazon with obfuscatory searches, rendering its recommendations meaningless.
  • Create a browser-automation based library that provides a high-level interface for interacting with Facebook. This could be used to automate your own facebook behaviour or to make a bot. It would be interesting to teach neural nets optimal social networking behaviour.
  • Make more web applications. While this is more humdrum than the others, it has the advantage of being economically viable.

Today I overcame a sizeable hurdle in developing my web application for researching Amazon's recommendation system. Apologies if this post is too technical; I barely understand what I've been doing myself – certainly not enough to put it in plain English.

For the last week, I have been attempting to make it so that submitting the Django form below causes the server to log into Amazon and search for the listed items using the scraping/browser-automation library I wrote called snu-snu. This is by no means a flashy front-end but it means I can test the basic functionality of what I'm building. I had partial success with this working directly on the computer I'm using for development. However, I couldn't be certain that what would work on one machine would work on Amazon's or any other servers.

The solution I was advised to use was to wrap everything using Docker so that the environment in which I deployed my application would be a carbon copy of my development environment. Based on my limited understanding, Docker is a tool that systematically builds individual instances of Linux (called containers) for different parts of a project. For example, one might have separate containers for web servers such as Nginx or Apache, databases such as MySQL or PostgreSQL and content management systems like Wordpress or Joomla. These containers aren't as isolated as virtual machines – they aren't each allocated chunks of system resources, for example – but still only interact with each-other inasmuch as is required. What makes them very useful is that their content and configuration are defined in human-readable Dockerfiles that can be deployed anywhere and produce identical containers when built.

The Docker setup I'm using combined the following:

  • Postgres: a database.

  • Celery: a system that enables tasks to be queued and scheduled so that they don't interfere with a user's interaction with a website. They can continue using the site while time-consuming code is executed.

  • Rabbitmq: a message-broking system used by Celery to schedule and manage tasks.

  • Django: a Python-based web application framework.

  • Nginx: a web server.

I needed to make it so that a task triggered by a user's interaction with Django would be queued in Celery and then be carried out by snu-snu using selenium and a headless browser (i.e. one configured not to output to a screen). This last part I had to figure out myself, or so I thought.

I spent a lot of time trying to modify the Dockerfile for the Django container so that the webdriver would be accessible to the code. Even though I made some progress, this was an appalling way to go about solving the problem; even if I had managed to cobble something together, I would not have received updates from any of the repositories I was pillaging. After failing at this for many hours, I decided to go about it another way. As a lot of docker containers are based on the stable and minimal Debian Linux distribution, I created a Debian virtual environment and started trying to manually get selenium working with a headless browser. This only worked when on top of installing a virtual framebuffer (which simulates outputting to a screen) I installed display managers and desktop environments. I stopped at this point as these pieces of software have no place on a server.

I was lucky to be dissuaded from the questionable path I set out on, though I didn't feel it. I read more documentation and decided to create a container just for the headless browser bases on an image created by someone else. I had no idea how my python code in Django would communicate with this new container. Luckily I had help from someone more experienced who showed me how to set up the container with an image and access a a selenium-controlled browser remotely. Even then, the image I chose didn't work.

After a little more searching I found an image called selenium/standalone-chrome which was maintained by SeleniumHQ themselves. All this required was that port 4444 was opened on this image and Django could talk to it. Four lines of markup in a docker-compose file was all it took to arrive at this simple and perfectly encapsulates solution. The trick with coding seems to be to know the four lines that work from the near-infinite number of combinations of lines that don't

The below screenshot shows the server logs for when the Amazon searches specified in the form above were carried out:

As you can see after about 13 minutes, the task had been successfully completed by Celery. If you look at lines further up, you can see the container headless_chrome_1 interacting with amazon.co.uk and the output from snu-snu being displayed via the container celery_1.

This post may be gibberish to you but I'm feeling optimistic as I have a working prototype for controlling snu-snu via a web interface. My next post will probably deal with my prototype for displaying recommendation scraped from Amazon.

My MA practical project was the result of honing down initially very broad research into machine learning. As you can see from the diagram below – which I would never claim is even remotely exhaustive in what it purports to represent – the system is far too extensive and complex to be investigated by one student over a few months. As I whittled down my interests, I ended up writing a piece of software called snu-snu that trains Amazon's recommendation system by carrying out searches, adding items to wish-lists and scraping the resulting recommendations from Amazon to be saved for later analysis. I am calling the more recent permutation of this software RARS, which stands for Research Amazon's Recommendation System.

The areas edged green show my initial interests for this project.

In this post, I will introduce you to some of the ideas I've been working with so far, the software I developed and my plans for deploying this software as a more user-friendly web application.

The Ideas

My initial research was a little aimless. I ran some simple natural language processing on a number of books and queried and trained Amazon with the most frequently occurring words (sans articles such as 'the' and other boring words). The results were largely inexplicable and I was certain that any interpretation on my part has about as much epistemological clout as a horoscope. Particularly hard to account for were the recommendation in the table below that were garnered by keywords from none other than Adoph Hitler's Mien Kampf:

In my preliminary research, I discovered that recommendation systems rely on a number of algorithms. Some of these – such as clustering and collaborative filtering – group users according to similar behaviour. Based on this, the recommendations output by Amazon for a set of related accounts might be read as part of an algorithmically generated social profile. Bearing this in mind, the above can only be explained by Hitler's political proclivities being typical of those of parents of toddlers – or more sinisterly, of childless neo-Nazis with penchants for children's toys.

Because the conclusions I could draw were so nebulous, I have been looking for a more systematic approach. Inspired by Richard Rogers' book Digital Methods, I'm interested in an empirical methodology that works mainly with digitally native objects. One such approach could see me surveying members of the public to see which categories of consumer goods, such as handbags and boots, they associate with stereotyped groups such as the middle classes, those who are feminine, people of colour or homosexuals. I could then empirically test Amazon to see whether the logics of its recommendation system produced something analogous to our social categorisations.

I have also found material that complements this research from a less empirically grounded source: the ideas of social subjection and machine enslavement put forward by sociologist and philosopher Maurizio Lazzarato. From page 12 of his book Signs and Machines: capitalism and the production of subjectivity:

In capitalism, the production of subjectivity works in two ways through what Deleuze and Guattiari call apparatuses [dispositifs] of social subjection and machinic enslavement. Social subjection equips us with an identity, a sex, a body, a profession, a nationality, and so on. In response to the needs of the social division of labour, it in this way manufactures individuated subjects, their consciousness, representations and behaviour[…] Machinic enslavement dismantles the individuated subject, consciousness and representations, acting on both the pre-individual and supra-individual levels.

On my reading, collaborative filtering operates at the junction of the forces of social subjection and machinic enslavement for the following two reasons: first, in that individual consumers both become "dividuals", as they are reduced to aggregations of electronic data and processed in ways that cross boundaries between different subjects, between subject and object, human and non human; and second, that a categorisation simultaneously occurs that may echo that which takes place on the subjective levels of class, race, gender and so forth. It does seem methodologically unusual to mix empirical research with highly abstract sociology, but I might as well take advantage of the freedom my course affords me in this regard.

The Old Software

The snusnu python library is currently hosted on my Github at https://github.com/simoncrowe/snu-snu. I wouldn't recommend trying to use this unless you know a bit of Python and don't mind getting your hands dirty. It requires Selenium and ChromeDriver for most of its functionality, which can take a bit of work to set up.  (Note: the screenshots below are old and snusnu currently needs to be treated as a module. To use this you need to either enter 'pip install git+https://github.com/simoncrowe/snusnu.git'  on your terminal or run the python interpreter from snu-snu's parent directory and import what you need from it. e.g. 'from snusnu import terminal'. )

Using the script terminal.py you can log into Amazon and either enter as many queries as you want to have carried out or – as in the screenshot below – specify a JSON file containing a list of queries.

The JSON lists of queries are generated by another script called text_process.py. This employs frequency analysis and parts-of-speech tagging from NLTK (Natural Language Toolkit) to derive a list of words from some text. In the below screenshot, The Waves by Virginia Woolf is being processed.

The New Software

In order to better understand what I could re-use the first version of my software I drew up a flowchart to show one of my tutors, part of which is below. He quite quickly suggested that I drop the terminal interface, stick the core functionality on a server and write a web application as a front end to it.

Over the past few weeks, I've been getting my head around the Django web application framework and attempting to integrate my existing code into it. I've made a bit of progress so far, most of which I owe to the help of fellow student and software developer Fabio Natali. He has advised me on how to go about developing and deploying the application and has critiqued a number of wireframes I've sketched out for the user interface. The image below is from the latest set of wireframes and is nearing something workable.

My somewhat ambitious plan for the next month is to develop a simple web application for investigating Amazon's recommendation system and ideally deploy for free it on Amazon Web Services. I'll write further posts as the project progresses.

I recently decided, in earnest, to finish Switch, a project which I've grappled with on-and-off since 2011. The project was, in part at least, a result of the psychological effects of spending the 21st year of my life in limbo, without a passport or income, in debt and with an uncertain future. More importantly, it was the product of a desire to work with the computer game as a serious medium for literary experimentation. Regardless of its merits, I've since had trouble engaging with a project from that dark phase of my life. Soon after the project began to flow and I almost lived and breathed it, my isolation ended and I rushed back to university. Perhaps tellingly, that following summer, instead of working on the text, or even the visual art or programming sides of the project, I decided to write a specialised programme to help me author the complex dialogue scripts. While this was a sound idea, it also served as an escape from the daunting task of writing dialogue for something so bleak. Another year of University removed me further from the origin of the project. In the summer of 2013, after I graduated, I decided to have another go at the 3d art and programming side of it, making some progress. The dialogue only came in dribs and drabs, mainly when I felt glum enough to engage with the desolate scenario, but not too depressed to write. Perhaps I was approaching it in the wrong way.

No purgatorial chamber would be complete without real-time shadows.

Some months later, I got a job and resumed the martial arts training I'd started at university. I was busy and found other projects to do on weekends. Unsurprisingly, my decision to finish the project didn't come from a happy place. Over my short life, the few decisions I'm most proud of arose from dissatisfaction and disgust rather than glee. This is one of the reasons I'd avoid antidepressants unless I was actually unable to function, rather than keeping going, however grimly and sullenly – I think Nietzsche had a point about the value of adversity. I realised that there was nowhere to hide from the emotions, ideas and influences (Beckett, Tarkovsky, Bela Tarr ) that gave rise to this project, and that I need to put it behind me before I go back to university again and perhaps find another direction. Further, I've come to appreciate that there isn't anything innately negative about minimalistic fiction like this, it merely limits the number of creative possibilities so that one can focus on fully harnessing them.

I was introduced to Unity by one of the technical tutors at university, and, like any good geek, excitedly quizzed him about it after his presentation. I went back to my room, downloaded it and ended up making some quite odd interactive 3D artwork. This was in late 2009 and Unity 2.5 had just been released for Windows – previous versions only run on Mac. They had also just scrapped the $200 charge for the indie version, which retained most of the functionality, without the fancy graphics. It was very artist-friendly; I only started learning to code in 2011. It was at this point that I began Switch, now working in Unity 3.0, which had many more features including BEAST light-mapping, which like many others, was only available with a pro licence. This meant that I needed to come up with another method of achieving a decent standard of lighting. This piece was to take place in one bare room, and while I didn't want it to look perfect, I wasn't going to be satisfied with the realism offered by direct lighting. My answer was a feature of Blender 2.4 which had already been scrapped in the latest versions. Radiosity is an imperfect method of simulating light scatter between 3d surfaces, using each quad (polygonal face with four edges) of the model to store light energy and emit excess to other quads. This method is time consuming and limited in applicability. Fortunately, for a single room, it was adequate. I ended up baking the lighting for most of the room into one 2048 x 2048 pixel image, and put together a composite texture in GIMP.

Following the pattern that started when they made a pared down version of their software free for hobbyists, students and independent developers five odd years ago, Unity 5 Personal now has all of the graphical features of its paid equivalents, but lacks some new features that would mainly benefit the large, more established developers it's aimed at. All of my work in 2011 could be seen as futile now that the free version of Unity has global illumination as standard. As I'll be publishing this project to unity's web player, having my light map and diffuse texture mixed on the same layer will save loading time, and, personally, I feel this method required a bit more artistry. While I could justify this, I couldn't resist playing with the new real-time shadows and reflections.

Now my procedural water has a reflection probe!

Thanks to the Unity team's generous business model, which I believe has helped them expand their community and brand to a global scale in less than a decade, I now have a graphically capable free games development kit at my disposal. It may not quite compete with CryEngine or Unreal engine, but it's closer than ever. This is further motivation to finish this project quickly, so that I have time to experiment with some of my more visually lush ideas.

NB: this article was written in 2013 and is extremely out-of-date. Unity now has native support for blendshapes.

The humble blinking animation: Without it, our characters wouldn't look quite as alive.

This post is about my frustrated attempts to make a character blink, which eventually led me to make the Unity3d games engine a bit more animation friendly – which for me means: a bit more like Maya.

Blend-Shapes in Maya

In Maya, if you want blinking or any other facial animation, you use blend-shapes. This is known as 'morphing' or 'shape interp[olation]' in other 3d packages. When blend-shapes are used, the mesh you see is a product of several different meshes. The influence of each other mesh on your base mesh (usually expressionless, if a face) is determined by a weight value, which can be key-framed. This allows the animator to smoothly morph between different intensities of different facial expressions, even mixing expressions together.

I used Maya's blend-shapes for this animation:

Blow-In 2 (without subs) from Simon Crowe on Vimeo.

All of the blend shapes were modified duplicates of the main head mesh:

Alternatives to Blend Shapes in Unity3d

Unfortunately, theUnity3d games engine, to my knowledge at least, lacks support for blend-shapes. The alternative is using 'bones' (a hierarchical skeleton of transform objects) to animate faces – these are called joints in Maya. For most animal models, the bones correspond to actual bones:

I stole this image from Scott Petrovic's blog. (click image to visit) I hope he doesn't mind!

 

This system of animation is very well suited to limbs and spines, but is not at all suited to the complex muscle structures of the face, where movement does not primarily result from the use of bones as levers.

Wait! Maybe I'm digressing a little. A simple action like blinking would be possible with bones, right? Well, I attempted to set up my character's eyelashes in Blender, and gave up after Blender didn't do what I expected it to do (i.e. What Maya would have done.) My idea was to group the vertices of the top eyelash to a bone centred inside the eyeball, which would rotate up and down to make the character blink. While this might have worked, I suspect it would have been incredibly frustrating to get it to look right.

Making Unity3d (Sort-of) Do Blend Shapes

I turned my attention to implementing blend shapes in Unity – something I'd worked out how to do  in theory, at least – years ago. I didn't do the sensible thing and look for other people's blend shape scripts and plug-ins, I dived right in!

To my credit, my idea worked. All the problems were Unity-specific. I put the nuts and bolts of the system in an abstract class, so all actual implementations could just focus on changing the weights of each blend shape. Here is that class, with a few comments to attempt to explain the logic of the code:

using UnityEngine;
[RequireComponent (typeof (MeshFilter))]
public abstract class BlendShape : MonoBehaviour {
	public Mesh[] shapes;
	public float[] weights;
	protected Vector3[][] differenceVerts;
	protected Vector3[] baseVerts;
	protected Mesh baseMesh;
	
	void Start () {
		MeshFilter baseMeshFilter =
			(MeshFilter)this.GetComponent("MeshFilter");
		baseMesh = baseMeshFilter.mesh;
		baseVerts = new Vector3[baseMesh.vertices.Length];
		for (int vertIdx = 0; vertIdx < baseVerts.Length; vertIdx ++) {
			baseVerts[vertIdx] = baseMesh.vertices[vertIdx];    
		}
		differenceVerts = new Vector3[shapes.Length][];
		for (int shapeIdx = 0; shapeIdx < shapes.Length; shapeIdx ++) {
			differenceVerts[shapeIdx] =
				new Vector3[shapes[shapeIdx].vertices.Length];
			for (int i = 0; i < differenceVerts[shapeIdx].Length; i ++) {
				differenceVerts[shapeIdx][i] =
					shapes[shapeIdx].vertices[i] - baseVerts[i];
				// For each vertex in each blend shape mesh, we calculate
				// and store the difference between that vertex's position
				// and the position of the corrosponding vertex in the base
				// mesh.
			}
		}
		Initialize();     // The Initialize method must be overriden in any
				  //class that inherits from this one.
	}
	void Update () {
		ProcessWeights();     // ProcessWeights can be overriden and filled
				      // with anything that needs to be done every
				      // frame to ensure wieghts are correct.
		Vector3 addendVert;
		Vector3[] newVerts = new Vector3[baseMesh.vertices.Length];
		for (int vertIdx = 0; vertIdx < baseVerts.Length; vertIdx ++) {
			addendVert = Vector3.zero;
			for (int shapeIdx = 0;
				shapeIdx < shapes.Length; shapeIdx ++) {
				addendVert += differenceVerts[shapeIdx][vertIdx]
				* weights[shapeIdx];
				// The differences in poition we calcualted earlier are
				// added together for each vertex - each multiplied by
				// a particular blend shapes's weight.
			}
			newVerts[vertIdx] = baseVerts[vertIdx] + addendVert;
			// The sum of all weighted differences are added to the base
			// vertices, producing blended vertex positions!
		}
		baseMesh.vertices = newVerts;
	}
	protected abstract void ProcessWeights();
	protected abstract void Initialize();
}

 

If we make sure that all our blend shapes are copies of the original mesh with the same number of vertices and assume that, when Unity imports the meshes, this will remain the case – this should work. The last assumption is a big one, and can easily be wrong. Without going into details of how I managed to get my eyelash mesh to work with my script, here is the code I used to animate the blink:

using UnityEngine;
using System.Collections;
 
public class BlinkBlendShape : BlendShape {
	float lastBlinkEnd = 0;
	float blinkStart;
	float lerpEnd;
	float currentWait;
	bool blinking;
	bool hasStarted;
	bool hasPaused;
	public float wait = 6;
	public float waitRandomMin = -0.75f;
	public float waitRanomMax = 0.75f;
	public float startDuration = 0.08f;
	public float pauseDuration = 0.05f;
	public float endDuration = 0.075f;
	
	protected override void Initialize() {
		currentWait = wait + Random.Range(waitRandomMin, waitRanomMax);
	}
	protected override void ProcessWeights () {
		if (blinking) {
			if (!hasStarted) {
				if (Time.time < lerpEnd) {
					weights[0] = 1 -
						((lerpEnd - Time.time) / startDuration);
				} else {
					hasStarted = true;
					lerpEnd = Time.time + pauseDuration;
				}
			} else if (!hasPaused) {
				if (Time.time > lerpEnd) {
					hasPaused = true;
					lerpEnd = Time.time + endDuration;
				}
			} else {
				if (Time.time < lerpEnd) {
					weights[0] = (lerpEnd - Time.time) / endDuration;
				} else {
					blinking = false;    
					lastBlinkEnd = Time.time;
					currentWait = wait
					    + Random.Range(waitRandomMin, waitRanomMax);
					hasStarted = false;
					hasPaused = false;
				}
			}
		} else {
			if (Time.time >= lastBlinkEnd + currentWait) {
				blinking = true;
				lerpEnd = Time.time + startDuration;
			}
		}
	}
}

 

Blinking should be a simple action, but my BlinkBlendShape class is actually longer than its parent: Blendshape. One reason for this is that I used random time values to make it more realistic. No-one blinks exactly every five seconds!

I shudder at the thought of how long and complex a script, or set of scripts, would have to be to allow for key-framed blend shapes like those in Maya. Perhaps something with simple methods like StartSmiling(), StopSmiling() or SetSmileWeight(float weight, float transitionDuration), would be more suitable for a non-linear game.

A Random Thought: Besides Facial animations, What Could We Use Blend Shapes for in a Game?

  • Changing levels of fluid in containers.
  • Wobbling viscous projectiles.
  • Rapidly growing plants.
  • Parasites crawling beneath skin.
  • Blooming flowers.
  • Boiling liquids.
  • Gas clouds.
  • Pulsating organs.
  • Undulating Surfaces.
  • Flowing, oozing, flopping substances.
  • Porcupine-like bristles.
  • Sphincters.
  • Soap bubbles.
  • Springs, maybe...
  • Massive slow-moving raindrops, like on the moon Titan.
  • Shape-shifting lizards! (Within reason – depending on how well the changing vertex positions worked with a skinned mesh.)

To conclude...

The image at the top of the page was the result of my own blend shape script. It works, at least for blinking, with a few limitations:

  • When unity imports a mesh, there is no guarantee that the number of vertices will be the same as your original model. One reason for this is that if a mesh has hard edges, Unity will use two separate vertices and edges to make the edge render as hard and sharp. Another is that if your mesh has UVs, Unity has to make two vertices for every one that is a seam in the uv map.
  • Two sided meshes (where each outward facing vertex has an inward-facing counterpart) like my original eyelash mesh, import with unpredictable vertex numbering. There are probably other types of meshes that do this.
  • It probably wouldn't be that efficient when working with multiple head meshes, rather than one mesh containing 100 vertices of one-sided eyelash. It's a script when it should really be a plug-in; It uses C# – a high-level language – do something that could be more efficient in a lower-level language like C++.

This post was more of an update on how I'm getting along with a project, than a feasible solution to a coding problem. Perhaps someone will use this as a starting point for a better implementation or direct me to a highly mature Blend-Shapes-in-Unity project with far fewer limitations.

Having recently graduated, until I get a job, I can't justify squandering my free time. I have no strong ideas for projects, but I do have this half-finished interactive prose piece called Switch that I started about two years ago.

This project is cold in two senses. It is cold because my ideas and circumstances have changed since I last worked on it. My brain isn't wired in such a way that I can work with it fluidly any more. It is also cold because of its starkness and negativity. I envisaged the entire thing as one joyless and hopeless cycle that could run ad infinitum. True to is absurdest (Read: Beckettian) root, all this bleakness would not be delivered without a good measure of humour. Nevertheless, the theme appeals to me less now, since although my immediate future is unlikley to be easy, at least I can be certain of staying in England, and of the possibility of paid work and more education, things which were far from certain two years ago.

What would be more emotionally appropriate to me now would be a traditional narrative in which adversity is eventually overcome. However, this project is what I have. This kind of art is always going to be a product of its maker's life. The project will no doubt change because I have changed. The outcome will be different to what it would have been, had I continued working on it rather than returning to university

The only major piece of news about this project is that I have almost finished the application I'll use to edit my dialogue scripts. I tweaked the code a little this morning and It really does seem to work! Below is a screen-shot of it in action.

While it may be a bit clunky, I came to the conclusion that this was the best way to edit a dialogue graph that didn't always conform to a tree structure. I'll probably write another post outlining the program's features. I may even put the source code and executable up on this website, along with the Unity3d code for the dialogue GUI, so that others can use it.

The content of the game hasn't changed a great deal. While the game was running – I opened a door and drawer on the cupboard, turned on one of the hobs and both taps, and pushed the wheelchair around – to demonstrate the interactivity I had started adding. I still need to add the ability to pick up and hold items, and to use tools.

Despite how awkward it will feel at first, I really do want to get stuck into this project. I've put quite a lot of work into it already, and I would like to see it finished.

Looking Back on the Loughborough University Degree Show

A screenshot of the fnal program used in the degree show installation

What I showed at the Loughborough University Degree Show only deviated greatly from the plans I've outlined in previous posts in one respect: I decided to change site, after being advised that the construction work I had planned to carry out on the original site would be too time consuming and probably beyond my skill set.

I went from this,

to this:

This wasn't quite as bad as it may seem because the office was sparsely furnished and its dimensions were similar to those that the other space would have had, had the barriers been constructed. If anything, it was easier to justify using the smaller room.

I was adamant to follow through with my plan of not making the desktop hi-fi speakers I was using visible, as I thought this would be aesthetically sloppy and conceptually muddy. I ended up deciding to enhance the anthropomorphism of the speakers, that I had already planned to hint at by placing them at average human head height.

I came up with two separate designs. Both of them were intended to be white human sized boxes that would house speakers behind fabric screens.

With the first, I was initially unsure as to how I'd attach the wooden frame to the inside of the MDF box, in order to avoid any of the edge of the cloth showing. To sidestep this problem, I put together another design, in which a large piece of fabric would be stretched across a 3D frame.

This worked in theory, but in practise my workmanship was too poor and the fabric too creased. The first problem could have been solved with greater care and better tools, but the second was an insurmountable obstacle.

I returned to the first idea and worked out the system of two frames screwed together, one of them holding the stretched fabric and the other one sliding neatly into the MDF box. I was too busy to record the construction process; all I have is an image of the finished boxes, all painted white and positioned on top of the white vinyl flooring which I laid over the carpet (and a layer of hardboard).

It wasn't the fabric cube – whose only function was to be a confluence of sounds, to be shot through with virtual voices – I had hoped for, but I thought it was at least presentable in the context of a degree show.

Though it only occurred to me to capture footage for the last three days, of the installation, I got the impression my instillation was visited quite frequently. Some were amused, some puzzled and some underwhelmed. What pleased me was that in a lot of cases, my audience did end up discussing social identity, in a gallery-type setting, which is all I could really have hoped for.

Ultimately, most of what I build had to be disposed of, though some was recycled for the Free Range show.

Looking Forward to the Free Range Show

The work I'm showing at the Free Ranged show uses the same algorithm, but instead of having six speakers emitting synthesized speech from one conversation, three pairs of headphones will be emitting speech from three dialogues. These will have to be generated beforehand rather than being produced in real-time based on webcam input. I've made a new program, based on the previous one, to achieve this. I've also modified the program so that I can choose to include all the derogatory terms I omitted from the London show. At least those recordings won't go to waste, and hopefully I won't cause too much of a stir.

Unfortunately, because my program has become incredibly memory hungry, I have to limit each audio file to a couple of minutes, otherwise I end up with an OutOfMemory error.

In terms of presentation, I made a box out of some scraps of my boxes from the Loughborough show, that will hold my laptop. I've attached a rudimentary wooden headphone stand to this box. I'm unsure as to how successful this version of the piece will be.

Talking About Social Identity is the somewhat cumbersome working title of my final project for the BA I'm taking in fine art.

My main claim is that our social identities – ours and others' conceptions of our sex, race, class, gender and sexuality – are constructed through an ongoing process of conversation and negotiation. I'm investigating this claim by building a sound installation that simulates a conversation between up to six virtual participants. These participants (literally electronic speakers) negotiate theirs and each other's social identity according to simple set of rules based on their belonging, or not belonging, to different identity groups such as middle-class or feminine. In this blog series I'm going to look at the major processes involved in making the sound installation as well as some theory and planning. Throughout the remainder of this post, I will introduce three key components of the project: motion tracking, conversation simulation, and speech sample extraction.

Motion Tracking

As I want this installation to be interactive, I need a way to monitor the audience's behaviour. This is where motion tracking comes in; I plan to install a web cam above the installation space, and use it to track people's positions in the space. At this point, I haven't decided precisely how I'm going to use the motion tracking data, though I do have some ideas which I will discuss in a future post.

I have, however, put a GUI together which allows me to easily use a motion detection and tracking algorithm from the open source Computer Vision library Aforge.NET. While waiting for the delivery of the webcam I hope to use for this project, I have experimented with the motion tracking using my laptop's inbuilt webcam.

While the camera's optics are poor and its sensor low quality, the algorithm – which looks for differences between the current frame of camera input and the background frame – can still detect motion. In the above image, I've set the background frame to one in which my hand and the pair of scissors were not in the field of view. The fact that the right handle of the pair of scissors is not recognised as a motion object against the darker background indicates a weakness in this kind of motion detection, its reliance on the difference between pixel values. Practically, I intend to address this weakness in three ways: Get a better camera, apply a contrast filter to the camera input before motion detection is carried out, and most importantly, make my installation space as white as possible, with consistent lighting.

Conversation Simulation

A lot of what you can see in the above image is unchanged since the first version of my conversation generation program that I made at the very end of 2012.

The topmost section gives me access to variables which determine, for example, how different the beliefs of my conversation's participants (which I call interlocutors) will be to start with and how much memory they have to store past utterances (things which they, and others, have said). The only major change I have made here is adding fields for disposition: a property which represents how participants in a conversation would feel about a belief, were it true. If my terminology makes no sense to you at this point, please bare with me. I will write about these settings and the algorithm that they control in much more detail in future posts.

The middle section for file output is fairly self-explanatory. This was the only form of output I used in 2012's version of the project. It allowed for a conversation, of a limited length, to be saved as a WAV file, each participant's speech occupying a different channel. In the version of the program I'm working on now, I want it to be possible to output in real-time and to file, simultaneously. I have not implemented this yet and will likely post something about the technical challenges it poses. The bottom section houses settings specific to real-time output. At the moment, I have settings for the positions of six speakers. These positions will roughly correspond to where each speaker is in my web cam's field of view and allow me to use audience members' positions relative to the speakers to change how they function. For example, a speaker could be more likely to emit speech if an audience member was closer to it.

Speech Sample Extraction

This is one of the most time consuming, and certainly the most repetitive of the processes involved in this project. I have to manually extract approximately 4,704 words in total from voice recordings and save them as individual files to be used to synthesise speech.

The above image shows the extraction of the word Caucasoids, a word which I will have to extract on 24 separate occasions. While I would like to be able to get a computer to do all of this automatically, the quality of these samples is going to have a major impact on the final outcome of this project. The computer saves me a lot of time in that I can automatically amplify each file so all the loudest points are equally loud and remove any background noise. However, this is when it stops being automatic: Each word needs to be listened to several times until I find the best range of it to save to file. The words also have to sound good in context and this often means cutting them short, especially when a word is likely to be in the middle of a sentence. I can allow words such as nouns, which are likely to be at the end of the sentence, to trail off and have greater emphasis; with adjectives, I have to compromise as they can be both in the middle and at the end of sentences. There is also the matter of attempting to repair mispronunciations and missing words. These tasks, and some issues relating to how I want the piece to be interpreted, will be discussed in a future post.

I hope you gained something from reading this. At least know that I gained something from writing it as I find it easier to clarify my ideas one I have expressed them in writing.

The process is usually straightforward. I'll select part of a WAV file from a voice acting session, that best encapsulates a given word for use in my speech synthesis, and save it as a separate file. This file is named after that word and placed in an appropriate folder.

Below are screenshots of both ends of the word "acts", zoomed in so that I can ensure that it begins and ends with silence.

You may have noticed that the word I have extracted does not include all the sound that the voice actor made when saying that word. This is often the case, because actors read words out individually and placed more emphasis on them, effectively treating them as words at the ends of sentences. Naturally, I sometimes do include all of the recorded word if it is a noun and therefore certain to be a the end of a sentence (within the limited set of sentence structures I'm using).

Which elements of the original spoken words I use may also have consequences for how my work is read. If I go to a great effort to make the voices sound real by leaving emphasis, breathing, ticks and trailing off, in the final words, people may assume that the main focus of my work is how people speak. Conversely, if I strip my work of the natural irregularities of spoken language, people might assume that I'm more concerned with the overarching system that dictates what they can say. To put it another way: Do I want people to think I'm looking at social structure or social agency? What about both? I must also consider the fact that my limited technical abilities prevent me from seamlessly belending different words together, which results in a talking clock effect. Therefore, including irregular elements of speech may further the sense of machine-speech, because the different words will seem even more disparate in source. This problem doesn't have an easy solution, though fortunately the tension can be read as useful in that I tend to think of this project as exploring the mixing of abstract models of human social identity with more concrete social processes.

Problems and Solutions

Occasionally, things go wrong. Words are missing or mispronounced in my original recordings. In these cases, where possible, I've mixed together parts of separate words to get an approximation of the desired word. In the example below, the word "Mongoloid" is created from two separate sources:

The two parts were repositioned and their gain (volume) envelopes tweaked until they could be mixed into a convincing word.

In other cases, I have been less fortunate with missing and malformed words and have resorted to copying words spoken in one of the other tones of voice; all voice actors contributed three sets of words with different emotive intonations.

I recently made the mistake of extracting all 196 of a set of words without first increasing the volume of the source recording so all of its peaks were at 0 decibels (maximum volume.) Instead of re-doing the entire recording, which would have taken several hours, I found a free audio editing program, Wavosaur, with batch processing. At first I was disappointed by the limited selection of pre-defined batch operations. However, I soon discovered that I could process multiple audio files using any VST plug-in I could get my hands on. Enter Blue Cat's Gain Suite, a completely free VST plug-in for adjusting audio volume. Despite Wavosaur's crude interface for accessing VST parameters, it worked well enough.

Hours of work reduced to seconds. Even if I take into account time spent researching the problem, I'm happy with this solution. Try doing something like this for free on a Mac!

Cleaning up After Myself

Being a tad dyspraxic means I need to spell-check everything, including my file names. Computer programs aren't forgiving of typos and misspellings; they just crash.

The best method I could come up with is to use Command Prompt to save a list of the files in a directory to a text file.

I then remove file extensions and run this text through an online spell-checker. I have my own spell checks but I'm worried that years of abuse have immunised them to some of my misspellings, as I may have simply added the misspelled words to their dictionaries.

At this point I will correct the name of Asains.WAV and any other offending files. The need that all files are named correctly intersects with a consideration of how I'm accessing the words (audio files) from my program. This will be the topic of a future post.

If you've made it this far, gentle reader, I commend your tolerance for dullness. The topic of this post is extremely boring and I've gone to little effort to liven it up.

It was only recently confirmed which space I am to use for my degree show installation, giving me roughly a month of certainty. Fortunately, I was assigned my first choice: the space I put the most into planning for. Even so, my idea was far from solid. This post documents my attempts to start putting my ideas into the world by selecting materials and altering my design with them in mind.

Overview

I have just finished updating the 3D model of my installation. I am likely to use the image below for my entry in the online version of the degree show catalogue.

 

The dimensions of this model correspond very closely to those of the materials I've ordered. The platform may not look that sturdy, as no diagonal wooden struts or fixing plates/brackets are included. Nonetheless, it is 20cm wider than I originally planned for, and has three additional pairs of vertical supports. I have made the barriers between the speakers and the space for the audience considerably shorter, in order to save on cloth and wood. Even after taking this step, I need 29m of 1.4m white curtain lining and nearly 100 meters of timber for all of the barriers – not to mention screws, staples, T-Brackets and L-Brackets.

Platform

The platform will be made of 45mm x 45mm timber, joined using steel brackets and screws, and if the structure still seems unsound, reinforced with diagonal struts made from offcuts. It is also likely that I will need to attach my platform to the walls, as the platform needs to take my weight; I've left the choice of masonry screws/bolts until after I've discussed the matter with technical staff.

 

At the bottom of the above image, you can see the three sheets of 1.24m x 2.48m 12mm thick MDF I'll be using for the top of the platforms.

Barriers

The barriers need to be opaque, white and not to muffle the sound from the speakers. This rules out barriers made entirely out of wood. I have considered a number of fabrics, to stretch over a wooden frame. 3 pass blackout material was too thick and airtight, and muffled sound to some extent. I also considered canvas, which would have worked, but would not have been white unless painted, in which case it would have tightened as it dried and put a lot of stress on my frames. I eventually opted for some cheap white poly-cotton curtain liner.

The barriers will be made in sections, probably before I set up my installation, and need to be very precisely sized if they are to make an effective wall. This is so important, that I may leave some of the smaller sections until I know exactly how long they need to be.

Misc.

All of the other major components of the installation are electronic.

The three pairs of speakers are Genius SP HF1800A desktop stereo models. I bought them due to their good sound quality and the fact that the right and left speakers came separately and could be connected using a RCA extension lead, meaning that they could go anywhere, relative to each other. After an initial failure, I have acquired some good quality RCA extensions, that don't cause signal strength to attenuate.

The webcam is a Logitec C 905. It has a fairly wide-angle Carl Ziess lens, with an middle-macro focal range. What this means is that I can purposefully adjust the focus to be too near, so that the images of people in the space are more blob-like.This should help my motion tracking software to read them as individual moving objects. In order to position this webcam above the space, I needed a USB extension, and not just any would do, it has to have an active repeater to reinforce the signal over a distance. I bought such an extension and it worked perfectly when I tested it. If it proves not to be long enough, I may find myself testing the claim that these active repeater USB extensions can be daisy-chained together.

Taking into account the speakers, webcam, cables, timber, fastenings and cloth, this installation could easily come to £700. Including tools and inevitable unforeseen costs, this could climb even higher, hopefully not reaching quadrupal figures.