Bruce Sterling

null

1. There’s a concept from science fiction criticism which has become a favourite of mine. Indeed it seems fundamental to this 21st century glocal postmodernity of ours, the concept of consensus reality.
1.1 It is worth remembering that this consensus often refers to the beliefs of the society in the work under criticism, in which marmalade may be money, spaceships may fly faster than light, and handheld communicators with vid screens may be ubiquitous.

2. The idea of consensus reality neatly captures several insights.
2.1 Reality proper, what Kant called the unsynthesized manifold, is unavoidably mediated by our senses and brain.
2.2 Our model of the world is socially constructed by a group we live in.
2.3 Powerful institutions of mainstream thought – like large newspapers – work within certain parameters of perception.
2.3.1 The first page of search engine results are representative. They are consensus reality engines. Common sense engines, in Bruce Sterling’s words.
2.4 Something in the consensus is inevitably and always wrong.
2.4.1 The consensus contains arguments with known for and against positions.
2.4.1.1 The argument itself can be wrong, irrelevant, meaningless side effect, not resolvable as either pro or con, etc.
2.5 Broad consensus realities often have enduring correlations with events.
2.6 Consensus is reinforced by breadth.

3. Kuhn’s concept of a scientific paradigm resembles a consensus reality, but is far more systematic.
3.1 Consensus reality includes cultural convention and everyday discussion including obvious internal logical contradictions.
3.2 Consensus reality is intuitive.
3.3 Consensus reality may be surprising – chance events – but not unanticipated ones.
3.3.1 “Black swans” are demonstrations of consensus reality.
3.3.2 Commuting to work is also demonstrative.

4. A reality based community responds to empirical sense-data.
4.1 Measures.
4.2 Adjusts in response to changes in data.
4.3 Follows technique.
4.3.1 Technique may be systematic. It may have a model.
4.3.1.1 The model may be tested empirically and systematically.
4.3.1.2 One might use a randomised controlled trial, or survey, or historical data source, or blind peer review.
4.4 Reality based communities survive by adaptation.
4.5 Strongly reality based communities would necessarily be scientific communities.
4.5.1 No serious political community today is also a scientific community.
4.5.1.1 Establishing professional pools of expertise for these processes is necessary but not sufficient.
4.5.1.1.1 Any such group analysing a public problem is inherently political.
4.5.1.1.2 This is technocracy.

5. The consensus reality based community is always broad, often well-established and always vulnerable to disruption of its reality.
5.1 This is the nature of Karl Rove’s insult.
5.1.1 By always anchoring themselves in well established consensus reality, Rove’s opponents fail to react to events initiated by his faction which change the broad understanding of reality.
5.1.2 Rove’s faction has since, with amusing consistency, repeatedly showed themselves to not be reality based.
5.1.2.1 This faction acts as an alternative consensus reality based community.
5.1.3 In rejecting the dominant consensus reality, and its rhetoric of objective evaluation, they went straight on and also rejected a reality base for their community.
5.1.3.1 This is not a survival technique.
5.1.3.2 On the day of the 2012 US Presidential election, both major parties expected to win.
5.2 The consensus reality based community may even tacitly acknowledge it is not reality based.
5.2.1 This is a society in which the consensus ritual detaches from its social meaning.
5.2.2 Incongruence between political consensus reality and reality manifests in scandal.
5.2.2.1 Fin de siècle Vienna.
5.2.2.2 Late Ming China.
5.2.3 Incongruence between social consensus reality and geophysics and biology manifests in natural disaster.
5.2.3.1 The Aral Sea.
5.2.4 Incongruence between financial consensus reality and economic and psychological reality manifests in financial crisis.
5.2.4.1 CDOs and CDSs.
5.2.4.2 South Sea Bubble.
5.2.4.3 Louisiana.
5.2.4.4 Tulips.

6. The siblings of consensus reality are the consensus future and the consensus past.
6.1 Revision is the change of the consensus past.
6.2 Changes to the consensus future feel like betrayal or relief.

We test the proposition that the number of movies in the same calendar year combining strippers and a specific subgenre cannot exceed one without the genre being fatally destroyed in popularity. We name this the CRwM Stripper Genre Collision Hypothesis after its originator, CRwM, writing at And Now The Screaming Starts. Simple computational analysis of the wikipedia English language corpus provides some, non-definitive, support.

There’s No Computational Sociology In The Champagne Room

CRwM writes:

When no systematic approach is available, then the best you can do is pick an arbitrary point and map out the trajectory of the waning trend from your specific frame of reference. So long as you admit your frame of reference is singular, you’ll allow others, each observing the same phenomenon from their distinct frames of reference, to make whatever calculations to sync up the observations. Applying this pragmatic solution to the problem, I’m defining my frame of reference thusly: A trend is creatively bankrupt when, within a single year, you get two films mashing-up the trend with strippers. — ibid

Though there may indeed be no objective measure for trend fizzlation, we can perhaps be more systematic in our investigation of any given frame of reference. Our approach is to leverage techniques from computational sociology to brute force search a partially structured film description corpus. Though IMDB is probably the best data source readily available to the public, its USD $15000 license fee means using it will have to wait until my long awaited funding from the Institute of Piss Farting About comes through. In the mean time, wikipedia is actually a pretty decent source. Though the English language version does not have great coverage on, e.g., foreign films, I would hypothesise that the gaps are congruent with the focus on trendspotting in this exercise. Wikipedia is, in the words of Bruce Sterling, a kind of common sense engine. It is suited to sampling what we think we know.

Though the approach might be naive enough to be described as folk computational sociology, I prefer to think of it as punk rock.

Spins A Web, Any Size

Though there is a very active tools community around wikipedia, most of it seems to be focused on productivity scripts for editors. Things like auto classification and flagging scripts are popular, and no doubt very useful to the editorial group, if the robot history on my very occassional contributions are any guide. The search toolset seemed from a brief google literature search to be either very simple and widely available (use google to hit a single page) or sophisticated papers of vector based searches implemented on the server side. Our cinematic exploration seemed to fall between these extremes.

My first crack at getting movie data out of wikipedia was to hit the film category page for a year and script a primitive web spider to suck down all the data from that starting point. The top entry on stack overflow also happens to suggest this.

Though I did get this on the way to working, and it can be seen as movieSpider.py at the github repo mentioned later, it’s a lousy approach. Not only do you have to tool about with faked headers because wikipedia doesn’t really want you to do this, you hit the same pages over and again while troubleshooting. You have to deal with the relatively unstructured format of HTML with embedded tags, which implies bucketfuls of heuristics to pull out anything meaningful. Plus if you get it working, you will want to expand the time range, and end up downloading a fair chunk of wikipedia anyay.

Takeaway Corpi

It turns out that wikipedia hosts backups of its entire database in convenient xml export formats. This includes partition by language and current version archives (without all the history and discussion). These data dumps are available here. At a couple of gig, compressed, even the fairly pathetic caps and bandwidth rates of say, Australian broadband, can deal with it in a day or so while you amuse yourself playing badminton. Once uncompressed, a recent version of English language wikipedia takes up around 30 Gb, or in other words, can fit on an iPhone 4.

Once available locally, running searches is quicker, particularly while debugging a script. Extracting a subset also becomes much simpler. Pros seem to rebuild the entire database, including indexes. Indexes didn’t seem much use to me here, as I was hitting the full content of a page, but maybe I’ve underestimated the power of the word indexing in a basic local database. At any rate, the structured data was sufficient for myself.

A short python script of a few hundred lines lets us pull out a particular subset of wikipedia according to a regular expression run as a search on the page content. If there is a hit, we save the entire page. This is found in movie.py and available at github. Building the subset file takes about seven hours on my machine. Using the regex [Category:[0-9]??? films], we can pull in any page that mentions films of a particular year. The resultant subset is a decent film corpus weighing in at a trim 292 Mb.

This same script can be used with minor modification for searching other spaces that attract wikipedia editors of a particularly pedantic and taxonomical breed. Their painstaking sifting of the world into categories is what makes tricks like this possible. You could, for instance, use it to build a wiki subset of military battles with a regex like [Category:Battles involving.*].

Once the subset database is built, we can run a similar expression search across it, but one aware of the structure of film pages – that they have a title, and a category indicating a year. We can therefore attempt a quantitative validation of the search CRwM did by pure pop culture brainpower:

$ python movie.py -e stripper -e zombie

The result of this is

Zombie Strippers -- 2008 Kiss the Bride (2008 film) -- 2008 # false positive Zombies! Zombies! Zombies! -- 2007 I Am Virgin -- 2010 Big Tits Zombie -- 2010 Can't Hardly Wait -- 1998 # song by White Zombie on soundtrack The Incredibly Strange Creatures Who Stopped Living and Became Mixed-Up Zombies -- 1964 Hide and Creep -- 2004 end 8 Hits: 8 Scanned: 60757

This is not a fully automated process – as I have annotated above, Kiss The Bride, though it possibly would have been enriched by either zombies or strippers, has neither. A review of Zombie Strippers is instead cited in its footnotes. Similarly this brute force text search is ignorant of synonyms – any great zombie burlesque films of the 1920s are liable to be skimmed over without comment.

We also find that the editorial consensus at wikipedia disagrees with CRwM on one crucial point – it asserts Zombie Strippers and Zombies! Zombies! Zombies! were not made in the same calendar year.

Applications

Though the script should be a productivity boost to film subgenre scholars, it still requires a great deal of human insight to make its results valuable. It works best with very concrete and widely recognized subgenre identifiers. Any more complex critical viewpoint is obscured by the lack of a shared jargon. For instance, CRwM’s example of From Dusk till Dawn as being a pomo deadpan crime flick is hardly controversial, but insufficiently universal to appear in wikipedia entries across the subgenre.

Some other notable datapoints:

There are no stripper werewolf films. The blue movie scene near the end of American Werewolf In London is insufficiently focused to qualify.
This technique confirms no vampire stripper collision counts exceeding one in a calendar year.
Two films in 1964 were both musicals and featured strippers: Robin and the 7 Hoods and the aforementioned and new to the author The Incredibly Strange Creatures Who Stopped Living and Became Mixed-Up Zombies. Since The Sound of Music appeared in 1965, and the genre survived at least until the popularity of Cabaret, we posit that either Robin contains insufficient stripper content to qualify as a stripper musical, or that musicals, as a full-blown genre, are outside the scope of the CRwM Stripper Genre Collision Hypothesis.

Though we believe our results should be repeatable, keeping in mind the central role of a critical human eye in this endeavour, for the convenience of those cinephiles who are interested in the output, but not technically inclined, a sorted listing of every stripper film in wikipedia is provided. This paragraph seems a fair bit creepier now than when we first thought of it.

Conclusion and future work

Searching for “stripper zombie” on wikipedia yields 108 results of varying quality. Using the techniques above this can be narrowed to six films. A film subset database built from an expression seems like something someone else could use. Say, to pitch a zombie werewolf stripper musical. Perhaps one that’s incredibly mixed up.

The two-stripper-flicks-a-year thing isn’t meant as a value judgment. It’s simply a law of the universe. — TNCITCM, you know, the article this whole post is about.

Conflated Automatons

Adam rambles

The Consensus Reality Based Community

A Quantitative Exploration of the CRwM Stripper Genre Collision Hypothesis