Data and the Future of Indie Film: Part 2

Kieran Masterton
Assemble Academy
Published in
8 min readMar 17, 2016

--

In the first part of this series I touched on why data is so important to the film business and why I think indie filmmakers should care about data. I also proposed a vision for a system that, I believe, could bring some of the data mining capabilities currently employed by the Majors to indie film producers. In this article, the second part of the series, I want to give some context by explaining how the Majors are already using data to produce, distribute and market their products; explore how indies could adopt similar approaches; and also discuss some of the existing projects working to open up data in the film industry.

Data Mining and the Majors

“Years ago competing films would be listed on a wall chart in a conference room. Today, this information is contained in a sophisticated movie marketing system”

Hollywood have over 100 years of film making and selling experience to call upon. This also means that, when making decisions about the production or release of a film, they have over 100 years of filmic data at their disposal. In his book, Hollywood in the New Millennium, recently published by the BFI, academic Tino Balio discusses Warner Brothers’ reliance on market research and their data mining capabilities when making key strategic decisions. Balio references Daniel Fellman, Warner Bros. President of Domestic Distribution, describing the studio’s ‘4 Star’ computer system in The Movie Business Book. Fellman discusses the extent of the system’s capabilities and the company’s applications for the data held within:

[4 Star] tracks every Warner project, as well as our competitor’s projects, in development, production or release. Our customized system can display the box office history of any actor, producer, director or film in seconds; we can analyze a marketplace, release schedule, daily grosses, reviews, demographics, trailers, TV spots, print ads, poster, Web sites, and year-to-date box office performance. Whatever information is needed regarding talent, box office receipts or research to aid in the decision making process, it’s in there. It’s also persuasive when talking with film-makers and making a case for critical decisions, such as a release date.

Fellman goes on to demonstrate some of the capabilities of the ‘4 Star’ system with two key examples. The first charts comparative grosses for one weekend and details revenue figures for “FSS” (Friday-Saturday-Sunday); “FFS WK #” or the number of weeks in release; “FSS RUNS”, the number of theatres; and “CUM” or cumulative figures since release. The data in the example is dense and not easily digestible at first glance but covers the entire industry’s revenue, including WB own films, for one weekend. His second example is less dense but no less powerful, it charts the entire industry’s release schedule for the weeks following the opening of Harry Potter and the Sorcerer’s Stone on November 16th 2001 through to January 4th 2002. This schedule demonstrates how Warner’s system can give execs a picture of the marketplace for any given window, allowing them to position their products to achieve maximum profitability.

It’s clear from Fellman’s examples the strong emphasis the Majors place on using data to facilitate decision making. These examples might not seem that impressive by today’s standards, when you can browse to Box Office Mojo and get what appears to be similar data. However, it’s important to remember that Fellman was writing in the early ‘00s and the studio’s systems have developed even further in their capabilities in the intervening years. Likewise, it’s critical to note that it’s all very well being able to access this data but it’s how you query the data, the context you place the data in and the insights you can make that bring real value.

Warner Brothers’ approach to data is symptomatic of the the behaviour of other major studios and indeed of most multinational corporations trying to achieve a competitive advantage in the global marketplace. It makes sense to leverage the information you amass about your industry to better understand your marketplace and out manoeuvre your competition.

We know that Hollywood’s dominance is being mildly disturbed due to massive changes in the distribution landscape, but they still retain a strong foothold because they know more than anyone about selling filmic products. It’s much easier to decide to extend a theatrical window, for example, if you have real-time data combined with an existing data model that tells you how a particular type of film has performed during previous expansions. Not only that, but it’s easier still if you also have information available on the optimum number of theatres, screens and the best location of those theatres. If Google can predict the opening weekend box office gross of a movie based on search trends, then I promise you Hollywood’s internal data mining operation can tell you the same, based on historical film release data.

Producing Films in the Dark

Right now, none of these capabilities are available to independent filmmakers, producers or distributors. Unlike the Majors, Indies do not have the luxury of decades of data collection to call upon in the development, production and distribution of their products. Indies are producing films in the dark. However, it is not out of the grasp of these producers to achieve something similar and even better than the system Fellman describes. It requires real cooperation and it’s this cooperation where the rub will lie. As described in the first part of this series it is technically possible for Indies to build a distributed piece of software akin to Warner’s 4 Star data mining operation. Vitally, however, unlike the Majors, it requires trust and the sharing of meaningful data to really be effective.

Suppose for a moment that these trust issues were resolved and that a unified, distributed, and importantly, open system existed for accessing film data and analytics. This system would revolutionise the way Indies finance, produce, market and distribute their films. For example, instead of asking investors to take a stab in the dark, filmmakers seeking funding could present an accurate, contextualised analysis of how their previous projects performed and demonstrate projections based on how, given a specific release strategy, other projects of a similar nature behaved in the market. These data models and projections would also grow in accuracy as more data enters the system, thus building trust in its abilities as more and more companies contribute.

Obviously, as with the systems used by the Majors, there are unforeseen circumstances and edge cases. Nobody, let alone a computer system, could, for example, have known that Madeleine McCann would disappear shortly before the proposed release of Gone Baby Gone in the UK. Such unforeseeable situations are beyond the vision of even the most advanced data analysis system. However, importantly, the kind of system I’m proposing and the sort of projects some in our industry are already working on, could get Indies a lot closer to more accurately understanding the marketplace for their products.

Data Projects of Interest

For the last couple of years I’ve been obsessed with finding anyone doing anything interesting with open film data. There are already some companies, organisation and individuals working on existing projects which explore the power of data in indie film in an open way. That said, none of the endeavours discussed below have ambitions on the scale of what I proposed in part 1 of this series, but they exemplify the open attitude to data I’m trying to advocate. What follows is a brief overview of what’s happening in the open film data space; if you know of any projects I’ve missed then I’d love to hear about them.

Stephen Follows’ Film Industry Data Blog

I’ve followed Stephen for a while on Twitter and found the work he does fascinating. On his blog he explores what data tells us about the film industry, in particular the British film industry. But this isn’t just a superficial glance at some meaningless statistics, this is carefully constructed data and analysis by someone who knows what he’s talking about.

The insights that Stephen makes in his blog are the kind of insights I’d like to see publicly displayed by an automated fully-distributed data mining system for indie film. For a flavour of what Stephen writes why not check out his latest post, Does Hollywood use the same movie release pattern every year?

VHX Stats

As you probably know, VHX are a comprehensive software-as-a-service video distribution company. Their particular focus is on multi-platform SVOD services, but they also offer a transactional and rental service too. The team at VHX, in an attempt to further the process of opening VOD data, have release VHX Stats. The site, updated once a day, displays stats about the number of sales VHX have made, where they’ve made them and how the user discovered the content.

While it isn’t possible to query this data or see data by content type etc. it is encouraging that VHX are forward thinking enough to start to share their data with the world. My hope is that eventually companies will make more data like this available via their APIs for use in an open data mining tool.

The Sundance Transparency Project

This project began life with a humble pilot study in 2013, but has more recently launched into the wild as a fully functional app. Created by Sundance, in partnership with Cinereach, the online tool aggregates financial data about the film industry and allows filmmakers using the system to query the data via a user friendly interface.

Here at Assemble we are the lead tech partner to the Transparency Project and our founder, James Franklin, has been intimately involved with the design and build of the project. I believe that right now this project is our best hope for a collective enterprise, open tool for indie film data insight. It’s definitely worth keeping an eye on!

Producer Foundry and Media Research Associates’ State of the Film Industry Survey

Sponsored by Fandor, Stage 32 and IndieWire this survey aims to take the pulse of the film industry and expose industry data that typically doesn’t see the light of day. It’s a bit of a shame that those outside the United States can’t fill in the survey because the postal code field won’t allow for non-US codes. However, I’m going to be really interested to read the results of the survey once they’re published. If you’re a filmmaker you should certainly consider completing the survey.

The Numbers Sites

These have been around for a while but, surprisingly, a lot of people don’t know just how much data these sites make available. The likes of BoxOffice.com, Box Office Mojo and The Numbers are all great for getting raw data about films with theatrical distribution deals and for basic comparison of film performance. However, frustratingly they lack VOD data and as I’m sure you’re aware VOD is on the rampage, eating up broadcast and home entertainment markets. As a result these resources are becoming less and less useful without this vital data. Likewise, while they expose some of the information available to the Majors in their systems it doesn’t they don’t allow for custom queries or insights. Also, needless to say, these sites don’t carry data for a vast amount of indie films that didn’t take the traditional distribution route.

What’s Next?

It’s obvious from this brief overview of film data projects only Sundance’s Transparency Project is really coming close to the ambition I described in my first article. However, I think it’s clear that there’s an appetite to solve this problem of producing films in the dark. The question, as always, is one of cooperation. In the next part of this series I’m going to propose how we could take a step towards making a fully-distributed, indie film data solution a reality. In the meantime, if you’ve come across any interesting film-centric data projects please message me on Twitter.

--

--