Film Databases

Published: 28 May 2012

The visit to Shanghai is nearly over; I head to Sydney tomorrow night. I had grand ambitions for this month, as a concentrated effort toward the FilmsFolded essays. Things never work out as planned, of course. Sickness, kids. But there has been a whole lot of work started, more in foundations and databases than in content.

That is the bad news, that content hasn't yet flourished. But the good news is that these database efforts really seem worth the investment and I have a respectable start. One of the diverting efforts is for an online kutachi bibliography. That has taken second priority, and I will report on it in the kutachi section of the blog in a few days. We'll get that partially online soon.

The main effort is on the IMDB comments, cleaning them and designing how they will fit into the essays.

From now on, notes will be published in draft — or rougher — form, as the ideas emerge piecemeal. When an essay, film or some other note matures, we will let you know here, so your RSS feed will report specific advances in the work.

A Complete Database

Designing a Filemaker Pro database.

A previous note reports on the integration and the design for the reborn FilmsFolded.com UI. After working that out, I turned to the desired underlying web publishing frameworks and unsuccessfully interviewed some potential coders. I think I will defer that end of the thing for a month, because once I focused on the database itself, it grew in immediate importance and effort.

Toby wrote some scraper code that fueled the previous iterations. It just goes to IMDB (and also in more recent versions themoviedb.org) and copies some stuff. He wrote some accompanying scripts that apparently prettified the content, but given the state of the comments this was inconsistent.

So I decided that to do this right: the database has to be built pretty much from scratch, so as to have it be as useful as we need.

The Old Presentation

(Image) A wayback machine archive of a FF page.

A wayback machine archive of a FF page.
This is not a faithful representation of the original page. Colors are missing and most entries have a poster displayed. But you can get see here the information that was scraped: English title, year, tagline, comment title and comment text.

I have begun a Filemaker Pro (version 12) database. I‘ll worry about how it gets published and married up to the web UI later. For now, the effort is just to give us a clean, useful record, and this involves a lot of repetitive, dumb work. When I get to Virginia Beach, I will hire a student to do the heavy lifting.

The reborn database is enriched beyond what we had before. We now have new fields for:

  •  nationality
  • both English and (unicode) non-English titles if the film is non-English
  • year of film (we had this before, but it is not in the record I have)
  • date of original posting to IMDB
  • TheMovieDB.org entry number (if there is one)
  • the Tedg rating for films, including those before the ratings started — and incorporating the changes in the lists of fours.

In terms of the comments themselves:

  • spelling errors are being been corrected, but I have retained my portmanteau spellings (runon words that usually would be hyphenated)
  • typographic conventions have been normalized to conform to the rest of the site (bullets, em-dash, quotes)
  • similarly, though the original IMDB text is unstyled, I do introduce italics for emphasis and for film names
  • I have a tag for whether a comment has been rewritten or extended, though no rewriting has yet occurred

A Record in the New Database

(Image) A screenshot of an entry.

A screenshot of an entry.
The main fields are obvious here. I have not yet created the facility to characterize folding types, and instead have put some temporary comment fields at the bottom.

(I have a previous Filemaker database, a rather huge affair with a hodgepodge of information in it, including hundreds of other films. None of this is essential to the current project. It is a low priority, but that additional information may be linked in for my desktop use if it is easy. It is too early to know the difficulty. None will appear in the web interface.)

The big improvement from the current effort is that the comments will be integrated into the essays by being tagged by the folding conventions addressed by those essays. I will be starting this effort in parallel with cleaning the database. Probably, this will be by a separate table of qualities that is relationally linked into the film list.

The Fours

The format for the Fours, and the list.

Largely independent of the database of comments, I have been working over on this FilmsFolded.TedGoranson side preparing the complementary groundwork. I’ll only mention the Fours here, because that is where some progress has been made.

My intent is to go through every film listed as a Four, and do some detailed annotation. As it happens, the Fours are not the very best examples for the folding essays, and work on them will be a sort of parallel effort. A helpful reader has provided me with a detailed synopsis of two films, and I will be starting with those. The synopses need to be quite detailed because behind the scenes we will be modeling some dynamics.

As it stands today, you can go to the section on Fours and see the current list and review the format for how I intend to handle each of these examples.

© copyright Ted Goranson, 2012