We’ve just been awarded another patent: for collaborative film interpretation. This one won’t be implemented in our system for some time but will be of interest to someone, possibly you. At the end, we’ll explain how we’ll implement it in redframer. In the meanwhile, we’d like to work with some other clever team taking advantage of this.
We can’t do sensor-forward recognition well and we can’t support real time sensor-forward adaptive interpretation.
if you have a film (or image), the current state of the art is not bad for feature identification, depending on the application. We can often tell that something is a face for instance and relate it to a small set of known individuals. Another example is that sequential aerially captured radar images can discriminate between farm vehicles and tanks, or can identify potentially threatening behavior in groups. Typically, what happens is that you have a video capture device that sends stuff to another device that performs the recognition. The capture and the recognition functions are separate in time and space.
What you can’t do well without us: you can’t do the semantic recognition close to the sensor. You might want to do this if your sensor is on an autonomous aircraft and it detects an immediate threat it needs to act on. But a more common reason is that you might want to instantly reconfigure the sensor/camera to give you more information of the kind you need to be able to know what you want. Another reason is that you may have many sensors ‘looking’ at things. We’ll visit that in moment.
The reason for not doing well in this regard is simple: the technology for getting images is different than that for thinking about them.
What We Can't Do At All
Military guy: suppose you have many sensors in the air, each can only give you a low confidence of knowledge, but all together can produce a very high level of insight, allowing you to collaboratively swarm without heavy communication.
Social Platform guy: suppose you have millions of new videos a day coming into the platform and you want to know what is going on in aggregate, both for your business strategist friends and your users.
Intelligence guy: suppose you think of your monitoring of voice and text messages as streams and you want in real time to be able to ‘connect dots’ reliably.
What we can’t do is connect the reasoning systems of vastly many sources in real time so that adaptive feature recognition, assembly and deduction can be performed.
How we will use this in redframer. There are about 50-60,000 feature films of interest to prospective redframer users. Over time, users will attach deep knowledge to these, but there are many elements that can be identified with this system.
A simple example is building an influence network of fight/chase choreography as it has evolved.
Another is identifying certain conventions of how actors modulate action. Philip Seymour Hoffman had an amazing ability to deliver a line but delay his associated facial expression almost a second so that the spoken information is in the time frame of the film but the visible information is in the time frame of the audience. Some directors (like Charlie Kaufman) know how to amplify this cinematically.
Suppose a user focused on this, assume it has a history in film and trigger the system to trace back and find the way we came to understand and value this.
How the Streamer Works
All reasoning systems boil down to picking the right abstractions to work with and relating them in ways to do what you need. Our group always starts with the abstractions. What we wanted were:
- Features that could be taken directly from a stream feed with special attention to video features. These should include visible info, like edges and patterns, but also implied features like change or omission.
- Features that have semantic weight, meaning that if we operate on them in certain ways, the results can be considered logical deductions. These don’t have to have semantics (in the ordinary ontologically defined sense) when extracted but have to come closer to ordinary facts the more they are handled.
- Features that are lightweight in two ways that often conflict. They have to be able to be communicated to possibly hundreds of thousands of peer processors in extremely terse statements; at the same time they have to enable operations that scale immensely within local processor arrays.
So far as the operations themselves, they had to be co-invented at the same time, being:
- Operators that are a lightweight enough that they can be handled if need be in hardware by processors located in or part of the sensor. This latter should optionally use existing digital signal processing and/or field-programmable
Though here we mention FPGA hardware, the idea is to be as hardware agnostic as possible. We don’t care what the architecture or instruction set is, though reprogrammability will be useful. Some early focus was on military airborne sensors and their onboard processing and filtering systems that typically use application specific integrated circuits.