Scientists create AI that ‘watches’ videos by mimicking the brain


Imagine an artificial intelligence (AI) model that can watch and understand moving images with the subtlety of a human brain. Now, scientists at Scripps Research have made this a reality by creating MovieNet: an innovative AI that processes videos much like how our brains interpret real-life scenes as they unfold over time.

This brain-inspired AI model, detailed in a study published in the Proceedings of the National Academy of Sciences on November 19, 2024, can perceive moving scenes by simulating how neurons — or brain cells — make real-time sense of the world. Conventional AI excels at recognizing still images, but MovieNet introduces a method for machine-learning models to recognize complex, changing scenes — a breakthrough that could transform fields from medical diagnostics to autonomous driving, where discerning subtle changes over time is crucial. MovieNet is also more accurate and environmentally sustainable than conventional AI.

“The brain doesn’t just see still frames; it creates an ongoing visual narrative,” says senior author Hollis Cline, PhD, the director of the Dorris Neuroscience Center and the Hahn Professor of Neuroscience at Scripps Research. “Static image recognition has come a long way, but the brain’s capacity to process flowing scenes — like watching a movie — requires a much more sophisticated form of pattern recognition. By studying how neurons capture these sequences, we’ve been able to apply similar principles to AI.”

To create MovieNet, Cline and first author Masaki Hiramoto, a staff scientist at Scripps Research, examined how the brain processes real-world scenes as short sequences, similar to movie clips. Specifically, the researchers studied how tadpole neurons responded to visual stimuli.

“Tadpoles have a very good visual system, plus we know that they can detect and respond to moving stimuli efficiently,” explains Hiramoto.

He and Cline identified neurons that respond to movie-like features — such as shifts in brightness and image rotation — and can recognize objects as they move and change. Located in the brain’s visual processing region known as the optic tectum, these neurons assemble parts of a moving image into a coherent sequence.

Think of this process as similar to a lenticular puzzle: each piece alone may not make sense, but together they form a complete image in motion. Different neurons process various “puzzle pieces” of a real-life moving image, which the brain then integrates into a continuous scene.

The researchers also found that the tadpoles’ optic tectum neurons distinguished subtle changes in visual stimuli over time, capturing information in roughly 100 to 600 millisecond dynamic clips rather than still frames. These neurons are highly sensitive to patterns of light and shadow, and each neuron’s response to a specific part of the visual field helps construct a detailed map of a scene to form a “movie clip.”

Cline and Hiramoto trained MovieNet to emulate this brain-like processing and encode video clips as a series of small, recognizable visual cues. This permitted the AI model to distinguish subtle differences among dynamic scenes.

To test MovieNet, the researchers showed it video clips of tadpoles swimming under different conditions. Not only did MovieNet achieve 82.3 percent accuracy in distinguishing normal versus abnormal swimming behaviors, but it exceeded the abilities of trained human observers by about 18 percent. It even outperformed existing AI models such as Google’s GoogLeNet — which achieved just 72 percent accuracy despite its extensive training and processing resources.

“This is where we saw real potential,” points out Cline.

The team determined that MovieNet was not only better than current AI models at understanding changing scenes, but it used less data and processing time. MovieNet’s ability to simplify data without sacrificing accuracy also sets it apart from conventional AI. By breaking down visual information into essential sequences, MovieNet effectively compresses data like a zipped file that retains critical details.

Beyond its high accuracy, MovieNet is an eco-friendly AI model. Conventional AI processing demands immense energy, leaving a heavy environmental footprint. MovieNet’s reduced data requirements offer a greener alternative that conserves energy while performing at a high standard.

“By mimicking the brain, we’ve managed to make our AI far less demanding, paving the way for models that aren’t just powerful but sustainable,” says Cline. “This efficiency also opens the door to scaling up AI in fields where conventional methods are costly.”

In addition, MovieNet has potential to reshape medicine. As the technology advances, it could become a valuable tool for identifying subtle changes in early-stage conditions, such as detecting irregular heart rhythms or spotting the first signs of neurodegenerative diseases like Parkinson’s. For example, small motor changes related to Parkinson’s that are often hard for human eyes to discern could be flagged by the AI early on, providing clinicians valuable time to intervene.

Furthermore, MovieNet’s ability to perceive changes in tadpole swimming patterns when tadpoles were exposed to chemicals could lead to more precise drug screening techniques, as scientists could study dynamic cellular responses rather than relying on static snapshots.

“Current methods miss critical changes because they can only analyze images captured at intervals,” remarks Hiramoto. “Observing cells over time means that MovieNet can track the subtlest changes during drug testing.”

Looking ahead, Cline and Hiramoto plan to continue refining MovieNet’s ability to adapt to different environments, enhancing its versatility and potential applications.

“Taking inspiration from biology will continue to be a fertile area for advancing AI,” says Cline. “By designing models that think like living organisms, we can achieve levels of efficiency that simply aren’t possible with conventional approaches.”

This work for the study “Identification of movie encoding neurons enables movie recognition AI,” was supported by funding from the National Institutes of Health (RO1EY011261, RO1EY027437 and RO1EY031597), the Hahn Family Foundation and the Harold L. Dorris Neurosciences Center Endowment Fund.

Leave a Reply

Your email address will not be published. Required fields are marked *