M2M-Gen: Background Music Generation for Japanese Manga using Large Language Models

Abstract This paper introduces M2M-Gen, a multimodal framework for generating background music tailored to Japanese manga. The key challenges in this task are the lack of available data or a baseline. We propose M2M-Gen, an automated background music generation pipeline which produces background music for an input manga book. Initially, we use the dialogues in a manga to detect scene boundaries, then perform emotion classification and generate detailed captions for each page within a scene. GPT-4o transforms these detailed music captions into high-level musical directives that guide a text-to-music model to produce music aligned with the manga's evolving narrative. The effectiveness of M2M-Gen is confirmed through extensive subjective evaluations, showcasing its capability to significantly enhance the manga reading experience by synchronizing music that complements specific scenes.

M2M-Gen Pipeline
Fig. 1 Our background music generation pipeline takes a manga book as input and generates a tailored audio music file for manga scenes.

The following sections provide examples of background music generated for manga scenes using M2M-Gen, a baseline and a random model.

ARMS (Genre - Battle)

Courtesy of Kato Masaki

M2M-Gen

Baseline

Random

Tasogare Tsushin (Genre - Horror)

Courtesy of Tanaka Masato

M2M-Gen

Baseline

Random

Totteoki No ABC (Genre - Romantic Comedy)

Courtesy of Aida Mayumi

M2M-Gen

Baseline

Random

Nichijou Soup (Genre - Humour)

Courtesy of Shindou Uni

M2M-Gen

Baseline

Random

Gakuen Noise (Genre - Battle)

Courtesy of Inohara Daisuke

M2M-Gen

Baseline

Random

Kuroido Ganka (Genre - Suspense)

Courtesy of Taira Masamie

M2M-Gen

Baseline

Random