"Towards fully automated manga translation." "Due to the high cost of translation, most comics have not been translated and are only available in their domestic markets. What if all comics could be immediately translated into any language?"
"What makes the translation of comics difficult?" Utterances by a character are divided up into multiple bubbles, interlaced with bubbles from other characters. They are not necessarily aligned in a straightforward way such as from left to right or right to left or top to bottom. It is necessary to figure out the visual structure just to figure out what the correct order is to arrange the bubbles. Not only that, but the meaning of the pictures has to be deciphered to do a correct translation, for example, some Japanese words can be translated into both "he", "him", "she", or "her", and without understanding the pictures, these ambiguities can't be resolved.
So what these researchers did was make a manga translation dataset which is a massive collection of "parallel corpus", that is, manga that have already been translated between languages, with the translated versions paired up with the original, that an AI system can train from, as well as a system for extracting pictures and text from the corpus. From this they built a "multimodal context-aware translation system" that was able to fully automate the manga translation process. By "multimodal", they mean the same system takes input in the form of text and pictures (so two "modes"), and by "context-aware", they mean the language translation takes into account the context provided by the pictures.
The translation system receives a manga image and a set of texts. The image represents a whole page, not just a single frame, and the system will figure out whether all the frames represent the same scene, or whether there are multiple scenes. Each of the texts has a "bounding box" that indicates where it came from in the image. A convolutional neural network called Faster R-CNN is used to determine what scene or scenes are in the frames.
Next the reading order is determined. This is first estimated by a simple algorithm that assumes frames are in rows, and within each row, frames are in columns, with a right-to-left (for Japanese), then top-to-bottom reading order. This gets the reading order about 92% correct. The remaining 8% irregular cases are handled by a supervised learning algorithm.
Next a system called illustration2vec is used to create a vector (this is the "embedding" idea though they don't use the word) describing the scene. The output of this is tags like "boy", "girl".
To do the actual translation, the input text is all concatenated together, translated all at once, and then broken back apart into the proper speech bubbles on the other end. To make this work, though, tags from illustration2vec are inserted into the text, which provide context to the translation system.
All of this glosses over additional complex issues that the researchers had to solve to make this work, such as identifying where text is in an image, figure out what style the text is written in, since text in manga is written in various styles, and extract the text.Towards Fully Automated Manga Translation
We tackle the problem of machine translation of manga, Japanese comics. Manga translation involves two important problems in machine translation: context-aware and multimodal translation. Since text and images are mixed up in an unstructured fashion in Manga, obtaining context from the image is essential for manga translation. However, it is still an open problem how to extract context from image and integrate into MT models. In addition, corpus and benchmarks to train and evaluate such model is currently unavailable. In this paper, we make the following four contributions that establishes the foundation of manga translation research. First, we propose multimodal context-aware translation framework. We are the first to incorporate context information obtained from manga image. It enables us to translate texts in speech bubbles that cannot be translated without using context information (e.g., texts in other speech bubbles, gender of speakers, etc.). Second, for training the model, we propose the approach to automatic corpus construction from pairs of original manga and their translations, by which large parallel corpus can be constructed without any manual labeling. Third, we created a new benchmark to evaluate manga translation. Finally, on top of our proposed methods, we devised a first comprehensive system for fully automated manga translation.arxiv.org