|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\documentclass[final]{beamer} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\usepackage[T1]{fontenc} |
|
|
\usepackage{lmodern} |
|
|
\usepackage[size=custom,width=120,height=72,scale=1.0]{beamerposter} |
|
|
\usetheme{gemini} |
|
|
\usecolortheme{cam} |
|
|
\usepackage{graphicx} |
|
|
\usepackage{booktabs} |
|
|
\usepackage[numbers]{natbib} |
|
|
\usepackage{tikz} |
|
|
\usepackage{pgfplots} |
|
|
\pgfplotsset{compat=1.14} |
|
|
\usepackage{anyfontsize} |
|
|
|
|
|
\definecolor{nipspurple}{RGB}{94,46,145} |
|
|
\setbeamercolor{headline}{bg=white, fg=black} |
|
|
\setbeamercolor{block title}{bg=nipspurple, fg=white} |
|
|
\addtobeamertemplate{block begin}{ |
|
|
\setlength{\textpaddingtop}{0.2em} |
|
|
\setlength{\textpaddingbottom}{0.2em} |
|
|
}{} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\newlength{\sepwidth} |
|
|
\newlength{\colwidth} |
|
|
\setlength{\sepwidth}{0.025\paperwidth} |
|
|
\setlength{\colwidth}{0.3\paperwidth} |
|
|
|
|
|
\newcommand{\separatorcolumn}{\begin{column}{\sepwidth}\end{column}} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\title{Paper2Poster: \ Towards Multimodal Poster Automation from Scientific Papers} |
|
|
|
|
|
\author{Wei Pang\textsuperscript{1}, Kevin Qinghong Lin\textsuperscript{2}, Xiangru Jian\textsuperscript{1}, Xi He\textsuperscript{1}, Philip Torr\textsuperscript{3}} |
|
|
|
|
|
\institute[shortinst]{1 University of Waterloo; 2 National University of Singapore; 3 University of Oxford} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\footercontent{ |
|
|
\href{https://paper2poster.github.io/}{https://paper2poster.github.io/} \hfill |
|
|
Generated by Paper2Poster \hfill |
|
|
} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\logoright{\includegraphics[height=5cm]{logos/right_logo.png}} |
|
|
\logoleft{\includegraphics[height=4cm]{logos/left_logo.png}} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\setbeamerfont{title}{size=\huge} |
|
|
\setbeamerfont{author}{size=\Large} |
|
|
\setbeamerfont{institute}{size=\large} |
|
|
\setbeamerfont{block title}{size=\Large} |
|
|
\setbeamerfont{block body}{size=\large} |
|
|
\begin{document} |
|
|
|
|
|
|
|
|
|
|
|
\addtobeamertemplate{headline}{} |
|
|
{ |
|
|
\begin{tikzpicture}[remember picture,overlay] |
|
|
\node [anchor=north west, inner sep=3cm] at ([xshift=0.0cm,yshift=1.0cm]current page.north west) |
|
|
\end{tikzpicture} |
|
|
} |
|
|
|
|
|
\begin{frame}[t] |
|
|
\begin{columns}[t] |
|
|
\separatorcolumn |
|
|
\begin{column}{\colwidth} |
|
|
\begin{block}{Why Posters Are Hard} |
|
|
We tackle \textbf{single-page multimodal compression}: dense papers must become legible posters with \textcolor{red}{tight spatial constraints}. Pure LLM or VLM approaches \textbf{struggle with layout}, missing \textit{reading order} and \textbf{overflow control}. We reveal \textcolor{blue}{visual-in-the-loop} planning is key to \textbf{clarity}, \textbf{balance}, and \textbf{engagement}. |
|
|
|
|
|
\begin{figure} |
|
|
\centering |
|
|
\includegraphics[width=0.80\linewidth]{figures/paper-picture-1.png} |
|
|
\end{figure} |
|
|
|
|
|
\end{block} |
|
|
|
|
|
\begin{block}{Benchmark \& Task} |
|
|
We introduce \textbf{Paper2Poster} and the task: generate a \textbf{single-page}, well-balanced poster that faithfully conveys core ideas. The protocol measures \textit{what matters}: \textbf{visual alignment}, \textbf{text fluency}, \textbf{holistic quality}, and knowledge transfer via \textcolor{blue}{PaperQuiz}. Our setup \textbf{standardizes evaluation} for automated poster generation. |
|
|
\end{block} |
|
|
|
|
|
\begin{block}{Curated Diverse Dataset} |
|
|
Dataset spans \textcolor{blue}{100} paper–poster pairs (NeurIPS, ICML, ICLR). Papers average \textcolor{blue}{22.6} pages and \textcolor{blue}{20K+} tokens; posters average \textcolor{blue}{1.4K} tokens. We observe \textbf{14.4x} text compression and \textbf{2.6x} figure reduction. Coverage: CV (\textcolor{blue}{19\%}), NLP (\textcolor{blue}{17\%}), RL (\textcolor{blue}{10\%})—driving \textbf{robustness}. |
|
|
|
|
|
\begin{figure} |
|
|
\centering |
|
|
\includegraphics[width=0.80\linewidth]{figures/paper-picture-6.png} |
|
|
\end{figure} |
|
|
|
|
|
\end{block} |
|
|
|
|
|
\end{column} |
|
|
\separatorcolumn |
|
|
\begin{column}{\colwidth} |
|
|
\begin{block}{Four-Pronged Evaluation} |
|
|
Our \textbf{four-pronged} suite tests end-to-end quality: Visual Quality via \textcolor{blue}{AltCLIP} similarity and \textbf{figure relevance}; Textual Coherence via \textcolor{blue}{PPL} (Llama-2-7B); VLM-as-Judge across \textbf{6 criteria}; and \textcolor{blue}{PaperQuiz} with length-aware penalties rewarding \textbf{dense, readable} designs. |
|
|
|
|
|
\begin{figure} |
|
|
\centering |
|
|
\includegraphics[width=0.80\linewidth]{figures/paper-picture-7.png} |
|
|
\end{figure} |
|
|
|
|
|
\end{block} |
|
|
|
|
|
\begin{block}{PosterAgent Pipeline} |
|
|
PosterAgent is \textbf{top-down, visual-in-the-loop}. \textit{Parser} builds a semantic asset library; \textit{Planner} aligns text–visual pairs and uses \textcolor{blue}{binary-tree} layouts to preserve \textbf{reading order}. \textit{Painter-Commenter} renders panels, applies \textcolor{blue}{zoom-in} VLM feedback, and fixes \textbf{overflow} and \textbf{alignment}—yielding concise, coherent posters. |
|
|
|
|
|
\begin{figure} |
|
|
\centering |
|
|
\includegraphics[width=0.80\linewidth]{figures/paper-picture-8.png} |
|
|
\end{figure} |
|
|
|
|
|
\end{block} |
|
|
|
|
|
\begin{block}{Main Results} |
|
|
Across metrics, \textbf{PosterAgent} variants beat multi-agent baselines. We attain \textcolor{blue}{state-leading figure relevance} and near-\textbf{human} visual similarity. GPT-4o pixel posters look good but show \textcolor{red}{noisy text} and high \textcolor{red}{PPL}. VLM-as-Judge scores place PosterAgent-4o at \textcolor{blue}{3.72} overall, approaching GT posters. |
|
|
|
|
|
\begin{figure} |
|
|
\centering |
|
|
\includegraphics[width=0.80\linewidth]{figures/paper-table-1.png} |
|
|
\end{figure} |
|
|
|
|
|
\end{block} |
|
|
|
|
|
\end{column} |
|
|
\separatorcolumn |
|
|
\begin{column}{\colwidth} |
|
|
\begin{block}{PaperQuiz Insights} |
|
|
\textcolor{blue}{PaperQuiz} tracks human judgment and rewards \textbf{informative brevity}. With penalties, GT posters lead; \textbf{PosterAgent} tops automated methods. Open-source \textcolor{blue}{Qwen-2.5} stacks stay \textbf{competitive}. Stronger reader VLMs exploit \textbf{structured layouts}, outperforming blog-like or \textcolor{red}{text-garbling} image generations. |
|
|
|
|
|
\begin{figure} |
|
|
\centering |
|
|
\includegraphics[width=0.80\linewidth]{figures/paper-picture-9.png} |
|
|
\end{figure} |
|
|
|
|
|
\end{block} |
|
|
|
|
|
\begin{block}{Efficient, Open, Scalable} |
|
|
Our pipeline slashes tokens by \textcolor{blue}{60–87\%}. PosterAgent-4o uses \textcolor{blue}{101K} tokens (\textcolor{blue}{\$0.55}); PosterAgent-Qwen uses \textcolor{blue}{47.6K} (\textcolor{blue}{\$0.0045}). Runtime ≈ \textcolor{blue}{4.5 min}. \textcolor{red}{Bottleneck}: sequential panel refinement; \textbf{future} parallelism, external knowledge, and human-in-the-loop will boost \textbf{engagement}. |
|
|
|
|
|
\begin{figure} |
|
|
\centering |
|
|
\includegraphics[width=0.80\linewidth]{figures/paper-table-8.png} |
|
|
\end{figure} |
|
|
|
|
|
\end{block} |
|
|
|
|
|
\end{column} |
|
|
\separatorcolumn |
|
|
\end{columns} |
|
|
\end{frame} |
|
|
|
|
|
\end{document} |
|
|
|