Spaces:

JaceWei
/

PaperShow

Running

File size: 7,322 Bytes

% Unofficial University of Cambridge Poster Template
% https://github.com/andiac/gemini-cam
% a fork of https://github.com/anishathalye/gemini
% also refer to https://github.com/k4rtik/uchicago-poster

\documentclass[final]{beamer}

% ====================
% Packages
% ====================

\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[size=custom,width=120,height=72,scale=1.0]{beamerposter}
\usetheme{gemini}
\usecolortheme{cam}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage[numbers]{natbib}
\usepackage{tikz}
\usepackage{pgfplots}
\pgfplotsset{compat=1.14}
\usepackage{anyfontsize}

\definecolor{nipspurple}{RGB}{94,46,145}
\setbeamercolor{headline}{bg=white, fg=black}
\setbeamercolor{block title}{bg=nipspurple, fg=white}
\addtobeamertemplate{block begin}{
  \setlength{\textpaddingtop}{0.2em}%
  \setlength{\textpaddingbottom}{0.2em}%
}{}
% ====================
% Lengths
% ====================

% If you have N columns, choose \sepwidth and \colwidth such that
% (N+1)*\sepwidth + N*\colwidth = \paperwidth
\newlength{\sepwidth}
\newlength{\colwidth}
\setlength{\sepwidth}{0.025\paperwidth}
\setlength{\colwidth}{0.3\paperwidth}

\newcommand{\separatorcolumn}{\begin{column}{\sepwidth}\end{column}}

% ====================
% Title
% ====================

\title{Paper2Poster: \ Towards Multimodal Poster Automation from Scientific Papers}

\author{Wei Pang\textsuperscript{1}, Kevin Qinghong Lin\textsuperscript{2}, Xiangru Jian\textsuperscript{1}, Xi He\textsuperscript{1}, Philip Torr\textsuperscript{3}}

\institute[shortinst]{1 University of Waterloo; 2 National University of Singapore; 3 University of Oxford}

% ====================
% Footer (optional)
% ====================

\footercontent{
  \href{https://paper2poster.github.io/}{https://paper2poster.github.io/} \hfill
  Generated by Paper2Poster \hfill
  }
% (can be left out to remove footer)

% ====================
% Logo (optional)
% ====================

% use this to include logos on the left and/or right side of the header:
\logoright{\includegraphics[height=5cm]{logos/right_logo.png}}
\logoleft{\includegraphics[height=4cm]{logos/left_logo.png}}

% ====================
% Body
% ====================


% --- injected font tweaks ---
\setbeamerfont{title}{size=\huge}
\setbeamerfont{author}{size=\Large}
\setbeamerfont{institute}{size=\large}
\setbeamerfont{block title}{size=\Large}
\setbeamerfont{block body}{size=\large}
\begin{document}

% Refer to https://github.com/k4rtik/uchicago-poster
% logo: https://www.cam.ac.uk/brand-resources/about-the-logo/logo-downloads
\addtobeamertemplate{headline}{}
{
    \begin{tikzpicture}[remember picture,overlay]
      \node [anchor=north west, inner sep=3cm] at ([xshift=0.0cm,yshift=1.0cm]current page.north west)
    \end{tikzpicture}
}

\begin{frame}[t]
\begin{columns}[t]
\separatorcolumn
\begin{column}{\colwidth}
\begin{block}{Why Posters Are Hard}
We tackle \textbf{single-page multimodal compression}: dense papers must become legible posters with \textcolor{red}{tight spatial constraints}. Pure LLM or VLM approaches \textbf{struggle with layout}, missing \textit{reading order} and \textbf{overflow control}. We reveal \textcolor{blue}{visual-in-the-loop} planning is key to \textbf{clarity}, \textbf{balance}, and \textbf{engagement}.

\begin{figure}
\centering
\includegraphics[width=0.80\linewidth]{figures/paper-picture-1.png}
\end{figure}

\end{block}

\begin{block}{Benchmark \& Task}
We introduce \textbf{Paper2Poster} and the task: generate a \textbf{single-page}, well-balanced poster that faithfully conveys core ideas. The protocol measures \textit{what matters}: \textbf{visual alignment}, \textbf{text fluency}, \textbf{holistic quality}, and knowledge transfer via \textcolor{blue}{PaperQuiz}. Our setup \textbf{standardizes evaluation} for automated poster generation.
\end{block}

\begin{block}{Curated Diverse Dataset}
Dataset spans \textcolor{blue}{100} paper–poster pairs (NeurIPS, ICML, ICLR). Papers average \textcolor{blue}{22.6} pages and \textcolor{blue}{20K+} tokens; posters average \textcolor{blue}{1.4K} tokens. We observe \textbf{14.4x} text compression and \textbf{2.6x} figure reduction. Coverage: CV (\textcolor{blue}{19\%}), NLP (\textcolor{blue}{17\%}), RL (\textcolor{blue}{10\%})—driving \textbf{robustness}.

\begin{figure}
\centering
\includegraphics[width=0.80\linewidth]{figures/paper-picture-6.png}
\end{figure}

\end{block}

\end{column}
\separatorcolumn
\begin{column}{\colwidth}
\begin{block}{Four-Pronged Evaluation}
Our \textbf{four-pronged} suite tests end-to-end quality: Visual Quality via \textcolor{blue}{AltCLIP} similarity and \textbf{figure relevance}; Textual Coherence via \textcolor{blue}{PPL} (Llama-2-7B); VLM-as-Judge across \textbf{6 criteria}; and \textcolor{blue}{PaperQuiz} with length-aware penalties rewarding \textbf{dense, readable} designs.

\begin{figure}
\centering
\includegraphics[width=0.80\linewidth]{figures/paper-picture-7.png}
\end{figure}

\end{block}

\begin{block}{PosterAgent Pipeline}
PosterAgent is \textbf{top-down, visual-in-the-loop}. \textit{Parser} builds a semantic asset library; \textit{Planner} aligns text–visual pairs and uses \textcolor{blue}{binary-tree} layouts to preserve \textbf{reading order}. \textit{Painter-Commenter} renders panels, applies \textcolor{blue}{zoom-in} VLM feedback, and fixes \textbf{overflow} and \textbf{alignment}—yielding concise, coherent posters.

\begin{figure}
\centering
\includegraphics[width=0.80\linewidth]{figures/paper-picture-8.png}
\end{figure}

\end{block}

\begin{block}{Main Results}
Across metrics, \textbf{PosterAgent} variants beat multi-agent baselines. We attain \textcolor{blue}{state-leading figure relevance} and near-\textbf{human} visual similarity. GPT-4o pixel posters look good but show \textcolor{red}{noisy text} and high \textcolor{red}{PPL}. VLM-as-Judge scores place PosterAgent-4o at \textcolor{blue}{3.72} overall, approaching GT posters.

\begin{figure}
\centering
\includegraphics[width=0.80\linewidth]{figures/paper-table-1.png}
\end{figure}

\end{block}

\end{column}
\separatorcolumn
\begin{column}{\colwidth}
\begin{block}{PaperQuiz Insights}
\textcolor{blue}{PaperQuiz} tracks human judgment and rewards \textbf{informative brevity}. With penalties, GT posters lead; \textbf{PosterAgent} tops automated methods. Open-source \textcolor{blue}{Qwen-2.5} stacks stay \textbf{competitive}. Stronger reader VLMs exploit \textbf{structured layouts}, outperforming blog-like or \textcolor{red}{text-garbling} image generations.

\begin{figure}
\centering
\includegraphics[width=0.80\linewidth]{figures/paper-picture-9.png}
\end{figure}

\end{block}

\begin{block}{Efficient, Open, Scalable}
Our pipeline slashes tokens by \textcolor{blue}{60–87\%}. PosterAgent-4o uses \textcolor{blue}{101K} tokens (\textcolor{blue}{\$0.55}); PosterAgent-Qwen uses \textcolor{blue}{47.6K} (\textcolor{blue}{\$0.0045}). Runtime ≈ \textcolor{blue}{4.5 min}. \textcolor{red}{Bottleneck}: sequential panel refinement; \textbf{future} parallelism, external knowledge, and human-in-the-loop will boost \textbf{engagement}.

\begin{figure}
\centering
\includegraphics[width=0.80\linewidth]{figures/paper-table-8.png}
\end{figure}

\end{block}

\end{column}
\separatorcolumn
\end{columns}
\end{frame}

\end{document}