We develop infrastructure for the evaluation of generated text.
Display benchmark results in a resizable iframe