Molbap's picture
Molbap HF Staff
push a bunch of updates
e903a32
|
raw
history blame
5.41 kB

LaTeX Importer

Complete LaTeX to MDX (Markdown + JSX) importer optimized for Astro with advanced support for references, interactive equations, and components.

๐Ÿš€ Quick Start

# Complete LaTeX โ†’ MDX conversion with all features
node index.mjs

# For step-by-step debugging
node latex-converter.mjs    # LaTeX โ†’ Markdown
node mdx-converter.mjs      # Markdown โ†’ MDX

๐Ÿ“ Structure

latex-importer/
โ”œโ”€โ”€ index.mjs                    # Complete LaTeX โ†’ MDX pipeline
โ”œโ”€โ”€ latex-converter.mjs          # LaTeX โ†’ Markdown with Pandoc
โ”œโ”€โ”€ mdx-converter.mjs           # Markdown โ†’ MDX with Astro components
โ”œโ”€โ”€ reference-preprocessor.mjs  # LaTeX references cleanup
โ”œโ”€โ”€ post-processor.mjs          # Markdown post-processing
โ”œโ”€โ”€ bib-cleaner.mjs            # Bibliography cleaner
โ”œโ”€โ”€ filters/
โ”‚   โ””โ”€โ”€ equation-ids.lua        # Pandoc filter for KaTeX equations
โ”œโ”€โ”€ input/                      # LaTeX sources
โ”‚   โ”œโ”€โ”€ main.tex
โ”‚   โ”œโ”€โ”€ main.bib
โ”‚   โ””โ”€โ”€ sections/
โ””โ”€โ”€ output/                     # Results
    โ”œโ”€โ”€ main.md                 # Intermediate Markdown
    โ””โ”€โ”€ main.mdx               # Final MDX for Astro

โœจ Key Features

๐ŸŽฏ Smart References

  • Invisible anchors: Automatic conversion of \label{} to <span id="..." style="position: absolute;"></span>
  • Clean links: Identifier cleanup (: โ†’ -, removing prefixes sec:, fig:, eq:)
  • Cross-references: Full support for \ref{} with functional links

๐Ÿงฎ Interactive Equations

  • KaTeX IDs: Conversion of \label{eq:...} to \htmlId{id}{equation}
  • Equation references: Clickable links to mathematical equations
  • Advanced KaTeX support: trust: true configuration for \htmlId{}

๐ŸŽจ Automatic Styling

  • Highlights: \highlight{text} โ†’ <span class="highlight">text</span>
  • Auto cleanup: Removal of numbering (1), (2), etc.
  • Astro components: Images โ†’ Figure with automatic imports

๐Ÿ”ง Robust Pipeline

  • LaTeX preprocessor: Reference cleanup before Pandoc
  • Lua filter: Equation processing in Pandoc AST
  • Post-processor: Markdown cleanup and optimization
  • MDX converter: Final transformation with Astro components

๐Ÿ“Š Example Workflow

# 1. Prepare LaTeX sources
cp my-paper/* input/

# 2. Complete automatic conversion
node index.mjs

# 3. Generated results
ls output/
# โ†’ main.md (Intermediate Markdown)  
# โ†’ main.mdx (Final MDX for Astro)
# โ†’ assets/image/ (extracted images)

๐Ÿ“‹ Conversion Result

The pipeline generates an MDX file optimized for Astro with:

---
title: "Your Article Title"
description: "Generated from LaTeX"
---

import Figure from '../components/Figure.astro';
import figure1 from '../assets/image/figure1.png';

## Section with invisible anchor
<span id="introduction" style="position: absolute;"></span>

Here is some text with <span class="highlight">highlighted words</span>.

Reference to an interactive [equation](#equation-name).

Equation with KaTeX ID:
\htmlId\htmlId{equation-name}{E = mc^2}

<Figure src={figure1} alt="Description" />

โš™๏ธ Required Astro Configuration

To use equations with IDs, add to astro.config.mjs:

import rehypeKatex from 'rehype-katex';

export default defineConfig({
  markdown: {
    rehypePlugins: [
      [rehypeKatex, { trust: true }], // โ† Important for \htmlId{}
    ],
  },
});

๐Ÿ› ๏ธ Prerequisites

  • Node.js with ESM support
  • Pandoc (brew install pandoc)
  • Astro to use the generated MDX

๐ŸŽฏ Technical Architecture

4-Stage Pipeline

  1. LaTeX Preprocessing (reference-preprocessor.mjs)

    • Cleanup of \label{} and \ref{}
    • Conversion \highlight{} โ†’ CSS spans
    • Removal of prefixes and problematic characters
  2. Pandoc + Lua Filter (equation-ids.lua)

    • LaTeX โ†’ Markdown conversion with gfm+tex_math_dollars+raw_html
    • Equation processing: \label{eq:name} โ†’ \htmlId{name}{equation}
    • Automatic image extraction
  3. Markdown Post-processing (post-processor.mjs)

    • KaTeX, Unicode, grouping commands cleanup
    • Attribute correction with :
    • Code snippet injection
  4. MDX Conversion (mdx-converter.mjs)

    • Images transformation โ†’ Figure
    • HTML span escaping correction
    • Automatic imports generation
    • MDX frontmatter

๐Ÿ“Š Conversion Statistics

For a typical scientific document:

  • 87 labels detected and processed
  • 48 invisible anchors created
  • 13 highlight spans with CSS class
  • 4 equations with \htmlId{} KaTeX
  • 40 images converted to components

โœ… Project Status

๐ŸŽ‰ Complete Features

  • โœ… LaTeX โ†’ MDX Pipeline: Full end-to-end functional conversion
  • โœ… Cross-document references: Perfectly functional internal links
  • โœ… Interactive equations: KaTeX support with clickable IDs
  • โœ… Automatic styling: Highlights and Astro components
  • โœ… Robustness: Automatic cleanup of all escaping
  • โœ… Optimization: Clean code without unnecessary elements

๐Ÿš€ Production Ready

The toolkit is now 100% operational for converting complex scientific LaTeX documents to MDX/Astro with all advanced features (references, interactive equations, styling).