tfrere HF Staff commited on
Commit
d268a8f
·
1 Parent(s): e514a7b

feat: Add Notion import support via NOTION_PAGE_ID env var

Browse files

- Add ENABLE_NOTION_IMPORT flag in Dockerfile
- Add notion:import script in package.json
- Modify notion-importer to use NOTION_PAGE_ID env var directly
- Auto-extract title and generate slug from Notion page
- Add comprehensive French documentation in NOTION_IMPORT.md

Dockerfile ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use an official Node runtime as the base image for building the application
2
+ # Build with Playwright (browsers and deps ready)
3
+ FROM mcr.microsoft.com/playwright:v1.55.0-jammy AS build
4
+
5
+ # Install git, git-lfs, and dependencies for Pandoc (only if ENABLE_LATEX_CONVERSION=true)
6
+ RUN apt-get update && apt-get install -y git git-lfs wget && apt-get clean
7
+
8
+ # Install latest Pandoc from GitHub releases (only installed if needed later)
9
+ RUN wget -qO- https://github.com/jgm/pandoc/releases/download/3.8/pandoc-3.8-linux-amd64.tar.gz | tar xzf - -C /tmp && \
10
+ cp /tmp/pandoc-3.8/bin/pandoc /usr/local/bin/ && \
11
+ cp /tmp/pandoc-3.8/bin/pandoc-lua /usr/local/bin/ && \
12
+ rm -rf /tmp/pandoc-3.8
13
+
14
+ # Set the working directory in the container
15
+ WORKDIR /app
16
+
17
+ # Copy package.json and package-lock.json
18
+ COPY app/package*.json ./
19
+
20
+ # Install dependencies
21
+ RUN npm install
22
+
23
+ # Copy the rest of the application code
24
+ COPY app/ .
25
+
26
+ # Conditionally convert LaTeX to MDX if ENABLE_LATEX_CONVERSION=true
27
+ ARG ENABLE_LATEX_CONVERSION=false
28
+ RUN if [ "$ENABLE_LATEX_CONVERSION" = "true" ]; then \
29
+ echo "🔄 LaTeX importer enabled - running latex:convert..."; \
30
+ npm run latex:convert; \
31
+ else \
32
+ echo "⏭️ LaTeX importer disabled - skipping..."; \
33
+ fi
34
+
35
+ # Conditionally import from Notion if ENABLE_NOTION_IMPORT=true
36
+ ARG ENABLE_NOTION_IMPORT=false
37
+ ARG NOTION_TOKEN
38
+ ARG NOTION_PAGE_ID
39
+ RUN if [ "$ENABLE_NOTION_IMPORT" = "true" ]; then \
40
+ echo "🔄 Notion importer enabled - running notion:import..."; \
41
+ npm run notion:import; \
42
+ else \
43
+ echo "⏭️ Notion importer disabled - skipping..."; \
44
+ fi
45
+
46
+ # Ensure `public/data` is a real directory with real files (not a symlink)
47
+ # This handles the case where `public/data` is a symlink in the repo, which
48
+ # would be broken inside the container after COPY.
49
+ RUN set -e; \
50
+ if [ -e public ] && [ ! -d public ]; then rm -f public; fi; \
51
+ mkdir -p public; \
52
+ if [ -L public/data ] || { [ -e public/data ] && [ ! -d public/data ]; }; then rm -f public/data; fi; \
53
+ mkdir -p public/data; \
54
+ cp -a src/content/assets/data/. public/data/
55
+
56
+ # Build the application
57
+ RUN npm run build
58
+
59
+ # Generate the PDF (light theme, full wait)
60
+ RUN npm run export:pdf -- --theme=light --wait=full
61
+
62
+ # Use an official Nginx runtime as the base image for serving the application
63
+ FROM nginx:alpine
64
+
65
+ # Copy the built application from the build stage
66
+ COPY --from=build /app/dist /usr/share/nginx/html
67
+
68
+ # Copy a custom Nginx configuration file
69
+ COPY nginx.conf /etc/nginx/nginx.conf
70
+
71
+ # Create necessary directories and set permissions
72
+ RUN mkdir -p /var/cache/nginx /var/run /var/log/nginx && \
73
+ chmod -R 777 /var/cache/nginx /var/run /var/log/nginx /etc/nginx/nginx.conf
74
+
75
+ # Switch to non-root user
76
+ USER nginx
77
+
78
+ # Expose port 8080
79
+ EXPOSE 8080
80
+
81
+ # Command to run the application
82
+ CMD ["nginx", "-g", "daemon off;"]
NOTION_IMPORT.md ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 📖 Guide d'importation depuis Notion
2
+
3
+ Ce guide explique comment configurer l'importation automatique depuis Notion lors du build de votre Space HuggingFace.
4
+
5
+ ## 🎯 Principe de fonctionnement
6
+
7
+ Lors du build Docker sur HuggingFace Spaces, si les variables d'environnement sont configurées :
8
+ 1. Le script va chercher votre page Notion
9
+ 2. Extrait automatiquement le titre et génère le slug
10
+ 3. Convertit le contenu en MDX
11
+ 4. Build l'application avec le nouveau contenu
12
+
13
+ **Avantage :** Vous modifiez votre article dans Notion, puis vous cliquez sur "Factory Reboot" dans HF Spaces → le site est automatiquement mis à jour !
14
+
15
+ ## ⚙️ Configuration sur HuggingFace Spaces
16
+
17
+ ### 1. Créer une intégration Notion
18
+
19
+ 1. Allez sur https://www.notion.so/my-integrations
20
+ 2. Cliquez sur "New integration"
21
+ 3. Donnez un nom (ex: "HF Article Importer")
22
+ 4. Sélectionnez votre workspace
23
+ 5. Cliquez sur "Submit"
24
+ 6. **Copiez le token** (format: `secret_xxxxx...`)
25
+
26
+ ### 2. Partager votre page Notion avec l'intégration
27
+
28
+ 1. Ouvrez votre page Notion
29
+ 2. Cliquez sur "Share" (en haut à droite)
30
+ 3. Cliquez sur "Invite"
31
+ 4. Recherchez le nom de votre intégration
32
+ 5. Sélectionnez-la et donnez la permission "Can read content"
33
+ 6. Cliquez sur "Invite"
34
+
35
+ ### 3. Récupérer l'ID de votre page Notion
36
+
37
+ L'ID se trouve dans l'URL de votre page :
38
+ ```
39
+ https://www.notion.so/Mon-Article-27877f1c9c9d804d9c82f7b3905578ff
40
+ └─────────────────┬─────────────────┘
41
+ C'est cet ID !
42
+ ```
43
+
44
+ Exemple : `27877f1c9c9d804d9c82f7b3905578ff`
45
+
46
+ ### 4. Configurer les variables d'environnement sur HF Spaces
47
+
48
+ 1. Allez dans les Settings de votre Space
49
+ 2. Section "Repository secrets"
50
+ 3. Ajoutez ces 3 variables :
51
+
52
+ | Variable | Valeur | Secret ? |
53
+ |----------|--------|----------|
54
+ | `ENABLE_NOTION_IMPORT` | `true` | Non |
55
+ | `NOTION_TOKEN` | `secret_xxx...` | **Oui** ✅ |
56
+ | `NOTION_PAGE_ID` | `27877f1c...` | Non |
57
+
58
+ **Important :** Cochez la case "Secret" pour `NOTION_TOKEN` uniquement !
59
+
60
+ ### 5. Rebuild votre Space
61
+
62
+ 1. Allez dans l'onglet "Settings"
63
+ 2. Cliquez sur "Factory reboot"
64
+ 3. Attendez le rebuild (~5-10 minutes)
65
+ 4. Votre article Notion est maintenant publié ! 🎉
66
+
67
+ ## 🔄 Workflow de mise à jour
68
+
69
+ ```
70
+ ┌─────────────────────────┐
71
+ │ 1. Éditez dans Notion │
72
+ │ (brouillon privé) │
73
+ └───────────┬─────────────┘
74
+
75
+
76
+ ┌─────────────────────────┐
77
+ │ 2. Vérifiez le contenu │
78
+ │ (preview Notion) │
79
+ └───────────┬─────────────┘
80
+
81
+
82
+ ┌─────────────────────────┐
83
+ │ 3. HF Spaces → │
84
+ │ "Factory Reboot" │
85
+ └───────────┬─────────────┘
86
+
87
+
88
+ ┌─────────────────────────┐
89
+ │ 4. Attendez 5-10 min │
90
+ │ (build Docker) │
91
+ └───────────┬─────────────┘
92
+
93
+
94
+ ┌─────────────────────────┐
95
+ │ 5. Site mis à jour ! ✅ │
96
+ │ (zéro downtime) │
97
+ └─────────────────────────┘
98
+ ```
99
+
100
+ ## 🧪 Test en local
101
+
102
+ Avant de publier, vous pouvez tester en local :
103
+
104
+ ```bash
105
+ # 1. Créer un fichier .env dans app/scripts/notion-importer/
106
+ cd app/scripts/notion-importer
107
+ cp env.example .env
108
+
109
+ # 2. Éditer .env avec vos credentials
110
+ # NOTION_TOKEN=secret_xxx
111
+ # NOTION_PAGE_ID=abc123
112
+
113
+ # 3. Installer les dépendances
114
+ npm install
115
+
116
+ # 4. Lancer l'import
117
+ node index.mjs
118
+
119
+ # 5. Le contenu est copié dans app/src/content/article.mdx
120
+ # Les images dans app/src/content/assets/image/
121
+
122
+ # 6. Lancer le serveur de dev Astro
123
+ cd ../.. # Retour à app/
124
+ npm run dev
125
+
126
+ # 7. Ouvrir http://localhost:4321
127
+ ```
128
+
129
+ ## 📋 Fonctionnalités supportées
130
+
131
+ ### ✅ Supporté automatiquement
132
+ - Texte formaté (gras, italique, code inline)
133
+ - Titres (h1, h2, h3, etc.)
134
+ - Listes (ordonnées, non-ordonnées)
135
+ - Images (téléchargées et converties)
136
+ - Liens externes
137
+ - Blocs de code avec syntaxe
138
+ - Callouts → Composant `Note`
139
+ - Tables → Composant stylisé
140
+ - Citations
141
+ - Équations LaTeX (inline et bloc)
142
+
143
+ ### ⚠️ Conversion manuelle requise
144
+ - Bases de données Notion → Créer en MDX
145
+ - Toggles → Utiliser `Accordion`
146
+ - Embeds complexes → Utiliser `HtmlEmbed`
147
+ - Graphiques → Utiliser `Trackio` ou d3.js
148
+
149
+ ## 🔧 Désactiver l'import Notion
150
+
151
+ Pour revenir à l'édition manuelle du MDX :
152
+
153
+ 1. HF Spaces → Settings → Repository secrets
154
+ 2. Changez `ENABLE_NOTION_IMPORT` à `false`
155
+ 3. Ou supprimez les variables d'env
156
+
157
+ Le site continuera de fonctionner avec le dernier contenu importé.
158
+
159
+ ## 🆘 Dépannage
160
+
161
+ ### Erreur "❌ NOTION_TOKEN not found"
162
+ → Vérifiez que vous avez bien créé la variable `NOTION_TOKEN` dans les secrets HF
163
+
164
+ ### Erreur "❌ Could not find Notion page"
165
+ → Vérifiez que vous avez bien partagé la page avec votre intégration Notion
166
+
167
+ ### L'import ne se lance pas au build
168
+ → Vérifiez que `ENABLE_NOTION_IMPORT=true` (sans guillemets)
169
+
170
+ ### Le build échoue pendant l'import
171
+ → Regardez les logs du build dans HF Spaces pour voir l'erreur exacte
172
+
173
+ ## 💡 Conseils
174
+
175
+ 1. **Testez en local d'abord** : Évitez les surprises en prod
176
+ 2. **Structure claire** : Utilisez bien les titres h1, h2, h3 dans Notion
177
+ 3. **Images optimisées** : Les images sont téléchargées et intégrées
178
+ 4. **Commits Git** : Pour un vrai versioning, committez aussi les MDX générés
179
+ 5. **Brouillons** : Gardez des pages privées pour vos brouillons Notion
180
+
181
+ ## 📚 Pour aller plus loin
182
+
183
+ - [Documentation Notion API](https://developers.notion.com/)
184
+ - [Documentation HuggingFace Spaces](https://huggingface.co/docs/hub/spaces)
185
+ - [README du Notion Importer](./app/scripts/notion-importer/README.md)
186
+
app/package.json ADDED
Binary file (2.54 kB). View file
 
app/scripts/notion-importer/README.md ADDED
@@ -0,0 +1,291 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Notion Importer
2
+
3
+ Complete Notion to MDX (Markdown + JSX) importer optimized for Astro with advanced media handling, interactive components, and seamless integration.
4
+
5
+ ## 🚀 Quick Start
6
+
7
+ ### Method 1: Using NOTION_PAGE_ID (Recommended)
8
+
9
+ ```bash
10
+ # Install dependencies
11
+ npm install
12
+
13
+ # Setup environment variables
14
+ cp env.example .env
15
+ # Edit .env with your Notion token and page ID
16
+
17
+ # Complete Notion → MDX conversion (fetches title/slug automatically)
18
+ NOTION_TOKEN=secret_xxx NOTION_PAGE_ID=abc123 node index.mjs
19
+
20
+ # Or use .env file
21
+ node index.mjs
22
+ ```
23
+
24
+ ### Method 2: Using pages.json (Legacy)
25
+
26
+ ```bash
27
+ # Install dependencies
28
+ npm install
29
+
30
+ # Setup environment variables
31
+ cp env.example .env
32
+ # Edit .env with your Notion token
33
+
34
+ # Configure pages in input/pages.json
35
+ # {
36
+ # "pages": [
37
+ # {
38
+ # "id": "your-page-id",
39
+ # "title": "Title",
40
+ # "slug": "slug"
41
+ # }
42
+ # ]
43
+ # }
44
+
45
+ # Complete Notion → MDX conversion
46
+ node index.mjs
47
+
48
+ # For step-by-step debugging
49
+ node notion-converter.mjs # Notion → Markdown
50
+ node mdx-converter.mjs # Markdown → MDX
51
+ ```
52
+
53
+ ## 📁 Structure
54
+
55
+ ```
56
+ notion-importer/
57
+ ├── index.mjs # Complete Notion → MDX pipeline
58
+ ├── notion-converter.mjs # Notion → Markdown with notion-to-md v4
59
+ ├── mdx-converter.mjs # Markdown → MDX with Astro components
60
+ ├── post-processor.mjs # Markdown post-processing
61
+ ├── package.json # Dependencies and scripts
62
+ ├── env.example # Environment variables template
63
+ ├── input/ # Configuration
64
+ │ └── pages.json # Notion pages to convert
65
+ └── output/ # Results
66
+ ├── *.md # Intermediate Markdown
67
+ ├── *.mdx # Final MDX for Astro
68
+ └── media/ # Downloaded media files
69
+ ```
70
+
71
+ ## ✨ Key Features
72
+
73
+ ### 🎯 **Advanced Media Handling**
74
+ - **Local download**: Automatic download of all Notion media (images, files, PDFs)
75
+ - **Path transformation**: Smart path conversion for web accessibility
76
+ - **Figure components**: Automatic conversion to Astro `Figure` components with zoom/download
77
+ - **Media organization**: Structured media storage by page ID
78
+
79
+ ### 🧮 **Interactive Components**
80
+ - **Callouts → Notes**: Notion callouts converted to Astro `Note` components
81
+ - **Enhanced tables**: Tables wrapped in styled containers
82
+ - **Code blocks**: Enhanced with copy functionality
83
+ - **Automatic imports**: Smart component and image import generation
84
+
85
+ ### 🎨 **Smart Formatting**
86
+ - **Link fixing**: Notion internal links converted to relative links
87
+ - **Artifact cleanup**: Removal of Notion-specific formatting artifacts
88
+ - **Frontmatter generation**: Automatic YAML frontmatter from Notion properties
89
+ - **Astro compatibility**: Full compatibility with Astro MDX processing
90
+
91
+ ### 🔧 **Robust Pipeline**
92
+ - **Notion preprocessing**: Advanced page configuration and media strategy
93
+ - **Post-processing**: Markdown cleanup and optimization
94
+ - **MDX conversion**: Final transformation with Astro components
95
+ - **Auto-copy**: Automatic copying to Astro content directory
96
+
97
+ ## 📊 Example Workflow
98
+
99
+ ```bash
100
+ # 1. Configure your Notion pages
101
+ # Edit input/pages.json with your page IDs
102
+
103
+ # 2. Complete automatic conversion
104
+ NOTION_TOKEN=your_token node index.mjs --clean
105
+
106
+ # 3. Generated results
107
+ ls output/
108
+ # → getting-started.md (Intermediate Markdown)
109
+ # → getting-started.mdx (Final MDX for Astro)
110
+ # → media/ (downloaded images and files)
111
+ ```
112
+
113
+ ### 📋 Conversion Result
114
+
115
+ The pipeline generates MDX files optimized for Astro with:
116
+
117
+ ```mdx
118
+ ---
119
+ title: "Getting Started with Notion"
120
+ published: "2024-01-15"
121
+ tableOfContentsAutoCollapse: true
122
+ ---
123
+
124
+ import Figure from '../components/Figure.astro';
125
+ import Note from '../components/Note.astro';
126
+ import gettingStartedImage from './media/getting-started/image1.png';
127
+
128
+ ## Introduction
129
+
130
+ Here is some content with a callout:
131
+
132
+ <Note type="info" title="Important">
133
+ This is a converted Notion callout.
134
+ </Note>
135
+
136
+ And an image:
137
+
138
+ <Figure
139
+ src={gettingStartedImage}
140
+ alt="Getting started screenshot"
141
+ zoomable
142
+ downloadable
143
+ layout="fixed"
144
+ />
145
+ ```
146
+
147
+ ## ⚙️ Required Astro Configuration
148
+
149
+ To use the generated MDX files, ensure your Astro project has the required components:
150
+
151
+ ```astro
152
+ // src/components/Figure.astro
153
+ ---
154
+ export interface Props {
155
+ src: any;
156
+ alt?: string;
157
+ caption?: string;
158
+ zoomable?: boolean;
159
+ downloadable?: boolean;
160
+ layout?: string;
161
+ id?: string;
162
+ }
163
+
164
+ const { src, alt, caption, zoomable, downloadable, layout, id } = Astro.props;
165
+ ---
166
+
167
+ <figure {id} class="figure">
168
+ <img src={src} alt={alt} />
169
+ {caption && <figcaption>{caption}</figcaption>}
170
+ </figure>
171
+ ```
172
+
173
+ ## 🛠️ Prerequisites
174
+
175
+ - **Node.js** with ESM support
176
+ - **Notion Integration**: Set up an integration in your Notion workspace
177
+ - **Notion Token**: Copy the "Internal Integration Token"
178
+ - **Shared Pages**: Share the specific Notion page(s) with your integration
179
+ - **Astro** to use the generated MDX
180
+
181
+ ## 🎯 Technical Architecture
182
+
183
+ ### 4-Stage Pipeline
184
+
185
+ 1. **Notion Preprocessing** (`notion-converter.mjs`)
186
+ - Configuration loading from `pages.json`
187
+ - Notion API client initialization
188
+ - Media download strategy configuration
189
+
190
+ 2. **Notion-to-Markdown** (notion-to-md v4)
191
+ - Page conversion with `NotionConverter`
192
+ - Media downloading with `downloadMediaTo()`
193
+ - File export with `DefaultExporter`
194
+
195
+ 3. **Markdown Post-processing** (`post-processor.mjs`)
196
+ - Notion artifact cleanup
197
+ - Link fixing and optimization
198
+ - Table and code block enhancement
199
+
200
+ 4. **MDX Conversion** (`mdx-converter.mjs`)
201
+ - Component transformation (Figure, Note)
202
+ - Automatic import generation
203
+ - Frontmatter enhancement
204
+ - Astro compatibility optimization
205
+
206
+ ## 📊 Configuration Options
207
+
208
+ ### Pages Configuration (`input/pages.json`)
209
+
210
+ ```json
211
+ {
212
+ "pages": [
213
+ {
214
+ "id": "your-notion-page-id",
215
+ "title": "Page Title",
216
+ "slug": "page-slug"
217
+ }
218
+ ]
219
+ }
220
+ ```
221
+
222
+ ### Environment Variables
223
+
224
+ Copy `env.example` to `.env` and configure:
225
+
226
+ ```bash
227
+ cp env.example .env
228
+ # Edit .env with your actual Notion token
229
+ ```
230
+
231
+ Required variables:
232
+ ```bash
233
+ NOTION_TOKEN=secret_your_notion_integration_token_here
234
+ ```
235
+
236
+ ### Command Line Options
237
+
238
+ ```bash
239
+ # Full workflow
240
+ node index.mjs --clean --token=your_token
241
+
242
+ # Notion to Markdown only
243
+ node index.mjs --notion-only
244
+
245
+ # Markdown to MDX only
246
+ node index.mjs --mdx-only
247
+
248
+ # Custom paths
249
+ node index.mjs --input=my-pages.json --output=converted/
250
+ ```
251
+
252
+ ## 📊 Conversion Statistics
253
+
254
+ For a typical Notion page:
255
+ - **Media files** automatically downloaded and organized
256
+ - **Callouts** converted to interactive Note components
257
+ - **Images** transformed to Figure components with zoom/download
258
+ - **Tables** enhanced with proper styling containers
259
+ - **Code blocks** enhanced with copy functionality
260
+ - **Links** fixed for proper internal navigation
261
+
262
+ ## ✅ Project Status
263
+
264
+ ### 🎉 **Complete Features**
265
+ - ✅ **Notion → MDX Pipeline**: Full end-to-end functional conversion
266
+ - ✅ **Media Management**: Automatic download and path transformation
267
+ - ✅ **Component Integration**: Seamless Astro component integration
268
+ - ✅ **Smart Formatting**: Intelligent cleanup and optimization
269
+ - ✅ **Robustness**: Error handling and graceful degradation
270
+ - ✅ **Flexibility**: Modular pipeline with step-by-step options
271
+
272
+ ### 🚀 **Production Ready**
273
+ The toolkit is now **100% operational** for converting Notion pages to MDX/Astro with all advanced features (media handling, component integration, smart formatting).
274
+
275
+ ## 🔗 Integration with notion-to-md v4
276
+
277
+ This toolkit leverages the powerful [notion-to-md v4](https://notionconvert.com/docs/v4/guides/) library with:
278
+
279
+ - **Advanced Media Strategies**: Download, upload, and direct media handling
280
+ - **Custom Renderers**: Block transformers and annotation transformers
281
+ - **Exporter Plugins**: File, buffer, and stdout output options
282
+ - **Database Support**: Full database property and frontmatter transformation
283
+ - **Page References**: Smart internal link handling
284
+
285
+ ## 📚 Additional Resources
286
+
287
+ - [notion-to-md v4 Documentation](https://notionconvert.com/docs/v4/guides/)
288
+ - [Notion API Documentation](https://developers.notion.com/)
289
+ - [Astro MDX Documentation](https://docs.astro.build/en/guides/integrations-guide/mdx/)
290
+ - [Media Handling Strategies](https://notionconvert.com/blog/mastering-media-handling-in-notion-to-md-v4-download-upload-and-direct-strategies/)
291
+ - [Frontmatter Transformation](https://notionconvert.com/blog/how-to-convert-notion-properties-to-frontmatter-with-notion-to-md-v4/)
app/scripts/notion-importer/env.example ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ NOTION_TOKEN=ntn_xxx
2
+ NOTION_PAGE_ID=xxx
app/scripts/notion-importer/index.mjs ADDED
@@ -0,0 +1,322 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env node
2
+
3
+ import { config } from 'dotenv';
4
+ import { join, dirname, basename } from 'path';
5
+ import { fileURLToPath } from 'url';
6
+ import { copyFileSync, existsSync, mkdirSync, readFileSync, writeFileSync, readdirSync, statSync } from 'fs';
7
+ import { convertNotionToMarkdown } from './notion-converter.mjs';
8
+ import { convertToMdx } from './mdx-converter.mjs';
9
+ import { Client } from '@notionhq/client';
10
+
11
+ // Load environment variables from .env file
12
+ config();
13
+
14
+ const __filename = fileURLToPath(import.meta.url);
15
+ const __dirname = dirname(__filename);
16
+
17
+ // Default configuration
18
+ const DEFAULT_INPUT = join(__dirname, 'input', 'pages.json');
19
+ const DEFAULT_OUTPUT = join(__dirname, 'output');
20
+ const ASTRO_CONTENT_PATH = join(__dirname, '..', '..', 'src', 'content', 'article.mdx');
21
+ const ASTRO_ASSETS_PATH = join(__dirname, '..', '..', 'src', 'content', 'assets', 'image');
22
+ const ASTRO_BIB_PATH = join(__dirname, '..', '..', 'src', 'content', 'bibliography.bib');
23
+
24
+ function parseArgs() {
25
+ const args = process.argv.slice(2);
26
+ const config = {
27
+ input: DEFAULT_INPUT,
28
+ output: DEFAULT_OUTPUT,
29
+ clean: false,
30
+ notionOnly: false,
31
+ mdxOnly: false,
32
+ token: process.env.NOTION_TOKEN,
33
+ pageId: process.env.NOTION_PAGE_ID
34
+ };
35
+
36
+ for (const arg of args) {
37
+ if (arg.startsWith('--input=')) {
38
+ config.input = arg.split('=')[1];
39
+ } else if (arg.startsWith('--output=')) {
40
+ config.output = arg.split('=')[1];
41
+ } else if (arg.startsWith('--token=')) {
42
+ config.token = arg.split('=')[1];
43
+ } else if (arg.startsWith('--page-id=')) {
44
+ config.pageId = arg.split('=')[1];
45
+ } else if (arg === '--clean') {
46
+ config.clean = true;
47
+ } else if (arg === '--notion-only') {
48
+ config.notionOnly = true;
49
+ } else if (arg === '--mdx-only') {
50
+ config.mdxOnly = true;
51
+ }
52
+ }
53
+
54
+ return config;
55
+ }
56
+
57
+ function showHelp() {
58
+ console.log(`
59
+ 🚀 Notion to MDX Toolkit
60
+
61
+ Usage:
62
+ node index.mjs [options]
63
+
64
+ Options:
65
+ --input=PATH Input pages configuration file (default: input/pages.json)
66
+ --output=PATH Output directory (default: output/)
67
+ --token=TOKEN Notion API token (or set NOTION_TOKEN env var)
68
+ --clean Clean output directory before processing
69
+ --notion-only Only convert Notion to Markdown (skip MDX conversion)
70
+ --mdx-only Only convert existing Markdown to MDX
71
+ --help, -h Show this help
72
+
73
+ Environment Variables:
74
+ NOTION_TOKEN Your Notion integration token
75
+
76
+ Examples:
77
+ # Full conversion workflow
78
+ NOTION_TOKEN=your_token node index.mjs --clean
79
+
80
+ # Only convert Notion pages to Markdown
81
+ node index.mjs --notion-only --token=your_token
82
+
83
+ # Only convert existing Markdown to MDX
84
+ node index.mjs --mdx-only
85
+
86
+ # Custom paths
87
+ node index.mjs --input=my-pages.json --output=converted/ --token=your_token
88
+
89
+ Configuration File Format (pages.json):
90
+ {
91
+ "pages": [
92
+ {
93
+ "id": "your-notion-page-id",
94
+ "title": "Page Title",
95
+ "slug": "page-slug"
96
+ }
97
+ ]
98
+ }
99
+
100
+ Workflow:
101
+ 1. Notion → Markdown (with media download)
102
+ 2. Markdown → MDX (with Astro components)
103
+ 3. Copy to Astro content directory
104
+ `);
105
+ }
106
+
107
+ function ensureDirectory(dir) {
108
+ if (!existsSync(dir)) {
109
+ mkdirSync(dir, { recursive: true });
110
+ }
111
+ }
112
+
113
+ async function cleanDirectory(dir) {
114
+ if (existsSync(dir)) {
115
+ const { execSync } = await import('child_process');
116
+ execSync(`rm -rf "${dir}"/*`, { stdio: 'inherit' });
117
+ }
118
+ }
119
+
120
+ function readPagesConfig(inputFile) {
121
+ try {
122
+ const content = readFileSync(inputFile, 'utf8');
123
+ return JSON.parse(content);
124
+ } catch (error) {
125
+ console.error(`❌ Error reading pages config: ${error.message}`);
126
+ return { pages: [] };
127
+ }
128
+ }
129
+
130
+ /**
131
+ * Create a temporary pages.json from NOTION_PAGE_ID environment variable
132
+ * Extracts title and generates slug from the Notion page
133
+ */
134
+ async function createPagesConfigFromEnv(pageId, token, outputPath) {
135
+ try {
136
+ console.log('🔍 Fetching page info from Notion API...');
137
+ const notion = new Client({ auth: token });
138
+ const page = await notion.pages.retrieve({ page_id: pageId });
139
+
140
+ // Extract title
141
+ let title = 'Article';
142
+ if (page.properties.title && page.properties.title.title && page.properties.title.title.length > 0) {
143
+ title = page.properties.title.title[0].plain_text;
144
+ } else if (page.properties.Name && page.properties.Name.title && page.properties.Name.title.length > 0) {
145
+ title = page.properties.Name.title[0].plain_text;
146
+ }
147
+
148
+ // Generate slug from title
149
+ const slug = title
150
+ .toLowerCase()
151
+ .replace(/[^\w\s-]/g, '')
152
+ .replace(/\s+/g, '-')
153
+ .replace(/-+/g, '-')
154
+ .trim();
155
+
156
+ console.log(` ✅ Found page: "${title}" (slug: ${slug})`);
157
+
158
+ // Create pages config
159
+ const pagesConfig = {
160
+ pages: [{
161
+ id: pageId,
162
+ title: title,
163
+ slug: slug
164
+ }]
165
+ };
166
+
167
+ // Write to temporary file
168
+ writeFileSync(outputPath, JSON.stringify(pagesConfig, null, 4));
169
+ console.log(` ✅ Created temporary pages config`);
170
+
171
+ return pagesConfig;
172
+ } catch (error) {
173
+ console.error(`❌ Error fetching page from Notion: ${error.message}`);
174
+ throw error;
175
+ }
176
+ }
177
+
178
+ function copyToAstroContent(outputDir) {
179
+ console.log('📋 Copying MDX files to Astro content directory...');
180
+
181
+ try {
182
+ // Ensure Astro directories exist
183
+ mkdirSync(dirname(ASTRO_CONTENT_PATH), { recursive: true });
184
+ mkdirSync(ASTRO_ASSETS_PATH, { recursive: true });
185
+
186
+ // Copy MDX file
187
+ const files = readdirSync(outputDir);
188
+ const mdxFiles = files.filter(file => file.endsWith('.mdx'));
189
+ if (mdxFiles.length > 0) {
190
+ const mdxFile = join(outputDir, mdxFiles[0]); // Take the first MDX file
191
+ copyFileSync(mdxFile, ASTRO_CONTENT_PATH);
192
+ console.log(` ✅ Copied MDX to ${ASTRO_CONTENT_PATH}`);
193
+ }
194
+
195
+ // Copy images
196
+ const mediaDir = join(outputDir, 'media');
197
+ if (existsSync(mediaDir)) {
198
+ const imageExtensions = ['.png', '.jpg', '.jpeg', '.gif', '.svg'];
199
+ let imageCount = 0;
200
+
201
+ function copyImagesRecursively(dir) {
202
+ const files = readdirSync(dir);
203
+ for (const file of files) {
204
+ const filePath = join(dir, file);
205
+ const stat = statSync(filePath);
206
+
207
+ if (stat.isDirectory()) {
208
+ copyImagesRecursively(filePath);
209
+ } else if (imageExtensions.some(ext => file.toLowerCase().endsWith(ext))) {
210
+ const filename = basename(filePath);
211
+ const destPath = join(ASTRO_ASSETS_PATH, filename);
212
+ copyFileSync(filePath, destPath);
213
+ imageCount++;
214
+ }
215
+ }
216
+ }
217
+
218
+ copyImagesRecursively(mediaDir);
219
+ console.log(` ✅ Copied ${imageCount} image(s) to ${ASTRO_ASSETS_PATH}`);
220
+
221
+ // Update image paths in MDX file
222
+ const mdxContent = readFileSync(ASTRO_CONTENT_PATH, 'utf8');
223
+ let updatedContent = mdxContent.replace(/\.\/media\//g, './assets/image/');
224
+ // Remove the subdirectory from image paths since we copy images directly to assets/image/
225
+ updatedContent = updatedContent.replace(/\.\/assets\/image\/[^\/]+\//g, './assets/image/');
226
+ writeFileSync(ASTRO_CONTENT_PATH, updatedContent);
227
+ console.log(` ✅ Updated image paths in MDX file`);
228
+ }
229
+
230
+ // Create empty bibliography.bib
231
+ writeFileSync(ASTRO_BIB_PATH, '');
232
+ console.log(` ✅ Created empty bibliography at ${ASTRO_BIB_PATH}`);
233
+
234
+ } catch (error) {
235
+ console.warn(` ⚠️ Failed to copy to Astro: ${error.message}`);
236
+ }
237
+ }
238
+
239
+
240
+ async function main() {
241
+ const args = process.argv.slice(2);
242
+
243
+ if (args.includes('--help') || args.includes('-h')) {
244
+ showHelp();
245
+ process.exit(0);
246
+ }
247
+
248
+ const config = parseArgs();
249
+
250
+ console.log('🚀 Notion to MDX Toolkit');
251
+ console.log('========================');
252
+
253
+ try {
254
+ // Prepare input config file
255
+ let inputConfigFile = config.input;
256
+ let pageIdFromEnv = null;
257
+
258
+ // If NOTION_PAGE_ID is provided via env var, create temporary pages.json
259
+ if (config.pageId && config.token) {
260
+ console.log('✨ Using NOTION_PAGE_ID from environment variable');
261
+ const tempConfigPath = join(config.output, '.temp-pages.json');
262
+ ensureDirectory(config.output);
263
+ await createPagesConfigFromEnv(config.pageId, config.token, tempConfigPath);
264
+ inputConfigFile = tempConfigPath;
265
+ pageIdFromEnv = config.pageId;
266
+ } else if (!existsSync(config.input)) {
267
+ console.error(`❌ No NOTION_PAGE_ID environment variable and no pages.json found at: ${config.input}`);
268
+ console.log('💡 Either set NOTION_PAGE_ID env var or create input/pages.json');
269
+ process.exit(1);
270
+ }
271
+
272
+ if (config.clean) {
273
+ console.log('🧹 Cleaning output directory...');
274
+ await cleanDirectory(config.output);
275
+ }
276
+
277
+ if (config.mdxOnly) {
278
+ // Only convert existing Markdown to MDX
279
+ console.log('📝 MDX conversion only mode');
280
+ await convertToMdx(config.output, config.output);
281
+ copyToAstroContent(config.output);
282
+
283
+ } else if (config.notionOnly) {
284
+ // Only convert Notion to Markdown
285
+ console.log('📄 Notion conversion only mode');
286
+ await convertNotionToMarkdown(inputConfigFile, config.output, config.token);
287
+
288
+ } else {
289
+ // Full workflow
290
+ console.log('🔄 Full conversion workflow');
291
+
292
+ // Step 1: Convert Notion to Markdown
293
+ console.log('\n📄 Step 1: Converting Notion pages to Markdown...');
294
+ await convertNotionToMarkdown(inputConfigFile, config.output, config.token);
295
+
296
+ // Step 2: Convert Markdown to MDX with Notion metadata
297
+ console.log('\n📝 Step 2: Converting Markdown to MDX...');
298
+ const pagesConfig = readPagesConfig(inputConfigFile);
299
+ const firstPage = pagesConfig.pages && pagesConfig.pages.length > 0 ? pagesConfig.pages[0] : null;
300
+ const pageId = pageIdFromEnv || (firstPage ? firstPage.id : null);
301
+ await convertToMdx(config.output, config.output, pageId, config.token);
302
+
303
+ // Step 3: Copy to Astro content directory
304
+ console.log('\n📋 Step 3: Copying to Astro content directory...');
305
+ copyToAstroContent(config.output);
306
+ }
307
+
308
+ console.log('\n🎉 Conversion completed successfully!');
309
+
310
+ } catch (error) {
311
+ console.error('❌ Error:', error.message);
312
+ process.exit(1);
313
+ }
314
+ }
315
+
316
+ // Export functions for use as module
317
+ export { convertNotionToMarkdown, convertToMdx };
318
+
319
+ // Run CLI if called directly
320
+ if (import.meta.url === `file://${process.argv[1]}`) {
321
+ main();
322
+ }